deepcausalmmm.core

Core components of DeepCausalMMM.

Functions

train_mmm(*args, **kwargs)

class deepcausalmmm.core.DeepCausalMMM(n_media: int = 10, ctrl_dim: int = 15, hidden: int = 32, n_regions: int = 2, dropout: float = 0.1, sparsity_weight: float = 0.01, enable_dag: bool = True, enable_interactions: bool = True, l1_weight: float = 0.001, l2_weight: float = 0.001, burn_in_weeks: int = 4, use_coefficient_momentum: bool = True, momentum_decay: float = 0.9, use_warm_start: bool = True, warm_start_epochs: int = 50, stabilization_method: str = 'exponential', coeff_l2_weight: float = 0.1, coeff_gen_l2_weight: float = 0.05, gru_layers: int = 1, ctrl_hidden_ratio: float = 0.5, dag_mode: str = 'triangular', notears_lambda1: float = 0.01, notears_rho_init: float = 1.0, notears_alpha_init: float = 0.0, notears_rho_max: float = 1e+16, dag_temperature: float = 1.0, notears_group_l1: float = 0.0)[source]

Deep Causal Marketing Mix Model with DAG structure and channel interactions.

This model combines deep learning with causal inference to understand the impact of marketing channels on business KPIs while learning causal relationships between channels through a Directed Acyclic Graph (DAG).

The model features: - GRU-based temporal modeling for time-varying coefficients - Learnable coefficient bounds for realistic attribution - DAG learning for causal channel interactions (triangular mask or opt-in NOTEARS) - Adstock and saturation transformations - Multi-region support with shared and region-specific parameters - Zero hardcoding philosophy - all parameters are learnable or configurable

Parameters:
  • n_media (int, default=10) – Number of media channels in the dataset

  • ctrl_dim (int, default=15) – Number of control variables (weather, events, etc.)

  • hidden (int, default=32) – Hidden dimension size for GRU and MLP layers

  • n_regions (int, default=2) – Number of geographic regions or DMAs

  • dropout (float, default=0.1) – Dropout rate for regularization during training

  • sparsity_weight (float, default=0.01) – Weight for sparsity regularization on coefficients

  • enable_dag (bool, default=True) – Whether to enable DAG learning for channel interactions

  • enable_interactions (bool, default=True) – Whether to enable channel interaction modeling

  • l1_weight (float, default=0.001) – L1 regularization weight for coefficient sparsity

  • l2_weight (float, default=0.001) – L2 regularization weight for coefficient smoothness

  • burn_in_weeks (int, default=4) – Number of initial weeks for GRU stabilization

  • use_coefficient_momentum (bool, default=True) – Whether to use momentum for coefficient stabilization

  • momentum_decay (float, default=0.9) – Decay rate for coefficient momentum

  • use_warm_start (bool, default=True) – Whether to use warm start training initialization

  • warm_start_epochs (int, default=50) – Number of epochs for warm start phase

  • stabilization_method (str, default="exponential") – Method for coefficient stabilization (“linear”, “exponential”, “sigmoid”)

  • coeff_l2_weight (float, default=0.1) – L2 regularization specifically for media coefficients

  • coeff_gen_l2_weight (float, default=0.05) – L2 regularization for coefficient generation layers

  • gru_layers (int, default=1) – Number of GRU layers (configured, not hardcoded)

  • ctrl_hidden_ratio (float, default=0.5) – Control hidden size as ratio of main hidden dimension

  • dag_mode (str, default="triangular") – DAG acyclicity mode: "triangular" (upper-triangular mask, default) or "notears" (continuous NOTEARS penalty; Zheng et al., 2018)

  • notears_lambda1 (float, default=0.01) – L1 sparsity on the full adjacency in NOTEARS mode

  • notears_rho_init (float, default=1.0) – Initial augmented-Lagrangian penalty rho

  • notears_alpha_init (float, default=0.0) – Initial dual variable alpha for NOTEARS

  • notears_rho_max (float, default=1e16) – Upper cap on rho for numerical safety

  • dag_temperature (float, default=1.0) – Sigmoid temperature for DAG edge weights (< 1 sharpens toward {0, 1})

  • notears_group_l1 (float, default=0.0) – Column-group L1 over adjacency columns (NOTEARS only)

media_coeffs

Time-varying coefficients for media channels

Type:

torch.nn.Parameter

ctrl_coeffs

Coefficients for control variables

Type:

torch.nn.Parameter

dag_matrix

Learnable DAG adjacency matrix for channel interactions

Type:

torch.nn.Parameter

region_baseline

Region-specific baseline contributions

Type:

torch.nn.Parameter

seasonal_coeff

Learnable coefficient for seasonal component

Type:

torch.nn.Parameter

Examples

>>> import torch
>>> from deepcausalmmm import DeepCausalMMM
>>>
>>> # Initialize model
>>> model = DeepCausalMMM(
...     n_media=5,
...     ctrl_dim=3,
...     n_regions=2,
...     hidden=64
... )
>>>
>>> # Prepare data tensors
>>> media_data = torch.randn(2, 104, 5)    # [regions, weeks, channels]
>>> control_data = torch.randn(2, 104, 3)  # [regions, weeks, controls]
>>> regions = torch.arange(2).unsqueeze(1).repeat(1, 104)
>>>
>>> # Forward pass
>>> predictions, media_coeffs, media_contributions, outputs = model(
...     media_data, control_data, regions
... )
>>>
>>> print(f"Predictions shape: {predictions.shape}")
>>> print(f"Media contributions: {outputs['contributions'].shape}")
__init__(n_media: int = 10, ctrl_dim: int = 15, hidden: int = 32, n_regions: int = 2, dropout: float = 0.1, sparsity_weight: float = 0.01, enable_dag: bool = True, enable_interactions: bool = True, l1_weight: float = 0.001, l2_weight: float = 0.001, burn_in_weeks: int = 4, use_coefficient_momentum: bool = True, momentum_decay: float = 0.9, use_warm_start: bool = True, warm_start_epochs: int = 50, stabilization_method: str = 'exponential', coeff_l2_weight: float = 0.1, coeff_gen_l2_weight: float = 0.05, gru_layers: int = 1, ctrl_hidden_ratio: float = 0.5, dag_mode: str = 'triangular', notears_lambda1: float = 0.01, notears_rho_init: float = 1.0, notears_alpha_init: float = 0.0, notears_rho_max: float = 1e+16, dag_temperature: float = 1.0, notears_group_l1: float = 0.0)[source]

Initialize internal Module state, shared by both nn.Module and ScriptModule.

initialize_baseline(y_data: Tensor)[source]

Initialize baseline to match target data statistics.

CRITICAL: y_data is ALREADY in scaled space (y/y_mean per region)! Extract ALL parameters directly from the actual data distribution. IMPORTANT: Skip padding weeks to avoid baseline bias!

initialize_hill_from_data(Xm: Tensor)[source]

Initialize hill_g based on per-channel SOV (Share of Voice) distribution.

For each channel, hill_g is set to the 60th percentile of its SOV values, ensuring the inflection point matches where the channel typically operates.

Parameters:

Xm – Media data [n_regions, n_timesteps, n_channels] in SOV-scaled space [0, 1]

initialize_stable_coefficients_from_data(Xm: Tensor, Xc: Tensor, y: Tensor)[source]

Initialize stable coefficients based on simple linear regression on the data. This provides domain-informed starting points for coefficient stabilization.

warm_start_training(Xm: Tensor, Xc: Tensor, R: Tensor, y: Tensor, optimizer: Optimizer, epochs: int = None)[source]

Warm-start training phase to stabilize GRU coefficients before main training. Uses only stable coefficients and focuses on learning good hidden state initialization.

adstock(x: Tensor) Tensor[source]

STABILIZED adstock transformation.

hill(x: Tensor) Tensor[source]

STABILIZED Hill saturation transformation.

dag_interaction(x: Tensor) Tensor[source]

Load-bearing DAG-driven channel interactions.

Each target channel j’s effective input becomes a learned blend of itself and a DAG-driven aggregation of its causal parents:

parents_j = sum_i adj[i, j] * x_i (column-j of adj * x) x_j_new = (1 - mix_j) * x_j + mix_j * parents_j

Two design choices make NOTEARS actually structure-learning here rather than decorative:

  • mix is a per-channel learnable scalar in (0, 1). The model can turn the DAG up where it helps prediction (strong causal parents) and down where it doesn’t, so each adj column receives genuine per-channel gradient signal rather than a globally-averaged one.

  • adj uses a temperature-controlled sigmoid (dag_temperature), so the model is encouraged toward {near-0, near-1} edges instead of a soft cluster around 0.5 / sigmoid floor — this is the standard trick for making sigmoid gates bi-modal.

With the previous additive form x + scalar * matmul(x, adj) the loss was satisfied even with a uniform adjacency at the L1 floor, which is why the learned graph collapsed to ~equal edges. The blended form forces the model to either pick informative parents or keep mix_j close to 0 (effectively ignoring the DAG for that channel).

process_media(X: Tensor) Tuple[Tensor, Dict[str, Tensor]][source]

Process media variables through transformations.

apply_burn_in_stabilization(coeffs: Tensor, stable_coeff: Tensor) Tensor[source]

Advanced burn-in stabilization with multiple transition methods.

Parameters:
  • coeffs – Time-varying coefficients [B, T, dim]

  • stable_coeff – Stable reference coefficients [dim]

Returns:

Stabilized coefficients with smooth burn-in transition

forward(Xm: Tensor, Xc: Tensor, R: Tensor) Tuple[Tensor, Tensor, Tensor, Dict[str, Any]][source]

Forward pass through the DeepCausalMMM model.

Processes media and control variables through the neural network to generate predictions, time-varying coefficients, per-channel media contributions, and a detailed outputs dict (baseline, seasonality, control contributions, DAG, etc.).

Parameters:
  • Xm (torch.Tensor) – Media data tensor of shape [batch_size, time_steps, n_media] Should be SOV-scaled (Share of Voice) normalized to [0, 1] range

  • Xc (torch.Tensor) – Control variables tensor of shape [batch_size, time_steps, ctrl_dim] Should be standardized (z-score normalized)

  • R (torch.Tensor) – Region indicators tensor of shape [batch_size, time_steps] Integer values representing region/DMA IDs

Returns:

  • predictions (torch.Tensor) – Model predictions (scaled KPI), shape typically [batch_size, time_steps] (broadcast with components before prediction_scale).

  • media_coefficients (torch.Tensor) – Time-varying media coefficients, shape [batch_size, time_steps, n_media].

  • media_contributions (torch.Tensor) – Per-channel media contributions X_processed * media_coefficients, shape [batch_size, time_steps, n_media].

  • outputs (Dict[str, Any]) – Detailed tensors, including: - contributions: same as media_contributions (media channel breakdown) - coefficients: same as returned media_coefficients - control_contributions: control variable contributions [batch, time, ctrl_dim] - control_coefficients: control coefficients [batch, time, ctrl_dim] - baseline: baseline without seasonality (for waterfall-style splits) - seasonal_contribution: seasonal term [batch, time] - dag_matrix (when DAG enabled): adjacency [n_media, n_media] - adstocked_media, media_hill, media_dag (when applicable): media pipeline stages

Examples

>>> import torch
>>> model = DeepCausalMMM(n_media=3, ctrl_dim=2, n_regions=2)
>>>
>>> # Prepare input tensors
>>> media = torch.rand(2, 52, 3)  # 2 regions, 52 weeks, 3 channels
>>> control = torch.randn(2, 52, 2)  # 2 control variables
>>> regions = torch.tensor([[0]*52, [1]*52])  # Region indicators
>>>
>>> # Forward pass
>>> pred, media_coeffs, media_contrib, outputs = model(media, control, regions)
>>>
>>> # Access detailed outputs
>>> media_contrib_out = outputs['contributions']
>>> ctrl_contrib = outputs['control_contributions']
>>> dag_matrix = outputs.get('dag_matrix')

Notes

The forward pass applies the following transformations in order: 1. Media processing: Adstock -> Hill saturation -> DAG interactions 2. Feature processing: Media features + Control features -> GRU 3. Coefficient generation: Time-varying coefficients from GRU states 4. Contribution calculation: Features * Coefficients 5. Final prediction: Baseline + Seasonality + Media + Control contributions

The model enforces several constraints: - DAG acyclicity: upper-triangular mask (default) or NOTEARS penalty

when dag_mode='notears'

  • Non-negative baseline and seasonal contributions

  • Learnable coefficient bounds to prevent explosion

  • Burn-in period stabilization for initial weeks

h_acyclicity(W: Tensor) Tensor[source]

NOTEARS acyclicity scalar: h(W) = tr(exp(W ⊙ W)) − d.

Equals zero iff W is the adjacency of a DAG; smooth and differentiable elsewhere. See Zheng et al., 2018 (https://arxiv.org/abs/1803.01422).

Parameters:

W – Square adjacency matrix (n_media × n_media).

Returns:

Scalar tensor; minimised toward 0 during training.

get_dag_loss() Tensor[source]

DAG regularisation. Mode-aware: triangular uses sparsity/confidence penalties only (acyclicity is structural); NOTEARS additionally adds the augmented-Lagrangian acyclicity term 0.5·rho·h(W)² + alpha·h(W) and an L1 penalty on the full adjacency.

notears_update_duals(factor: float = 10.0, progress: float = 0.25) Dict[str, float][source]

Augmented-Lagrangian dual update (NOTEARS outer loop).

Call once every K epochs from the trainer. Returns diagnostic dict ({“h”, “rho”, “alpha”}) for logging; returns empty dict in triangular mode.

Parameters:
  • factor – Multiplicative growth applied to rho when h(W) stalls.

  • progress – Required relative shrinkage of h between outer iterations. If h_new > progress * h_prev, rho is grown by factor.

get_dag_adjacency_matrix(eps: float | None = None) Tensor[source]

Learned adjacency with dag_temperature and tri_mask applied.

Parameters:

eps – If None, return continuous edge weights. If a float, zero entries with |w| < eps (same rule as threshold_dag).

Returns:

Square adjacency tensor [n_media, n_media].

threshold_dag(eps: float = 0.3) Tensor[source]

Post-training pruning: zero out adjacency entries with |w| < eps.

Returns the thresholded adjacency tensor. For NOTEARS mode this is the recommended way to obtain a clean discrete DAG from the continuous W.

get_sparsity_loss() Tensor[source]

Sparsity loss to encourage sparse coefficients.

get_regularization_loss() Tensor[source]

Calculate combined regularization loss including DAG penalty and coefficient regularization.

get_parameters() Dict[str, Tensor][source]

Get model parameters for analysis.

deepcausalmmm.core.get_default_config() Dict[str, Any][source]

Get default configuration settings for the model.

Includes DAG structure-learning keys under dag_mode:

  • 'triangular' (default) — upper-triangular acyclicity mask

  • 'notears' — NOTEARS augmented-Lagrangian mode; see also notears_warmup_epochs, notears_lambda1, notears_dual_*, dag_temperature, and notears_group_l1

Returns:

Dict containing all configuration parameters

deepcausalmmm.core.update_config(base_config: Dict[str, Any], updates: Dict[str, Any]) Dict[str, Any][source]

Update base configuration with new values.

Parameters:
  • base_config – Base configuration dictionary

  • updates – Dictionary containing updates to apply

Returns:

Updated configuration dictionary

class deepcausalmmm.core.ModelTrainer(config: Dict[str, Any] | None = None)[source]

Reusable trainer class for DeepCausalMMM models.

This class provides a complete training pipeline for DeepCausalMMM models with advanced features including early stopping, learning rate scheduling, gradient clipping, and comprehensive logging. It supports both MSE and Huber loss functions with automatic device detection and mixed precision training.

Features: - Config-driven model initialization (zero hardcoding) - Automatic device detection (CPU/CUDA) - Multiple loss functions (MSE, Huber, optional Focal) - Early stopping with patience - Learning rate scheduling (StepLR, Cosine Annealing) - Gradient clipping (global and parameter-specific) - Comprehensive metrics tracking (RMSE, R², MAE) - Progress bars with detailed statistics - Holdout evaluation during training

Parameters:

config (Dict[str, Any], optional) – Configuration dictionary containing all training parameters. If None, uses default configuration from get_default_config().

model

The initialized model instance

Type:

DeepCausalMMM

optimizer

The optimizer (Adam by default)

Type:

torch.optim.Optimizer

scheduler

Learning rate scheduler if enabled

Type:

torch.optim.lr_scheduler._LRScheduler

device

Training device (CPU or CUDA)

Type:

torch.device

best_rmse

Best holdout RMSE achieved during training

Type:

float

train_losses

Training loss history

Type:

List[float]

train_rmses

Training RMSE history

Type:

List[float]

train_r2s

Training R² history

Type:

List[float]

Examples

>>> from deepcausalmmm.core.trainer import ModelTrainer
>>> from deepcausalmmm.core.config import get_default_config
>>>
>>> # Initialize trainer with custom config
>>> config = get_default_config()
>>> config['n_epochs'] = 1000
>>> config['learning_rate'] = 0.01
>>> trainer = ModelTrainer(config)
>>>
>>> # Train model (assumes processed_data is available)
>>> model, results = trainer.train(processed_data)
>>>
>>> # Access training history
>>> print(f"Final RMSE: {results['holdout_rmse']:.0f}")
>>> print(f"Final R²: {results['holdout_r2']:.3f}")
__init__(config: Dict[str, Any] | None = None)[source]

Initialize the trainer with configuration.

Parameters:

config – Configuration dictionary. If None, uses default config.

create_model(n_media: int, n_control: int, n_regions: int) DeepCausalMMM[source]

Create and initialize model from config with reproducible initialization.

Parameters:
  • n_media – Number of media channels

  • n_control – Number of control variables

  • n_regions – Number of regions

Returns:

Initialized DeepCausalMMM model

create_optimizer_and_scheduler()[source]

Create optimizer and learning rate scheduler from config.

warm_start_training(X_media: Tensor, X_control: Tensor, R: Tensor, y: Tensor, verbose: bool = True) None[source]

Perform warm-start training to stabilize GRU coefficients.

Parameters:
  • X_media – Media data tensor

  • X_control – Control data tensor

  • R – Region tensor

  • y – Target tensor

  • verbose – Whether to show progress

train_epoch(X_media: Tensor, X_control: Tensor, R: Tensor, y: Tensor) Tuple[float, float, float][source]

Train for one epoch.

Parameters:
  • X_media – Media data tensor

  • X_control – Control data tensor

  • R – Region tensor

  • y – Target tensor

Returns:

Tuple of (loss, rmse, r2)

evaluate_holdout(X_media: Tensor, X_control: Tensor, R: Tensor, y: Tensor) Tuple[float, float, float][source]

Evaluate model on holdout data.

Parameters:
  • X_media – Holdout media data tensor

  • X_control – Holdout control data tensor

  • R – Holdout region tensor

  • y – Holdout target tensor

Returns:

Tuple of (loss, rmse, r2)

should_stop_early(current_rmse: float) bool[source]

Check if training should stop early based on RMSE improvement.

Parameters:

current_rmse – Current epoch’s RMSE

Returns:

True if training should stop

train(X_media_train: Tensor, X_control_train: Tensor, R_train: Tensor, y_train: Tensor, X_media_holdout: Tensor | None = None, X_control_holdout: Tensor | None = None, R_holdout: Tensor | None = None, y_holdout: Tensor | None = None, pipeline: Any | None = None, verbose: bool = True) Dict[str, Any][source]

Full training loop with warm-start, main training, and holdout evaluation.

Parameters:
  • X_media_train – Training media data

  • X_control_train – Training control data

  • R_train – Training region data

  • y_train – Training target data

  • X_media_holdout – Optional holdout media data

  • X_control_holdout – Optional holdout control data

  • R_holdout – Optional holdout region data

  • y_holdout – Optional holdout target data

  • pipeline – Optional UnifiedDataPipeline for accessing scaler

  • verbose – Whether to show progress

Returns:

Dictionary with training results

class deepcausalmmm.core.InferenceManager(model: DeepCausalMMM, pipeline: UnifiedDataPipeline | None = None, scaler: SimpleGlobalScaler | None = None, config: Dict[str, Any] | None = None, channel_names: List[str] | None = None, control_names: List[str] | None = None)[source]

Modern class-based interface for DeepCausalMMM model inference.

Handles: - Model predictions on new data - Contribution analysis (media, control, baseline) - Coefficient extraction - Data preprocessing for inference - Inverse transformations for interpretable results

__init__(model: DeepCausalMMM, pipeline: UnifiedDataPipeline | None = None, scaler: SimpleGlobalScaler | None = None, config: Dict[str, Any] | None = None, channel_names: List[str] | None = None, control_names: List[str] | None = None)[source]

Initialize the inference manager.

Parameters:
  • model – Trained DeepCausalMMM model

  • pipeline – UnifiedDataPipeline used for training (preferred)

  • scaler – SimpleGlobalScaler used for training (legacy support)

  • config – Configuration dictionary

  • channel_names – List of media channel names

  • control_names – List of control variable names

predict(X_media: ndarray, X_control: ndarray, return_contributions: bool = True, remove_padding: bool = True, return_media_coefficients: bool = False) Dict[str, ndarray][source]

Make predictions on new data.

Parameters:
  • X_media – Media data [n_regions, n_weeks, n_media_channels]

  • X_control – Control data [n_regions, n_weeks, n_control_vars]

  • return_contributions – Whether to return contribution breakdowns

  • remove_padding – Whether to remove burn-in padding from results

  • return_media_coefficients – If True, include time-varying media coefficients (second tensor from forward()) as media_coefficients.

Returns:

Dictionary containing predictions and optionally contributions

predict_and_inverse_transform(X_media: ndarray, X_control: ndarray, return_contributions: bool = True) Dict[str, ndarray][source]

Make predictions and apply inverse transformations for interpretable results.

Parameters:
  • X_media – Media data [n_regions, n_weeks, n_media_channels]

  • X_control – Control data [n_regions, n_weeks, n_control_vars]

  • return_contributions – Whether to return contribution breakdowns

Returns:

Dictionary containing predictions and contributions in original scale

get_coefficients() Dict[str, ndarray][source]

Extract model coefficients.

Returns:

Dictionary containing media and control coefficients

get_dag_adjacency(threshold: bool = False, eps: float | None = None) ndarray | None[source]

Extract DAG adjacency matrix if available.

Uses the same mask + dag_temperature scaling as training. Set threshold=True (or pass eps) to prune weak edges via notears_threshold from config by default.

Parameters:
  • threshold – If True, zero entries below eps.

  • eps – Pruning cutoff; defaults to config['notears_threshold'].

Returns:

Adjacency matrix or None if DAG is not enabled

analyze_contributions(X_media: ndarray, X_control: ndarray, aggregate_regions: bool = True, aggregate_time: bool = False) Dict[str, Any][source]

Comprehensive contribution analysis.

Parameters:
  • X_media – Media data

  • X_control – Control data

  • aggregate_regions – Whether to aggregate across regions

  • aggregate_time – Whether to aggregate across time

Returns:

Dictionary with detailed contribution analysis

class deepcausalmmm.core.VisualizationManager(config: Dict[str, Any] | None = None)[source]

Visualization manager for creating consistent plots in DeepCausalMMM analysis.

Provides a unified interface for creating training progress, coefficient analysis, contribution plots, DAG visualizations, and other MMM-related charts. All plot parameters are driven by configuration for consistency.

Parameters:

config (Dict[str, Any], optional) – Configuration dictionary. If None, uses default configuration.

__init__(config: Dict[str, Any] | None = None)[source]

Initialize the visualization manager.

Parameters:

config – Configuration dictionary. If None, uses default config.

create_training_progress_plot(train_losses: List[float], train_rmses: List[float], train_r2s: List[float], title: str = 'Training Progress') Figure[source]

Create a training progress plot with loss, RMSE, and R².

Parameters:
  • train_losses – Training losses over epochs

  • train_rmses – Training RMSEs over epochs

  • train_r2s – Training R² scores over epochs

  • title – Plot title

Returns:

Plotly figure

create_actual_vs_predicted_plot(y_actual: ndarray, y_predicted: ndarray, title: str = 'Actual vs Predicted', weeks: List[int] | None = None) Figure[source]

Create an actual vs predicted time series plot.

Parameters:
  • y_actual – Actual values

  • y_predicted – Predicted values

  • title – Plot title

  • weeks – Optional week indices for x-axis

Returns:

Plotly figure

create_scatter_plot(x: ndarray, y: ndarray, title: str = 'Scatter Plot', x_label: str = 'X', y_label: str = 'Y', color: str = 'blue') Figure[source]

Create a scatter plot with perfect correlation line.

Parameters:
  • x – X values

  • y – Y values

  • title – Plot title

  • x_label – X-axis label

  • y_label – Y-axis label

  • color – Marker color

Returns:

Plotly figure

create_waterfall_chart(categories: List[str], values: List[float], title: str = 'Waterfall Chart') Figure[source]

Create a proper waterfall chart using Plotly’s go.Waterfall.

Parameters:
  • categories – Category names

  • values – Values for each category

  • title – Chart title

Returns:

Plotly figure

create_contribution_stacked_bar(media_contributions: ndarray, control_contributions: ndarray, baseline: ndarray, media_names: List[str], control_names: List[str], weeks: List[int] | None = None, title: str = 'Contributions Over Time') Figure[source]

Create a stacked bar chart of contributions over time.

Parameters:
  • media_contributions – Media contributions [n_weeks, n_media]

  • control_contributions – Control contributions [n_weeks, n_controls]

  • baseline – Baseline values [n_weeks]

  • media_names – Media channel names

  • control_names – Control variable names

  • weeks – Optional week indices

  • title – Chart title

Returns:

Plotly figure

create_dag_network_plot(adjacency_matrix: ndarray, node_names: List[str], title: str = 'DAG Network') Figure[source]

Create a DAG network visualization.

Parameters:
  • adjacency_matrix – Adjacency matrix [n_nodes, n_nodes]

  • node_names – Node names

  • title – Plot title

Returns:

Plotly figure

create_dag_heatmap(adjacency_matrix: ndarray, node_names: List[str], title: str = 'DAG Adjacency Matrix') Figure[source]

Create a DAG adjacency matrix heatmap.

Parameters:
  • adjacency_matrix – Adjacency matrix [n_nodes, n_nodes]

  • node_names – Node names

  • title – Plot title

Returns:

Plotly figure

save_plot(fig: Figure, filepath: str, include_plotlyjs: str = 'cdn') bool[source]

Save a Plotly figure to HTML file.

Parameters:
  • fig – Plotly figure to save

  • filepath – Output file path

  • include_plotlyjs – How to include Plotly.js (‘cdn’, ‘inline’, etc.)

Returns:

True if successful, False otherwise

create_comprehensive_dashboard(results: Dict[str, Any], output_dir: str = 'dashboard_comprehensive') List[Tuple[str, str]][source]

Create a comprehensive dashboard with multiple plots.

Parameters:
  • results – Training results dictionary

  • output_dir – Output directory for plots

Returns:

List of (plot_name, filepath) tuples for created plots

class deepcausalmmm.core.UnifiedDataPipeline(config: Dict[str, Any])[source]

Unified data processing pipeline for DeepCausalMMM models.

This pipeline ensures consistent data transformations between training and holdout datasets, implementing the complete preprocessing workflow required for MMM analysis. It handles temporal splitting, multi-scale normalization, seasonal decomposition, and tensor preparation for PyTorch models.

Key Features: - Temporal train/holdout splitting (respects time series nature) - SOV (Share of Voice) scaling for media channels - Z-score normalization for control variables - Min-Max scaling for seasonal components (per region) - Burn-in padding for GRU stabilization - Automatic tensor conversion and device handling - Inverse transformation utilities for interpretation - Region encoding and validation

The pipeline maintains data integrity by: - Using the same scaler fit on training data for holdout - Preserving temporal order in all transformations - Handling missing values and outliers appropriately - Ensuring consistent tensor shapes across regions

Parameters:

config (Dict[str, Any]) – Configuration dictionary containing: - ‘holdout_ratio’: Fraction of data for holdout (default 0.08) - ‘burn_in_weeks’: Number of weeks for padding (default 6) - ‘random_seed’: Seed for reproducible operations (default 42) - Media channel names, control variable names, etc.

scaler

Fitted scaler for consistent transformations

Type:

SimpleGlobalScaler

seasonal_detector

Seasonal decomposition utility

Type:

DetectSeasonality

media_columns

Names of media channel columns

Type:

List[str]

control_columns

Names of control variable columns

Type:

List[str]

region_column

Name of region identifier column

Type:

str

target_column

Name of target variable column

Type:

str

Examples

>>> import pandas as pd
>>> from deepcausalmmm.core.data import UnifiedDataPipeline
>>> from deepcausalmmm.core.config import get_default_config
>>>
>>> # Load your MMM dataset
>>> df = pd.read_csv('mmm_data.csv')
>>> config = get_default_config()
>>>
>>> # Initialize and fit pipeline
>>> pipeline = UnifiedDataPipeline(config)
>>> processed_data = pipeline.fit_transform(df)
>>>
>>> # Access processed tensors
>>> X_media_train = processed_data['X_media_train']
>>> y_train = processed_data['y_train']
>>>
>>> # Get holdout data
>>> X_media_holdout = processed_data['X_media_holdout']
>>> y_holdout = processed_data['y_holdout']
>>>
>>> print(f"Training shape: {X_media_train.shape}")
>>> print(f"Holdout shape: {X_media_holdout.shape}")
__init__(config: Dict[str, Any])[source]

Initialize the unified data pipeline.

Parameters:

config – Configuration dictionary with all parameters

temporal_split(X_media: ndarray, X_control: ndarray, y: ndarray, holdout_ratio: float | None = None) Tuple[Dict[str, ndarray], Dict[str, ndarray]][source]

Perform time series split of data using ratio-based approach. This ensures adequate holdout data regardless of burn-in weeks.

Parameters:
  • X_media – Media data [regions, weeks, channels]

  • X_control – Control data [regions, weeks, controls]

  • y – Target data [regions, weeks]

  • holdout_ratio – Fraction of data for holdout (uses config if None)

Returns:

Tuple of (train_data_dict, holdout_data_dict)

fit_and_transform_training(train_data: Dict[str, ndarray]) Dict[str, Tensor][source]

Fit scaler on training data and transform it.

Parameters:

train_data – Dictionary with training data arrays

Returns:

Dictionary with transformed and padded tensors

transform_holdout(holdout_data: Dict[str, ndarray]) Dict[str, Tensor][source]

Transform holdout data using the fitted scaler (same transformations as training).

Parameters:

holdout_data – Dictionary with holdout data arrays

Returns:

Dictionary with transformed and padded tensors

inverse_transform_predictions(y_pred_scaled: Tensor, remove_padding: bool = True) Tensor[source]

Inverse transform predictions to original scale.

Parameters:
  • y_pred_scaled – Predictions in scaled space

  • remove_padding – Whether to remove padding weeks

Returns:

Predictions in original scale

get_evaluation_data(y_true_padded: Tensor, y_pred_padded: Tensor) Tuple[Tensor, Tensor][source]

Extract evaluation data (removing burn-in padding).

Parameters:
  • y_true_padded – True values with padding

  • y_pred_padded – Predicted values with padding

Returns:

Tuple of (y_true_eval, y_pred_eval) without padding

inverse_transform_contributions(media_contributions: Tensor, y_true: Tensor) Tensor[source]

Inverse transform media contributions to original scale.

Parameters:
  • media_contributions – Media contributions in scaled space

  • y_true – True values in original scale (for scaling reference)

Returns:

Media contributions in original scale

get_scaler() SimpleGlobalScaler[source]

Get the fitted scaler for external use.

Returns:

Fitted SimpleGlobalScaler instance

predict_and_postprocess(model, X_media: ndarray, X_control: ndarray, channel_names: List[str], control_names: List[str], combine_with_holdout: bool = True) Dict[str, Any][source]

Generate predictions and contributions using the unified pipeline.

Parameters:
  • model – Trained model

  • X_media – Media data (full dataset for contributions)

  • X_control – Control data (full dataset for contributions)

  • channel_names – Media channel names

  • control_names – Control variable names

  • combine_with_holdout – Whether to combine train+holdout for contributions

Returns:

Dictionary with predictions, contributions, and metadata

calculate_metrics(y_true: Tensor, y_pred: Tensor, prefix: str = '') Dict[str, float][source]

Calculate comprehensive metrics for model evaluation.

Parameters:
  • y_true – True values

  • y_pred – Predicted values

  • prefix – Prefix for metric names (e.g., ‘train_’, ‘holdout_’)

Returns:

Dictionary of metrics

get_processed_full_data()[source]

Get the processed full dataset (train + holdout) with all transformations applied. This includes seasonality features, scaling, and padding - exactly as the model expects.

Returns:

Dictionary containing processed X_media and X_control tensors

class deepcausalmmm.core.SimpleGlobalScaler(config: Dict[str, Any] | None = None)[source]

Linear scaling approach (y/y_mean) for additive attribution.

Scaling features: - Media: Share-of-voice scaling with outlier smoothing - Control: Robust standardization with adaptive clipping - Target: Linear scaling by region mean (y/y_mean) for additive decomposition - Adaptive normalization with distribution-aware clipping - Advanced outlier handling for extreme value stability

__init__(config: Dict[str, Any] | None = None)[source]

Initialize the scaler with optional config parameters.

fit(X_media: ndarray, X_control: ndarray, y: ndarray) None[source]

Fit the scaler using simple global statistics.

Parameters:
  • X_media – Media variables [n_regions, n_timesteps, n_channels]

  • X_control – Control variables [n_regions, n_timesteps, n_controls]

  • y – Target variable [n_regions, n_timesteps]

transform(X_media: ndarray, X_control: ndarray, y: ndarray) Tuple[Tensor, Tensor, Tensor][source]

Transform data using fitted parameters.

Parameters:
  • X_media – Media variables [n_regions, n_timesteps, n_channels]

  • X_control – Control variables [n_regions, n_timesteps, n_controls]

  • y – Target variable [n_regions, n_timesteps]

Returns:

Tuple of (X_media_scaled, X_control_scaled, y_scaled)

inverse_transform_target(y_scaled: Tensor) Tensor[source]

Inverse transform target variable.

Parameters:

y_scaled – Scaled target [n_regions, n_timesteps]

Returns:

Original scale target

inverse_transform_contributions(media_contributions: Tensor, baseline: Tensor = None, control_contributions: Tensor = None, seasonal_contributions: Tensor = None, trend_contributions: Tensor = None, prediction_scale: Tensor = None) dict[source]

Inverse transform ALL contributions to original scale using simple multiplication.

With linear scaling (y/y_mean), the inverse transform is straightforward: component_orig = component_scaled * prediction_scale * y_mean_per_region

This preserves additivity: sum(components_orig) = prediction_orig

Parameters:
  • media_contributions – Media contributions in scaled space [regions, timesteps, channels]

  • baseline – Baseline in scaled space [regions, timesteps]

  • control_contributions – Control contributions in scaled space [regions, timesteps, controls]

  • seasonal_contributions – Seasonal contributions in scaled space [regions, timesteps]

  • trend_contributions – Trend contributions in scaled space [regions, timesteps]

  • prediction_scale – Model’s prediction_scale factor (from F.softplus(self.prediction_scale))

Returns:

Dictionary with all contributions in original scale

fit_transform(X_media: ndarray, X_control: ndarray, y: ndarray) Tuple[Tensor, Tensor, Tensor][source]

Fit the scaler and transform data in one step.

Parameters:
  • X_media – Media variables [n_regions, n_timesteps, n_channels]

  • X_control – Control variables [n_regions, n_timesteps, n_controls]

  • y – Target variable [n_regions, n_timesteps]

Returns:

Tuple of (X_media_scaled, X_control_scaled, y_scaled)

deepcausalmmm.core.GlobalScaler

alias of SimpleGlobalScaler

class deepcausalmmm.core.NodeToEdge(node_dim: int, edge_dim: int)[source]

Transform node features to edge features using attention mechanism.

__init__(node_dim: int, edge_dim: int)[source]

Initialize the node to edge transformation.

Parameters:
  • node_dim – Dimension of node features

  • edge_dim – Dimension of edge features

forward(nodes: Tensor, adj_matrix: Tensor) Tensor[source]

Transform node features to edge features.

Parameters:
  • nodes – Node features [batch_size, n_nodes, 1]

  • adj_matrix – Adjacency matrix [n_nodes, n_nodes]

Returns:

Edge features [batch_size, n_nodes, n_nodes, edge_dim]

class deepcausalmmm.core.EdgeToNode(edge_dim: int, node_dim: int)[source]

Aggregate edge features back to nodes.

__init__(edge_dim: int, node_dim: int)[source]

Initialize the edge to node transformation.

Parameters:
  • edge_dim – Dimension of edge features

  • node_dim – Dimension of node features

forward(edges: Tensor, nodes: Tensor, adj_matrix: Tensor) Tensor[source]

Aggregate edge features to update node features.

Parameters:
  • edges – Edge features [batch_size, n_nodes, n_nodes, edge_dim]

  • nodes – Node features [batch_size, n_nodes, 1]

  • adj_matrix – Adjacency matrix [n_nodes, n_nodes]

Returns:

Updated node features [batch_size, n_nodes, 1]

class deepcausalmmm.core.DAGConstraint(n_nodes: int, sparsity_weight: float = 0.1, temperature: float = 1.0)[source]

Enforce acyclicity in the graph structure using strict triangular constraint.

__init__(n_nodes: int, sparsity_weight: float = 0.1, temperature: float = 1.0)[source]

Initialize the DAG constraint module.

Parameters:
  • n_nodes – Number of nodes in the graph

  • sparsity_weight – Weight for the sparsity penalty

  • temperature – Initial temperature for Gumbel-Softmax

gumbel_softmax(logits: Tensor, tau: float) Tensor[source]

Gumbel-Softmax sampling with straight-through gradients.

Parameters:
  • logits – Input logits

  • tau – Temperature parameter

Returns:

Sampled probabilities

get_adjacency() Tensor[source]

Get the current adjacency matrix using Gumbel-Softmax sampling. This enforces unidirectional edges and allows learning discrete structure.

update_temperature(epoch: int, total_epochs: int, min_temp: float = 0.1)[source]

Update temperature using exponential decay schedule.

Parameters:
  • epoch – Current epoch

  • total_epochs – Total number of epochs

  • min_temp – Minimum temperature

dag_loss() Tensor[source]

Compute the DAG constraint loss with sparsity penalty. With strictly upper triangular form, we only need sparsity penalty as acyclicity is guaranteed by construction.

Returns:

Loss term combining sparsity and entropy

deepcausalmmm.core.train_mmm(*args, **kwargs)[source]

Deprecated since version 1.0.0: train_mmm() is deprecated. Use ModelTrainer class instead.

Modules

config

Configuration settings for DeepCausalMMM model.

dag_model

DAG model implementation with Node-to-Edge and Edge-to-Node transformations.

data

Data preprocessing and loading utilities for DeepCausalMMM.

inference

Modern InferenceManager class for DeepCausalMMM model inference.

scaling

Simple, proven scaling implementation that works reliably.

seasonality

train_model

Training functions for DeepCausalMMM models.

trainer

Reusable ModelTrainer class for DeepCausalMMM training.

unified_model

DeepCausalMMM model implementation combining GRU, DAG, and interaction components.

visualization

Reusable VisualizationManager class for creating consistent plots.