deepcausalmmm.core

Core components of DeepCausalMMM.

Functions

train_mmm(*args, **kwargs)

class deepcausalmmm.core.DeepCausalMMM(n_media: int = 10, ctrl_dim: int = 15, hidden: int = 32, n_regions: int = 2, dropout: float = 0.1, sparsity_weight: float = 0.01, enable_dag: bool = True, enable_interactions: bool = True, l1_weight: float = 0.001, l2_weight: float = 0.001, burn_in_weeks: int = 4, use_coefficient_momentum: bool = True, momentum_decay: float = 0.9, use_warm_start: bool = True, warm_start_epochs: int = 50, stabilization_method: str = 'exponential', coeff_l2_weight: float = 0.1, coeff_gen_l2_weight: float = 0.05, gru_layers: int = 1, ctrl_hidden_ratio: float = 0.5, dag_mode: str = 'triangular', notears_lambda1: float = 0.01, notears_rho_init: float = 1.0, notears_alpha_init: float = 0.0, notears_rho_max: float = 1e+16, dag_temperature: float = 1.0, notears_group_l1: float = 0.0)[source]

Deep Causal Marketing Mix Model with DAG structure and channel interactions.

This model combines deep learning with causal inference to understand the impact of marketing channels on business KPIs while learning causal relationships between channels through a Directed Acyclic Graph (DAG).

The model features: - GRU-based temporal modeling for time-varying coefficients - Learnable coefficient bounds for realistic attribution - DAG learning for causal channel interactions (triangular mask or opt-in NOTEARS) - Adstock and saturation transformations - Multi-region support with shared and region-specific parameters - Zero hardcoding philosophy - all parameters are learnable or configurable

Parameters:

n_media (int, default=10) – Number of media channels in the dataset
ctrl_dim (int, default=15) – Number of control variables (weather, events, etc.)
hidden (int, default=32) – Hidden dimension size for GRU and MLP layers
n_regions (int, default=2) – Number of geographic regions or DMAs
dropout (float, default=0.1) – Dropout rate for regularization during training
sparsity_weight (float, default=0.01) – Weight for sparsity regularization on coefficients
enable_dag (bool, default=True) – Whether to enable DAG learning for channel interactions
enable_interactions (bool, default=True) – Whether to enable channel interaction modeling
l1_weight (float, default=0.001) – L1 regularization weight for coefficient sparsity
l2_weight (float, default=0.001) – L2 regularization weight for coefficient smoothness
burn_in_weeks (int, default=4) – Number of initial weeks for GRU stabilization
use_coefficient_momentum (bool, default=True) – Whether to use momentum for coefficient stabilization
momentum_decay (float, default=0.9) – Decay rate for coefficient momentum
use_warm_start (bool, default=True) – Whether to use warm start training initialization
warm_start_epochs (int, default=50) – Number of epochs for warm start phase
stabilization_method (str, default="exponential") – Method for coefficient stabilization (“linear”, “exponential”, “sigmoid”)
coeff_l2_weight (float, default=0.1) – L2 regularization specifically for media coefficients
coeff_gen_l2_weight (float, default=0.05) – L2 regularization for coefficient generation layers
gru_layers (int, default=1) – Number of GRU layers (configured, not hardcoded)
ctrl_hidden_ratio (float, default=0.5) – Control hidden size as ratio of main hidden dimension
dag_mode (str, default="triangular") – DAG acyclicity mode: "triangular" (upper-triangular mask, default) or "notears" (continuous NOTEARS penalty; Zheng et al., 2018)
notears_lambda1 (float, default=0.01) – L1 sparsity on the full adjacency in NOTEARS mode
notears_rho_init (float, default=1.0) – Initial augmented-Lagrangian penalty rho
notears_alpha_init (float, default=0.0) – Initial dual variable alpha for NOTEARS
notears_rho_max (float, default=1e16) – Upper cap on rho for numerical safety
dag_temperature (float, default=1.0) – Sigmoid temperature for DAG edge weights (< 1 sharpens toward {0, 1})
notears_group_l1 (float, default=0.0) – Column-group L1 over adjacency columns (NOTEARS only)

media_coeffs

Time-varying coefficients for media channels

Type:: torch.nn.Parameter

ctrl_coeffs

Coefficients for control variables

Type:: torch.nn.Parameter

dag_matrix

Learnable DAG adjacency matrix for channel interactions

Type:: torch.nn.Parameter

region_baseline

Region-specific baseline contributions

Type:: torch.nn.Parameter

seasonal_coeff

Learnable coefficient for seasonal component

Type:: torch.nn.Parameter

Examples

>>> import torch
>>> from deepcausalmmm import DeepCausalMMM
>>>
>>> # Initialize model
>>> model = DeepCausalMMM(
...     n_media=5,
...     ctrl_dim=3,
...     n_regions=2,
...     hidden=64
... )
>>>
>>> # Prepare data tensors
>>> media_data = torch.randn(2, 104, 5)    # [regions, weeks, channels]
>>> control_data = torch.randn(2, 104, 3)  # [regions, weeks, controls]
>>> regions = torch.arange(2).unsqueeze(1).repeat(1, 104)
>>>
>>> # Forward pass
>>> predictions, media_coeffs, media_contributions, outputs = model(
...     media_data, control_data, regions
... )
>>>
>>> print(f"Predictions shape: {predictions.shape}")
>>> print(f"Media contributions: {outputs['contributions'].shape}")

__init__(n_media: int = 10, ctrl_dim: int = 15, hidden: int = 32, n_regions: int = 2, dropout: float = 0.1, sparsity_weight: float = 0.01, enable_dag: bool = True, enable_interactions: bool = True, l1_weight: float = 0.001, l2_weight: float = 0.001, burn_in_weeks: int = 4, use_coefficient_momentum: bool = True, momentum_decay: float = 0.9, use_warm_start: bool = True, warm_start_epochs: int = 50, stabilization_method: str = 'exponential', coeff_l2_weight: float = 0.1, coeff_gen_l2_weight: float = 0.05, gru_layers: int = 1, ctrl_hidden_ratio: float = 0.5, dag_mode: str = 'triangular', notears_lambda1: float = 0.01, notears_rho_init: float = 1.0, notears_alpha_init: float = 0.0, notears_rho_max: float = 1e+16, dag_temperature: float = 1.0, notears_group_l1: float = 0.0)[source]: Initialize internal Module state, shared by both nn.Module and ScriptModule.

initialize_baseline(y_data: Tensor)[source]

Initialize baseline to match target data statistics.

CRITICAL: y_data is ALREADY in scaled space (y/y_mean per region)! Extract ALL parameters directly from the actual data distribution. IMPORTANT: Skip padding weeks to avoid baseline bias!

initialize_hill_from_data(Xm: Tensor)[source]

Initialize hill_g based on per-channel SOV (Share of Voice) distribution.

For each channel, hill_g is set to the 60th percentile of its SOV values, ensuring the inflection point matches where the channel typically operates.

Parameters:: Xm – Media data [n_regions, n_timesteps, n_channels] in SOV-scaled space [0, 1]

initialize_stable_coefficients_from_data(Xm: Tensor, Xc: Tensor, y: Tensor)[source]: Initialize stable coefficients based on simple linear regression on the data. This provides domain-informed starting points for coefficient stabilization.

warm_start_training(Xm: Tensor, Xc: Tensor, R: Tensor, y: Tensor, optimizer: Optimizer, epochs: int = None)[source]: Warm-start training phase to stabilize GRU coefficients before main training. Uses only stable coefficients and focuses on learning good hidden state initialization.

adstock(x: Tensor) → Tensor[source]: STABILIZED adstock transformation.

hill(x: Tensor) → Tensor[source]: STABILIZED Hill saturation transformation.

dag_interaction(x: Tensor) → Tensor[source]

Load-bearing DAG-driven channel interactions.

Each target channel j’s effective input becomes a learned blend of itself and a DAG-driven aggregation of its causal parents:

parents_j = sum_i adj[i, j] * x_i (column-j of adj * x) x_j_new = (1 - mix_j) * x_j + mix_j * parents_j

Two design choices make NOTEARS actually structure-learning here rather than decorative:

mix is a per-channel learnable scalar in (0, 1). The model can turn the DAG up where it helps prediction (strong causal parents) and down where it doesn’t, so each adj column receives genuine per-channel gradient signal rather than a globally-averaged one.
adj uses a temperature-controlled sigmoid (dag_temperature), so the model is encouraged toward {near-0, near-1} edges instead of a soft cluster around 0.5 / sigmoid floor — this is the standard trick for making sigmoid gates bi-modal.

With the previous additive form x + scalar * matmul(x, adj) the loss was satisfied even with a uniform adjacency at the L1 floor, which is why the learned graph collapsed to ~equal edges. The blended form forces the model to either pick informative parents or keep mix_j close to 0 (effectively ignoring the DAG for that channel).

process_media(X: Tensor) → Tuple[Tensor, Dict[str, Tensor]][source]: Process media variables through transformations.

apply_burn_in_stabilization(coeffs: Tensor, stable_coeff: Tensor) → Tensor[source]

Advanced burn-in stabilization with multiple transition methods.

Parameters:

coeffs – Time-varying coefficients [B, T, dim]
stable_coeff – Stable reference coefficients [dim]

Returns:

Stabilized coefficients with smooth burn-in transition

forward(Xm: Tensor, Xc: Tensor, R: Tensor) → Tuple[Tensor, Tensor, Tensor, Dict[str, Any]][source]

Forward pass through the DeepCausalMMM model.

Processes media and control variables through the neural network to generate predictions, time-varying coefficients, per-channel media contributions, and a detailed outputs dict (baseline, seasonality, control contributions, DAG, etc.).

Parameters:

Xm (torch.Tensor) – Media data tensor of shape [batch_size, time_steps, n_media] Should be SOV-scaled (Share of Voice) normalized to [0, 1] range
Xc (torch.Tensor) – Control variables tensor of shape [batch_size, time_steps, ctrl_dim] Should be standardized (z-score normalized)
R (torch.Tensor) – Region indicators tensor of shape [batch_size, time_steps] Integer values representing region/DMA IDs

Returns:

predictions (torch.Tensor) – Model predictions (scaled KPI), shape typically [batch_size, time_steps] (broadcast with components before prediction_scale).
media_coefficients (torch.Tensor) – Time-varying media coefficients, shape [batch_size, time_steps, n_media].
media_contributions (torch.Tensor) – Per-channel media contributions X_processed * media_coefficients, shape [batch_size, time_steps, n_media].
outputs (Dict[str, Any]) – Detailed tensors, including: - contributions: same as media_contributions (media channel breakdown) - coefficients: same as returned media_coefficients - control_contributions: control variable contributions [batch, time, ctrl_dim] - control_coefficients: control coefficients [batch, time, ctrl_dim] - baseline: baseline without seasonality (for waterfall-style splits) - seasonal_contribution: seasonal term [batch, time] - dag_matrix (when DAG enabled): adjacency [n_media, n_media] - adstocked_media, media_hill, media_dag (when applicable): media pipeline stages

Examples

>>> import torch
>>> model = DeepCausalMMM(n_media=3, ctrl_dim=2, n_regions=2)
>>>
>>> # Prepare input tensors
>>> media = torch.rand(2, 52, 3)  # 2 regions, 52 weeks, 3 channels
>>> control = torch.randn(2, 52, 2)  # 2 control variables
>>> regions = torch.tensor([[0]*52, [1]*52])  # Region indicators
>>>
>>> # Forward pass
>>> pred, media_coeffs, media_contrib, outputs = model(media, control, regions)
>>>
>>> # Access detailed outputs
>>> media_contrib_out = outputs['contributions']
>>> ctrl_contrib = outputs['control_contributions']
>>> dag_matrix = outputs.get('dag_matrix')

Notes

The forward pass applies the following transformations in order: 1. Media processing: Adstock -> Hill saturation -> DAG interactions 2. Feature processing: Media features + Control features -> GRU 3. Coefficient generation: Time-varying coefficients from GRU states 4. Contribution calculation: Features * Coefficients 5. Final prediction: Baseline + Seasonality + Media + Control contributions

The model enforces several constraints: - DAG acyclicity: upper-triangular mask (default) or NOTEARS penalty

when dag_mode='notears'

Non-negative baseline and seasonal contributions
Learnable coefficient bounds to prevent explosion
Burn-in period stabilization for initial weeks

h_acyclicity(W: Tensor) → Tensor[source]

NOTEARS acyclicity scalar: h(W) = tr(exp(W ⊙ W)) − d.

Equals zero iff W is the adjacency of a DAG; smooth and differentiable elsewhere. See Zheng et al., 2018 (https://arxiv.org/abs/1803.01422).

Parameters:: W – Square adjacency matrix (n_media × n_media).
Returns:: Scalar tensor; minimised toward 0 during training.

get_dag_loss() → Tensor[source]: DAG regularisation. Mode-aware: triangular uses sparsity/confidence penalties only (acyclicity is structural); NOTEARS additionally adds the augmented-Lagrangian acyclicity term 0.5·rho·h(W)² + alpha·h(W) and an L1 penalty on the full adjacency.

notears_update_duals(factor: float = 10.0, progress: float = 0.25) → Dict[str, float][source]

Augmented-Lagrangian dual update (NOTEARS outer loop).

Call once every K epochs from the trainer. Returns diagnostic dict ({“h”, “rho”, “alpha”}) for logging; returns empty dict in triangular mode.

Parameters:

factor – Multiplicative growth applied to rho when h(W) stalls.
progress – Required relative shrinkage of h between outer iterations. If h_new > progress * h_prev, rho is grown by factor.

get_dag_adjacency_matrix(eps: float | None = None) → Tensor[source]

Learned adjacency with dag_temperature and tri_mask applied.

Parameters:: eps – If None, return continuous edge weights. If a float, zero entries with |w| < eps (same rule as threshold_dag).
Returns:: Square adjacency tensor [n_media, n_media].

threshold_dag(eps: float = 0.3) → Tensor[source]

Post-training pruning: zero out adjacency entries with |w| < eps.

Returns the thresholded adjacency tensor. For NOTEARS mode this is the recommended way to obtain a clean discrete DAG from the continuous W.

get_sparsity_loss() → Tensor[source]: Sparsity loss to encourage sparse coefficients.

get_regularization_loss() → Tensor[source]: Calculate combined regularization loss including DAG penalty and coefficient regularization.

get_parameters() → Dict[str, Tensor][source]: Get model parameters for analysis.

deepcausalmmm.core.get_default_config() → Dict[str, Any][source]

Get default configuration settings for the model.

Includes DAG structure-learning keys under dag_mode:

'triangular' (default) — upper-triangular acyclicity mask
'notears' — NOTEARS augmented-Lagrangian mode; see also notears_warmup_epochs, notears_lambda1, notears_dual_*, dag_temperature, and notears_group_l1

Returns:: Dict containing all configuration parameters

deepcausalmmm.core.update_config(base_config: Dict[str, Any], updates: Dict[str, Any]) → Dict[str, Any][source]

Update base configuration with new values.

Parameters:

base_config – Base configuration dictionary
updates – Dictionary containing updates to apply

Returns:

Updated configuration dictionary

class deepcausalmmm.core.ModelTrainer(config: Dict[str, Any] | None = None)[source]

Reusable trainer class for DeepCausalMMM models.

This class provides a complete training pipeline for DeepCausalMMM models with advanced features including early stopping, learning rate scheduling, gradient clipping, and comprehensive logging. It supports both MSE and Huber loss functions with automatic device detection and mixed precision training.

Features: - Config-driven model initialization (zero hardcoding) - Automatic device detection (CPU/CUDA) - Multiple loss functions (MSE, Huber, optional Focal) - Early stopping with patience - Learning rate scheduling (StepLR, Cosine Annealing) - Gradient clipping (global and parameter-specific) - Comprehensive metrics tracking (RMSE, R², MAE) - Progress bars with detailed statistics - Holdout evaluation during training

Parameters:: config (Dict[str, Any], optional) – Configuration dictionary containing all training parameters. If None, uses default configuration from get_default_config().

model

The initialized model instance

Type:: DeepCausalMMM

optimizer

The optimizer (Adam by default)

Type:: torch.optim.Optimizer

scheduler

Learning rate scheduler if enabled

Type:: torch.optim.lr_scheduler._LRScheduler

device

Training device (CPU or CUDA)

Type:: torch.device

best_rmse

Best holdout RMSE achieved during training

Type:: float

train_losses

Training loss history

Type:: List[float]

train_rmses

Training RMSE history

Type:: List[float]

train_r2s

Training R² history

Type:: List[float]

Examples

>>> from deepcausalmmm.core.trainer import ModelTrainer
>>> from deepcausalmmm.core.config import get_default_config
>>>
>>> # Initialize trainer with custom config
>>> config = get_default_config()
>>> config['n_epochs'] = 1000
>>> config['learning_rate'] = 0.01
>>> trainer = ModelTrainer(config)
>>>
>>> # Train model (assumes processed_data is available)
>>> model, results = trainer.train(processed_data)
>>>
>>> # Access training history
>>> print(f"Final RMSE: {results['holdout_rmse']:.0f}")
>>> print(f"Final R²: {results['holdout_r2']:.3f}")

__init__(config: Dict[str, Any] | None = None)[source]

Initialize the trainer with configuration.

Parameters:: config – Configuration dictionary. If None, uses default config.

create_model(n_media: int, n_control: int, n_regions: int) → DeepCausalMMM[source]

Create and initialize model from config with reproducible initialization.

Parameters:

n_media – Number of media channels
n_control – Number of control variables
n_regions – Number of regions

Returns:

Initialized DeepCausalMMM model

create_optimizer_and_scheduler()[source]: Create optimizer and learning rate scheduler from config.

warm_start_training(X_media: Tensor, X_control: Tensor, R: Tensor, y: Tensor, verbose: bool = True) → None[source]

Perform warm-start training to stabilize GRU coefficients.

Parameters:

X_media – Media data tensor
X_control – Control data tensor
R – Region tensor
y – Target tensor
verbose – Whether to show progress

train_epoch(X_media: Tensor, X_control: Tensor, R: Tensor, y: Tensor) → Tuple[float, float, float][source]

Train for one epoch.

Parameters:

X_media – Media data tensor
X_control – Control data tensor
R – Region tensor
y – Target tensor

Returns:

Tuple of (loss, rmse, r2)

evaluate_holdout(X_media: Tensor, X_control: Tensor, R: Tensor, y: Tensor) → Tuple[float, float, float][source]

Evaluate model on holdout data.

Parameters:

X_media – Holdout media data tensor
X_control – Holdout control data tensor
R – Holdout region tensor
y – Holdout target tensor

Returns:

Tuple of (loss, rmse, r2)

should_stop_early(current_rmse: float) → bool[source]

Check if training should stop early based on RMSE improvement.

Parameters:: current_rmse – Current epoch’s RMSE
Returns:: True if training should stop

train(X_media_train: Tensor, X_control_train: Tensor, R_train: Tensor, y_train: Tensor, X_media_holdout: Tensor | None = None, X_control_holdout: Tensor | None = None, R_holdout: Tensor | None = None, y_holdout: Tensor | None = None, pipeline: Any | None = None, verbose: bool = True) → Dict[str, Any][source]

Full training loop with warm-start, main training, and holdout evaluation.

Parameters:

X_media_train – Training media data
X_control_train – Training control data
R_train – Training region data
y_train – Training target data
X_media_holdout – Optional holdout media data
X_control_holdout – Optional holdout control data
R_holdout – Optional holdout region data
y_holdout – Optional holdout target data
pipeline – Optional UnifiedDataPipeline for accessing scaler
verbose – Whether to show progress

Returns:

Dictionary with training results

class deepcausalmmm.core.InferenceManager(model: DeepCausalMMM, pipeline: UnifiedDataPipeline | None = None, scaler: SimpleGlobalScaler | None = None, config: Dict[str, Any] | None = None, channel_names: List[str] | None = None, control_names: List[str] | None = None)[source]

Modern class-based interface for DeepCausalMMM model inference.

Handles: - Model predictions on new data - Contribution analysis (media, control, baseline) - Coefficient extraction - Data preprocessing for inference - Inverse transformations for interpretable results

Initialize the inference manager.

Parameters:

model – Trained DeepCausalMMM model
pipeline – UnifiedDataPipeline used for training (preferred)
scaler – SimpleGlobalScaler used for training (legacy support)
config – Configuration dictionary
channel_names – List of media channel names
control_names – List of control variable names

predict(X_media: ndarray, X_control: ndarray, return_contributions: bool = True, remove_padding: bool = True, return_media_coefficients: bool = False) → Dict[str, ndarray][source]

Make predictions on new data.

Parameters:

X_media – Media data [n_regions, n_weeks, n_media_channels]
X_control – Control data [n_regions, n_weeks, n_control_vars]
return_contributions – Whether to return contribution breakdowns
remove_padding – Whether to remove burn-in padding from results
return_media_coefficients – If True, include time-varying media coefficients (second tensor from forward()) as media_coefficients.

Returns:

Dictionary containing predictions and optionally contributions

predict_and_inverse_transform(X_media: ndarray, X_control: ndarray, return_contributions: bool = True) → Dict[str, ndarray][source]

Make predictions and apply inverse transformations for interpretable results.

Parameters:

X_media – Media data [n_regions, n_weeks, n_media_channels]
X_control – Control data [n_regions, n_weeks, n_control_vars]
return_contributions – Whether to return contribution breakdowns

Returns:

Dictionary containing predictions and contributions in original scale

get_coefficients() → Dict[str, ndarray][source]

Extract model coefficients.

Returns:: Dictionary containing media and control coefficients

get_dag_adjacency(threshold: bool = False, eps: float | None = None) → ndarray | None[source]

Extract DAG adjacency matrix if available.

Uses the same mask + dag_temperature scaling as training. Set threshold=True (or pass eps) to prune weak edges via notears_threshold from config by default.

Parameters:

threshold – If True, zero entries below eps.
eps – Pruning cutoff; defaults to config['notears_threshold'].

Returns:

Adjacency matrix or None if DAG is not enabled

analyze_contributions(X_media: ndarray, X_control: ndarray, aggregate_regions: bool = True, aggregate_time: bool = False) → Dict[str, Any][source]

Comprehensive contribution analysis.

Parameters:

X_media – Media data
X_control – Control data
aggregate_regions – Whether to aggregate across regions
aggregate_time – Whether to aggregate across time

Returns:

Dictionary with detailed contribution analysis

class deepcausalmmm.core.VisualizationManager(config: Dict[str, Any] | None = None)[source]

Visualization manager for creating consistent plots in DeepCausalMMM analysis.

Provides a unified interface for creating training progress, coefficient analysis, contribution plots, DAG visualizations, and other MMM-related charts. All plot parameters are driven by configuration for consistency.

Parameters:: config (Dict[str, Any], optional) – Configuration dictionary. If None, uses default configuration.

__init__(config: Dict[str, Any] | None = None)[source]

Initialize the visualization manager.

Parameters:: config – Configuration dictionary. If None, uses default config.

create_training_progress_plot(train_losses: List[float], train_rmses: List[float], train_r2s: List[float], title: str = 'Training Progress') → Figure[source]

Create a training progress plot with loss, RMSE, and R².

Parameters:

train_losses – Training losses over epochs
train_rmses – Training RMSEs over epochs
train_r2s – Training R² scores over epochs
title – Plot title

Returns:

Plotly figure

create_actual_vs_predicted_plot(y_actual: ndarray, y_predicted: ndarray, title: str = 'Actual vs Predicted', weeks: List[int] | None = None) → Figure[source]

Create an actual vs predicted time series plot.

Parameters:

y_actual – Actual values
y_predicted – Predicted values
title – Plot title
weeks – Optional week indices for x-axis

Returns:

Plotly figure

create_scatter_plot(x: ndarray, y: ndarray, title: str = 'Scatter Plot', x_label: str = 'X', y_label: str = 'Y', color: str = 'blue') → Figure[source]

Create a scatter plot with perfect correlation line.

Parameters:

x – X values
y – Y values
title – Plot title
x_label – X-axis label
y_label – Y-axis label
color – Marker color

Returns:

Plotly figure

create_waterfall_chart(categories: List[str], values: List[float], title: str = 'Waterfall Chart') → Figure[source]

Create a proper waterfall chart using Plotly’s go.Waterfall.

Parameters:

categories – Category names
values – Values for each category
title – Chart title

Returns:

Plotly figure

create_contribution_stacked_bar(media_contributions: ndarray, control_contributions: ndarray, baseline: ndarray, media_names: List[str], control_names: List[str], weeks: List[int] | None = None, title: str = 'Contributions Over Time') → Figure[source]

Create a stacked bar chart of contributions over time.

Parameters:

media_contributions – Media contributions [n_weeks, n_media]
control_contributions – Control contributions [n_weeks, n_controls]
baseline – Baseline values [n_weeks]
media_names – Media channel names
control_names – Control variable names
weeks – Optional week indices
title – Chart title

Returns:

Plotly figure

create_dag_network_plot(adjacency_matrix: ndarray, node_names: List[str], title: str = 'DAG Network') → Figure[source]

Create a DAG network visualization.

Parameters:

adjacency_matrix – Adjacency matrix [n_nodes, n_nodes]
node_names – Node names
title – Plot title

Returns:

Plotly figure

create_dag_heatmap(adjacency_matrix: ndarray, node_names: List[str], title: str = 'DAG Adjacency Matrix') → Figure[source]

Create a DAG adjacency matrix heatmap.

Parameters:

adjacency_matrix – Adjacency matrix [n_nodes, n_nodes]
node_names – Node names
title – Plot title

Returns:

Plotly figure

save_plot(fig: Figure, filepath: str, include_plotlyjs: str = 'cdn') → bool[source]

Save a Plotly figure to HTML file.

Parameters:

fig – Plotly figure to save
filepath – Output file path
include_plotlyjs – How to include Plotly.js (‘cdn’, ‘inline’, etc.)

Returns:

True if successful, False otherwise

create_comprehensive_dashboard(results: Dict[str, Any], output_dir: str = 'dashboard_comprehensive') → List[Tuple[str, str]][source]

Create a comprehensive dashboard with multiple plots.

Parameters:

results – Training results dictionary
output_dir – Output directory for plots

Returns:

List of (plot_name, filepath) tuples for created plots

class deepcausalmmm.core.UnifiedDataPipeline(config: Dict[str, Any])[source]

Unified data processing pipeline for DeepCausalMMM models.

This pipeline ensures consistent data transformations between training and holdout datasets, implementing the complete preprocessing workflow required for MMM analysis. It handles temporal splitting, multi-scale normalization, seasonal decomposition, and tensor preparation for PyTorch models.

Key Features: - Temporal train/holdout splitting (respects time series nature) - SOV (Share of Voice) scaling for media channels - Z-score normalization for control variables - Min-Max scaling for seasonal components (per region) - Burn-in padding for GRU stabilization - Automatic tensor conversion and device handling - Inverse transformation utilities for interpretation - Region encoding and validation

The pipeline maintains data integrity by: - Using the same scaler fit on training data for holdout - Preserving temporal order in all transformations - Handling missing values and outliers appropriately - Ensuring consistent tensor shapes across regions

Parameters:: config (Dict[str, Any]) – Configuration dictionary containing: - ‘holdout_ratio’: Fraction of data for holdout (default 0.08) - ‘burn_in_weeks’: Number of weeks for padding (default 6) - ‘random_seed’: Seed for reproducible operations (default 42) - Media channel names, control variable names, etc.

scaler

Fitted scaler for consistent transformations

Type:: SimpleGlobalScaler

seasonal_detector

Seasonal decomposition utility

Type:: DetectSeasonality

media_columns

Names of media channel columns

Type:: List[str]

control_columns

Names of control variable columns

Type:: List[str]

region_column

Name of region identifier column

Type:: str

target_column

Name of target variable column

Type:: str

Examples

>>> import pandas as pd
>>> from deepcausalmmm.core.data import UnifiedDataPipeline
>>> from deepcausalmmm.core.config import get_default_config
>>>
>>> # Load your MMM dataset
>>> df = pd.read_csv('mmm_data.csv')
>>> config = get_default_config()
>>>
>>> # Initialize and fit pipeline
>>> pipeline = UnifiedDataPipeline(config)
>>> processed_data = pipeline.fit_transform(df)
>>>
>>> # Access processed tensors
>>> X_media_train = processed_data['X_media_train']
>>> y_train = processed_data['y_train']
>>>
>>> # Get holdout data
>>> X_media_holdout = processed_data['X_media_holdout']
>>> y_holdout = processed_data['y_holdout']
>>>
>>> print(f"Training shape: {X_media_train.shape}")
>>> print(f"Holdout shape: {X_media_holdout.shape}")

__init__(config: Dict[str, Any])[source]

Initialize the unified data pipeline.

Parameters:: config – Configuration dictionary with all parameters

temporal_split(X_media: ndarray, X_control: ndarray, y: ndarray, holdout_ratio: float | None = None) → Tuple[Dict[str, ndarray], Dict[str, ndarray]][source]

Perform time series split of data using ratio-based approach. This ensures adequate holdout data regardless of burn-in weeks.

Parameters:

X_media – Media data [regions, weeks, channels]
X_control – Control data [regions, weeks, controls]
y – Target data [regions, weeks]
holdout_ratio – Fraction of data for holdout (uses config if None)

Returns:

Tuple of (train_data_dict, holdout_data_dict)

fit_and_transform_training(train_data: Dict[str, ndarray]) → Dict[str, Tensor][source]

Fit scaler on training data and transform it.

Parameters:: train_data – Dictionary with training data arrays
Returns:: Dictionary with transformed and padded tensors

transform_holdout(holdout_data: Dict[str, ndarray]) → Dict[str, Tensor][source]

Transform holdout data using the fitted scaler (same transformations as training).

Parameters:: holdout_data – Dictionary with holdout data arrays
Returns:: Dictionary with transformed and padded tensors

inverse_transform_predictions(y_pred_scaled: Tensor, remove_padding: bool = True) → Tensor[source]

Inverse transform predictions to original scale.

Parameters:

y_pred_scaled – Predictions in scaled space
remove_padding – Whether to remove padding weeks

Returns:

Predictions in original scale

get_evaluation_data(y_true_padded: Tensor, y_pred_padded: Tensor) → Tuple[Tensor, Tensor][source]

Extract evaluation data (removing burn-in padding).

Parameters:

y_true_padded – True values with padding
y_pred_padded – Predicted values with padding

Returns:

Tuple of (y_true_eval, y_pred_eval) without padding

inverse_transform_contributions(media_contributions: Tensor, y_true: Tensor) → Tensor[source]

Inverse transform media contributions to original scale.

Parameters:

media_contributions – Media contributions in scaled space
y_true – True values in original scale (for scaling reference)

Returns:

Media contributions in original scale

get_scaler() → SimpleGlobalScaler[source]

Get the fitted scaler for external use.

Returns:: Fitted SimpleGlobalScaler instance

predict_and_postprocess(model, X_media: ndarray, X_control: ndarray, channel_names: List[str], control_names: List[str], combine_with_holdout: bool = True) → Dict[str, Any][source]

Generate predictions and contributions using the unified pipeline.

Parameters:

model – Trained model
X_media – Media data (full dataset for contributions)
X_control – Control data (full dataset for contributions)
channel_names – Media channel names
control_names – Control variable names
combine_with_holdout – Whether to combine train+holdout for contributions

Returns:

Dictionary with predictions, contributions, and metadata

calculate_metrics(y_true: Tensor, y_pred: Tensor, prefix: str = '') → Dict[str, float][source]

Calculate comprehensive metrics for model evaluation.

Parameters:

y_true – True values
y_pred – Predicted values
prefix – Prefix for metric names (e.g., ‘train_’, ‘holdout_’)

Returns:

Dictionary of metrics

get_processed_full_data()[source]

Get the processed full dataset (train + holdout) with all transformations applied. This includes seasonality features, scaling, and padding - exactly as the model expects.

Returns:: Dictionary containing processed X_media and X_control tensors

class deepcausalmmm.core.SimpleGlobalScaler(config: Dict[str, Any] | None = None)[source]

Linear scaling approach (y/y_mean) for additive attribution.

Scaling features: - Media: Share-of-voice scaling with outlier smoothing - Control: Robust standardization with adaptive clipping - Target: Linear scaling by region mean (y/y_mean) for additive decomposition - Adaptive normalization with distribution-aware clipping - Advanced outlier handling for extreme value stability

__init__(config: Dict[str, Any] | None = None)[source]: Initialize the scaler with optional config parameters.

fit(X_media: ndarray, X_control: ndarray, y: ndarray) → None[source]

Fit the scaler using simple global statistics.

Parameters:

X_media – Media variables [n_regions, n_timesteps, n_channels]
X_control – Control variables [n_regions, n_timesteps, n_controls]
y – Target variable [n_regions, n_timesteps]

transform(X_media: ndarray, X_control: ndarray, y: ndarray) → Tuple[Tensor, Tensor, Tensor][source]

Transform data using fitted parameters.

Parameters:

X_media – Media variables [n_regions, n_timesteps, n_channels]
X_control – Control variables [n_regions, n_timesteps, n_controls]
y – Target variable [n_regions, n_timesteps]

Returns:

Tuple of (X_media_scaled, X_control_scaled, y_scaled)

inverse_transform_target(y_scaled: Tensor) → Tensor[source]

Inverse transform target variable.

Parameters:: y_scaled – Scaled target [n_regions, n_timesteps]
Returns:: Original scale target

inverse_transform_contributions(media_contributions: Tensor, baseline: Tensor = None, control_contributions: Tensor = None, seasonal_contributions: Tensor = None, trend_contributions: Tensor = None, prediction_scale: Tensor = None) → dict[source]

Inverse transform ALL contributions to original scale using simple multiplication.

With linear scaling (y/y_mean), the inverse transform is straightforward: component_orig = component_scaled * prediction_scale * y_mean_per_region

This preserves additivity: sum(components_orig) = prediction_orig

Parameters:

media_contributions – Media contributions in scaled space [regions, timesteps, channels]
baseline – Baseline in scaled space [regions, timesteps]
control_contributions – Control contributions in scaled space [regions, timesteps, controls]
seasonal_contributions – Seasonal contributions in scaled space [regions, timesteps]
trend_contributions – Trend contributions in scaled space [regions, timesteps]
prediction_scale – Model’s prediction_scale factor (from F.softplus(self.prediction_scale))

Returns:

Dictionary with all contributions in original scale

fit_transform(X_media: ndarray, X_control: ndarray, y: ndarray) → Tuple[Tensor, Tensor, Tensor][source]

Fit the scaler and transform data in one step.

Parameters:

X_media – Media variables [n_regions, n_timesteps, n_channels]
X_control – Control variables [n_regions, n_timesteps, n_controls]
y – Target variable [n_regions, n_timesteps]

Returns:

Tuple of (X_media_scaled, X_control_scaled, y_scaled)

deepcausalmmm.core.GlobalScaler: alias of SimpleGlobalScaler

class deepcausalmmm.core.NodeToEdge(node_dim: int, edge_dim: int)[source]

Transform node features to edge features using attention mechanism.

__init__(node_dim: int, edge_dim: int)[source]

Initialize the node to edge transformation.

Parameters:

node_dim – Dimension of node features
edge_dim – Dimension of edge features

forward(nodes: Tensor, adj_matrix: Tensor) → Tensor[source]

Transform node features to edge features.

Parameters:

nodes – Node features [batch_size, n_nodes, 1]
adj_matrix – Adjacency matrix [n_nodes, n_nodes]

Returns:

Edge features [batch_size, n_nodes, n_nodes, edge_dim]

class deepcausalmmm.core.EdgeToNode(edge_dim: int, node_dim: int)[source]

Aggregate edge features back to nodes.

__init__(edge_dim: int, node_dim: int)[source]

Initialize the edge to node transformation.

Parameters:

edge_dim – Dimension of edge features
node_dim – Dimension of node features

forward(edges: Tensor, nodes: Tensor, adj_matrix: Tensor) → Tensor[source]

Aggregate edge features to update node features.

Parameters:

edges – Edge features [batch_size, n_nodes, n_nodes, edge_dim]
nodes – Node features [batch_size, n_nodes, 1]
adj_matrix – Adjacency matrix [n_nodes, n_nodes]

Returns:

Updated node features [batch_size, n_nodes, 1]

class deepcausalmmm.core.DAGConstraint(n_nodes: int, sparsity_weight: float = 0.1, temperature: float = 1.0)[source]

Enforce acyclicity in the graph structure using strict triangular constraint.

__init__(n_nodes: int, sparsity_weight: float = 0.1, temperature: float = 1.0)[source]

Initialize the DAG constraint module.

Parameters:

n_nodes – Number of nodes in the graph
sparsity_weight – Weight for the sparsity penalty
temperature – Initial temperature for Gumbel-Softmax

gumbel_softmax(logits: Tensor, tau: float) → Tensor[source]

Gumbel-Softmax sampling with straight-through gradients.

Parameters:

logits – Input logits
tau – Temperature parameter

Returns:

Sampled probabilities

get_adjacency() → Tensor[source]: Get the current adjacency matrix using Gumbel-Softmax sampling. This enforces unidirectional edges and allows learning discrete structure.

update_temperature(epoch: int, total_epochs: int, min_temp: float = 0.1)[source]

Update temperature using exponential decay schedule.

Parameters:

epoch – Current epoch
total_epochs – Total number of epochs
min_temp – Minimum temperature

dag_loss() → Tensor[source]

Compute the DAG constraint loss with sparsity penalty. With strictly upper triangular form, we only need sparsity penalty as acyclicity is guaranteed by construction.

Returns:: Loss term combining sparsity and entropy

deepcausalmmm.core.train_mmm(*args, **kwargs)[source]: Deprecated since version 1.0.0: train_mmm() is deprecated. Use ModelTrainer class instead.

Modules

`config`	Configuration settings for DeepCausalMMM model.
`dag_model`	DAG model implementation with Node-to-Edge and Edge-to-Node transformations.
`data`	Data preprocessing and loading utilities for DeepCausalMMM.
`inference`	Modern InferenceManager class for DeepCausalMMM model inference.
`scaling`	Simple, proven scaling implementation that works reliably.
`seasonality`
`train_model`	Training functions for DeepCausalMMM models.
`trainer`	Reusable ModelTrainer class for DeepCausalMMM training.
`unified_model`	DeepCausalMMM model implementation combining GRU, DAG, and interaction components.
`visualization`	Reusable VisualizationManager class for creating consistent plots.