deepcausalmmm.core
Core components of DeepCausalMMM.
Functions
|
- class deepcausalmmm.core.DeepCausalMMM(n_media: int = 10, ctrl_dim: int = 15, hidden: int = 32, n_regions: int = 2, dropout: float = 0.1, sparsity_weight: float = 0.01, enable_dag: bool = True, enable_interactions: bool = True, l1_weight: float = 0.001, l2_weight: float = 0.001, burn_in_weeks: int = 4, use_coefficient_momentum: bool = True, momentum_decay: float = 0.9, use_warm_start: bool = True, warm_start_epochs: int = 50, stabilization_method: str = 'exponential', coeff_l2_weight: float = 0.1, coeff_gen_l2_weight: float = 0.05, gru_layers: int = 1, ctrl_hidden_ratio: float = 0.5, dag_mode: str = 'triangular', notears_lambda1: float = 0.01, notears_rho_init: float = 1.0, notears_alpha_init: float = 0.0, notears_rho_max: float = 1e+16, dag_temperature: float = 1.0, notears_group_l1: float = 0.0)[source]
Deep Causal Marketing Mix Model with DAG structure and channel interactions.
This model combines deep learning with causal inference to understand the impact of marketing channels on business KPIs while learning causal relationships between channels through a Directed Acyclic Graph (DAG).
The model features: - GRU-based temporal modeling for time-varying coefficients - Learnable coefficient bounds for realistic attribution - DAG learning for causal channel interactions (triangular mask or opt-in NOTEARS) - Adstock and saturation transformations - Multi-region support with shared and region-specific parameters - Zero hardcoding philosophy - all parameters are learnable or configurable
- Parameters:
n_media (int, default=10) – Number of media channels in the dataset
ctrl_dim (int, default=15) – Number of control variables (weather, events, etc.)
hidden (int, default=32) – Hidden dimension size for GRU and MLP layers
n_regions (int, default=2) – Number of geographic regions or DMAs
dropout (float, default=0.1) – Dropout rate for regularization during training
sparsity_weight (float, default=0.01) – Weight for sparsity regularization on coefficients
enable_dag (bool, default=True) – Whether to enable DAG learning for channel interactions
enable_interactions (bool, default=True) – Whether to enable channel interaction modeling
l1_weight (float, default=0.001) – L1 regularization weight for coefficient sparsity
l2_weight (float, default=0.001) – L2 regularization weight for coefficient smoothness
burn_in_weeks (int, default=4) – Number of initial weeks for GRU stabilization
use_coefficient_momentum (bool, default=True) – Whether to use momentum for coefficient stabilization
momentum_decay (float, default=0.9) – Decay rate for coefficient momentum
use_warm_start (bool, default=True) – Whether to use warm start training initialization
warm_start_epochs (int, default=50) – Number of epochs for warm start phase
stabilization_method (str, default="exponential") – Method for coefficient stabilization (“linear”, “exponential”, “sigmoid”)
coeff_l2_weight (float, default=0.1) – L2 regularization specifically for media coefficients
coeff_gen_l2_weight (float, default=0.05) – L2 regularization for coefficient generation layers
gru_layers (int, default=1) – Number of GRU layers (configured, not hardcoded)
ctrl_hidden_ratio (float, default=0.5) – Control hidden size as ratio of main hidden dimension
dag_mode (str, default="triangular") – DAG acyclicity mode:
"triangular"(upper-triangular mask, default) or"notears"(continuous NOTEARS penalty; Zheng et al., 2018)notears_lambda1 (float, default=0.01) – L1 sparsity on the full adjacency in NOTEARS mode
notears_rho_init (float, default=1.0) – Initial augmented-Lagrangian penalty
rhonotears_alpha_init (float, default=0.0) – Initial dual variable
alphafor NOTEARSnotears_rho_max (float, default=1e16) – Upper cap on
rhofor numerical safetydag_temperature (float, default=1.0) – Sigmoid temperature for DAG edge weights (
< 1sharpens toward {0, 1})notears_group_l1 (float, default=0.0) – Column-group L1 over adjacency columns (NOTEARS only)
- media_coeffs
Time-varying coefficients for media channels
- Type:
torch.nn.Parameter
- ctrl_coeffs
Coefficients for control variables
- Type:
torch.nn.Parameter
- dag_matrix
Learnable DAG adjacency matrix for channel interactions
- Type:
torch.nn.Parameter
- region_baseline
Region-specific baseline contributions
- Type:
torch.nn.Parameter
- seasonal_coeff
Learnable coefficient for seasonal component
- Type:
torch.nn.Parameter
Examples
>>> import torch >>> from deepcausalmmm import DeepCausalMMM >>> >>> # Initialize model >>> model = DeepCausalMMM( ... n_media=5, ... ctrl_dim=3, ... n_regions=2, ... hidden=64 ... ) >>> >>> # Prepare data tensors >>> media_data = torch.randn(2, 104, 5) # [regions, weeks, channels] >>> control_data = torch.randn(2, 104, 3) # [regions, weeks, controls] >>> regions = torch.arange(2).unsqueeze(1).repeat(1, 104) >>> >>> # Forward pass >>> predictions, media_coeffs, media_contributions, outputs = model( ... media_data, control_data, regions ... ) >>> >>> print(f"Predictions shape: {predictions.shape}") >>> print(f"Media contributions: {outputs['contributions'].shape}")
- __init__(n_media: int = 10, ctrl_dim: int = 15, hidden: int = 32, n_regions: int = 2, dropout: float = 0.1, sparsity_weight: float = 0.01, enable_dag: bool = True, enable_interactions: bool = True, l1_weight: float = 0.001, l2_weight: float = 0.001, burn_in_weeks: int = 4, use_coefficient_momentum: bool = True, momentum_decay: float = 0.9, use_warm_start: bool = True, warm_start_epochs: int = 50, stabilization_method: str = 'exponential', coeff_l2_weight: float = 0.1, coeff_gen_l2_weight: float = 0.05, gru_layers: int = 1, ctrl_hidden_ratio: float = 0.5, dag_mode: str = 'triangular', notears_lambda1: float = 0.01, notears_rho_init: float = 1.0, notears_alpha_init: float = 0.0, notears_rho_max: float = 1e+16, dag_temperature: float = 1.0, notears_group_l1: float = 0.0)[source]
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- initialize_baseline(y_data: Tensor)[source]
Initialize baseline to match target data statistics.
CRITICAL: y_data is ALREADY in scaled space (y/y_mean per region)! Extract ALL parameters directly from the actual data distribution. IMPORTANT: Skip padding weeks to avoid baseline bias!
- initialize_hill_from_data(Xm: Tensor)[source]
Initialize hill_g based on per-channel SOV (Share of Voice) distribution.
For each channel, hill_g is set to the 60th percentile of its SOV values, ensuring the inflection point matches where the channel typically operates.
- Parameters:
Xm – Media data [n_regions, n_timesteps, n_channels] in SOV-scaled space [0, 1]
- initialize_stable_coefficients_from_data(Xm: Tensor, Xc: Tensor, y: Tensor)[source]
Initialize stable coefficients based on simple linear regression on the data. This provides domain-informed starting points for coefficient stabilization.
- warm_start_training(Xm: Tensor, Xc: Tensor, R: Tensor, y: Tensor, optimizer: Optimizer, epochs: int = None)[source]
Warm-start training phase to stabilize GRU coefficients before main training. Uses only stable coefficients and focuses on learning good hidden state initialization.
- dag_interaction(x: Tensor) Tensor[source]
Load-bearing DAG-driven channel interactions.
Each target channel j’s effective input becomes a learned blend of itself and a DAG-driven aggregation of its causal parents:
parents_j = sum_i adj[i, j] * x_i (column-j of adj * x) x_j_new = (1 - mix_j) * x_j + mix_j * parents_j
Two design choices make NOTEARS actually structure-learning here rather than decorative:
mix is a per-channel learnable scalar in (0, 1). The model can turn the DAG up where it helps prediction (strong causal parents) and down where it doesn’t, so each adj column receives genuine per-channel gradient signal rather than a globally-averaged one.
adj uses a temperature-controlled sigmoid (
dag_temperature), so the model is encouraged toward {near-0, near-1} edges instead of a soft cluster around 0.5 / sigmoid floor — this is the standard trick for making sigmoid gates bi-modal.
With the previous additive form
x + scalar * matmul(x, adj)the loss was satisfied even with a uniform adjacency at the L1 floor, which is why the learned graph collapsed to ~equal edges. The blended form forces the model to either pick informative parents or keepmix_jclose to 0 (effectively ignoring the DAG for that channel).
- process_media(X: Tensor) Tuple[Tensor, Dict[str, Tensor]][source]
Process media variables through transformations.
- apply_burn_in_stabilization(coeffs: Tensor, stable_coeff: Tensor) Tensor[source]
Advanced burn-in stabilization with multiple transition methods.
- Parameters:
coeffs – Time-varying coefficients [B, T, dim]
stable_coeff – Stable reference coefficients [dim]
- Returns:
Stabilized coefficients with smooth burn-in transition
- forward(Xm: Tensor, Xc: Tensor, R: Tensor) Tuple[Tensor, Tensor, Tensor, Dict[str, Any]][source]
Forward pass through the DeepCausalMMM model.
Processes media and control variables through the neural network to generate predictions, time-varying coefficients, per-channel media contributions, and a detailed
outputsdict (baseline, seasonality, control contributions, DAG, etc.).- Parameters:
Xm (torch.Tensor) – Media data tensor of shape [batch_size, time_steps, n_media] Should be SOV-scaled (Share of Voice) normalized to [0, 1] range
Xc (torch.Tensor) – Control variables tensor of shape [batch_size, time_steps, ctrl_dim] Should be standardized (z-score normalized)
R (torch.Tensor) – Region indicators tensor of shape [batch_size, time_steps] Integer values representing region/DMA IDs
- Returns:
predictions (torch.Tensor) – Model predictions (scaled KPI), shape typically
[batch_size, time_steps](broadcast with components beforeprediction_scale).media_coefficients (torch.Tensor) – Time-varying media coefficients, shape
[batch_size, time_steps, n_media].media_contributions (torch.Tensor) – Per-channel media contributions
X_processed * media_coefficients, shape[batch_size, time_steps, n_media].outputs (Dict[str, Any]) – Detailed tensors, including: -
contributions: same asmedia_contributions(media channel breakdown) -coefficients: same as returnedmedia_coefficients-control_contributions: control variable contributions[batch, time, ctrl_dim]-control_coefficients: control coefficients[batch, time, ctrl_dim]-baseline: baseline without seasonality (for waterfall-style splits) -seasonal_contribution: seasonal term[batch, time]-dag_matrix(when DAG enabled): adjacency[n_media, n_media]-adstocked_media,media_hill,media_dag(when applicable): media pipeline stages
Examples
>>> import torch >>> model = DeepCausalMMM(n_media=3, ctrl_dim=2, n_regions=2) >>> >>> # Prepare input tensors >>> media = torch.rand(2, 52, 3) # 2 regions, 52 weeks, 3 channels >>> control = torch.randn(2, 52, 2) # 2 control variables >>> regions = torch.tensor([[0]*52, [1]*52]) # Region indicators >>> >>> # Forward pass >>> pred, media_coeffs, media_contrib, outputs = model(media, control, regions) >>> >>> # Access detailed outputs >>> media_contrib_out = outputs['contributions'] >>> ctrl_contrib = outputs['control_contributions'] >>> dag_matrix = outputs.get('dag_matrix')
Notes
The forward pass applies the following transformations in order: 1. Media processing: Adstock -> Hill saturation -> DAG interactions 2. Feature processing: Media features + Control features -> GRU 3. Coefficient generation: Time-varying coefficients from GRU states 4. Contribution calculation: Features * Coefficients 5. Final prediction: Baseline + Seasonality + Media + Control contributions
The model enforces several constraints: - DAG acyclicity: upper-triangular mask (default) or NOTEARS penalty
when
dag_mode='notears'Non-negative baseline and seasonal contributions
Learnable coefficient bounds to prevent explosion
Burn-in period stabilization for initial weeks
- h_acyclicity(W: Tensor) Tensor[source]
NOTEARS acyclicity scalar: h(W) = tr(exp(W ⊙ W)) − d.
Equals zero iff W is the adjacency of a DAG; smooth and differentiable elsewhere. See Zheng et al., 2018 (https://arxiv.org/abs/1803.01422).
- Parameters:
W – Square adjacency matrix (n_media × n_media).
- Returns:
Scalar tensor; minimised toward 0 during training.
- get_dag_loss() Tensor[source]
DAG regularisation. Mode-aware: triangular uses sparsity/confidence penalties only (acyclicity is structural); NOTEARS additionally adds the augmented-Lagrangian acyclicity term 0.5·rho·h(W)² + alpha·h(W) and an L1 penalty on the full adjacency.
- notears_update_duals(factor: float = 10.0, progress: float = 0.25) Dict[str, float][source]
Augmented-Lagrangian dual update (NOTEARS outer loop).
Call once every K epochs from the trainer. Returns diagnostic dict ({“h”, “rho”, “alpha”}) for logging; returns empty dict in triangular mode.
- Parameters:
factor – Multiplicative growth applied to rho when h(W) stalls.
progress – Required relative shrinkage of h between outer iterations. If h_new > progress * h_prev, rho is grown by factor.
- get_dag_adjacency_matrix(eps: float | None = None) Tensor[source]
Learned adjacency with
dag_temperatureandtri_maskapplied.- Parameters:
eps – If
None, return continuous edge weights. If a float, zero entries with|w| < eps(same rule asthreshold_dag).- Returns:
Square adjacency tensor
[n_media, n_media].
- threshold_dag(eps: float = 0.3) Tensor[source]
Post-training pruning: zero out adjacency entries with |w| < eps.
Returns the thresholded adjacency tensor. For NOTEARS mode this is the recommended way to obtain a clean discrete DAG from the continuous W.
- deepcausalmmm.core.get_default_config() Dict[str, Any][source]
Get default configuration settings for the model.
Includes DAG structure-learning keys under
dag_mode:'triangular'(default) — upper-triangular acyclicity mask'notears'— NOTEARS augmented-Lagrangian mode; see alsonotears_warmup_epochs,notears_lambda1,notears_dual_*,dag_temperature, andnotears_group_l1
- Returns:
Dict containing all configuration parameters
- deepcausalmmm.core.update_config(base_config: Dict[str, Any], updates: Dict[str, Any]) Dict[str, Any][source]
Update base configuration with new values.
- Parameters:
base_config – Base configuration dictionary
updates – Dictionary containing updates to apply
- Returns:
Updated configuration dictionary
- class deepcausalmmm.core.ModelTrainer(config: Dict[str, Any] | None = None)[source]
Reusable trainer class for DeepCausalMMM models.
This class provides a complete training pipeline for DeepCausalMMM models with advanced features including early stopping, learning rate scheduling, gradient clipping, and comprehensive logging. It supports both MSE and Huber loss functions with automatic device detection and mixed precision training.
Features: - Config-driven model initialization (zero hardcoding) - Automatic device detection (CPU/CUDA) - Multiple loss functions (MSE, Huber, optional Focal) - Early stopping with patience - Learning rate scheduling (StepLR, Cosine Annealing) - Gradient clipping (global and parameter-specific) - Comprehensive metrics tracking (RMSE, R², MAE) - Progress bars with detailed statistics - Holdout evaluation during training
- Parameters:
config (Dict[str, Any], optional) – Configuration dictionary containing all training parameters. If None, uses default configuration from get_default_config().
- model
The initialized model instance
- Type:
- optimizer
The optimizer (Adam by default)
- Type:
- scheduler
Learning rate scheduler if enabled
- Type:
torch.optim.lr_scheduler._LRScheduler
- device
Training device (CPU or CUDA)
- Type:
Examples
>>> from deepcausalmmm.core.trainer import ModelTrainer >>> from deepcausalmmm.core.config import get_default_config >>> >>> # Initialize trainer with custom config >>> config = get_default_config() >>> config['n_epochs'] = 1000 >>> config['learning_rate'] = 0.01 >>> trainer = ModelTrainer(config) >>> >>> # Train model (assumes processed_data is available) >>> model, results = trainer.train(processed_data) >>> >>> # Access training history >>> print(f"Final RMSE: {results['holdout_rmse']:.0f}") >>> print(f"Final R²: {results['holdout_r2']:.3f}")
- __init__(config: Dict[str, Any] | None = None)[source]
Initialize the trainer with configuration.
- Parameters:
config – Configuration dictionary. If None, uses default config.
- create_model(n_media: int, n_control: int, n_regions: int) DeepCausalMMM[source]
Create and initialize model from config with reproducible initialization.
- Parameters:
n_media – Number of media channels
n_control – Number of control variables
n_regions – Number of regions
- Returns:
Initialized DeepCausalMMM model
- warm_start_training(X_media: Tensor, X_control: Tensor, R: Tensor, y: Tensor, verbose: bool = True) None[source]
Perform warm-start training to stabilize GRU coefficients.
- Parameters:
X_media – Media data tensor
X_control – Control data tensor
R – Region tensor
y – Target tensor
verbose – Whether to show progress
- train_epoch(X_media: Tensor, X_control: Tensor, R: Tensor, y: Tensor) Tuple[float, float, float][source]
Train for one epoch.
- Parameters:
X_media – Media data tensor
X_control – Control data tensor
R – Region tensor
y – Target tensor
- Returns:
Tuple of (loss, rmse, r2)
- evaluate_holdout(X_media: Tensor, X_control: Tensor, R: Tensor, y: Tensor) Tuple[float, float, float][source]
Evaluate model on holdout data.
- Parameters:
X_media – Holdout media data tensor
X_control – Holdout control data tensor
R – Holdout region tensor
y – Holdout target tensor
- Returns:
Tuple of (loss, rmse, r2)
- should_stop_early(current_rmse: float) bool[source]
Check if training should stop early based on RMSE improvement.
- Parameters:
current_rmse – Current epoch’s RMSE
- Returns:
True if training should stop
- train(X_media_train: Tensor, X_control_train: Tensor, R_train: Tensor, y_train: Tensor, X_media_holdout: Tensor | None = None, X_control_holdout: Tensor | None = None, R_holdout: Tensor | None = None, y_holdout: Tensor | None = None, pipeline: Any | None = None, verbose: bool = True) Dict[str, Any][source]
Full training loop with warm-start, main training, and holdout evaluation.
- Parameters:
X_media_train – Training media data
X_control_train – Training control data
R_train – Training region data
y_train – Training target data
X_media_holdout – Optional holdout media data
X_control_holdout – Optional holdout control data
R_holdout – Optional holdout region data
y_holdout – Optional holdout target data
pipeline – Optional UnifiedDataPipeline for accessing scaler
verbose – Whether to show progress
- Returns:
Dictionary with training results
- class deepcausalmmm.core.InferenceManager(model: DeepCausalMMM, pipeline: UnifiedDataPipeline | None = None, scaler: SimpleGlobalScaler | None = None, config: Dict[str, Any] | None = None, channel_names: List[str] | None = None, control_names: List[str] | None = None)[source]
Modern class-based interface for DeepCausalMMM model inference.
Handles: - Model predictions on new data - Contribution analysis (media, control, baseline) - Coefficient extraction - Data preprocessing for inference - Inverse transformations for interpretable results
- __init__(model: DeepCausalMMM, pipeline: UnifiedDataPipeline | None = None, scaler: SimpleGlobalScaler | None = None, config: Dict[str, Any] | None = None, channel_names: List[str] | None = None, control_names: List[str] | None = None)[source]
Initialize the inference manager.
- Parameters:
model – Trained DeepCausalMMM model
pipeline – UnifiedDataPipeline used for training (preferred)
scaler – SimpleGlobalScaler used for training (legacy support)
config – Configuration dictionary
channel_names – List of media channel names
control_names – List of control variable names
- predict(X_media: ndarray, X_control: ndarray, return_contributions: bool = True, remove_padding: bool = True, return_media_coefficients: bool = False) Dict[str, ndarray][source]
Make predictions on new data.
- Parameters:
X_media – Media data [n_regions, n_weeks, n_media_channels]
X_control – Control data [n_regions, n_weeks, n_control_vars]
return_contributions – Whether to return contribution breakdowns
remove_padding – Whether to remove burn-in padding from results
return_media_coefficients – If True, include time-varying media coefficients (second tensor from
forward()) asmedia_coefficients.
- Returns:
Dictionary containing predictions and optionally contributions
- predict_and_inverse_transform(X_media: ndarray, X_control: ndarray, return_contributions: bool = True) Dict[str, ndarray][source]
Make predictions and apply inverse transformations for interpretable results.
- Parameters:
X_media – Media data [n_regions, n_weeks, n_media_channels]
X_control – Control data [n_regions, n_weeks, n_control_vars]
return_contributions – Whether to return contribution breakdowns
- Returns:
Dictionary containing predictions and contributions in original scale
- get_coefficients() Dict[str, ndarray][source]
Extract model coefficients.
- Returns:
Dictionary containing media and control coefficients
- get_dag_adjacency(threshold: bool = False, eps: float | None = None) ndarray | None[source]
Extract DAG adjacency matrix if available.
Uses the same mask +
dag_temperaturescaling as training. Setthreshold=True(or passeps) to prune weak edges vianotears_thresholdfrom config by default.- Parameters:
threshold – If True, zero entries below
eps.eps – Pruning cutoff; defaults to
config['notears_threshold'].
- Returns:
Adjacency matrix or None if DAG is not enabled
- analyze_contributions(X_media: ndarray, X_control: ndarray, aggregate_regions: bool = True, aggregate_time: bool = False) Dict[str, Any][source]
Comprehensive contribution analysis.
- Parameters:
X_media – Media data
X_control – Control data
aggregate_regions – Whether to aggregate across regions
aggregate_time – Whether to aggregate across time
- Returns:
Dictionary with detailed contribution analysis
- class deepcausalmmm.core.VisualizationManager(config: Dict[str, Any] | None = None)[source]
Visualization manager for creating consistent plots in DeepCausalMMM analysis.
Provides a unified interface for creating training progress, coefficient analysis, contribution plots, DAG visualizations, and other MMM-related charts. All plot parameters are driven by configuration for consistency.
- Parameters:
config (Dict[str, Any], optional) – Configuration dictionary. If None, uses default configuration.
- __init__(config: Dict[str, Any] | None = None)[source]
Initialize the visualization manager.
- Parameters:
config – Configuration dictionary. If None, uses default config.
- create_training_progress_plot(train_losses: List[float], train_rmses: List[float], train_r2s: List[float], title: str = 'Training Progress') Figure[source]
Create a training progress plot with loss, RMSE, and R².
- Parameters:
train_losses – Training losses over epochs
train_rmses – Training RMSEs over epochs
train_r2s – Training R² scores over epochs
title – Plot title
- Returns:
Plotly figure
- create_actual_vs_predicted_plot(y_actual: ndarray, y_predicted: ndarray, title: str = 'Actual vs Predicted', weeks: List[int] | None = None) Figure[source]
Create an actual vs predicted time series plot.
- Parameters:
y_actual – Actual values
y_predicted – Predicted values
title – Plot title
weeks – Optional week indices for x-axis
- Returns:
Plotly figure
- create_scatter_plot(x: ndarray, y: ndarray, title: str = 'Scatter Plot', x_label: str = 'X', y_label: str = 'Y', color: str = 'blue') Figure[source]
Create a scatter plot with perfect correlation line.
- Parameters:
x – X values
y – Y values
title – Plot title
x_label – X-axis label
y_label – Y-axis label
color – Marker color
- Returns:
Plotly figure
- create_waterfall_chart(categories: List[str], values: List[float], title: str = 'Waterfall Chart') Figure[source]
Create a proper waterfall chart using Plotly’s go.Waterfall.
- Parameters:
categories – Category names
values – Values for each category
title – Chart title
- Returns:
Plotly figure
- create_contribution_stacked_bar(media_contributions: ndarray, control_contributions: ndarray, baseline: ndarray, media_names: List[str], control_names: List[str], weeks: List[int] | None = None, title: str = 'Contributions Over Time') Figure[source]
Create a stacked bar chart of contributions over time.
- Parameters:
media_contributions – Media contributions [n_weeks, n_media]
control_contributions – Control contributions [n_weeks, n_controls]
baseline – Baseline values [n_weeks]
media_names – Media channel names
control_names – Control variable names
weeks – Optional week indices
title – Chart title
- Returns:
Plotly figure
- create_dag_network_plot(adjacency_matrix: ndarray, node_names: List[str], title: str = 'DAG Network') Figure[source]
Create a DAG network visualization.
- Parameters:
adjacency_matrix – Adjacency matrix [n_nodes, n_nodes]
node_names – Node names
title – Plot title
- Returns:
Plotly figure
- create_dag_heatmap(adjacency_matrix: ndarray, node_names: List[str], title: str = 'DAG Adjacency Matrix') Figure[source]
Create a DAG adjacency matrix heatmap.
- Parameters:
adjacency_matrix – Adjacency matrix [n_nodes, n_nodes]
node_names – Node names
title – Plot title
- Returns:
Plotly figure
- save_plot(fig: Figure, filepath: str, include_plotlyjs: str = 'cdn') bool[source]
Save a Plotly figure to HTML file.
- Parameters:
fig – Plotly figure to save
filepath – Output file path
include_plotlyjs – How to include Plotly.js (‘cdn’, ‘inline’, etc.)
- Returns:
True if successful, False otherwise
- create_comprehensive_dashboard(results: Dict[str, Any], output_dir: str = 'dashboard_comprehensive') List[Tuple[str, str]][source]
Create a comprehensive dashboard with multiple plots.
- Parameters:
results – Training results dictionary
output_dir – Output directory for plots
- Returns:
List of (plot_name, filepath) tuples for created plots
- class deepcausalmmm.core.UnifiedDataPipeline(config: Dict[str, Any])[source]
Unified data processing pipeline for DeepCausalMMM models.
This pipeline ensures consistent data transformations between training and holdout datasets, implementing the complete preprocessing workflow required for MMM analysis. It handles temporal splitting, multi-scale normalization, seasonal decomposition, and tensor preparation for PyTorch models.
Key Features: - Temporal train/holdout splitting (respects time series nature) - SOV (Share of Voice) scaling for media channels - Z-score normalization for control variables - Min-Max scaling for seasonal components (per region) - Burn-in padding for GRU stabilization - Automatic tensor conversion and device handling - Inverse transformation utilities for interpretation - Region encoding and validation
The pipeline maintains data integrity by: - Using the same scaler fit on training data for holdout - Preserving temporal order in all transformations - Handling missing values and outliers appropriately - Ensuring consistent tensor shapes across regions
- Parameters:
config (Dict[str, Any]) – Configuration dictionary containing: - ‘holdout_ratio’: Fraction of data for holdout (default 0.08) - ‘burn_in_weeks’: Number of weeks for padding (default 6) - ‘random_seed’: Seed for reproducible operations (default 42) - Media channel names, control variable names, etc.
- scaler
Fitted scaler for consistent transformations
- Type:
- seasonal_detector
Seasonal decomposition utility
- Type:
Examples
>>> import pandas as pd >>> from deepcausalmmm.core.data import UnifiedDataPipeline >>> from deepcausalmmm.core.config import get_default_config >>> >>> # Load your MMM dataset >>> df = pd.read_csv('mmm_data.csv') >>> config = get_default_config() >>> >>> # Initialize and fit pipeline >>> pipeline = UnifiedDataPipeline(config) >>> processed_data = pipeline.fit_transform(df) >>> >>> # Access processed tensors >>> X_media_train = processed_data['X_media_train'] >>> y_train = processed_data['y_train'] >>> >>> # Get holdout data >>> X_media_holdout = processed_data['X_media_holdout'] >>> y_holdout = processed_data['y_holdout'] >>> >>> print(f"Training shape: {X_media_train.shape}") >>> print(f"Holdout shape: {X_media_holdout.shape}")
- __init__(config: Dict[str, Any])[source]
Initialize the unified data pipeline.
- Parameters:
config – Configuration dictionary with all parameters
- temporal_split(X_media: ndarray, X_control: ndarray, y: ndarray, holdout_ratio: float | None = None) Tuple[Dict[str, ndarray], Dict[str, ndarray]][source]
Perform time series split of data using ratio-based approach. This ensures adequate holdout data regardless of burn-in weeks.
- Parameters:
X_media – Media data [regions, weeks, channels]
X_control – Control data [regions, weeks, controls]
y – Target data [regions, weeks]
holdout_ratio – Fraction of data for holdout (uses config if None)
- Returns:
Tuple of (train_data_dict, holdout_data_dict)
- fit_and_transform_training(train_data: Dict[str, ndarray]) Dict[str, Tensor][source]
Fit scaler on training data and transform it.
- Parameters:
train_data – Dictionary with training data arrays
- Returns:
Dictionary with transformed and padded tensors
- transform_holdout(holdout_data: Dict[str, ndarray]) Dict[str, Tensor][source]
Transform holdout data using the fitted scaler (same transformations as training).
- Parameters:
holdout_data – Dictionary with holdout data arrays
- Returns:
Dictionary with transformed and padded tensors
- inverse_transform_predictions(y_pred_scaled: Tensor, remove_padding: bool = True) Tensor[source]
Inverse transform predictions to original scale.
- Parameters:
y_pred_scaled – Predictions in scaled space
remove_padding – Whether to remove padding weeks
- Returns:
Predictions in original scale
- get_evaluation_data(y_true_padded: Tensor, y_pred_padded: Tensor) Tuple[Tensor, Tensor][source]
Extract evaluation data (removing burn-in padding).
- Parameters:
y_true_padded – True values with padding
y_pred_padded – Predicted values with padding
- Returns:
Tuple of (y_true_eval, y_pred_eval) without padding
- inverse_transform_contributions(media_contributions: Tensor, y_true: Tensor) Tensor[source]
Inverse transform media contributions to original scale.
- Parameters:
media_contributions – Media contributions in scaled space
y_true – True values in original scale (for scaling reference)
- Returns:
Media contributions in original scale
- get_scaler() SimpleGlobalScaler[source]
Get the fitted scaler for external use.
- Returns:
Fitted SimpleGlobalScaler instance
- predict_and_postprocess(model, X_media: ndarray, X_control: ndarray, channel_names: List[str], control_names: List[str], combine_with_holdout: bool = True) Dict[str, Any][source]
Generate predictions and contributions using the unified pipeline.
- Parameters:
model – Trained model
X_media – Media data (full dataset for contributions)
X_control – Control data (full dataset for contributions)
channel_names – Media channel names
control_names – Control variable names
combine_with_holdout – Whether to combine train+holdout for contributions
- Returns:
Dictionary with predictions, contributions, and metadata
- class deepcausalmmm.core.SimpleGlobalScaler(config: Dict[str, Any] | None = None)[source]
Linear scaling approach (y/y_mean) for additive attribution.
Scaling features: - Media: Share-of-voice scaling with outlier smoothing - Control: Robust standardization with adaptive clipping - Target: Linear scaling by region mean (y/y_mean) for additive decomposition - Adaptive normalization with distribution-aware clipping - Advanced outlier handling for extreme value stability
- __init__(config: Dict[str, Any] | None = None)[source]
Initialize the scaler with optional config parameters.
- fit(X_media: ndarray, X_control: ndarray, y: ndarray) None[source]
Fit the scaler using simple global statistics.
- Parameters:
X_media – Media variables [n_regions, n_timesteps, n_channels]
X_control – Control variables [n_regions, n_timesteps, n_controls]
y – Target variable [n_regions, n_timesteps]
- transform(X_media: ndarray, X_control: ndarray, y: ndarray) Tuple[Tensor, Tensor, Tensor][source]
Transform data using fitted parameters.
- Parameters:
X_media – Media variables [n_regions, n_timesteps, n_channels]
X_control – Control variables [n_regions, n_timesteps, n_controls]
y – Target variable [n_regions, n_timesteps]
- Returns:
Tuple of (X_media_scaled, X_control_scaled, y_scaled)
- inverse_transform_target(y_scaled: Tensor) Tensor[source]
Inverse transform target variable.
- Parameters:
y_scaled – Scaled target [n_regions, n_timesteps]
- Returns:
Original scale target
- inverse_transform_contributions(media_contributions: Tensor, baseline: Tensor = None, control_contributions: Tensor = None, seasonal_contributions: Tensor = None, trend_contributions: Tensor = None, prediction_scale: Tensor = None) dict[source]
Inverse transform ALL contributions to original scale using simple multiplication.
With linear scaling (y/y_mean), the inverse transform is straightforward: component_orig = component_scaled * prediction_scale * y_mean_per_region
This preserves additivity: sum(components_orig) = prediction_orig
- Parameters:
media_contributions – Media contributions in scaled space [regions, timesteps, channels]
baseline – Baseline in scaled space [regions, timesteps]
control_contributions – Control contributions in scaled space [regions, timesteps, controls]
seasonal_contributions – Seasonal contributions in scaled space [regions, timesteps]
trend_contributions – Trend contributions in scaled space [regions, timesteps]
prediction_scale – Model’s prediction_scale factor (from F.softplus(self.prediction_scale))
- Returns:
Dictionary with all contributions in original scale
- fit_transform(X_media: ndarray, X_control: ndarray, y: ndarray) Tuple[Tensor, Tensor, Tensor][source]
Fit the scaler and transform data in one step.
- Parameters:
X_media – Media variables [n_regions, n_timesteps, n_channels]
X_control – Control variables [n_regions, n_timesteps, n_controls]
y – Target variable [n_regions, n_timesteps]
- Returns:
Tuple of (X_media_scaled, X_control_scaled, y_scaled)
- deepcausalmmm.core.GlobalScaler
alias of
SimpleGlobalScaler
- class deepcausalmmm.core.NodeToEdge(node_dim: int, edge_dim: int)[source]
Transform node features to edge features using attention mechanism.
- class deepcausalmmm.core.EdgeToNode(edge_dim: int, node_dim: int)[source]
Aggregate edge features back to nodes.
- __init__(edge_dim: int, node_dim: int)[source]
Initialize the edge to node transformation.
- Parameters:
edge_dim – Dimension of edge features
node_dim – Dimension of node features
- forward(edges: Tensor, nodes: Tensor, adj_matrix: Tensor) Tensor[source]
Aggregate edge features to update node features.
- Parameters:
edges – Edge features [batch_size, n_nodes, n_nodes, edge_dim]
nodes – Node features [batch_size, n_nodes, 1]
adj_matrix – Adjacency matrix [n_nodes, n_nodes]
- Returns:
Updated node features [batch_size, n_nodes, 1]
- class deepcausalmmm.core.DAGConstraint(n_nodes: int, sparsity_weight: float = 0.1, temperature: float = 1.0)[source]
Enforce acyclicity in the graph structure using strict triangular constraint.
- __init__(n_nodes: int, sparsity_weight: float = 0.1, temperature: float = 1.0)[source]
Initialize the DAG constraint module.
- Parameters:
n_nodes – Number of nodes in the graph
sparsity_weight – Weight for the sparsity penalty
temperature – Initial temperature for Gumbel-Softmax
- gumbel_softmax(logits: Tensor, tau: float) Tensor[source]
Gumbel-Softmax sampling with straight-through gradients.
- Parameters:
logits – Input logits
tau – Temperature parameter
- Returns:
Sampled probabilities
- get_adjacency() Tensor[source]
Get the current adjacency matrix using Gumbel-Softmax sampling. This enforces unidirectional edges and allows learning discrete structure.
- deepcausalmmm.core.train_mmm(*args, **kwargs)[source]
Deprecated since version 1.0.0: train_mmm() is deprecated. Use ModelTrainer class instead.
Modules
Configuration settings for DeepCausalMMM model. |
|
DAG model implementation with Node-to-Edge and Edge-to-Node transformations. |
|
Data preprocessing and loading utilities for DeepCausalMMM. |
|
Modern InferenceManager class for DeepCausalMMM model inference. |
|
Simple, proven scaling implementation that works reliably. |
|
Training functions for DeepCausalMMM models. |
|
Reusable ModelTrainer class for DeepCausalMMM training. |
|
DeepCausalMMM model implementation combining GRU, DAG, and interaction components. |
|
Reusable VisualizationManager class for creating consistent plots. |