deepcausalmmm

DeepCausalMMM: Deep Learning Marketing Mix Modeling with Causal Structure

A PyTorch-based implementation of Marketing Mix Modeling that incorporates: - GRU-based time-varying coefficients with advanced stabilization - DAG (Directed Acyclic Graph) structure for causal relationships

(upper-triangular mask by default; opt-in NOTEARS continuous learning)

  • Channel interaction modeling

  • Regional scaling and analysis

Main Components: - DeepCausalMMM: Core model class - ComprehensiveAnalyzer: Advanced post-processing and visualization - Configuration system for reproducible experiments

class deepcausalmmm.DeepCausalMMM(n_media: int = 10, ctrl_dim: int = 15, hidden: int = 32, n_regions: int = 2, dropout: float = 0.1, sparsity_weight: float = 0.01, enable_dag: bool = True, enable_interactions: bool = True, l1_weight: float = 0.001, l2_weight: float = 0.001, burn_in_weeks: int = 4, use_coefficient_momentum: bool = True, momentum_decay: float = 0.9, use_warm_start: bool = True, warm_start_epochs: int = 50, stabilization_method: str = 'exponential', coeff_l2_weight: float = 0.1, coeff_gen_l2_weight: float = 0.05, gru_layers: int = 1, ctrl_hidden_ratio: float = 0.5, dag_mode: str = 'triangular', notears_lambda1: float = 0.01, notears_rho_init: float = 1.0, notears_alpha_init: float = 0.0, notears_rho_max: float = 1e+16, dag_temperature: float = 1.0, notears_group_l1: float = 0.0)[source]

Deep Causal Marketing Mix Model with DAG structure and channel interactions.

This model combines deep learning with causal inference to understand the impact of marketing channels on business KPIs while learning causal relationships between channels through a Directed Acyclic Graph (DAG).

The model features: - GRU-based temporal modeling for time-varying coefficients - Learnable coefficient bounds for realistic attribution - DAG learning for causal channel interactions (triangular mask or opt-in NOTEARS) - Adstock and saturation transformations - Multi-region support with shared and region-specific parameters - Zero hardcoding philosophy - all parameters are learnable or configurable

Parameters:
  • n_media (int, default=10) – Number of media channels in the dataset

  • ctrl_dim (int, default=15) – Number of control variables (weather, events, etc.)

  • hidden (int, default=32) – Hidden dimension size for GRU and MLP layers

  • n_regions (int, default=2) – Number of geographic regions or DMAs

  • dropout (float, default=0.1) – Dropout rate for regularization during training

  • sparsity_weight (float, default=0.01) – Weight for sparsity regularization on coefficients

  • enable_dag (bool, default=True) – Whether to enable DAG learning for channel interactions

  • enable_interactions (bool, default=True) – Whether to enable channel interaction modeling

  • l1_weight (float, default=0.001) – L1 regularization weight for coefficient sparsity

  • l2_weight (float, default=0.001) – L2 regularization weight for coefficient smoothness

  • burn_in_weeks (int, default=4) – Number of initial weeks for GRU stabilization

  • use_coefficient_momentum (bool, default=True) – Whether to use momentum for coefficient stabilization

  • momentum_decay (float, default=0.9) – Decay rate for coefficient momentum

  • use_warm_start (bool, default=True) – Whether to use warm start training initialization

  • warm_start_epochs (int, default=50) – Number of epochs for warm start phase

  • stabilization_method (str, default="exponential") – Method for coefficient stabilization (“linear”, “exponential”, “sigmoid”)

  • coeff_l2_weight (float, default=0.1) – L2 regularization specifically for media coefficients

  • coeff_gen_l2_weight (float, default=0.05) – L2 regularization for coefficient generation layers

  • gru_layers (int, default=1) – Number of GRU layers (configured, not hardcoded)

  • ctrl_hidden_ratio (float, default=0.5) – Control hidden size as ratio of main hidden dimension

  • dag_mode (str, default="triangular") – DAG acyclicity mode: "triangular" (upper-triangular mask, default) or "notears" (continuous NOTEARS penalty; Zheng et al., 2018)

  • notears_lambda1 (float, default=0.01) – L1 sparsity on the full adjacency in NOTEARS mode

  • notears_rho_init (float, default=1.0) – Initial augmented-Lagrangian penalty rho

  • notears_alpha_init (float, default=0.0) – Initial dual variable alpha for NOTEARS

  • notears_rho_max (float, default=1e16) – Upper cap on rho for numerical safety

  • dag_temperature (float, default=1.0) – Sigmoid temperature for DAG edge weights (< 1 sharpens toward {0, 1})

  • notears_group_l1 (float, default=0.0) – Column-group L1 over adjacency columns (NOTEARS only)

media_coeffs

Time-varying coefficients for media channels

Type:

torch.nn.Parameter

ctrl_coeffs

Coefficients for control variables

Type:

torch.nn.Parameter

dag_matrix

Learnable DAG adjacency matrix for channel interactions

Type:

torch.nn.Parameter

region_baseline

Region-specific baseline contributions

Type:

torch.nn.Parameter

seasonal_coeff

Learnable coefficient for seasonal component

Type:

torch.nn.Parameter

Examples

>>> import torch
>>> from deepcausalmmm import DeepCausalMMM
>>>
>>> # Initialize model
>>> model = DeepCausalMMM(
...     n_media=5,
...     ctrl_dim=3,
...     n_regions=2,
...     hidden=64
... )
>>>
>>> # Prepare data tensors
>>> media_data = torch.randn(2, 104, 5)    # [regions, weeks, channels]
>>> control_data = torch.randn(2, 104, 3)  # [regions, weeks, controls]
>>> regions = torch.arange(2).unsqueeze(1).repeat(1, 104)
>>>
>>> # Forward pass
>>> predictions, media_coeffs, media_contributions, outputs = model(
...     media_data, control_data, regions
... )
>>>
>>> print(f"Predictions shape: {predictions.shape}")
>>> print(f"Media contributions: {outputs['contributions'].shape}")
__init__(n_media: int = 10, ctrl_dim: int = 15, hidden: int = 32, n_regions: int = 2, dropout: float = 0.1, sparsity_weight: float = 0.01, enable_dag: bool = True, enable_interactions: bool = True, l1_weight: float = 0.001, l2_weight: float = 0.001, burn_in_weeks: int = 4, use_coefficient_momentum: bool = True, momentum_decay: float = 0.9, use_warm_start: bool = True, warm_start_epochs: int = 50, stabilization_method: str = 'exponential', coeff_l2_weight: float = 0.1, coeff_gen_l2_weight: float = 0.05, gru_layers: int = 1, ctrl_hidden_ratio: float = 0.5, dag_mode: str = 'triangular', notears_lambda1: float = 0.01, notears_rho_init: float = 1.0, notears_alpha_init: float = 0.0, notears_rho_max: float = 1e+16, dag_temperature: float = 1.0, notears_group_l1: float = 0.0)[source]

Initialize internal Module state, shared by both nn.Module and ScriptModule.

initialize_baseline(y_data: Tensor)[source]

Initialize baseline to match target data statistics.

CRITICAL: y_data is ALREADY in scaled space (y/y_mean per region)! Extract ALL parameters directly from the actual data distribution. IMPORTANT: Skip padding weeks to avoid baseline bias!

initialize_hill_from_data(Xm: Tensor)[source]

Initialize hill_g based on per-channel SOV (Share of Voice) distribution.

For each channel, hill_g is set to the 60th percentile of its SOV values, ensuring the inflection point matches where the channel typically operates.

Parameters:

Xm – Media data [n_regions, n_timesteps, n_channels] in SOV-scaled space [0, 1]

initialize_stable_coefficients_from_data(Xm: Tensor, Xc: Tensor, y: Tensor)[source]

Initialize stable coefficients based on simple linear regression on the data. This provides domain-informed starting points for coefficient stabilization.

warm_start_training(Xm: Tensor, Xc: Tensor, R: Tensor, y: Tensor, optimizer: Optimizer, epochs: int = None)[source]

Warm-start training phase to stabilize GRU coefficients before main training. Uses only stable coefficients and focuses on learning good hidden state initialization.

adstock(x: Tensor) Tensor[source]

STABILIZED adstock transformation.

hill(x: Tensor) Tensor[source]

STABILIZED Hill saturation transformation.

dag_interaction(x: Tensor) Tensor[source]

Load-bearing DAG-driven channel interactions.

Each target channel j’s effective input becomes a learned blend of itself and a DAG-driven aggregation of its causal parents:

parents_j = sum_i adj[i, j] * x_i (column-j of adj * x) x_j_new = (1 - mix_j) * x_j + mix_j * parents_j

Two design choices make NOTEARS actually structure-learning here rather than decorative:

  • mix is a per-channel learnable scalar in (0, 1). The model can turn the DAG up where it helps prediction (strong causal parents) and down where it doesn’t, so each adj column receives genuine per-channel gradient signal rather than a globally-averaged one.

  • adj uses a temperature-controlled sigmoid (dag_temperature), so the model is encouraged toward {near-0, near-1} edges instead of a soft cluster around 0.5 / sigmoid floor — this is the standard trick for making sigmoid gates bi-modal.

With the previous additive form x + scalar * matmul(x, adj) the loss was satisfied even with a uniform adjacency at the L1 floor, which is why the learned graph collapsed to ~equal edges. The blended form forces the model to either pick informative parents or keep mix_j close to 0 (effectively ignoring the DAG for that channel).

process_media(X: Tensor) Tuple[Tensor, Dict[str, Tensor]][source]

Process media variables through transformations.

apply_burn_in_stabilization(coeffs: Tensor, stable_coeff: Tensor) Tensor[source]

Advanced burn-in stabilization with multiple transition methods.

Parameters:
  • coeffs – Time-varying coefficients [B, T, dim]

  • stable_coeff – Stable reference coefficients [dim]

Returns:

Stabilized coefficients with smooth burn-in transition

forward(Xm: Tensor, Xc: Tensor, R: Tensor) Tuple[Tensor, Tensor, Tensor, Dict[str, Any]][source]

Forward pass through the DeepCausalMMM model.

Processes media and control variables through the neural network to generate predictions, time-varying coefficients, per-channel media contributions, and a detailed outputs dict (baseline, seasonality, control contributions, DAG, etc.).

Parameters:
  • Xm (torch.Tensor) – Media data tensor of shape [batch_size, time_steps, n_media] Should be SOV-scaled (Share of Voice) normalized to [0, 1] range

  • Xc (torch.Tensor) – Control variables tensor of shape [batch_size, time_steps, ctrl_dim] Should be standardized (z-score normalized)

  • R (torch.Tensor) – Region indicators tensor of shape [batch_size, time_steps] Integer values representing region/DMA IDs

Returns:

  • predictions (torch.Tensor) – Model predictions (scaled KPI), shape typically [batch_size, time_steps] (broadcast with components before prediction_scale).

  • media_coefficients (torch.Tensor) – Time-varying media coefficients, shape [batch_size, time_steps, n_media].

  • media_contributions (torch.Tensor) – Per-channel media contributions X_processed * media_coefficients, shape [batch_size, time_steps, n_media].

  • outputs (Dict[str, Any]) – Detailed tensors, including: - contributions: same as media_contributions (media channel breakdown) - coefficients: same as returned media_coefficients - control_contributions: control variable contributions [batch, time, ctrl_dim] - control_coefficients: control coefficients [batch, time, ctrl_dim] - baseline: baseline without seasonality (for waterfall-style splits) - seasonal_contribution: seasonal term [batch, time] - dag_matrix (when DAG enabled): adjacency [n_media, n_media] - adstocked_media, media_hill, media_dag (when applicable): media pipeline stages

Examples

>>> import torch
>>> model = DeepCausalMMM(n_media=3, ctrl_dim=2, n_regions=2)
>>>
>>> # Prepare input tensors
>>> media = torch.rand(2, 52, 3)  # 2 regions, 52 weeks, 3 channels
>>> control = torch.randn(2, 52, 2)  # 2 control variables
>>> regions = torch.tensor([[0]*52, [1]*52])  # Region indicators
>>>
>>> # Forward pass
>>> pred, media_coeffs, media_contrib, outputs = model(media, control, regions)
>>>
>>> # Access detailed outputs
>>> media_contrib_out = outputs['contributions']
>>> ctrl_contrib = outputs['control_contributions']
>>> dag_matrix = outputs.get('dag_matrix')

Notes

The forward pass applies the following transformations in order: 1. Media processing: Adstock -> Hill saturation -> DAG interactions 2. Feature processing: Media features + Control features -> GRU 3. Coefficient generation: Time-varying coefficients from GRU states 4. Contribution calculation: Features * Coefficients 5. Final prediction: Baseline + Seasonality + Media + Control contributions

The model enforces several constraints: - DAG acyclicity: upper-triangular mask (default) or NOTEARS penalty

when dag_mode='notears'

  • Non-negative baseline and seasonal contributions

  • Learnable coefficient bounds to prevent explosion

  • Burn-in period stabilization for initial weeks

h_acyclicity(W: Tensor) Tensor[source]

NOTEARS acyclicity scalar: h(W) = tr(exp(W ⊙ W)) − d.

Equals zero iff W is the adjacency of a DAG; smooth and differentiable elsewhere. See Zheng et al., 2018 (https://arxiv.org/abs/1803.01422).

Parameters:

W – Square adjacency matrix (n_media × n_media).

Returns:

Scalar tensor; minimised toward 0 during training.

get_dag_loss() Tensor[source]

DAG regularisation. Mode-aware: triangular uses sparsity/confidence penalties only (acyclicity is structural); NOTEARS additionally adds the augmented-Lagrangian acyclicity term 0.5·rho·h(W)² + alpha·h(W) and an L1 penalty on the full adjacency.

notears_update_duals(factor: float = 10.0, progress: float = 0.25) Dict[str, float][source]

Augmented-Lagrangian dual update (NOTEARS outer loop).

Call once every K epochs from the trainer. Returns diagnostic dict ({“h”, “rho”, “alpha”}) for logging; returns empty dict in triangular mode.

Parameters:
  • factor – Multiplicative growth applied to rho when h(W) stalls.

  • progress – Required relative shrinkage of h between outer iterations. If h_new > progress * h_prev, rho is grown by factor.

get_dag_adjacency_matrix(eps: float | None = None) Tensor[source]

Learned adjacency with dag_temperature and tri_mask applied.

Parameters:

eps – If None, return continuous edge weights. If a float, zero entries with |w| < eps (same rule as threshold_dag).

Returns:

Square adjacency tensor [n_media, n_media].

threshold_dag(eps: float = 0.3) Tensor[source]

Post-training pruning: zero out adjacency entries with |w| < eps.

Returns the thresholded adjacency tensor. For NOTEARS mode this is the recommended way to obtain a clean discrete DAG from the continuous W.

get_sparsity_loss() Tensor[source]

Sparsity loss to encourage sparse coefficients.

get_regularization_loss() Tensor[source]

Calculate combined regularization loss including DAG penalty and coefficient regularization.

get_parameters() Dict[str, Tensor][source]

Get model parameters for analysis.

deepcausalmmm.get_default_config() Dict[str, Any][source]

Get default configuration settings for the model.

Includes DAG structure-learning keys under dag_mode:

  • 'triangular' (default) — upper-triangular acyclicity mask

  • 'notears' — NOTEARS augmented-Lagrangian mode; see also notears_warmup_epochs, notears_lambda1, notears_dual_*, dag_temperature, and notears_group_l1

Returns:

Dict containing all configuration parameters

deepcausalmmm.update_config(base_config: Dict[str, Any], updates: Dict[str, Any]) Dict[str, Any][source]

Update base configuration with new values.

Parameters:
  • base_config – Base configuration dictionary

  • updates – Dictionary containing updates to apply

Returns:

Updated configuration dictionary

class deepcausalmmm.ComprehensiveAnalyzer(model, media_cols: List[str], control_cols: List[str], output_dir: str = 'mmm_analysis_results', pipeline=None, auto_detect_burnin: bool = True, manual_burnin_weeks: int | None = None, config: Dict | None = None, inference: InferenceManager | None = None)[source]

Modernized comprehensive analyzer for DeepCausalMMM with config-driven visualizations.

__init__(model, media_cols: List[str], control_cols: List[str], output_dir: str = 'mmm_analysis_results', pipeline=None, auto_detect_burnin: bool = True, manual_burnin_weeks: int | None = None, config: Dict | None = None, inference: InferenceManager | None = None)[source]

Initialize the comprehensive analyzer.

Parameters:
  • model – Trained DeepCausalMMM model

  • media_cols – List of media column names

  • control_cols – List of control column names

  • output_dir – Directory to save outputs

  • pipeline – UnifiedDataPipeline instance for modern data processing

  • auto_detect_burnin – Whether to automatically detect burn-in weeks from model

  • manual_burnin_weeks – Manually specify burn-in weeks (overrides auto-detection)

  • config – Configuration dictionary (uses default if None)

  • inference – Modern InferenceManager instance

inverse_transform_target(y_scaled: ndarray) ndarray[source]

Apply inverse transformation to target variable using modern pipeline.

Parameters:

y_scaled – Scaled target values

Returns:

Unscaled target values

inverse_transform_contributions(contributions_scaled: ndarray, y_original: ndarray) ndarray[source]

Apply inverse transformation to contributions using modern pipeline.

Parameters:
  • contributions_scaled – Scaled contributions

  • y_original – Original scale target values

Returns:

Contributions in original scale

analyze_with_unified_pipeline(X_media: ndarray, X_control: ndarray, y_true: ndarray, create_plots: bool = True) Dict[str, Any][source]

Perform comprehensive analysis using the unified pipeline.

Parameters:
  • X_media – Media data (full dataset)

  • X_control – Control data (full dataset)

  • y_true – True target values (full dataset)

  • create_plots – Whether to create visualization plots

Returns:

Dictionary with analysis results

analyze_comprehensive(X_media: ndarray, X_control: ndarray, y_true: ndarray, region_ids: ndarray, weeks: List[int] | None = None) Dict[str, Any][source]

Run comprehensive analysis with all visualizations. Automatically removes burn-in/padding from all outputs.

Parameters:
  • X_media – Media variables [n_regions, n_weeks, n_channels] (may include padding)

  • X_control – Control variables [n_regions, n_weeks, n_controls] (may include padding)

  • y_true – True target values (scaled, may include padding)

  • region_ids – Region identifiers

  • weeks – Week labels (optional)

Returns:

Dictionary containing all analysis results (burn-in removed)

class deepcausalmmm.ResponseCurveFit(data: DataFrame, *, bottom_param: bool = False, model_level: Literal['Overall', 'DMA'] = 'Overall', date_col: str = 'week_monday')[source]

Fit Hill equation response curves to marketing mix model predictions.

The Hill equation models saturation effects: y = bottom + (top - bottom) * x^slope / (saturation^slope + x^slope)

Parameters:
  • data (pd.DataFrame) – DataFrame with columns: ‘week_monday’, ‘spend’, ‘impressions’, ‘predicted’ For DMA-level: also needs ‘dmacode’ column

  • bottom_param (bool, default=False) – Whether to fit a non-zero intercept (bottom parameter) For MMM, typically False (response at zero spend = 0)

  • Modellevel (str, default='Overall') – ‘Overall’: Single aggregated curve across all regions ‘DMA’: Separate curves for each DMA

  • Datecol (str, default='week_monday') – Name of the date column

top

Maximum response (saturation level)

Type:

float

bottom

Minimum response (typically 0)

Type:

float

saturation

Spend level at half-maximum response

Type:

float

slope

Steepness of the curve

Type:

float

r_2

R-squared score of the fitted curve

Type:

float

equation

String representation of the fitted equation

Type:

str

figure

Plotly figure object (if generate_figure=True)

Type:

go.Figure

Examples

>>> # Prepare data
>>> data = pd.DataFrame({
...     'week_monday': dates,
...     'spend': spend_values,
...     'impressions': impression_values,
...     'predicted': model_predictions
... })
>>>
>>> # Fit overall response curve
>>> fitter = ResponseCurveFit(data, Modellevel='Overall')
>>> fitter.fit_model(
...     title="Response Curve",
...     x_label="Impressions",
...     y_label="Predicted Visits",
...     generate_figure=True,
...     save_figure=True,
...     output_path='response_curve.html'
... )
>>> print(f"R²: {fitter.r_2:.3f}")
>>> print(f"Slope: {fitter.slope:.3f}")
__init__(data: DataFrame, *, bottom_param: bool = False, model_level: Literal['Overall', 'DMA'] = 'Overall', date_col: str = 'week_monday') None[source]

Initialize ResponseCurveFit.

Parameters:
  • data (pd.DataFrame) – Input data with required columns

  • bottom_param (bool, default=False) – Whether to fit non-zero intercept

  • model_level ({'Overall', 'DMA'}, default='Overall') – Aggregation level for fitting

  • date_col (str, default='week_monday') – Name of date column

Hill(X: ndarray, *params) ndarray[source]

Backward compatibility wrapper for _hill_equation.

get_param(curve_fit_kws: dict) List[float][source]

Backward compatibility wrapper for _fit_curve.

regression(x_fit, y_fit, x_label, y_label, title, sigfigs, log_x, print_r_sqr, generate_figure, view_figure, *params) None[source]

Backward compatibility wrapper for _calculate_r2_and_plot.

fit(*, x_label: str = 'x', y_label: str = 'y', title: str = 'Fitted Hill equation', sigfigs: int = 6, log_x: bool = False, print_r_sqr: bool = True, generate_figure: bool = True, view_figure: bool = False, save_figure: bool = False, output_path: str | None = None, curve_fit_kws: dict | None = None) DataFrame | None[source]

Fit Hill equation to the data.

Parameters:
  • x_label (str, default='x') – X-axis label

  • y_label (str, default='y') – Y-axis label

  • title (str, default='Fitted Hill equation') – Plot title

  • sigfigs (int, default=6) – Significant figures for equation display

  • log_x (bool, default=False) – Whether to use log scale for x-axis

  • print_r_sqr (bool, default=True) – Whether to print R² score

  • generate_figure (bool, default=True) – Whether to generate visualization

  • view_figure (bool, default=False) – Whether to display the figure

  • save_figure (bool, default=False) – Whether to save the figure

  • output_path (str, optional) – Path to save the figure (if save_figure=True)

  • curve_fit_kws (dict, optional) – Additional keyword arguments for scipy.optimize.curve_fit

Returns:

For DMA-level: DataFrame with parameters for each DMA For Overall: None (parameters stored as attributes)

Return type:

pd.DataFrame or None

fit_model(**kwargs) DataFrame | None[source]

Backward compatibility wrapper for fit().

predict(X: ndarray) ndarray[source]

Predict response for new spend levels (Overall level only).

Parameters:

X (np.ndarray) – New spend/impression values

Returns:

Predicted response values

Return type:

np.ndarray

get_summary()[source]

Get summary of fitted parameters.

Returns:

Dictionary with ‘params’, ‘r2’, and ‘equation’

Return type:

dict

deepcausalmmm.ResponseCurveFitter

alias of ResponseCurveFit

class deepcausalmmm.BudgetOptimizer(budget: float, channels: List[str], response_curves: Dict[str, Dict], *, num_weeks: int = 52, method: str = 'trust-constr')[source]

Optimize marketing budget allocation using response curves.

Uses constrained optimization (trust-constr, SLSQP, or differential evolution) with Hill transformation curves from ResponseCurveFit to find optimal spend allocation that maximizes total response subject to business constraints.

Parameters:
  • budget (float) – Total budget to allocate across all channels

  • channels (List[str]) – List of channel names to include in optimization

  • response_curves (Dict[str, Dict]) – Response curve parameters by channel from ResponseCurveFit. Each channel dict should contain: ‘top’, ‘bottom’, ‘saturation’, ‘slope’

  • num_weeks (int, default=52) – Number of weeks for planning horizon (annual by default)

  • method (str, default='trust-constr') – Optimization method: ‘trust-constr’, ‘SLSQP’, ‘differential_evolution’, ‘hybrid’

constraints_df

DataFrame with channel-level constraints (lower, upper bounds)

Type:

pd.DataFrame or None

Examples

>>> # After fitting response curves with ResponseCurveFit
>>> curves = {
...     'TV': {'top': 1000000, 'bottom': 0, 'saturation': 50000, 'slope': 1.5},
...     'Search': {'top': 800000, 'bottom': 0, 'saturation': 30000, 'slope': 2.0},
...     'Social': {'top': 600000, 'bottom': 0, 'saturation': 20000, 'slope': 1.8}
... }
>>>
>>> optimizer = BudgetOptimizer(
...     budget=1000000,
...     channels=['TV', 'Search', 'Social'],
...     response_curves=curves,
...     num_weeks=52
... )
>>>
>>> # Optional: Set channel-specific constraints
>>> optimizer.set_constraints({
...     'TV': {'lower': 50000, 'upper': 500000},
...     'Search': {'lower': 100000, 'upper': 400000}
... })
>>>
>>> # Run optimization
>>> result = optimizer.optimize()
>>>
>>> # View results
>>> if result.success:
...     print("Optimal Allocation:")
...     for channel, spend in result.allocation.items():
...         print(f"  {channel}: ${spend:,.0f}")
...     print(f"\nTotal Response: {result.predicted_response:,.0f}")
...     print(f"\nDetailed Results:\n{result.by_channel}")

Notes

The optimizer maximizes total response using the Hill equation:

\[response = bottom + (top - bottom) * \frac{spend^{slope}}{saturation^{slope} + spend^{slope}}\]

Where: - top: Maximum response (saturation level) - bottom: Minimum response (typically 0) - saturation: Spend level at half-maximum response - slope: Steepness of the response curve

The optimization problem is:

\[ \begin{align}\begin{aligned}\max_{x_1, ..., x_n} \sum_{i=1}^{n} response_i(x_i)\\s.t. \sum_{i=1}^{n} x_i = budget\\lower_i \leq x_i \leq upper_i \quad \forall i\end{aligned}\end{align} \]
__init__(budget: float, channels: List[str], response_curves: Dict[str, Dict], *, num_weeks: int = 52, method: str = 'trust-constr')[source]

Initialize BudgetOptimizer with budget, channels, and response curves.

set_constraints(constraints: Dict[str, Dict[str, float]]) None[source]

Set spend constraints for channels.

Parameters:

constraints (Dict[str, Dict[str, float]]) – Channel constraints: {‘channel’: {‘lower’: min_spend, ‘upper’: max_spend}}

Examples

>>> optimizer.set_constraints({
...     'TV': {'lower': 50000, 'upper': 500000},
...     'Search': {'lower': 100000, 'upper': 400000},
...     'Social': {'lower': 25000, 'upper': 300000}
... })

Notes

  • Channels not specified in constraints get default bounds: [0, budget]

  • Upper bounds are automatically capped at total budget

  • If lower > upper, lower is reset to 0

  • Upper bounds cannot be 0 (would make channel unusable)

optimize() OptimizationResult[source]

Run optimization to find optimal budget allocation.

Returns:

Optimization results including allocation, predicted response, and details

Return type:

OptimizationResult

Examples

>>> result = optimizer.optimize()
>>> if result.success:
...     print("Optimization successful!")
...     print(f"Predicted response: {result.predicted_response:,.0f}")
...     for channel, spend in result.allocation.items():
...         roi = result.by_channel[result.by_channel['channel']==channel]['roi'].iloc[0]
...         print(f"{channel}: ${spend:,.0f} (ROI: {roi:.2f})")
... else:
...     print(f"Optimization failed: {result.message}")
compare_scenarios(scenarios: Dict[str, Dict[str, float]]) DataFrame[source]

Compare different budget allocation scenarios.

Parameters:

scenarios (Dict[str, Dict[str, float]]) – Dictionary of scenarios: {‘scenario_name’: {‘channel’: spend, …}}

Returns:

Comparison of scenarios with predicted responses and ROIs

Return type:

pd.DataFrame

Examples

>>> scenarios = {
...     'Current': {'TV': 400000, 'Search': 350000, 'Social': 250000},
...     'Optimized': result.allocation,
...     'Heavy TV': {'TV': 600000, 'Search': 250000, 'Social': 150000}
... }
>>> comparison = optimizer.compare_scenarios(scenarios)
>>> print(comparison)
class deepcausalmmm.OptimizationResult(success: bool, allocation: Dict[str, float], predicted_response: float, by_channel: DataFrame, message: str = '', method: str = 'trust-constr')[source]

Result from budget optimization.

success

Whether optimization converged successfully

Type:

bool

allocation

Optimal spend allocation by channel

Type:

Dict[str, float]

predicted_response

Total predicted response at optimal allocation

Type:

float

by_channel

Detailed results by channel with spend, response, and ROI

Type:

pd.DataFrame

message

Optimization status message

Type:

str

method

Optimization method used

Type:

str

Examples

>>> result = optimizer.optimize()
>>> if result.success:
...     print(f"Optimal allocation: {result.allocation}")
...     print(f"Expected response: {result.predicted_response:,.0f}")
...     print(result.by_channel)
success: bool
allocation: Dict[str, float]
predicted_response: float
by_channel: DataFrame
message: str = ''
method: str = 'trust-constr'
__init__(success: bool, allocation: Dict[str, float], predicted_response: float, by_channel: DataFrame, message: str = '', method: str = 'trust-constr') None
deepcausalmmm.optimize_budget_from_curves(budget: float, curve_params: DataFrame, *, channel_col: str = 'channel', num_weeks: int = 52, constraints: Dict[str, Dict[str, float]] | None = None, method: str = 'trust-constr') OptimizationResult[source]

Convenience function to optimize budget directly from curve parameters DataFrame.

This function is useful when you have response curve parameters in a DataFrame (e.g., from ResponseCurveFit fitted on multiple channels) and want to quickly run optimization without manually setting up the BudgetOptimizer.

Parameters:
  • budget (float) – Total budget to allocate

  • curve_params (pd.DataFrame) – DataFrame with response curve parameters. Required columns: channel, top, bottom, saturation, slope

  • channel_col (str, default='channel') – Name of the channel column in curve_params

  • num_weeks (int, default=52) – Number of weeks for planning horizon

  • constraints (Dict[str, Dict[str, float]], optional) – Channel-specific constraints: {‘channel’: {‘lower’: min, ‘upper’: max}}

  • method (str, default='trust-constr') – Optimization method

Returns:

Optimization results

Return type:

OptimizationResult

Examples

>>> # After fitting curves for multiple channels
>>> curves_df = pd.DataFrame({
...     'channel': ['TV', 'Search', 'Social'],
...     'top': [1000000, 800000, 600000],
...     'bottom': [0, 0, 0],
...     'saturation': [50000, 30000, 20000],
...     'slope': [1.5, 2.0, 1.8]
... })
>>>
>>> result = optimize_budget_from_curves(
...     budget=1000000,
...     curve_params=curves_df,
...     constraints={'TV': {'lower': 100000, 'upper': 600000}}
... )
>>> print(result.allocation)
class deepcausalmmm.SimpleGlobalScaler(config: Dict[str, Any] | None = None)[source]

Linear scaling approach (y/y_mean) for additive attribution.

Scaling features: - Media: Share-of-voice scaling with outlier smoothing - Control: Robust standardization with adaptive clipping - Target: Linear scaling by region mean (y/y_mean) for additive decomposition - Adaptive normalization with distribution-aware clipping - Advanced outlier handling for extreme value stability

__init__(config: Dict[str, Any] | None = None)[source]

Initialize the scaler with optional config parameters.

fit(X_media: ndarray, X_control: ndarray, y: ndarray) None[source]

Fit the scaler using simple global statistics.

Parameters:
  • X_media – Media variables [n_regions, n_timesteps, n_channels]

  • X_control – Control variables [n_regions, n_timesteps, n_controls]

  • y – Target variable [n_regions, n_timesteps]

transform(X_media: ndarray, X_control: ndarray, y: ndarray) Tuple[Tensor, Tensor, Tensor][source]

Transform data using fitted parameters.

Parameters:
  • X_media – Media variables [n_regions, n_timesteps, n_channels]

  • X_control – Control variables [n_regions, n_timesteps, n_controls]

  • y – Target variable [n_regions, n_timesteps]

Returns:

Tuple of (X_media_scaled, X_control_scaled, y_scaled)

inverse_transform_target(y_scaled: Tensor) Tensor[source]

Inverse transform target variable.

Parameters:

y_scaled – Scaled target [n_regions, n_timesteps]

Returns:

Original scale target

inverse_transform_contributions(media_contributions: Tensor, baseline: Tensor = None, control_contributions: Tensor = None, seasonal_contributions: Tensor = None, trend_contributions: Tensor = None, prediction_scale: Tensor = None) dict[source]

Inverse transform ALL contributions to original scale using simple multiplication.

With linear scaling (y/y_mean), the inverse transform is straightforward: component_orig = component_scaled * prediction_scale * y_mean_per_region

This preserves additivity: sum(components_orig) = prediction_orig

Parameters:
  • media_contributions – Media contributions in scaled space [regions, timesteps, channels]

  • baseline – Baseline in scaled space [regions, timesteps]

  • control_contributions – Control contributions in scaled space [regions, timesteps, controls]

  • seasonal_contributions – Seasonal contributions in scaled space [regions, timesteps]

  • trend_contributions – Trend contributions in scaled space [regions, timesteps]

  • prediction_scale – Model’s prediction_scale factor (from F.softplus(self.prediction_scale))

Returns:

Dictionary with all contributions in original scale

fit_transform(X_media: ndarray, X_control: ndarray, y: ndarray) Tuple[Tensor, Tensor, Tensor][source]

Fit the scaler and transform data in one step.

Parameters:
  • X_media – Media variables [n_regions, n_timesteps, n_channels]

  • X_control – Control variables [n_regions, n_timesteps, n_controls]

  • y – Target variable [n_regions, n_timesteps]

Returns:

Tuple of (X_media_scaled, X_control_scaled, y_scaled)

deepcausalmmm.GlobalScaler

alias of SimpleGlobalScaler

deepcausalmmm.get_device(device: str | None = None) device[source]

Get the appropriate device for model training/inference.

Parameters:

device – Device specification (‘auto’, ‘cpu’, ‘cuda’, ‘cuda:0’, etc.) If None or ‘auto’, will use CUDA if available

Returns:

Selected device

Return type:

torch.device

Modules

cli

Command-line interface for DeepCausalMMM.

core

Core components of DeepCausalMMM.

exceptions

Custom exceptions for DeepCausalMMM package.

postprocess

Post-processing utilities for DeepCausalMMM analysis and visualization.

utils

Utility functions for DeepCausalMMM.