DAG and NOTEARS structure learning
DeepCausalMMM learns how media channels influence each other through a Directed Acyclic Graph (DAG) embedded in the forward pass. Each channel’s effective input can blend its own signal with a weighted sum of causal parents.
Two modes are available (config['dag_mode']):
``triangular`` (default) — acyclicity enforced by an upper-triangular adjacency mask. Stable, fast, and backward compatible with earlier releases.
``notears`` (opt-in) — continuous structure learning via the NOTEARS smooth penalty h(W) = tr(exp(W ⊙ W)) − d (Zheng et al., 2018), optimised under an augmented Lagrangian with periodic dual updates.
Triangular mode (default)
No configuration change is required:
from deepcausalmmm.core import get_default_config
config = get_default_config()
assert config['dag_mode'] == 'triangular'
The model learns sparse adjacency weights subject to the triangular mask.
Inspect edges after training with model.threshold_dag() or the DAG network
plot in examples/dashboard_rmse_optimized.py.
NOTEARS mode (opt-in)
Enable data-driven topology discovery:
from deepcausalmmm.core import get_default_config
from deepcausalmmm.core.trainer import ModelTrainer
config = get_default_config()
config['dag_mode'] = 'notears'
# Recommended starting points (see config.py for defaults):
config['notears_warmup_epochs'] = 500 # Huber-only, then enable penalty
config['notears_lambda1'] = 0.005 # L1 sparsity on adjacency
config['dag_temperature'] = 0.5 # Sharper {0,1} edge weights
config['notears_group_l1'] = 0.01 # Focused parents per channel
config['notears_dual_factor'] = 3.0 # Gentler rho growth when h stalls
config['notears_dual_update_every'] = 100 # Outer-loop cadence (epochs)
trainer = ModelTrainer(config)
# ... create model, prepare data, trainer.train(...) as in quickstart ...
Key config keys
Key |
Role |
|---|---|
|
|
|
Epochs of Huber-only training before the NOTEARS penalty activates |
|
L1 sparsity on the learned adjacency |
|
Initial augmented-Lagrangian penalty and dual variable |
|
How often |
|
Multiplier applied to |
|
Sigmoid temperature for edge weights ( |
|
Column-group L1 encouraging focused parent sets per channel |
|
Pruning cutoff for |
|
Minimum edge weight shown in dashboard DAG plot (often |
|
Global cap on strongest edges in the dashboard network chart |
Training behaviour
When dag_mode='notears' and notears_warmup_epochs > 0:
Warmup — prediction (Huber) loss only;
notears_activeis False.Activation — at the warmup epoch, the NOTEARS penalty and dual updates turn on. Verbose logs print
[NOTEARS] warmup complete ....Outer loop — every
notears_dual_update_everyepochs,model.notears_update_duals(factor=notears_dual_factor)adjustsrhoandalphabased on the currenth(W).
Huber prediction loss is unchanged; NOTEARS terms are added in
get_dag_loss() only.
Inspecting the learned graph
After training:
W = model.threshold_dag(eps=0.3) # pruned adjacency tensor
print(W)
With examples/dashboard_rmse_optimized.py, the DAG network plot uses global
top-N edges and writes dag_adjacency.csv beside the HTML output.
API reference
NOTEARS logic lives on DeepCausalMMM:
Defaults and tunables are in get_default_config().
See also Quick Start Guide (NOTEARS subsection) and the v1.0.21 entry in
CHANGELOG.md.