deepcausalmmm.utils.data_generator

Config-driven synthetic data generator for DeepCausalMMM. Replaces hardcoded data generation with configurable parameters.

Functions

generate_synthetic_mmm_data([n_regions, ...])

Simple wrapper to generate synthetic MMM data as a DataFrame.

get_synthetic_data_config()

Get default synthetic data configuration.

update_config_with_synthetic_data(config)

Update configuration with synthetic data parameters.

Classes

ConfigurableDataGenerator([config])

Generate synthetic MMM data using configuration parameters.

class deepcausalmmm.utils.data_generator.ConfigurableDataGenerator(config: Dict[str, Any] | None = None)[source]

Generate synthetic MMM data using configuration parameters.

All data generation parameters are driven by configuration to ensure consistency and reproducibility across examples and tests.

__init__(config: Dict[str, Any] | None = None)[source]

Initialize the data generator.

Parameters:

config – Configuration dictionary. If None, uses default config.

generate_mmm_dataset(n_regions: int = 2, n_weeks: int = 104, n_media_channels: int = 5, n_control_channels: int = 3) Tuple[ndarray, ndarray, ndarray][source]

Generate a complete MMM dataset with realistic patterns.

Parameters:
  • n_regions – Number of regions

  • n_weeks – Number of weeks

  • n_media_channels – Number of media channels

  • n_control_channels – Number of control variables

Returns:

Tuple of (X_media, X_control, y) arrays

deepcausalmmm.utils.data_generator.get_synthetic_data_config() Dict[str, Any][source]

Get default synthetic data configuration.

Returns:

Dictionary with synthetic data parameters

deepcausalmmm.utils.data_generator.update_config_with_synthetic_data(config: Dict[str, Any]) Dict[str, Any][source]

Update configuration with synthetic data parameters.

Parameters:

config – Base configuration

Returns:

Updated configuration with synthetic data settings

deepcausalmmm.utils.data_generator.generate_synthetic_mmm_data(n_regions: int = 10, n_weeks: int = 52, n_media: int = 5, n_controls: int = 3, seed: int = 42)[source]

Simple wrapper to generate synthetic MMM data as a DataFrame.

Parameters:
  • n_regions – Number of regions/DMAs

  • n_weeks – Number of weeks

  • n_media – Number of media channels

  • n_controls – Number of control variables

  • seed – Random seed for reproducibility

Returns:

pandas DataFrame with synthetic MMM data