deepcausalmmm.utils.data_generator
Config-driven synthetic data generator for DeepCausalMMM. Replaces hardcoded data generation with configurable parameters.
Functions
|
Simple wrapper to generate synthetic MMM data as a DataFrame. |
Get default synthetic data configuration. |
|
Update configuration with synthetic data parameters. |
Classes
|
Generate synthetic MMM data using configuration parameters. |
- class deepcausalmmm.utils.data_generator.ConfigurableDataGenerator(config: Dict[str, Any] | None = None)[source]
Generate synthetic MMM data using configuration parameters.
All data generation parameters are driven by configuration to ensure consistency and reproducibility across examples and tests.
- __init__(config: Dict[str, Any] | None = None)[source]
Initialize the data generator.
- Parameters:
config – Configuration dictionary. If None, uses default config.
- generate_mmm_dataset(n_regions: int = 2, n_weeks: int = 104, n_media_channels: int = 5, n_control_channels: int = 3) Tuple[ndarray, ndarray, ndarray][source]
Generate a complete MMM dataset with realistic patterns.
- Parameters:
n_regions – Number of regions
n_weeks – Number of weeks
n_media_channels – Number of media channels
n_control_channels – Number of control variables
- Returns:
Tuple of (X_media, X_control, y) arrays
- deepcausalmmm.utils.data_generator.get_synthetic_data_config() Dict[str, Any][source]
Get default synthetic data configuration.
- Returns:
Dictionary with synthetic data parameters
- deepcausalmmm.utils.data_generator.update_config_with_synthetic_data(config: Dict[str, Any]) Dict[str, Any][source]
Update configuration with synthetic data parameters.
- Parameters:
config – Base configuration
- Returns:
Updated configuration with synthetic data settings
- deepcausalmmm.utils.data_generator.generate_synthetic_mmm_data(n_regions: int = 10, n_weeks: int = 52, n_media: int = 5, n_controls: int = 3, seed: int = 42)[source]
Simple wrapper to generate synthetic MMM data as a DataFrame.
- Parameters:
n_regions – Number of regions/DMAs
n_weeks – Number of weeks
n_media – Number of media channels
n_controls – Number of control variables
seed – Random seed for reproducibility
- Returns:
pandas DataFrame with synthetic MMM data