Response Curves

The response curves module provides non-linear saturation analysis using Hill equations to model diminishing returns in marketing channels.

Overview

Response curves help you understand:

  • Saturation Points: When additional spend/impressions yield diminishing returns

  • Optimal Allocation: Which channels have room for increased investment

  • S-Shaped Relationships: Non-linear effects of marketing activities

  • Channel Efficiency: Compare saturation across different channels

Key Features

  • Hill Equation Fitting: Fits S-shaped saturation curves to channel data

  • Automatic Aggregation: Aggregates DMA-week data to national weekly level

  • Direct Attribution: Works with additive contributions from linear scaling (v1.0.19+)

  • Interactive Visualizations: Plotly-based plots with hover details

  • Performance Metrics: R², slope, and saturation point for each channel

  • Backward Compatibility: Maintains support for legacy method names

ResponseCurveFit Class

class deepcausalmmm.postprocess.response_curves.ResponseCurveFit(data: DataFrame, *, bottom_param: bool = False, model_level: Literal['Overall', 'DMA'] = 'Overall', date_col: str = 'week_monday')[source]

Bases: object

Fit Hill equation response curves to marketing mix model predictions.

The Hill equation models saturation effects: y = bottom + (top - bottom) * x^slope / (saturation^slope + x^slope)

Parameters:
  • data (pd.DataFrame) – DataFrame with columns: ‘week_monday’, ‘spend’, ‘impressions’, ‘predicted’ For DMA-level: also needs ‘dmacode’ column

  • bottom_param (bool, default=False) – Whether to fit a non-zero intercept (bottom parameter) For MMM, typically False (response at zero spend = 0)

  • Modellevel (str, default='Overall') – ‘Overall’: Single aggregated curve across all regions ‘DMA’: Separate curves for each DMA

  • Datecol (str, default='week_monday') – Name of the date column

top

Maximum response (saturation level)

Type:

float

bottom

Minimum response (typically 0)

Type:

float

saturation

Spend level at half-maximum response

Type:

float

slope

Steepness of the curve

Type:

float

r_2

R-squared score of the fitted curve

Type:

float

equation

String representation of the fitted equation

Type:

str

figure

Plotly figure object (if generate_figure=True)

Type:

go.Figure

Examples

>>> # Prepare data
>>> data = pd.DataFrame({
...     'week_monday': dates,
...     'spend': spend_values,
...     'impressions': impression_values,
...     'predicted': model_predictions
... })
>>>
>>> # Fit overall response curve
>>> fitter = ResponseCurveFit(data, Modellevel='Overall')
>>> fitter.fit_model(
...     title="Response Curve",
...     x_label="Impressions",
...     y_label="Predicted Visits",
...     generate_figure=True,
...     save_figure=True,
...     output_path='response_curve.html'
... )
>>> print(f"R²: {fitter.r_2:.3f}")
>>> print(f"Slope: {fitter.slope:.3f}")
__init__(data: DataFrame, *, bottom_param: bool = False, model_level: Literal['Overall', 'DMA'] = 'Overall', date_col: str = 'week_monday') None[source]

Initialize ResponseCurveFit.

Parameters:
  • data (pd.DataFrame) – Input data with required columns

  • bottom_param (bool, default=False) – Whether to fit non-zero intercept

  • model_level ({'Overall', 'DMA'}, default='Overall') – Aggregation level for fitting

  • date_col (str, default='week_monday') – Name of date column

Hill(X: ndarray, *params) ndarray[source]

Backward compatibility wrapper for _hill_equation.

get_param(curve_fit_kws: dict) List[float][source]

Backward compatibility wrapper for _fit_curve.

regression(x_fit, y_fit, x_label, y_label, title, sigfigs, log_x, print_r_sqr, generate_figure, view_figure, *params) None[source]

Backward compatibility wrapper for _calculate_r2_and_plot.

fit(*, x_label: str = 'x', y_label: str = 'y', title: str = 'Fitted Hill equation', sigfigs: int = 6, log_x: bool = False, print_r_sqr: bool = True, generate_figure: bool = True, view_figure: bool = False, save_figure: bool = False, output_path: str | None = None, curve_fit_kws: dict | None = None) DataFrame | None[source]

Fit Hill equation to the data.

Parameters:
  • x_label (str, default='x') – X-axis label

  • y_label (str, default='y') – Y-axis label

  • title (str, default='Fitted Hill equation') – Plot title

  • sigfigs (int, default=6) – Significant figures for equation display

  • log_x (bool, default=False) – Whether to use log scale for x-axis

  • print_r_sqr (bool, default=True) – Whether to print R² score

  • generate_figure (bool, default=True) – Whether to generate visualization

  • view_figure (bool, default=False) – Whether to display the figure

  • save_figure (bool, default=False) – Whether to save the figure

  • output_path (str, optional) – Path to save the figure (if save_figure=True)

  • curve_fit_kws (dict, optional) – Additional keyword arguments for scipy.optimize.curve_fit

Returns:

For DMA-level: DataFrame with parameters for each DMA For Overall: None (parameters stored as attributes)

Return type:

pd.DataFrame or None

fit_model(**kwargs) DataFrame | None[source]

Backward compatibility wrapper for fit().

predict(X: ndarray) ndarray[source]

Predict response for new spend levels (Overall level only).

Parameters:

X (np.ndarray) – New spend/impression values

Returns:

Predicted response values

Return type:

np.ndarray

get_summary()[source]

Get summary of fitted parameters.

Returns:

Dictionary with ‘params’, ‘r2’, and ‘equation’

Return type:

dict

Basic Usage

Fitting Response Curves

from deepcausalmmm.postprocess import ResponseCurveFit
import pandas as pd

# Prepare your data
channel_data = pd.DataFrame({
    'week': [1, 2, 3, ...],
    'impressions': [10000, 15000, 20000, ...],
    'contributions': [500000, 650000, 750000, ...]
})

# Initialize fitter
fitter = ResponseCurveFit(
    data=channel_data,
    x_col='impressions',
    y_col='contributions',
    model_level='national',
    date_col='week'
)

# Fit the curve
slope, saturation = fitter.fit_curve()
print(f"Slope (a): {slope:.3f}")
print(f"Half-Saturation Point (g): {saturation:.0f}")

Calculating R² and Plotting

# Calculate R² and generate interactive plot
r2_score = fitter.calculate_r2_and_plot(
    save_path='response_curve_channel.html'
)
print(f"R² Score: {r2_score:.3f}")

Complete Workflow

from deepcausalmmm.postprocess import ResponseCurveFit
import pandas as pd

# Load channel data (impressions and contributions)
df = pd.read_csv('channel_data.csv')

# Initialize and fit
fitter = ResponseCurveFit(
    data=df,
    x_col='impressions',
    y_col='contributions',
    model_level='national',
    date_col='week'
)

# Get fitted parameters
slope, saturation = fitter.fit_curve()

# Generate plot and get R²
r2 = fitter.calculate_r2_and_plot(save_path='curve.html')

# Interpret results
print(f"Channel Saturation Analysis:")
print(f"  Slope (a): {slope:.3f}")
print(f"  Half-Saturation (g): {saturation:,.0f} impressions")
print(f"  Fit Quality (R²): {r2:.3f}")

if slope >= 2.0:
    print("  Strong S-shaped curve (diminishing returns)")
else:
    print("  Gentle curve (less pronounced saturation)")

if r2 >= 0.8:
    print("  Excellent fit")
elif r2 >= 0.6:
    print("  Good fit")
else:
    print("  Moderate fit - review data quality")

Advanced Usage

Batch Processing Multiple Channels

import pandas as pd
from deepcausalmmm.postprocess import ResponseCurveFit

# Assume you have a DataFrame with multiple channels
all_channels_data = pd.read_csv('all_channels.csv')

results = []
for channel in all_channels_data['channel'].unique():
    # Filter data for this channel
    channel_df = all_channels_data[
        all_channels_data['channel'] == channel
    ].copy()

    # Fit response curve
    fitter = ResponseCurveFit(
        data=channel_df,
        x_col='impressions',
        y_col='contributions',
        model_level='national',
        date_col='week'
    )

    slope, saturation = fitter.fit_curve()
    r2 = fitter.calculate_r2_and_plot(
        save_path=f'curves/{channel}_response_curve.html'
    )

    results.append({
        'channel': channel,
        'slope': slope,
        'saturation': saturation,
        'r2': r2
    })

# Create summary DataFrame
summary = pd.DataFrame(results)
summary = summary.sort_values('r2', ascending=False)
print(summary)

Interpreting Results

Slope (a) Parameter:

  • a >= 3.0: Very strong S-curve, rapid saturation

  • 2.0 <= a < 3.0: Strong S-curve, clear diminishing returns

  • 1.0 <= a < 2.0: Gentle curve, gradual saturation

  • a < 1.0: Very gentle, almost linear

Half-Saturation Point (g):

  • The impression/spend level where the channel reaches 50% of maximum effect

  • Lower values indicate faster saturation

  • Compare across channels to identify efficiency

R² Score:

  • >= 0.8: Excellent fit, high confidence

  • 0.6 <= < 0.8: Good fit, reasonable confidence

  • 0.4 <= < 0.6: Moderate fit, review data

  • < 0.4: Poor fit, investigate data quality or model assumptions

Hill Equation

The response curve uses the Hill equation:

\[\begin{split}y = \\frac{x^a}{x^a + g^a}\end{split}\]

Where:

  • x: Input variable (impressions or spend)

  • y: Output variable (contributions or response)

  • a: Slope parameter (controls steepness of S-curve)

  • g: Half-saturation point (x value where y = 0.5)

Properties:

  • Monotonic: Always increasing

  • Bounded: Output between 0 and 1 (when normalized)

  • S-Shaped: When a >= 2.0

  • Half-Saturation: y(g) = 0.5

Technical Details

Fitting Algorithm

The module uses scipy.optimize.curve_fit with:

  • Initial Guess: a=1, g=median(x)

  • Bounds: a [0.01, 100], g [0.01, max(x) × 10]

  • Method: Trust Region Reflective (default)

  • Max Iterations: 10,000

Data Preprocessing

  1. Aggregation: Groups by date column and sums x and y

  2. Sorting: Sorts by x values for consistent fitting

  3. Normalization: Internally normalizes y for numerical stability

  4. Scaling: Scales fitted curve back to original y scale

Backward Compatibility

Legacy Method Names

For backward compatibility, the following legacy method names are supported:

fitter = ResponseCurveFit(data=df, x_col='x', y_col='y')

# New API (recommended)
result = fitter._hill_equation(x, a, g)
slope, sat = fitter.fit_curve()
r2 = fitter.calculate_r2_and_plot()

# Legacy API (still works)
result = fitter.Hill(x, a, g)
slope, sat = fitter.get_param()
r2 = fitter.regression()
slope, sat = fitter.fit_model()  # Alias for fit_curve

Legacy Parameter Names

# New API (recommended)
fitter = ResponseCurveFit(
    data=df,
    x_col='impressions',
    y_col='contributions',
    model_level='national',
    date_col='week'
)

# Legacy API (still works)
fitter = ResponseCurveFit(
    data=df,
    x_col='impressions',
    y_col='contributions',
    Modellevel='national',  # Old name
    Datecol='week'  # Old name
)

ResponseCurveFitter Alias

The original class name ResponseCurveFitter is maintained as an alias:

from deepcausalmmm.postprocess import ResponseCurveFitter

# This works identically to ResponseCurveFit
fitter = ResponseCurveFitter(data=df, x_col='x', y_col='y')

Best Practices

Data Quality

  • Sufficient Points: Use at least 20-30 data points for reliable fitting

  • Range Coverage: Ensure data covers a wide range of x values

  • Outlier Handling: Remove or investigate extreme outliers before fitting

  • Monotonicity: Response curves assume y generally increases with x

Aggregation Level

  • National Weekly: Recommended for most analyses (reduces noise)

  • Regional: Use when analyzing regional differences

  • DMA-Level: Use with caution (high variance)

Model Validation

  • R² Threshold: Aim for R² >= 0.6 for reliable insights

  • Visual Inspection: Always review the generated plots

  • Business Logic: Ensure fitted parameters make business sense

  • Cross-Validation: Test on holdout periods when possible

Common Issues

Poor Fit (Low R²):

  • Check for outliers or data quality issues

  • Verify monotonic relationship between x and y

  • Consider if Hill equation is appropriate for your data

  • Try different aggregation levels

Unrealistic Parameters:

  • Very high slope (a > 10): May indicate overfitting

  • Very high saturation (g >> max(x)): Channel not reaching saturation

  • Very low saturation (g << median(x)): Most data in saturated region

Convergence Issues:

  • Increase max iterations

  • Try different initial guesses

  • Check for numerical issues (very large/small values)

  • Normalize your data before fitting

Examples

See the examples/ directory for complete examples:

  • example_response_curves.py: Full workflow with DeepCausalMMM integration

  • dashboard_rmse_optimized.py: Dashboard with integrated response curves

See Also