Response Curves
===============

The response curves module provides non-linear saturation analysis using Hill equations to model diminishing returns in marketing channels.

Overview
--------

Response curves help you understand:

* **Saturation Points**: When additional spend/impressions yield diminishing returns
* **Optimal Allocation**: Which channels have room for increased investment
* **S-Shaped Relationships**: Non-linear effects of marketing activities
* **Channel Efficiency**: Compare saturation across different channels

Key Features
------------

* **Hill Equation Fitting**: Fits S-shaped saturation curves to channel data
* **Automatic Aggregation**: Aggregates DMA-week data to national weekly level
* **Direct Attribution**: Works with additive contributions from linear scaling (v1.0.19+)
* **Interactive Visualizations**: Plotly-based plots with hover details
* **Performance Metrics**: R², slope, and saturation point for each channel
* **Backward Compatibility**: Maintains support for legacy method names

ResponseCurveFit Class
----------------------

.. autoclass:: deepcausalmmm.postprocess.response_curves.ResponseCurveFit
   :members:
   :undoc-members:
   :show-inheritance:
   :special-members: __init__

Basic Usage
-----------

Fitting Response Curves
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from deepcausalmmm.postprocess import ResponseCurveFit
    import pandas as pd

    # Prepare your data
    channel_data = pd.DataFrame({
        'week': [1, 2, 3, ...],
        'impressions': [10000, 15000, 20000, ...],
        'contributions': [500000, 650000, 750000, ...]
    })

    # Initialize fitter
    fitter = ResponseCurveFit(
        data=channel_data,
        x_col='impressions',
        y_col='contributions',
        model_level='national',
        date_col='week'
    )

    # Fit the curve
    slope, saturation = fitter.fit_curve()
    print(f"Slope (a): {slope:.3f}")
    print(f"Half-Saturation Point (g): {saturation:.0f}")

Calculating R² and Plotting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Calculate R² and generate interactive plot
    r2_score = fitter.calculate_r2_and_plot(
        save_path='response_curve_channel.html'
    )
    print(f"R² Score: {r2_score:.3f}")

Complete Workflow
~~~~~~~~~~~~~~~~~

.. code-block:: python

    from deepcausalmmm.postprocess import ResponseCurveFit
    import pandas as pd

    # Load channel data (impressions and contributions)
    df = pd.read_csv('channel_data.csv')

    # Initialize and fit
    fitter = ResponseCurveFit(
        data=df,
        x_col='impressions',
        y_col='contributions',
        model_level='national',
        date_col='week'
    )

    # Get fitted parameters
    slope, saturation = fitter.fit_curve()

    # Generate plot and get R²
    r2 = fitter.calculate_r2_and_plot(save_path='curve.html')

    # Interpret results
    print(f"Channel Saturation Analysis:")
    print(f"  Slope (a): {slope:.3f}")
    print(f"  Half-Saturation (g): {saturation:,.0f} impressions")
    print(f"  Fit Quality (R²): {r2:.3f}")
    
    if slope >= 2.0:
        print("  Strong S-shaped curve (diminishing returns)")
    else:
        print("  Gentle curve (less pronounced saturation)")
    
    if r2 >= 0.8:
        print("  Excellent fit")
    elif r2 >= 0.6:
        print("  Good fit")
    else:
        print("  Moderate fit - review data quality")

Advanced Usage
--------------

Batch Processing Multiple Channels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    import pandas as pd
    from deepcausalmmm.postprocess import ResponseCurveFit

    # Assume you have a DataFrame with multiple channels
    all_channels_data = pd.read_csv('all_channels.csv')
    
    results = []
    for channel in all_channels_data['channel'].unique():
        # Filter data for this channel
        channel_df = all_channels_data[
            all_channels_data['channel'] == channel
        ].copy()
        
        # Fit response curve
        fitter = ResponseCurveFit(
            data=channel_df,
            x_col='impressions',
            y_col='contributions',
            model_level='national',
            date_col='week'
        )
        
        slope, saturation = fitter.fit_curve()
        r2 = fitter.calculate_r2_and_plot(
            save_path=f'curves/{channel}_response_curve.html'
        )
        
        results.append({
            'channel': channel,
            'slope': slope,
            'saturation': saturation,
            'r2': r2
        })
    
    # Create summary DataFrame
    summary = pd.DataFrame(results)
    summary = summary.sort_values('r2', ascending=False)
    print(summary)

Interpreting Results
~~~~~~~~~~~~~~~~~~~~

**Slope (a) Parameter:**

* ``a >= 3.0``: Very strong S-curve, rapid saturation
* ``2.0 <= a < 3.0``: Strong S-curve, clear diminishing returns
* ``1.0 <= a < 2.0``: Gentle curve, gradual saturation
* ``a < 1.0``: Very gentle, almost linear

**Half-Saturation Point (g):**

* The impression/spend level where the channel reaches 50% of maximum effect
* Lower values indicate faster saturation
* Compare across channels to identify efficiency

**R² Score:**

* ``R² >= 0.8``: Excellent fit, high confidence
* ``0.6 <= R² < 0.8``: Good fit, reasonable confidence
* ``0.4 <= R² < 0.6``: Moderate fit, review data
* ``R² < 0.4``: Poor fit, investigate data quality or model assumptions

Hill Equation
-------------

The response curve uses the Hill equation:

.. math::

    y = \\frac{x^a}{x^a + g^a}

Where:

* ``x``: Input variable (impressions or spend)
* ``y``: Output variable (contributions or response)
* ``a``: Slope parameter (controls steepness of S-curve)
* ``g``: Half-saturation point (x value where y = 0.5)

Properties:

* **Monotonic**: Always increasing
* **Bounded**: Output between 0 and 1 (when normalized)
* **S-Shaped**: When ``a >= 2.0``
* **Half-Saturation**: ``y(g) = 0.5``

Technical Details
-----------------

Fitting Algorithm
~~~~~~~~~~~~~~~~~

The module uses ``scipy.optimize.curve_fit`` with:

* **Initial Guess**: ``a=1``, ``g=median(x)``
* **Bounds**: ``a ∈ [0.01, 100]``, ``g ∈ [0.01, max(x) × 10]``
* **Method**: Trust Region Reflective (default)
* **Max Iterations**: 10,000

Data Preprocessing
~~~~~~~~~~~~~~~~~~

1. **Aggregation**: Groups by date column and sums x and y
2. **Sorting**: Sorts by x values for consistent fitting
3. **Normalization**: Internally normalizes y for numerical stability
4. **Scaling**: Scales fitted curve back to original y scale

Backward Compatibility
----------------------

Legacy Method Names
~~~~~~~~~~~~~~~~~~~

For backward compatibility, the following legacy method names are supported:

.. code-block:: python

    fitter = ResponseCurveFit(data=df, x_col='x', y_col='y')
    
    # New API (recommended)
    result = fitter._hill_equation(x, a, g)
    slope, sat = fitter.fit_curve()
    r2 = fitter.calculate_r2_and_plot()
    
    # Legacy API (still works)
    result = fitter.Hill(x, a, g)
    slope, sat = fitter.get_param()
    r2 = fitter.regression()
    slope, sat = fitter.fit_model()  # Alias for fit_curve

Legacy Parameter Names
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # New API (recommended)
    fitter = ResponseCurveFit(
        data=df,
        x_col='impressions',
        y_col='contributions',
        model_level='national',
        date_col='week'
    )
    
    # Legacy API (still works)
    fitter = ResponseCurveFit(
        data=df,
        x_col='impressions',
        y_col='contributions',
        Modellevel='national',  # Old name
        Datecol='week'  # Old name
    )

ResponseCurveFitter Alias
~~~~~~~~~~~~~~~~~~~~~~~~~

The original class name ``ResponseCurveFitter`` is maintained as an alias:

.. code-block:: python

    from deepcausalmmm.postprocess import ResponseCurveFitter
    
    # This works identically to ResponseCurveFit
    fitter = ResponseCurveFitter(data=df, x_col='x', y_col='y')

Best Practices
--------------

Data Quality
~~~~~~~~~~~~

* **Sufficient Points**: Use at least 20-30 data points for reliable fitting
* **Range Coverage**: Ensure data covers a wide range of x values
* **Outlier Handling**: Remove or investigate extreme outliers before fitting
* **Monotonicity**: Response curves assume y generally increases with x

Aggregation Level
~~~~~~~~~~~~~~~~~

* **National Weekly**: Recommended for most analyses (reduces noise)
* **Regional**: Use when analyzing regional differences
* **DMA-Level**: Use with caution (high variance)

Model Validation
~~~~~~~~~~~~~~~~

* **R² Threshold**: Aim for R² >= 0.6 for reliable insights
* **Visual Inspection**: Always review the generated plots
* **Business Logic**: Ensure fitted parameters make business sense
* **Cross-Validation**: Test on holdout periods when possible

Common Issues
~~~~~~~~~~~~~

**Poor Fit (Low R²)**:

* Check for outliers or data quality issues
* Verify monotonic relationship between x and y
* Consider if Hill equation is appropriate for your data
* Try different aggregation levels

**Unrealistic Parameters**:

* Very high slope (a > 10): May indicate overfitting
* Very high saturation (g >> max(x)): Channel not reaching saturation
* Very low saturation (g << median(x)): Most data in saturated region

**Convergence Issues**:

* Increase max iterations
* Try different initial guesses
* Check for numerical issues (very large/small values)
* Normalize your data before fitting

Examples
--------

See the ``examples/`` directory for complete examples:

* ``example_response_curves.py``: Full workflow with DeepCausalMMM integration
* ``dashboard_rmse_optimized.py``: Dashboard with integrated response curves

See Also
--------

* :doc:`analysis`: General analysis utilities
* :doc:`core`: Core model components
* :doc:`../tutorials/index`: Step-by-step tutorials
* :doc:`../examples/index`: Practical examples