CUPED Explained: Reduce A/B Test Duration by 50%

TL;DR

CUPED (Controlled-experiment Using Pre-Experiment Data) reduces variance in A/B tests by using pre-experiment user behavior as a covariate. This can reduce test duration by 30-50% or detect smaller effects. Requires user-level tracking and historical data. Best for: returning users, subscription metrics, repeat purchase behavior.

What is CUPED?

CUPED is a variance reduction technique developed by Microsoft. The core idea:

If you know a user's historical behavior, you can predict their behavior in the test.

By adjusting for this prediction, you reduce noise and get clearer signals.

Example: User A historically converts at 10%, User B at 2%. If both convert in your test, User B's conversion is more surprising (bigger signal). CUPED accounts for this.

Why Use CUPED?

Reduce test duration by 30-50%

Same statistical power with less data

Detect smaller effects

Find 5% improvements instead of needing 10%

Run more experiments

Faster iteration = more tests per year

Requirements for CUPED

Pre-experiment data

Need historical conversion data per user

User-level tracking

Must track same users before and during test

Stable metric

Pre-experiment metric should correlate with test metric

The CUPED Formula

Y_cuped = Y - θ(X - E[X])

Where:
- Y = metric value during experiment
- X = pre-experiment metric value
- E[X] = average pre-experiment value
- θ = Cov(X,Y) / Var(X) (optimal coefficient)

This adjusts each user's metric based on how much their historical behavior predicts their test behavior.

When CUPED Works Best

Good Use Cases

• Subscription metrics (MRR, churn)
• Repeat purchase behavior
• Engagement metrics (DAU, sessions)
• Returning user experiments

Poor Use Cases

• New user acquisition
• First-time purchase
• No historical data available
• Metrics with low correlation

CUPED Support

CUPED requires custom implementation and historical data infrastructure. Most visual A/B testing tools (including ExperimentHQ) don't support it natively. Best for: data teams at scale-ups with data warehouses.

For most teams, standard A/B testing is sufficient. Focus on running more tests rather than optimizing variance reduction.