How long should I run an A/B test?

Run your A/B test for at least 1-2 full weeks to account for day-of-week variations. Never stop a test early just because one variant looks like it's winning — wait until you reach 95% statistical significance.

How many visitors do I need for an A/B test?

For a typical test detecting a 10% relative improvement at 95% confidence and 80% power, you need roughly 25,000 visitors per variant. Lower-traffic sites can test larger changes to reach significance faster.

A/B Testing Guide: How to Run Your First Experiment

A/B testing is how the best growth teams make decisions. Instead of debating whether a blue or green button converts better, they test it. Instead of guessing what headline works, they measure it. This guide walks you through everything you need to run your first experiment — from hypothesis to analysis.

What You'll Learn

What A/B testing is and why it matters
How to write a strong hypothesis
Setting up your first experiment (with code examples)
How to read and interpret results
Common mistakes that invalidate tests

What Is A/B Testing?

A/B testing (also called split testing) is showing two different versions of a webpage to different visitors and measuring which version performs better. "Version A" is your current page (the control), and "Version B" is the modified version (the variant).

Control

Your current page, unchanged. This is the baseline you're comparing against.

Variant

The modified version with your change. This is what you think will perform better.

Traffic is split randomly — typically 50/50 — so each version gets a statistically comparable audience. After collecting enough data, you analyze which version had a higher conversion rate (or whichever metric you're tracking).

Why A/B Testing Matters

Without testing, product decisions come down to opinions — the loudest person in the room wins. A/B testing replaces opinions with evidence:

Compound growth: A 5% improvement per month compounds to 80% annual growth. Small, consistent wins add up dramatically.

Risk reduction: Instead of rolling out a redesign and praying, test changes on a portion of traffic first.

Data-driven culture: Teams that test regularly make better decisions across the board — not just on websites.

Step 1: Form a Hypothesis

Every good experiment starts with a hypothesis — a specific, testable prediction about what will happen and why. A strong hypothesis follows this format:

"If we [change X], then [metric Y] will [increase/decrease] because [reason Z]."

Here are examples of strong vs. weak hypotheses:

Strong: "If we change the CTA button from 'Submit' to 'Get My Free Report', signups will increase by 10% because action-oriented copy creates urgency and clarifies the value."

Weak: "Let's try a different button color and see what happens."

The "because" is the most important part. It forces you to think about why a change would work, which helps you learn even when tests lose.

Step 2: Choose Your Primary Metric

Before running any test, decide on one primary metric that determines success. This is your North Star for the experiment.

If You're Testing...	Primary Metric	Example
Signup page	Signup conversion rate	Signups ÷ unique visitors
Pricing page	Plan selection rate	Clicks on pricing CTA ÷ page views
Product page	Add-to-cart rate	Add to cart clicks ÷ product views
Landing page	Lead conversion rate	Form submissions ÷ page visits

Avoid Multiple Primary Metrics

Testing against multiple primary metrics inflates your false positive rate. Choose one. You can track secondary metrics for learning, but only one metric decides the winner.

Step 3: Calculate Sample Size

Before launching, estimate how many visitors you'll need. This prevents two common mistakes: stopping too early (false positives) or running too long (wasting time).

The sample size depends on three factors:

Baseline Conversion Rate

Your current conversion rate. If it's 3%, you need more visitors than if it's 30%.

Minimum Detectable Effect (MDE)

The smallest improvement you care about. A 5% relative lift needs fewer visitors than detecting a 2% lift.

Statistical Power & Confidence

Industry standard: 80% power, 95% confidence. This means a 5% chance of false positives and 20% chance of missing a real effect.

Quick rule of thumb: For a typical SaaS signup page with a 3% conversion rate, detecting a 10% relative improvement (3.0% → 3.3%) requires roughly 25,000 visitors per variant — or 50,000 total.

Step 4: Set Up Your Experiment

There are two approaches to implementation: no-code (visual editor) and code-based. For your first test, we recommend the no-code approach.

Option A: No-Code with a Visual Editor

Tools like ExperimentHQ, VWO, and Optimizely offer visual editors that let you modify your page without writing code. Here's the setup process with ExperimentHQ:

Add the ExperimentHQ snippet to your site's <head> tag:

<!-- ExperimentHQ A/B Testing Snippet -->
<script async
  src="https://cdn.experimenthq.io/snippet.min.js"
  data-site="YOUR_SITE_KEY">
</script>

Open the visual editor from your ExperimentHQ dashboard and navigate to the page you want to test.

Click any element to modify it — change text, swap images, hide or rearrange elements.

Set your conversion goal (e.g., click on signup button) and traffic allocation (usually 50/50).

Hit "Launch" and start collecting data.

Option B: Code-Based A/B Testing

For more complex changes or single-page apps, you can implement tests in code. Here's a minimal JavaScript example:

// Simple client-side A/B test
const variant = Math.random() < 0.5 ? 'control' : 'variant';

if (variant === 'variant') {
  document.querySelector('.hero-headline').textContent =
    'Start Growing Your Revenue Today';
  document.querySelector('.cta-button').textContent =
    'Get Started Free';
}

// Track which variant the user saw
analytics.track('experiment_viewed', {
  experiment: 'homepage_hero_test',
  variant: variant
});

Watch Out for Flicker

Client-side testing can cause a "flash of original content" (FOOC) where visitors briefly see the control before the variant loads. This biases results. Use async-ready tools like ExperimentHQ that apply changes before the page renders, or implement anti-flicker snippets.

Step 5: Run the Experiment

Once launched, the hardest part is being patient. Here are the rules for running a clean experiment:

Don't peek at results early

Checking results multiple times inflates your false positive rate. This is called "peeking" — and it's the #1 reason teams think they found a winner when they didn't. Set a review date and stick to it.

Run for at least 1–2 full weeks

User behavior varies by day of week. A test that runs Monday–Wednesday will miss weekend patterns. Always include at least one full business cycle.

Don't change anything mid-test

No redesigns, no traffic source changes, no new promotions. Any external change contaminates results and makes them uninterpretable.

Step 6: Analyze Results

When your test reaches the target sample size, it's time to analyze. Here's what to look for:

Statistical Significance

Statistical significance tells you the probability that your result is real — not just random chance. The industry standard is 95% confidence (p-value < 0.05). This means there's only a 5% chance the difference you're seeing is due to random variation.

>95%

Significant — implement the winner

85–95%

Promising — consider extending

<85%

Not significant — no clear winner

What to Do with Results

Variant wins: Implement the variant as the new default. Document what you learned and move on to the next test.

No clear winner: The change had no meaningful impact. This is still a win — you've learned that this element isn't the bottleneck.

Variant loses: The change hurt conversions. Revert immediately — now you know what not to change.

5 Mistakes That Invalidate A/B Tests

1. Stopping the test too early

The most common mistake. A variant that's 20% ahead after 100 visitors might be 5% behind after 5,000. Always run until your pre-determined sample size is reached.

2. Testing too many changes at once

If you change the headline, the image, and the button color simultaneously, you won't know which change drove the result. Test one meaningful change at a time.

3. Ignoring the novelty effect

Returning visitors react differently to changes purely because they're new. Run tests long enough to capture multiple cohorts of returning visitors.

4. Ignoring segment differences

An experiment might win overall but lose for mobile users. Always segment results by device, traffic source, and user type before declaring a winner.

5. Running tests during unusual periods

Black Friday, product launches, PR spikes, and holidays distort traffic patterns. Never start a new test during these periods.

What Should You Test First?

Focus your first tests on pages with the highest leverage — where a small improvement has the biggest revenue impact.

High-Impact Pages

Homepage hero section
Pricing page layout
Signup / checkout flow
Product detail page

High-Impact Elements

CTA button copy and colour
Headline and subheadline
Social proof placement
Form length and field order

Ready to run your first test?

ExperimentHQ has a free tier — 3 experiments, 50K monthly visitors. No credit card required.

See pricing →