A/B testing is how the best growth teams make decisions. Instead of debating whether a blue or green button converts better, they test it. Instead of guessing what headline works, they measure it. This guide walks you through everything you need to run your first experiment — from hypothesis to analysis.
What You'll Learn
- What A/B testing is and why it matters
- How to write a strong hypothesis
- Setting up your first experiment (with code examples)
- How to read and interpret results
- Common mistakes that invalidate tests
What Is A/B Testing?
A/B testing (also called split testing) is showing two different versions of a webpage to different visitors and measuring which version performs better. "Version A" is your current page (the control), and "Version B" is the modified version (the variant).
Control
Your current page, unchanged. This is the baseline you're comparing against.
Variant
The modified version with your change. This is what you think will perform better.
Traffic is split randomly — typically 50/50 — so each version gets a statistically comparable audience. After collecting enough data, you analyze which version had a higher conversion rate (or whichever metric you're tracking).
Why A/B Testing Matters
Without testing, product decisions come down to opinions — the loudest person in the room wins. A/B testing replaces opinions with evidence:
Compound growth: A 5% improvement per month compounds to 80% annual growth. Small, consistent wins add up dramatically.
Risk reduction: Instead of rolling out a redesign and praying, test changes on a portion of traffic first.
Data-driven culture: Teams that test regularly make better decisions across the board — not just on websites.
Step 1: Form a Hypothesis
Every good experiment starts with a hypothesis — a specific, testable prediction about what will happen and why. A strong hypothesis follows this format:
"If we [change X], then [metric Y] will [increase/decrease] because [reason Z]."
Here are examples of strong vs. weak hypotheses:
Strong: "If we change the CTA button from 'Submit' to 'Get My Free Report', signups will increase by 10% because action-oriented copy creates urgency and clarifies the value."
Weak: "Let's try a different button color and see what happens."
The "because" is the most important part. It forces you to think about why a change would work, which helps you learn even when tests lose.
Step 2: Choose Your Primary Metric
Before running any test, decide on one primary metric that determines success. This is your North Star for the experiment.
| If You're Testing... | Primary Metric | Example |
|---|---|---|
| Signup page | Signup conversion rate | Signups ÷ unique visitors |
| Pricing page | Plan selection rate | Clicks on pricing CTA ÷ page views |
| Product page | Add-to-cart rate | Add to cart clicks ÷ product views |
| Landing page | Lead conversion rate | Form submissions ÷ page visits |
Avoid Multiple Primary Metrics
Testing against multiple primary metrics inflates your false positive rate. Choose one. You can track secondary metrics for learning, but only one metric decides the winner.
Step 3: Calculate Sample Size
Before launching, estimate how many visitors you'll need. This prevents two common mistakes: stopping too early (false positives) or running too long (wasting time).
The sample size depends on three factors:
Baseline Conversion Rate
Your current conversion rate. If it's 3%, you need more visitors than if it's 30%.
Minimum Detectable Effect (MDE)
The smallest improvement you care about. A 5% relative lift needs fewer visitors than detecting a 2% lift.
Statistical Power & Confidence
Industry standard: 80% power, 95% confidence. This means a 5% chance of false positives and 20% chance of missing a real effect.
Quick rule of thumb: For a typical SaaS signup page with a 3% conversion rate, detecting a 10% relative improvement (3.0% → 3.3%) requires roughly 25,000 visitors per variant — or 50,000 total.
Step 4: Set Up Your Experiment
There are two approaches to implementation: no-code (visual editor) and code-based. For your first test, we recommend the no-code approach.
Option A: No-Code with a Visual Editor
Tools like ExperimentHQ, VWO, and Optimizely offer visual editors that let you modify your page without writing code. Here's the setup process with ExperimentHQ:
Add the ExperimentHQ snippet to your site's <head> tag:
<!-- ExperimentHQ A/B Testing Snippet -->
<script async
src="https://cdn.experimenthq.io/snippet.min.js"
data-site="YOUR_SITE_KEY">
</script>Open the visual editor from your ExperimentHQ dashboard and navigate to the page you want to test.
Click any element to modify it — change text, swap images, hide or rearrange elements.
Set your conversion goal (e.g., click on signup button) and traffic allocation (usually 50/50).
Hit "Launch" and start collecting data.
Option B: Code-Based A/B Testing
For more complex changes or single-page apps, you can implement tests in code. Here's a minimal JavaScript example:
// Simple client-side A/B test
const variant = Math.random() < 0.5 ? 'control' : 'variant';
if (variant === 'variant') {
document.querySelector('.hero-headline').textContent =
'Start Growing Your Revenue Today';
document.querySelector('.cta-button').textContent =
'Get Started Free';
}
// Track which variant the user saw
analytics.track('experiment_viewed', {
experiment: 'homepage_hero_test',
variant: variant
});Watch Out for Flicker
Client-side testing can cause a "flash of original content" (FOOC) where visitors briefly see the control before the variant loads. This biases results. Use async-ready tools like ExperimentHQ that apply changes before the page renders, or implement anti-flicker snippets.
Step 5: Run the Experiment
Once launched, the hardest part is being patient. Here are the rules for running a clean experiment:
Don't peek at results early
Checking results multiple times inflates your false positive rate. This is called "peeking" — and it's the #1 reason teams think they found a winner when they didn't. Set a review date and stick to it.
Run for at least 1–2 full weeks
User behavior varies by day of week. A test that runs Monday–Wednesday will miss weekend patterns. Always include at least one full business cycle.
Don't change anything mid-test
No redesigns, no traffic source changes, no new promotions. Any external change contaminates results and makes them uninterpretable.
Step 6: Analyze Results
When your test reaches the target sample size, it's time to analyze. Here's what to look for:
Statistical Significance
Statistical significance tells you the probability that your result is real — not just random chance. The industry standard is 95% confidence (p-value < 0.05). This means there's only a 5% chance the difference you're seeing is due to random variation.
>95%
Significant — implement the winner
85–95%
Promising — consider extending
<85%
Not significant — no clear winner
What to Do with Results
Variant wins: Implement the variant as the new default. Document what you learned and move on to the next test.
No clear winner: The change had no meaningful impact. This is still a win — you've learned that this element isn't the bottleneck.
Variant loses: The change hurt conversions. Revert immediately — now you know what not to change.
5 Mistakes That Invalidate A/B Tests
1. Stopping the test too early
The most common mistake. A variant that's 20% ahead after 100 visitors might be 5% behind after 5,000. Always run until your pre-determined sample size is reached.
2. Testing too many changes at once
If you change the headline, the image, and the button color simultaneously, you won't know which change drove the result. Test one meaningful change at a time.
3. Ignoring the novelty effect
Returning visitors react differently to changes purely because they're new. Run tests long enough to capture multiple cohorts of returning visitors.
4. Ignoring segment differences
An experiment might win overall but lose for mobile users. Always segment results by device, traffic source, and user type before declaring a winner.
5. Running tests during unusual periods
Black Friday, product launches, PR spikes, and holidays distort traffic patterns. Never start a new test during these periods.
What Should You Test First?
Focus your first tests on pages with the highest leverage — where a small improvement has the biggest revenue impact.
High-Impact Pages
- Homepage hero section
- Pricing page layout
- Signup / checkout flow
- Product detail page
High-Impact Elements
- CTA button copy and colour
- Headline and subheadline
- Social proof placement
- Form length and field order
Ready to run your first test?
ExperimentHQ has a free tier — 3 experiments, 50K monthly visitors. No credit card required.