Feature flags are the most underrated tool in a development team arsenal. They decouple deployment from release, let you ship incomplete features safely, and turn every release into a controlled experiment. But they also create technical debt if managed poorly. This guide covers everything from the four types of flags to the cleanup strategies that keep your codebase healthy.
1. What Feature Flags Are (And Aren't)
A feature flag is a conditional check in your code that controls whether a feature is active. At its simplest:
if (featureFlags.isEnabled('new-checkout')) {
renderNewCheckout()
} else {
renderOldCheckout()
}That's it. The power comes from what controls that boolean — a static config file, a remote service, a user attribute, a percentage rollout, or an A/B test assignment.
Feature flags are:
Feature flags are NOT:
The best teams use flags aggressively for deployment safety and experimentation, but clean them up ruthlessly afterward.
2. Four Types of Feature Flags
Not all flags are equal. Understanding the four types helps you set appropriate lifespans and ownership:
Release Flags
Purpose: Hide incomplete features in production while developing in main.
Lifespan: Days to weeks. Remove as soon as the feature is fully launched.
Example: new-dashboard-v2 — shows the new dashboard to internal users while building it.
Experiment Flags
Purpose: Run A/B tests by assigning users to different code paths.
Lifespan: 2-8 weeks (duration of the experiment). Remove after the test concludes.
Example: pricing-page-v3-test — 50% of users see the new pricing layout.
Ops Flags (Kill Switches)
Purpose: Disable features that depend on external services or have known failure modes.
Lifespan: Long-lived. These stay in the codebase permanently as safety mechanisms.
Example: enable-third-party-chat — disable the chat widget if the provider has an outage.
Permission Flags
Purpose: Expose features to specific user segments (beta users, enterprise tier, internal team).
Lifespan: Medium to long. Stays active while the feature is restricted.
Example: advanced-analytics-beta — only available to users on the Pro plan.
Each type has different ownership, different cleanup urgency, and different monitoring needs. Treat them accordingly.
3. Implementation Patterns
There are three main patterns for evaluating feature flags, each suited to different architectures:
Pattern 1: Server-side evaluation
// Server-side (Node.js / Next.js API route)
import { getFlags } from './feature-flags'
export async function handler(req, res) {
const flags = await getFlags(req.user.id)
if (flags['new-api-response'].enabled) {
return res.json(newApiResponse(req))
}
return res.json(legacyApiResponse(req))
}Best for: API responses, server-rendered pages, sensitive logic. No exposure of flag config to the client.
Pattern 2: Client-side evaluation
// React component
function Dashboard() {
const flags = useFeatureFlags()
return (
<div>
<Header />
{flags['new-sidebar'].enabled
? <NewSidebar />
: <OldSidebar />
}
<MainContent />
</div>
)
}Best for: UI variations, interactive features, non-sensitive toggles. Fast evaluation but flag config is visible to users.
Pattern 3: Edge evaluation
// Cloudflare Worker / Vercel Edge Middleware
export default {
async fetch(request, env) {
const userId = getUserId(request)
const variant = await evaluateFlag('new-landing', userId)
if (variant === 'enabled') {
return fetch(new URL('/landing-v2', request.url))
}
return fetch(request)
}
}Best for: CDN-level routing, landing page experiments, geographic targeting. Sub-millisecond decisions at the edge.
4. Gradual Rollouts
Gradual rollouts are the killer use case for feature flags. Instead of shipping to 100% of users and praying, you ship to 1% and verify.
The standard rollout pattern:
The key to gradual rollouts: sticky assignment. A user who sees the new version at 1% must continue seeing it at 10% and 50%. Otherwise your metrics are garbage. This is typically done by hashing the user ID:
function isInRollout(userId, percentage) {
// Deterministic: same user always gets same result
const hash = murmurhash3(userId) % 100
return hash < percentage
}
// At 1%: users with hash 0 see the feature
// At 10%: users with hash 0-9 see the feature
// At 50%: users with hash 0-49 see the feature
// Original 1% users are always includedExperimentHQ handles sticky assignment automatically — including across sessions, devices (when logged in), and rollout percentage changes.
5. Kill Switches
A kill switch is a feature flag that defaults to ON and can be turned OFF instantly. It's your emergency brake.
Every feature that depends on an external service should have one:
// Kill switch pattern
const chatEnabled = featureFlags.isEnabled('enable-live-chat', {
default: true, // ON by default
})
if (chatEnabled) {
loadLiveChatWidget()
}
// When Intercom goes down at 2am:
// Flip the flag to OFF in the dashboard
// No deploy needed. Instant. Global.Kill switches should be:
enable-stripe-checkout not flag_47.The cost of not having kill switches: a 2am PagerDuty alert, a frantic deploy that takes 15 minutes, and a potential rollback that affects unrelated changes. With a kill switch: one click, problem solved.
6. A/B Testing with Feature Flags
A/B testing is a feature flag with measurement. Instead of just toggling on/off, you assign users to variants and track which performs better.
The workflow:
// Experiment flag (multi-variant)
const variant = featureFlags.getVariant('checkout-experiment', userId)
switch (variant) {
case 'one-page':
return <OnePageCheckout />
case 'accordion':
return <AccordionCheckout />
default:
return <OriginalCheckout />
}
// Track conversion
featureFlags.track('purchase', { revenue: order.total })The advantage of combining flags and experiments: you can start with a gradual rollout (is the feature stable?) and seamlessly transition to a measured experiment (is the feature better?), all with the same flag.
7. Flag Cleanup Strategies
Feature flags are temporary by nature (except kill switches). But in practice, they accumulate like technical debt because removing them feels less urgent than adding new ones. This is a real problem — stale flags make code harder to understand, increase branch complexity, and can cause subtle bugs.
Here's how top teams manage flag hygiene:
A practical cleanup pattern:
// Before cleanup (flag at 100% for 3 weeks)
if (featureFlags.isEnabled('new-pricing-page')) {
return <NewPricingPage /> // ← everyone sees this
} else {
return <OldPricingPage /> // ← dead code
}
// After cleanup
return <NewPricingPage /> // Flag removed, old code deletedRule of thumb: if you have more than 20 active flags at any time, you're accumulating debt faster than you're paying it down. Treat flag cleanup with the same urgency as bug fixes.
8. Feature Flags with ExperimentHQ
ExperimentHQ combines feature flags and A/B testing in a single platform, so you don't need separate tools for deployment safety and experimentation:
The typical workflow: create a release flag for safe deployment, convert it to an experiment flag for measurement, ship the winner, remove the flag. All in one tool, one dashboard, one line of code.
Start Testing Today
The best experimentation programs start with a single test. Pick your highest-traffic page, form a hypothesis, and run your first experiment this week. The data will guide everything after that.