
Yes, but it’s tricky. Testing multiple changes at once can confuse results, making it hard to know which change worked. This is why A/B testing typically focuses on one change at a time for clarity. If you want to test multiple elements together, multivariate testing is a better option, but it requires more traffic and careful planning.
Key Points:
- A/B Testing: Compare two versions to find the better one.
- Multivariate Testing: Test multiple elements at once to see how they interact.
- Challenges: Testing too many changes risks unclear results and false positives.
- Best Practices: Start with one change, then scale up to multivariate testing for high-traffic sites.
Quick Comparison:
If you're short on traffic, stick to A/B testing with bold changes. For high-traffic websites, multivariate testing can uncover deeper insights. Always plan your tests carefully to get clear, actionable results.
The ultimate guide to A/B testing | Ronny Kohavi (Airbnb, Microsoft, Amazon)
Airbnb
Can You Test More Than One Change at a Time?
Yes, you can test multiple changes at once, but it’s not as straightforward as it sounds. While it might seem like a time-saving approach, testing several changes simultaneously can lead to confusing results and is generally discouraged by experts.
Why Testing Multiple Changes Is Tricky
When you test multiple elements at the same time, it becomes difficult to figure out which specific change led to the observed outcome. This lack of clarity makes it hard to learn from the test and apply those lessons to future optimization efforts.
"When testing multiple changes on a single page, it can sometimes become difficult to keep track of which change is performing well and which isn't to be able to streamline your optimization efforts accordingly. Especially in a losing campaign, it becomes difficult to understand whether the entire test was flawed, or was it just one change that impacted the overall performance of the test."
Testing multiple elements also increases the likelihood of false positives. For example, if you’re using a standard significance level of 0.05 and testing 20 scenarios, at least one outcome is likely to appear significant purely by chance. Google’s infamous experiment with 41 shades of blue highlighted this issue - at a 95% confidence level, the risk of a false positive reached 88% [2].
Here’s what can go wrong when you test multiple changes at once:
- Murky analysis: Was it the headline tweak, the button color change, or the page redesign that drove the result? Without clarity, replicating success becomes a guessing game.
- Conflicting effects: One change might improve conversions while another has the opposite effect. The overall result could look neutral, masking what might have been a winning element on its own.
- Sample contamination: Longer testing durations - common when multiple variations are involved - can lead to issues like cookie deletion or shifts in user behavior over time, skewing your data [2].
"Each additional variation and goal adds a new combination of individual statistics for online experiments comparisons to an experiment. In a scenario where there are four variations and four goals, that's 16 potential outcomes that need to be controlled for separately."
- Optimizely Practical Guide to Stats [2]
When Multivariate Testing Makes Sense
Despite the challenges, there are times when multivariate testing - a method specifically designed for testing multiple elements - can be the right choice. This approach is particularly useful when you want to understand how different components of a page work together.
Multivariate testing is ideal for fine-tuning an already successful landing page, especially if it has a high conversion rate (over 10%) and you’re seeking incremental improvements [3].
"Multivariate testing can be useful to understand the impact that different parts of a user experience have on conversion - however conversion is defined in a specific context. We call this element contribution. This can be informative to do before a larger scale A/B test or a website redesign, so you can have a clearer understanding of how important certain elements are to success."
- Jake Sapirstein, Head of Strategy at LiftCentro [4]
It works best in controlled settings, like landing pages, checkout processes, or other defined user flows. However, keep in mind that multivariate testing requires significantly more traffic than A/B testing. Testing multiple combinations - often between 8 and 25 - means each variation needs enough visitors to reach statistical significance [3].
"Instead of testing each variable one by one with a traditional A/B test, you can see results on multiple variables faster. The time to build the test will be longer. However, once you determine the combination that has the highest conversion rate, you can implement and reap the rewards."
For businesses using PIMMS, the platform’s A/B testing tools allow you to distribute traffic across 2–4 different landing pages with custom weights. This ensures clear attribution, so you know exactly which page performs better. If your monthly traffic is under 100,000 unique visitors, stick to A/B tests that focus on single elements. Save multivariate testing for when you have high traffic volumes and want to understand how various elements interact.
How Many Variants Can You Run in a Single Test?
You can run 3, 4, or even more variants in a single test. However, keep in mind that every additional variant increases the traffic and time required to achieve statistically reliable results. This approach, called A/B/n testing, can be highly effective when executed thoughtfully, but it demands careful planning to avoid pitfalls.
Balancing Traffic and Variants
The relationship between the number of variants and the traffic required isn’t straightforward - it grows rapidly. For example, if your website has a 5% conversion rate and you’re aiming for a 10% improvement, here’s how much traffic you’d need to reach 95% statistical significance:
As you can see, moving from 2 to 3 variants increases the required traffic by about 50%, while adding a fourth variant pushes it up by another 34%. Beyond this, there’s the challenge of the multiple comparisons problem, which increases the likelihood of false positives.
"With more variations, you will need more visitors and conversions to prove a winner." - Ton Wesseling, founder of Online Dialogue [6]
"The greater the difference in appearance, the faster you detect significant performance differences." - Idan Michaeli, Dynamic Yield [6]
Your site’s traffic volume plays a key role in deciding how many variants to test. If you have low traffic, it’s better to test fewer variations with noticeable differences. On the other hand, high-traffic sites can afford to experiment with more ambitious setups. If your goal is to gain deep insights into user behavior, fewer variants may yield clearer results. But if you’re chasing a quick win and have the traffic to support it, testing more variations could be a worthwhile gamble [6][1].
Best Practices for A/B/n Testing
Experts have different opinions on the ideal number of variants. Some recommend limiting tests to five variations (including the control), while others argue for testing multiple options to maximize discovery:
"I don't think it's possible to give a general answer. The specific test setup depends on a number of factors (see below). From my personal experience and opinion, I would never test more than five variations (including control) at the same time." - Dr. Julia Engelmann, Head of Data & Analytics at Web Arts/konversionsKRAFT [6]
"The fewer options, the less valuable the test. Anything with less than four variants is a no go as far as I am concerned for our program, because of the limited chances of discovery and success." - Andrew Anderson, Director of Optimization at Recovery Brands [6]
Your testing strategy should consider your available traffic, the resources needed to create variations, and your learning objectives. Platforms like PIMMS allow traffic distribution across 2–4 landing pages with custom weights, making it easier to maintain clear attribution.
When testing multiple variants, account for the increased error rate by applying cumulative alpha corrections. The formula is Cumulative alpha = 1 - (1 - Alpha)^k, where Alpha is typically 0.05, and k is the number of variants [6]. Fortunately, most modern testing tools handle these adjustments automatically.
"The main point is not to get too hung up on which [correction] approach, just that it is done." - Matt Gershoff, CEO of Conductrics [6]
Start tests conservatively, focusing on bold changes rather than small tweaks. This increases the chances of identifying meaningful differences, especially if traffic is limited [7]. On average, roughly one in seven A/B tests produces a winning result [5].
"It comes down to how bold you are and how quickly you want results." - Idan Michaeli, Dynamic Yield [6]
What's a Good Conversion Lift from A/B Testing?
When it comes to interpreting conversion lifts, it's important to align results with realistic goals. A common question is: "What kind of improvement should I expect?" While there’s no universal answer, understanding typical benchmarks can help you set achievable expectations and build a solid, sustainable testing strategy.
Typical Success Metrics
Only about 20% of A/B tests reach the 95% significance threshold. Among these, the average lift is an impressive 61%. However, when you factor in all tests - successful or not - the average lift drops to around 4% [9][13]. Statistically, just 1 in every 7.5 tests delivers a significant improvement [9].
Even small gains can lead to huge business impacts. For instance, a minor tweak to Expedia's checkout form resulted in an additional $12 million in profits [11]. Across industries, specific changes have led to notable improvements:
- Simplifying landing page copy in SaaS can double conversion rates.
- Streamlined language on B2B pages has delivered up to a 50% lift.
- Incorporating user-generated content has driven conversion increases as high as 161% [9].
"Revenue per user is particularly useful for testing different pricing strategies or upsell offers. It's not always feasible to directly measure revenue, especially for B2B experimentation, where you don't necessarily know the LTV of a customer for a long time." - Alex Birkett, Co-founder, Omniscient Digital [10]
These metrics provide a framework for understanding results and recognizing why unusually large gains often require a closer look.
Understanding Large Results
While big conversion lifts - over 50% - might seem exciting, they often indicate issues with the test rather than exceptional success. Here’s why:
- External factors, like holidays, weather, or media coverage, can create temporary spikes that don’t last.
- The novelty effect might generate short-term enthusiasm that fades as users adjust to the change.
- Instrumentation improvements, such as faster load times or better mobile usability, may reflect fixes rather than genuine optimization.
For example, Dell once reported a 300% increase in conversion rates through A/B testing [8]. While impressive on the surface, such a dramatic figure likely pointed to problems with the original version rather than the new variation being groundbreaking.
To ensure reliable outcomes, tests should run for at least 3–4 weeks to account for weekly seasonality. Additionally, aim for each variation to achieve between 250 and 400 conversions before drawing any conclusions [12]. Taking this measured approach helps confirm that observed improvements are sustainable and not just artifacts of external influences.
"At least 80% of winning tests are completely worthless." - Martin Goodson, Qubit Research Lead [12]
If you’re using tools like PIMMS for A/B testing, it’s essential to track both immediate results and long-term performance. Built-in analytics can help determine whether observed gains are genuine optimizations or temporary anomalies. This ensures your testing program delivers meaningful, lasting results.
Ready to start?
Join thousands of marketers using PIMMS to track their campaigns.
How Much Traffic Do You Need for A/B Testing?
A common question businesses face is, "How much traffic is necessary for A/B testing?" While there's no one-size-fits-all answer, understanding the basics can help you design more effective tests.
"When it comes to A/B testing, figuring out the right sample size is a big deal. It's about ensuring valid and reliable results." - The Statsig Team [14]
The amount of traffic you need depends on the size of the change you're testing. If you're aiming for small improvements, you'll need significantly more traffic to detect those changes. On the other hand, testing for larger, more noticeable changes requires less traffic to reach reliable conclusions.
Basic Sample Size Guidelines
There's no universal traffic requirement, but some general benchmarks can guide your planning. For example, if your current conversion rate is 30% and you're expecting a substantial increase of over 20%, you'll need around 1,000 visits per variation to achieve statistically significant results [15].
However, as your baseline conversion rate drops or the expected improvement shrinks, sample size requirements grow quickly. For instance:
- A 5% conversion rate with a 20% expected increase needs about 7,500 visits per variation.
- A 2% conversion rate with only a 5% expected improvement demands nearly 310,000 visits per variation [15].
This explains why low-traffic pages often struggle with A/B testing. If your site gets fewer than 1,000 unique visitors per week or fewer than 5-10 conversions weekly, traditional A/B testing may not yield reliable results.
Several factors influence how much traffic you’ll need:
- Baseline conversion rate: Higher rates require fewer visits per variation, while lower rates demand more.
- Minimum detectable effect: Smaller changes are harder to detect and need more traffic.
- Statistical power and significance level: These settings, often 0.80 and 0.05 respectively, also impact sample size requirements [14].
For businesses with limited traffic, focus on testing bold, impactful changes rather than minor tweaks. Big changes are more likely to produce noticeable results, helping you draw conclusions faster. Additionally, consider testing micro-conversions like email signups or "Add to Cart" clicks, which tend to have higher conversion rates than final purchases [16].
Using a Sample Size Calculator
Sample size calculators can help you determine the exact traffic you need based on your baseline conversion rate, the smallest change you want to detect, and your desired confidence levels [14].
When using these tools, keep your expectations realistic. Remember, only one in seven A/B tests results in a winning outcome [5]. Setting conservative goals for improvement ensures you gather enough data for meaningful results.
Many A/B testing platforms include built-in tools to estimate the traffic or time needed for your tests [15]. For instance, if you’re using PIMMS, these calculations are integrated into the workflow, making it easier to plan timelines and allocate traffic across variations. Accurate sample size estimates not only guide traffic needs but also help determine how long your tests should run.
Most tests should last between 2 to 6 weeks, depending on your traffic and the size of the effect you’re measuring [14]. This timeframe accounts for weekly patterns and avoids drawing conclusions based on short-term fluctuations. Resist the urge to check results too early - it can lead to false insights and unreliable data [14].
To improve your A/B testing outcomes, focus on site-wide changes and straightforward A/B tests, especially if your site has low traffic. These approaches are more likely to deliver actionable results compared to page-specific or multivariate tests.
A/B Testing vs. Multivariate Testing: Key Differences
Building on earlier discussions about testing strategies, let’s dive into how A/B testing and multivariate testing differ in their scope and application.
When deciding on a testing method for your website, you’ll likely encounter these two main approaches. Both aim to improve your site’s performance, but they operate differently and are suited for distinct purposes.
A/B testing evaluates two full-page designs to determine which performs better, while multivariate testing analyzes multiple elements simultaneously to find the best combination [3][19]. Essentially, A/B testing focuses on the big picture (global changes), whereas multivariate testing hones in on smaller details (local changes) [3].
“Think of A/B and multivariate testing as complementary optimization methods” [20].
Comparison of Methods
Choosing between these methods depends on your goals, available traffic, and resources. Here’s a breakdown of their key differences:
Choosing the Right Approach
Traffic levels are a critical factor when selecting a testing method. A/B testing is ideal for situations with limited traffic, as it splits visitors between just two versions. Multivariate testing, on the other hand, requires significantly more traffic to evaluate the numerous combinations being tested [3][18].
Ease of setup is another key difference. A/B testing is relatively simple to implement, making it a quick way to identify a winning version. Multivariate testing, however, involves more complexity, as it requires advanced tools to analyze how different elements interact [19].
The insights you gain also vary. A/B testing is best for determining which overall page design works better, while multivariate testing digs deeper, showing how individual elements perform in combination [3]. Additionally, the scale of changes matters: A/B testing shines when comparing drastically different designs, whereas multivariate testing is perfect for assessing subtle tweaks [3].
Combining Both Methods
Many successful strategies use a combination of A/B and multivariate testing. Start with A/B testing to identify the best overall design or layout. Once you’ve nailed that, move on to multivariate testing to fine-tune individual elements - assuming your site has enough traffic to support it [3].
For example, tools like PIMMS allow you to split traffic across 2–4 links with custom weights, making it a great option for quick A/B testing when traffic is limited. By understanding the strengths of each method, you can create a testing strategy that balances speed, depth, and available resources effectively.
Best Practices for A/B Testing Success
To make the most out of your A/B testing efforts, it's important to stick to proven methods that yield reliable insights and actionable results. Let’s dive into the key practices that can set you up for success.
Key Takeaways
Start with a solid hypothesis and clear success metrics. A strong hypothesis should be rooted in research and follow an "if-then" structure. For example, "If we change the call-to-action button color to blue, then we expect a 20–30% increase in clicks." Knowing your target metrics, whether they involve major conversion boosts or smaller incremental improvements, ensures you can accurately assess the test outcomes[5].
Test one element at a time, unless you’re running multivariate tests. This approach makes it easier to pinpoint which change influenced the results. If your traffic volume supports multivariate testing, you can explore multiple variables simultaneously without losing clarity[22].
Determine your sample size in advance and stick to it. Planning your sample size upfront prevents premature conclusions. For example, AAA conducted 450 real-time A/B tests over 18 months and saw a 45% increase in online memberships through disciplined testing, which included proper sample size calculations[21].
Run tests for at least one full business cycle. A business cycle - often a week - captures natural fluctuations in user behavior. Ending tests too early can lead to skewed results or false conclusions[22].
Document everything systematically. Use a consistent template to record your hypothesis, test setup, duration, results, and observations. This practice not only helps with future analysis but also keeps your team aligned[22].
By following these steps, you’ll create a strong foundation for continuous testing and improvement.
Building a Testing Mindset
Once you’ve mastered the basics, it’s time to embrace a mindset that prioritizes ongoing experimentation and learning.
Focus on the process, not just the results. Even tests that don’t yield the desired outcome can offer valuable lessons. For instance, Save the Children’s shift to digital fundraising led to an 85% increase in conversions and a 25% boost in revenue per visitor. Their success - raising £1.5 million for Ukraine in just two weeks - was the result of refining earlier tests, even those that initially fell short[21].
Share insights across your team. A/B testing isn’t just about the numbers - it’s about driving smarter decisions. DocuSign improved mobile conversion rates by 35% by streamlining its sign-up process, and HP generated $21 million in additional revenue through nearly 500 experiments. Both companies succeeded because they ensured test insights were shared and applied across their teams[23].
Expand your testing program strategically. Start with high-impact areas, like key pages in your conversion funnel, and gradually expand as you gain expertise. Tools like PIMMS make it easier to manage traffic distribution across multiple variants.
Take True Botanicals, for example. By systematically testing social proof elements, they achieved a 4.9% site-wide conversion rate and an estimated ROI increase of over $2 million. Embedding experimentation into your workflows can turn A/B testing into a powerful competitive edge[23].
FAQs
When should I choose A/B testing over multivariate testing for my website?
When should I choose A/B testing over multivariate testing for my website?
When deciding between A/B testing and multivariate testing, it all comes down to your website's traffic and the complexity of the changes you're evaluating.
A/B testing is a go-to option for websites with lower traffic or when you're focused on comparing two versions of a single element - like a headline or a call-to-action button. It’s simple to set up and doesn’t need as much traffic or time to produce reliable results.
On the flip side, multivariate testing is better suited for high-traffic sites where you want to test multiple elements at once and understand how they interact. This method provides a deeper look into how various changes work together, but it does require a larger audience and more time to gather meaningful data. If you're trying to figure out the combined impact of several changes on user behavior, multivariate testing is the way to go.
What should I watch out for when testing multiple changes or variants in an A/B test?
What should I watch out for when testing multiple changes or variants in an A/B test?
When running A/B tests with multiple changes or variants, there are some common mistakes you’ll want to steer clear of to ensure your results are both reliable and actionable.
- Making too many changes at once: If you test several elements in a single experiment, it becomes nearly impossible to determine which specific change caused the impact you’re seeing. Keep it focused.
- Skipping a clear hypothesis: Without a well-defined hypothesis, you might end up tracking metrics that don’t align with your goals, which can lead to confusion and wasted effort.
- Not having enough traffic or sample size: Testing too many variations without sufficient traffic can slow down your results or leave you with inconclusive data. Make sure your sample size is large enough to reach statistical significance.
- Running overlapping tests: If you’re testing multiple things on the same audience at the same time, you risk splitting your traffic too thin and distorting your outcomes. This makes it tough to draw accurate conclusions.
By avoiding these pitfalls, you can design more effective A/B tests and make smarter, data-backed decisions.
How can I calculate the right sample size and traffic needed for reliable A/B test results?
How can I calculate the right sample size and traffic needed for reliable A/B test results?
To get trustworthy A/B test results, you'll need a sample size that's big enough to reveal noticeable differences. A good rule is to aim for at least 1,000 visitors per variant, though the exact number can vary. It depends on things like your current conversion rate, the size of the change you're testing for, and the confidence level you want to achieve.
For instance, if your conversion rate is 20% and you're looking to spot a small 2% improvement, you'll need a larger sample size than if you were testing for a more significant change. Tools like sample size calculators can help you figure out the traffic you'll need based on your specific scenario.
It's also important to let your tests run for at least 1–2 weeks. This allows you to capture variations in user behavior, such as differences between weekdays and weekends. By doing this, you can ensure your results are reliable and actionable.