If you’re not A/B testing, you’re missing out. If you still don’t have methods for introducing variations and tracking the result on subsequent conversions, this article isn’t for you. I’m talking to the people the already A/B test, and especially those that A/B test a lot. If you subscribe to WhichTestWon.com, you’re the one I’m talking to. Cluster testing is next step in testing and CRO for those who are looking at diminishing returns from standard methods of CRO.
What is cluster testing?
A/B testing is great, but if you’re not cluster testing you’ll miss out. Cluster testing is the idea that positive changes in one section of a webpage/email/form/etc. might be good, but could be better if they are also accompanies by a change in another part. In the graphic below, if we are trying to optimize the newsletter signup flow, there are 3 obvious sections that jump out for testing: the copy of the header, the copy of the sub-header, and the button copy.
I might be interested in testing the difference between “Signup” and “Subscribe” for button CTA, but if I just test the button CTA for “Signup” vs “Subscribe” and the header copy for “Signup” vs “Subscribe” independently of one another, I may be missing out on potential lift by doing both actions concurrently. Having the header match the button CTA actually turns out to be more effective in causing conversion lift that simply picking the right word. In this particular test, switching either the header or button CTA to Signup from Subscribe (it used to be subscribe) resulted in a ~1.5% lift in CTA. It would be logical, therefore, to assume that the combination of the two would result in a lift somewhat less (but close to) 3% lift, however, in practice, both being “Signup” resulted in a ~4.5% lift: The combined gains were more than the incremental gains.
What does this mean?
Potential customers respond to your overall message, not just parts of it. Thus, changes to copy that aren’t done in isolation from other changes can often result in a lift or drop isn’t the sum of the parts if you tested independently. Think about it: If you were trying to test the best color for text in a header that was 15 characters long, would changing the color of one letter at a time and running 15 different test yield the same result as if you combined all those letter-color-change tests into one test and changed the whole header color? Thats extremely unlikely.
So, given that it makes sense we do tests in clusters (changing the entire background color, not one pixel at a time) why do we so often run tests that are independent of other content changes? Because statistical significance. The reality is that for each possible combination of factors, you need to reach a conversion rate that is statistically significant all on its on, and each possible combination dilutes the size of the test group, and each variation increased the need for more test groups.
For example, just doing a header/copy cluster analysis with 3 variations of each, would require you to hit statistical significance with 16 clusters. Running the header and copy A/B/C/D tests separately would require reaching statistical significance only 8 times; so assuming same number of leads and CTA conversion that was close, it would take over twice as long to do a simple test as cluster analysis as opposed to standard separate testing.
Why bother then?
Cluster testing can show some interesting results that will never appear in separated testing. Cluster testing is going to show you what happens when you make changes in the wild, while separated testing is more like reproducing results in a laboratory devoid of actual world interactions. Cluster testing is ultimately going to more accurately reflect bottom-line long-term affects of the testing. It will often point changes in the same direction that separated testing will- but not always. Sometimes marketers may find that doing changes in pairs or groups actually results in a positive result, while making the changes individually didn’t results in as positive a result (or even resulted in a negative change).
For example, in the above Header/CTA Copy test, just A/B testing separately would show that the best results would be had with header copy “Newsletter Signup” and button CTA “Signup”, since when you run the header “Newsletter Signup” with the 3 possible button CTAs, the winner is “Signup”, and when you run the button CTA “Signup” the header copy that performs the best is “Newsletter Signup”. But, as you can user, users were responding to a disconnect between the wording “subscribe” and “signup” and running a 9-cluster test actually showed that. The best performers were when the working matched between the two, and that the combination of changing both the header and button CTA together resulted in a CTA of 1.69%, rather than the best you could get by separate testing at 1.48%.
When to do cluster testing?
While it would be amazing to run all your A/B test through a full cluster analysis, unless you have a near infinite sample group, it will be impossible. The number of clusters needed increased according to the power of the number of elements you are testing, so even the most trafficked site will be able to at best cluster test 4 or 5 elements together.
Separate testing can actually give you clues as to what to test. If doing standard separate testing on on text color shows high levels of variation, while changes to font show lower levels of variation, then color would be a factor to choose over font when choosing your clusters. Look at some of the bigger factors affecting conversion and begin introducing those into a cluster testing group.
Ultimately, it’s an art backed up by science. There isn’t one “true” way of doing cluster analysis that is going to get you your highest CRO. Knowledge of the industry you’re testing for helps, statistical intuition helps, and there is no replacement for good old fashioned experience. Cluster analysis isn’t something for everyone- but if you’ve maximized the results from standard separated testing, it may be the next step to bump up your conversion rate even further.