Be data driven, but don’t be fooled by statistics!
Over and over again, I read posts by SEM professionals telling their readers to optimize their campaigns by analyzing the data, including data from A/B testing. And I absolutely agree! Marketing instinct is good, but you need to look at the data and see what actually is happening.
The problem, though, is that it is too easy to make decisions based on too little data — that is, results that are not statistically significant. Although the data is absolutely showing what did happen, what we want to do is use that to predict the future. And, for that, we need results that are highly likely to be predictive. For PPC optimization, you need to see enough clicks and conversions.
Let me give an example. If I told you that one test showed 67% conversion and another had 50% conversion, you might just say that the difference is huge and go with the higher conversion setup. That is, it is more likely that the higher conversion will continue in the future. But let’s look more closely at the data. If that result was from flipping coins and one showed heads in 2 of 3 flips and the other showed heads in 2 of 4 flips, then it is obvious that the experiment does not have enough flips to really say whether the coins were fair or whether the coins were rigged — and with 3 flips, 50% heads is just not a possible outcome. The data could easily have resulted from fair coins and it would be a mistake to predict that the coins showing heads in 2 of 3 flips will continue to be biased toward heads in the future. It is most likely that continued flips will tend to be closer to 50%-50% (which statisticians might call regression to the mean).
The same is true for analysis of paid-search campaigns. Let’s look a little closer at some examples from campaigns we have worked on.
In the first example, our e-commerce customer was interested in testing for an under-performing ad group. For the previous 30-day period, there were 100 clicks and 2 conversions for the ad group, a conversion rate of 2%. For an A/B test splitting the clicks evenly between the current landing page and the new landing page, we would expect about 50 clicks for each landing page. Now, let’s say that in the test, the existing landing page generated 1 conversion in 50 clicks. And suppose the new landing page generated 1 conversion also — then we have no information which is better, so that would mean 50% probability that the new page was better (and 50% probability that A page is better). What if the new page had 2 conversions? Statistically, the chance that the new page is better is 72% (and, correspondingly, 28% chance that the new page is worse). Not enough to base a decision on. A standard threshold is often 95% certainty. If the expected improvement in our A/B experiment were a 25% higher conversion rate — that is, an increase of 0.5% from the current 2% to 2.5% — then we would expect the new landing page to generate, on average, one additional conversion for every 200 clicks. At 50 clicks a month for the new landing page in the A/B test, that would mean one additional conversion every 4 months. Similar to the coin toss example, this will take a lot of months to generate enough data to be a reliable predictor. Very tough to get enough data to made a decision.
For another e-commerce retailer, we were testing a change for a much higher volume campaign — about 1000 clicks per day and 3% conversion rate. What if we ran the A/B test for ten days and saw that B had 10% more conversions than A, that is A had 5000 clicks and 150 conversions and B had 5000 clicks and 165 conversions. Is that enough? Unfortunately, no. Statistically, we are now about 80% sure that B is better, but being less than our 95% threshold, it is not statistically significant. If, however, we saw 20% improvement, that is, B had 180 conversions with the 5000 clicks, then we would be 95% certain that B was better and it would be statically significant.
Computing the statistics does not have to be difficult. Kissmetrics has developed a really nice web tool to let you quickly evaluate the statistical significance for conversion optimization. Try their A/B Significance Test — it is very simple to use.
What is the right statistical threshold to make a decision? If the amount of work of changing the campaign is small and the risk of a change that actually hurts the campaign is low, then your threshold before making a change might be relatively small. For major decisions with larger implications, you should get more data and have a higher threshold before making that decision. In both cases, keep analyzing the new data to continually increase your understanding of your campaigns.
Remember, it is very important to be data driven. Just don’t be fooled by statistics.
Want to know more? Just email me. Or you can click here to request information on Vioby’s products for optimizing your advertising campaigns.