A/B Testing Mastery: SSM Thoughts Bridge Review (Part 1)

A/B test is really important to scale the decision we are making. And also helping us to determine which way to optimize, automate or scale. A/B testing is all about putting effectiveness on the top. A/B testing (also known as split testing or bucket testing) is a method of comparing two versions of a webpage or app against each other to determine which one performs better. It is essentially an experiment where two or more variants of a page are shown to users at random, and statistical analysis is used to determine which variation performs for a better given conversion goal.

This is (Part 1) of my writings on A/B testing. This article is based on what I have learnt from CXL institute. I will be writing about more on A/B test following weeks.

History of A/B Testing:

Ton Wesseling, (Founder and Co-Owner, Online Dialogue) showed this diagram of the history of A/B testing in CXL Institute. According to him, before 1995 there wasn’t a thing called A/B testing but from 1995 till 2000 there were some testing, which was like comparing the results of week 1 and week 2. That was not essentially A/B testing. But in 2000 there were Meta data redirecting. From 2003 there was poorman solution in A/B testing with tools like offermatica, optimost and memetrics. Google launched google optimizer in 2006 that changed the entire game of A/B testing. In 2010, VWO Optimizely come in to the marketplace. From 2016, the quality of the tools started to become way better. And finally, 2019 the entire A/B testing becomes server side with so much potentialities.

The Value of A/B Testing:

A/B testing is considered as the big silver bullet of companies. We need to gather evidence to ensure something or to deciding our moves. This can be similar to health industry pyramid of evidence. Which shows how from opinions and hypotheses it starts and then after proper testing and analysis there comes a winner. But on the top there is randomized control trials (aka A/B testing). It is a pyramid of evidence, so the silver bullet is the fact that randomized controlled trials is on the top. It is based on a hierarchy of evidence (this was also referred to as a waterfall methodology). What we aim to do in A/B testing is taking small steps (each step representing growth) while getting more done than we previously could while using a smaller amount of resources.

“Our success at Amazon is a function of how many experiments we do per year, per month, per week, per day…” - Jeff Bezos, CEO, Amazon

If you want to add effectiveness to your efficiency, and want to make sure you're making the right decisions, better decisions, trust worthy decisions there's only one way to go. Applying randomized controlled trials. Applying A/B testing in company to make sure we are making the right decisions and speed up and have trustworthy decisions and not just expert opinions and stuff that's down this pyramid. That is the real value of A/B testing. It helps us to make better trustworthy decisions.

When to conduct A/B Testing

A/B testing can be valuable when it comes to deployments. These can involve legal reasons, new products, a change in operations, etc. When designing an A/B test, it is important to do a split between the “a” and “b” variables to make sure there are no negative impacts. The goal of A/B testing is optimization. Optimization equals lean deployments. Long story short, we are looking for wins.

Even if we aren’t achieving the wins we desire, A/B tests can also be useful in identifying signals. For instance, conversion signal maps are created by running experiments that leave out specific elements to see if there is an impact. Fly-in advertisements and pop-ups are examples of controlled variables that can create signals by drawing attention to themselves. This may lead to wins in the short-term, but with too many at once they can ultimately lower conversions.

The ROAR Model

The ROAR model was developed by Ton Wesseling. This was created back in 2015 as a rule of thumb model to explain when we can run how many experiments. The ROAR model consists of four [optimization] phases:

· Risk,

· Optimization,

· Automation, and

· Re-think.

The risk phase recommends we follow the standard of at least 1000 conversions per month in order to test (this ensures we have enough data). Conversions can include leads, clicks, transactions, or any type of goal we’ve established through our website.

We require a 15% impact that can typically be detected in three weeks if we have enough data. If we are lacking data, we should still research and hypothesize. There are two things Ton recommends while we measure from our website:

1. Percent of transactions on the website

2. Number of unique visitors

Keep in mind, that the buying cycle is typically one week (7-8 days). Weekends versus weekdays can also impact buying behavior. It is recommended to test no more than 4 weeks. It is also important to finish a buying cycle before ending the test (i.e. one full week of testing, two full weeks, and three full weeks).

The optimization phase has a border set at 10,000 conversions per month. With this many conversions, it is suggested to start with four A/B tests a week at 200 per year. This requires a 5% impact (10% less than the risk phase) and is typically managed by a full-fledged optimization team. Automation, phase three, is achieved when conversions grow larger than 10,000 requiring another optimization team. Typically when this phase occurs, the data becomes part of the DNA of the company. Lastly, the re-think phase challenges users to go back and see what can be further improved for future testing.

Which KPI to Pick?

KPIs, in business means Key Performance Indication. When A/B testing, we need to choose a goal metric rather than choosing KPI. This helps us maintain focus when it comes to the behavior we are aiming to change. According to Ton, he has segmented the goal metrics in a hierarchy funnel order.

The top tier, or “golden metric” if you will, is the potential lifetime value of the user. We are looking for key indicators to identify this within our target audience of users. This can be extremely difficult for immature or startup companies that are still working to establish a solid foundation. Revenue per user is another metric to pay mind to. It is easy to raise transactions by lowering the price, making revenue a higher goal metric. It is also harder to measure.

Transactions (leads/conversions) reside in the middle as they serve as a good driver to optimize followed by behavior, which is part of the key research methodology. On the bottom tier, we see clicks. This is considered as the least tier metrics to consider. Because there are ways we can uplift the clicks but that will not take account of change in behaviors or transactions. This will not be suggested while testing for financial transactions.

***

Sheikh Shafi Mahmud is an Economics Graduate of Notre Dame, Digital Full Stack Marketer. Author and Founder of SSM Thoughts Bridge. He is also the Co-Founder, Music Composer & Producer of Apeiruss. Can be reached at sheikhshafimahmud@gmail.com

Search This Blog

SSM THOUGHTS BRIDGE