Skip to content

The Business of Social Games and Casino

How to succeed in the mobile game space by Lloyd Melnick

Day: November 19, 2019

An often-better alternative to AB testing?

While AB testing is an integral element of mobile and social game development (as well development of most digital products), in many situations there is a better option. Several years ago, I had the opportunity to serve as an advisor to a company that had some brilliant people. Their CTO was a strong advocate of using multi-armed bandit testing as a superior alternative to AB testing. Multi-armed bandit testing is not new, there was a popular post in 2012 (http://stevehanov.ca/blog/index.php?id=132), and it is used by Google and other tech giants, but people (especially product managers) still regularly default to traditional ABn testing.

The problem with AB testing is that you leave money and performance on the table. Until the test is over, the poorer performing variant(s) will always get a significant share of your traffic. With the multi-armed bandit approach, you allocate increasingly less traffic to poorly performing variants.

What is multi-armed bandit testing

A multi-armed bandit approach allows you to dynamically allocate traffic to variations that are performing well while allocating less and less traffic to underperforming variations. Instead of two distinct periods of pure exploration and pure exploitation, bandit tests are adaptive, and simultaneously include exploration and exploitation. As Optimizely wrote recently, ” multi-armed bandit optimizations aim to maximize performance of your primary metric across all your variations. They do this by dynamically re-allocating traffic to whichever variation is currently performing best. This will help you extract as much value as possible from the leading variation during the experiment lifecycle, so you avoid the opportunity cost of showing sub-optimal experiences.”

Multi-armed bandit testing is a Bayesian approach to AB testing. As Shawn Lu writes in a post titled Beyond A/B testing, “The foundation of the multi-armed bandit experiment is Bayesian updating. Each treatment (called “arm”) has a probability of success, which is modeled as a Bernoulli process. The probability of success is unknown, and is modeled by a Beta distribution. As the experiment continues, each arm receives user traffic, and the Beta distribution is updated accordingly.”

A recap on ABn testing

To compare bandit testing with ABn testing (AB is with two variants, a test and control, n allows for additional variables), let’s quickly recap how AB testing works. Alex Atkins summarizes it succinctly, writing “in statistical terms, a/b testing consists of a short period of pure exploration, where you’re randomly assigning equal numbers of users to Version A and Version B. It then jumps into a long period of pure exploitation, where you send 100% of your users to the more successful version of your site.”

Benefits of multi-armed bandit testing

Bandit algorithms try to minimize opportunity costs and regret (the difference between your actual return and the return you would have collected had you deployed the optimal options at every opportunity). Rather than letting an AB test run until it is statistically significant, a bandit test moves subjects into the best performing group faster, allowing you to capture more gains. Matt Gershoff writes, ““Some like to call it earning while learning. You need to both learn in order to figure out what works and what doesn’t, but to earn; you take advantage of what you have learned. This is what I really like about the Bandit way of looking at the problem, it highlights that collecting data has a real cost, in terms of opportunities lost.”

A related advantage of multi-armed bandit testing is you make fewer mistakes. An A/B test will always send a significant portion of traffic to the sub-optimal group.

Also, as Shawn Lu writes, “[an] advantage of bandit experiment is that it terminates earlier than A/B test because it requires much smaller sample. In a two-armed experiment with click-through rate 4% and 5%, traditional A/B testing requires 11,165 in each treatment group at 95% significance level. With 100 users a day, the experiment will take 223 days. In the bandit experiment, however, simulation ended after 31 days, at the above termination criterion.” if the treatment group is clearly superior, we still have to spend lots of traffic on the control group, in order to obtain statistical significance.”

Finally, while not mathematically an advantage, bandit testing relieves the pressure to end a test too early. With ABn testing, frequently you will see one option perform better “directionally” and decide, or be forced to decide, to terminate the test and move everyone to the higher performing bucket before you get significant results. Unfortunately, this sometimes leads to picking an option that would be reversed once there is more data.

Why multi-armed bandit is not always the correct approach

The value of bandit testing does not mean you should abandon completely ABn testing. In Lu’s post, he writes “the convenience of smaller sample size comes at a cost of a larger false positive rate.” That is, you end up sometimes gravitating to the sub-optimal solution.

Alex Atkins also writes, “in essence, there shouldn’t be an ‘a/b testing vs. bandit testing, which is better?’ debate, because it’s comparing apples to oranges. These two methodologies serve two different needs.”
A/B testing is a better option when the company has large enough user base, when it’s important to control for type I error (false positives), and when there are few enough variants that we can test each one of them against the control group one at a time.”

The Bandit Option

While multi-armed bandit testing is not always a better option than ABn testing, you should look closely at using bandit testing when possible. It can reduce the opportunity cost of your testing and relieve pressure to terminate tests prematurely.

Key takeaways

  • While AB testing is the most common method of optimizing between alternatives, in many situations the multi-armed bandit approach is optimal.
  • A multi-armed bandit approach allows you to dynamically allocate traffic to variations that are performing well while allocating less and less traffic to underperforming variations.
  • Multi-armed bandit testing reduces regret (the loss pursing multiple options rather than the best option), is faster and lowers the risk of pressure to end the test prematurely.

Slide1

Share this:

  • Facebook
  • LinkedIn

Like this:

Like Loading...
Lloyd Melnick Analytics, Bayes' Theorem, General Social Games Business, General Tech Business, Social Casino 1 Comment November 19, 2019November 18, 2019 4 Minutes

Get my book on LTV

The definitive book on customer lifetime value, Understanding the Predictable, is now available in both print and Kindle formats on Amazon.

Understanding the Predictable delves into the world of Customer Lifetime Value (LTV), a metric that shows how much each customer is worth to your business. By understanding this metric, you can predict how changes to your product will impact the value of each customer. You will also learn how to apply this simple yet powerful method of predictive analytics to optimize your marketing and user acquisition.

For more information, click here

Follow The Business of Social Games and Casino on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,333 other followers

Most Recent Posts

  • Podcasts now available
  • Lessons for gaming and tech companies from the Peter Drucker Forum
  • Chaos Theory, the Butterfly Effect, and Gaming
  • How to give help without micromanaging

Lloyd Melnick

This is Lloyd Melnick’s personal blog.  All views and opinions expressed on this website are mine alone and do not represent those of people, institutions or organizations that I may or may not be associated with in professional or personal capacity.

I am a serial builder of businesses (senior leadership on three exits worth over $700 million), successful in big (Disney, Stars Group/PokerStars, Zynga) and small companies (Merscom, Spooky Cool Labs) with over 20 years experience in the gaming and casino space.  Currently, I am the GM of VGW’s Chumba Casino and on the Board of Directors of Murka Games and Luckbox.

Topic Areas

  • Analytics (114)
  • Bayes' Theorem (8)
  • behavioral economics (8)
  • blue ocean strategy (14)
  • Crowdfunding (4)
  • General Social Games Business (457)
  • General Tech Business (194)
  • Growth (88)
  • International Issues with Social Games (50)
  • Lloyd's favorite posts (101)
  • LTV (54)
  • Machine Learning (10)
  • Mobile Platforms (37)
  • Social Casino (51)
  • Social Games Marketing (104)
  • thinking fast and slow (5)
  • Uncategorized (32)

Social

  • View CasualGame’s profile on Facebook
  • View @lloydmelnick’s profile on Twitter
  • View lloydmelnick’s profile on LinkedIn

RSS

RSS Feed RSS - Posts

RSS Feed RSS - Comments

Categories

  • Analytics (114)
  • Bayes' Theorem (8)
  • behavioral economics (8)
  • blue ocean strategy (14)
  • Crowdfunding (4)
  • General Social Games Business (457)
  • General Tech Business (194)
  • Growth (88)
  • International Issues with Social Games (50)
  • Lloyd's favorite posts (101)
  • LTV (54)
  • Machine Learning (10)
  • Mobile Platforms (37)
  • Social Casino (51)
  • Social Games Marketing (104)
  • thinking fast and slow (5)
  • Uncategorized (32)

Archives

  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • November 2019
  • October 2019
  • September 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • December 2010
November 2019
S M T W T F S
 12
3456789
10111213141516
17181920212223
24252627282930
« Oct   Jan »

by Lloyd Melnick

All posts by Lloyd Melnick unless specified otherwise
Google+

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 1,333 other followers

Follow Lloyd Melnick on Quora

RSS HBR Blog

  • Using AI to Turn Your Teams Into Superteams - SPONSOR CONTENT FROM DELOITTE
    Sponsor content from Deloitte.
  • Data Is Great — But It’s Not a Replacement for Talking to Customers
    The best insights often come from seeing the world through someone else’s eyes.

RSS Techcrunch

  • An error has occurred; the feed is probably down. Try again later.

RSS MIT Sloan Management Review Blog

  • The Best of This Week
    How to Ace the Virtual Hiring Process Remote hiring is here to stay. New research has identified four strategies that organizations can use to improve their approach to recruit the best and brightest talent remotely: taking the time to prepare, demonstrating attention to detail, sparking authentic conversation, and addressing candidates’ uncertainties head-o […]
  • Execs Bullish on AI but Wary of Data Leadership
    Every year in December and January, NewVantage Partners (NVP) conducts a survey of data and technology executives in large companies primarily located in the U.S. Every year, we (the authors) collaborate in analyzing and interpreting the results. And every year, we wonder why the survey results suggest that certain aspects of the data environment aren’t […] […]
Website Powered by WordPress.com.
Cancel

 
Loading Comments...
Comment
    ×
    %d bloggers like this: