Skip to content

The Business of Social Games and Casino

How to succeed in the mobile game space by Lloyd Melnick

Day: November 19, 2019

An often-better alternative to AB testing?

An often-better alternative to AB testing?

While AB testing is an integral element of mobile and social game development (as well development of most digital products), in many situations there is a better option. Several years ago, I had the opportunity to serve as an advisor to a company that had some brilliant people. Their CTO was a strong advocate of using multi-armed bandit testing as a superior alternative to AB testing. Multi-armed bandit testing is not new, there was a popular post in 2012 (http://stevehanov.ca/blog/index.php?id=132), and it is used by Google and other tech giants, but people (especially product managers) still regularly default to traditional ABn testing.

The problem with AB testing is that you leave money and performance on the table. Until the test is over, the poorer performing variant(s) will always get a significant share of your traffic. With the multi-armed bandit approach, you allocate increasingly less traffic to poorly performing variants.

What is multi-armed bandit testing

A multi-armed bandit approach allows you to dynamically allocate traffic to variations that are performing well while allocating less and less traffic to underperforming variations. Instead of two distinct periods of pure exploration and pure exploitation, bandit tests are adaptive, and simultaneously include exploration and exploitation. As Optimizely wrote recently, ” multi-armed bandit optimizations aim to maximize performance of your primary metric across all your variations. They do this by dynamically re-allocating traffic to whichever variation is currently performing best. This will help you extract as much value as possible from the leading variation during the experiment lifecycle, so you avoid the opportunity cost of showing sub-optimal experiences.”

Multi-armed bandit testing is a Bayesian approach to AB testing. As Shawn Lu writes in a post titled Beyond A/B testing, “The foundation of the multi-armed bandit experiment is Bayesian updating. Each treatment (called “arm”) has a probability of success, which is modeled as a Bernoulli process. The probability of success is unknown, and is modeled by a Beta distribution. As the experiment continues, each arm receives user traffic, and the Beta distribution is updated accordingly.”

A recap on ABn testing

To compare bandit testing with ABn testing (AB is with two variants, a test and control, n allows for additional variables), let’s quickly recap how AB testing works. Alex Atkins summarizes it succinctly, writing “in statistical terms, a/b testing consists of a short period of pure exploration, where you’re randomly assigning equal numbers of users to Version A and Version B. It then jumps into a long period of pure exploitation, where you send 100% of your users to the more successful version of your site.”

Benefits of multi-armed bandit testing

Bandit algorithms try to minimize opportunity costs and regret (the difference between your actual return and the return you would have collected had you deployed the optimal options at every opportunity). Rather than letting an AB test run until it is statistically significant, a bandit test moves subjects into the best performing group faster, allowing you to capture more gains. Matt Gershoff writes, ““Some like to call it earning while learning. You need to both learn in order to figure out what works and what doesn’t, but to earn; you take advantage of what you have learned. This is what I really like about the Bandit way of looking at the problem, it highlights that collecting data has a real cost, in terms of opportunities lost.”

A related advantage of multi-armed bandit testing is you make fewer mistakes. An A/B test will always send a significant portion of traffic to the sub-optimal group.

Also, as Shawn Lu writes, “[an] advantage of bandit experiment is that it terminates earlier than A/B test because it requires much smaller sample. In a two-armed experiment with click-through rate 4% and 5%, traditional A/B testing requires 11,165 in each treatment group at 95% significance level. With 100 users a day, the experiment will take 223 days. In the bandit experiment, however, simulation ended after 31 days, at the above termination criterion.” if the treatment group is clearly superior, we still have to spend lots of traffic on the control group, in order to obtain statistical significance.”

Finally, while not mathematically an advantage, bandit testing relieves the pressure to end a test too early. With ABn testing, frequently you will see one option perform better “directionally” and decide, or be forced to decide, to terminate the test and move everyone to the higher performing bucket before you get significant results. Unfortunately, this sometimes leads to picking an option that would be reversed once there is more data.

Why multi-armed bandit is not always the correct approach

The value of bandit testing does not mean you should abandon completely ABn testing. In Lu’s post, he writes “the convenience of smaller sample size comes at a cost of a larger false positive rate.” That is, you end up sometimes gravitating to the sub-optimal solution.

Alex Atkins also writes, “in essence, there shouldn’t be an ‘a/b testing vs. bandit testing, which is better?’ debate, because it’s comparing apples to oranges. These two methodologies serve two different needs.”
A/B testing is a better option when the company has large enough user base, when it’s important to control for type I error (false positives), and when there are few enough variants that we can test each one of them against the control group one at a time.”

The Bandit Option

While multi-armed bandit testing is not always a better option than ABn testing, you should look closely at using bandit testing when possible. It can reduce the opportunity cost of your testing and relieve pressure to terminate tests prematurely.

Key takeaways

  • While AB testing is the most common method of optimizing between alternatives, in many situations the multi-armed bandit approach is optimal.
  • A multi-armed bandit approach allows you to dynamically allocate traffic to variations that are performing well while allocating less and less traffic to underperforming variations.
  • Multi-armed bandit testing reduces regret (the loss pursing multiple options rather than the best option), is faster and lowers the risk of pressure to end the test prematurely.

Slide1

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
Like Loading...
Unknown's avatarAuthor Lloyd MelnickPosted on November 19, 2019November 18, 2019Categories Analytics, Bayes' Theorem, General Social Games Business, General Tech Business, Social CasinoTags A/B testing, analytics, multi-armed bandit, testing1 Comment on An often-better alternative to AB testing?

Get my book on LTV

The definitive book on customer lifetime value, Understanding the Predictable, is now available in both print and Kindle formats on Amazon.

Understanding the Predictable delves into the world of Customer Lifetime Value (LTV), a metric that shows how much each customer is worth to your business. By understanding this metric, you can predict how changes to your product will impact the value of each customer. You will also learn how to apply this simple yet powerful method of predictive analytics to optimize your marketing and user acquisition.

For more information, click here

Follow The Business of Social Games and Casino on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 791 other subscribers

Most Recent Posts

  • Join me at PDMA Inspire for my talk on new product prioritization
  • Why keep studying?
  • The next three years of this blog
  • Interview with the CEO of Murka on the biggest growth opportunity in gaming, Barak David

Lloyd Melnick

This is Lloyd Melnick’s personal blog.  All views and opinions expressed on this website are mine alone and do not represent those of people, institutions or organizations that I may or may not be associated with in professional or personal capacity.

I am a serial builder of businesses (senior leadership on three exits worth over $700 million), successful in big (Disney, Stars Group/PokerStars, Zynga) and small companies (Merscom, Spooky Cool Labs) with over 20 years experience in the gaming and casino space.  Currently, I am the GM of VGW’s Chumba Casino and on the Board of Directors of Murka Games and Luckbox.

Topic Areas

  • Analytics (114)
  • Bayes' Theorem (8)
  • behavioral economics (8)
  • blue ocean strategy (14)
  • Crowdfunding (4)
  • DBA (2)
  • General Social Games Business (459)
  • General Tech Business (195)
  • Growth (88)
  • International Issues with Social Games (50)
  • Lloyd's favorite posts (101)
  • LTV (54)
  • Machine Learning (10)
  • Metaverse (1)
  • Mobile Platforms (37)
  • Prioritization (1)
  • Social Casino (52)
  • Social Games Marketing (105)
  • thinking fast and slow (5)
  • Uncategorized (33)

Social

  • View CasualGame’s profile on Facebook
  • View @lloydmelnick’s profile on Twitter
  • View lloydmelnick’s profile on LinkedIn

RSS

RSS Feed RSS - Posts

RSS Feed RSS - Comments

Categories

  • Analytics (114)
  • Bayes' Theorem (8)
  • behavioral economics (8)
  • blue ocean strategy (14)
  • Crowdfunding (4)
  • DBA (2)
  • General Social Games Business (459)
  • General Tech Business (195)
  • Growth (88)
  • International Issues with Social Games (50)
  • Lloyd's favorite posts (101)
  • LTV (54)
  • Machine Learning (10)
  • Metaverse (1)
  • Mobile Platforms (37)
  • Prioritization (1)
  • Social Casino (52)
  • Social Games Marketing (105)
  • thinking fast and slow (5)
  • Uncategorized (33)

Archives

  • September 2023
  • December 2021
  • July 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • November 2019
  • October 2019
  • September 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • December 2010
November 2019
S M T W T F S
 12
3456789
10111213141516
17181920212223
24252627282930
« Oct   Jan »

by Lloyd Melnick

All posts by Lloyd Melnick unless specified otherwise
Google+

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 791 other subscribers
Follow Lloyd Melnick on Quora

RSS HBR Blog

  • Future of Business: Moderna’s Founder on Innovation That Breaks Through
  • How U.S. Employers Can Meet the Healthcare Needs of Younger Workers
  • You’re Probably A/B Testing Too Much. Here’s What to Do Instead.
  • When You’ve Stopped Growing with Your Executive Coach
  • Research: LLMs Respond Differently in English and Chinese
  • Bring More Discipline to Your Decision-Making
  • 5 Ways Organizations Can Pivot with Purpose
  • Your AI Strategy Needs to Expand Beyond the U.S. and China
  • Most Employees Don’t Trust Their Leaders. Here’s What to Do About It.
  • Could Your Company Benefit from Fastvertising?

RSS Techcrunch

  • An error has occurred; the feed is probably down. Try again later.

RSS MIT Sloan Management Review Blog

  • The Top 10 MIT SMR Articles of 2025
  • How Nesting Changes Platform Strategy
  • Ask Sanyin: Why Is It So Hard to Pull the Plug on a Project?
  • Resilience Means Fewer Recoveries, Not Faster Ones
  • What Retailers Must Get Right This Holiday Season
  • Are You an Authentic Leader or an Authentic Jerk?
  • Creating More, Not Less, With AI: GeekWire’s Todd Bishop
  • Negotiation Skills: How to Beat Anxiety and Boost Results
  • How Grocery Retail Innovations Change Customer Behavior
  • Scaling GenAI: Get Big Value From Smaller Efforts
The Business of Social Games and Casino Website Powered by WordPress.com.
  • Subscribe Subscribed
    • The Business of Social Games and Casino
    • Join 726 other subscribers
    • Already have a WordPress.com account? Log in now.
    • The Business of Social Games and Casino
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d