All Resources

Multi-armed bandit

What is Multi-armed Bandit (MAB)?

Multi-armed bandit has several approaches but is generally a way of optimizing for a desired outcome. Over time, the idea is to maximize for a cumulative benefit in the long term. The name, multi-armed bandit, comes from trying to identify which slot machine (cleverly called a one-armed bandit) had the best payout out of x number of slot machines, while not losing too much money. Players would start to play multiple one-armed bandits at once (i.e. a literal multi-armed bandit strategy) and make informed decisions on which machines to play based on what they observed. 

Multi-armed bandit is about finding the right balance between exploration and exploitation.

You need to explore (i.e. try out) different options, actions, and experiences in order to successfully exploit that situation and maximize the desired outcome. Data gathered from the exploration informs the optimization.

A/B Testing vs. Multi-armed Bandit

A/B testing (and similar approaches) run a series of tests that compare one or more variations, splitting the traffic between them, to try to achieve statistical significance that confirms a given variant performs best amongst those variants over a set period of time. 

  • Static
  • Considered an exploratory approach
  • Leverages random assignments and has an equal probability of an outcome
  • Requires very large sample sizes to reach statistical significance
  • Traffic is split between the winner and loser(s), resulting in potentially a large amount of traffic being sent to the loser(s) over a long period of time (as it can take a long time to reach stat. sig.)

Multi-armed bandit consists of several approaches, but in general they involve doing some exploration off and on to get a pulse on the variables (e.g. what's working and what's not working towards the goal) and then leverages that knowledge to exploit the learnings and maximize for the best potential outcome. All the while, exploration continues in some capacity to ensure the exploitation is being optimized. 

  • Adaptive
  • Considered an exploration and exploitation approach (both run simultaneously)
  • Leverages historical data learned with exploration to optimize for a desired outcome.
  • Does not require as large of sample sizes to get results
  • Traffic is gradually allocated to the variants that perform well, increasing the speed of the test

How Does Multi-armed Bandit Fit into Intellimize Continuous Conversion™?

Similar to multi-armed bandit, Intellimize's Continuous Conversion™ is optimization focused, leveraging both exploration and exploitation. However, Intellimize takes it a step further. Continuous Conversion™ is a machine learning model that Intellimize developed from the ground up to maximize conversions. Intellimize uses a probabilistic deep neural network that allows for three key benefits: 

  • Automatic learning of predictive performance for each variation per visitor, based on what is known about the variation and the visitor. As visitors interact with your variations, the system is able to learn which variations resonate with which types of visitors. The machine learning (ML) can then uniquely optimize each visitor's journey, surfacing the variations that are most likely to get them to convert.
  • Automatic optimization toward the sitewide goals, with on-page goals as engagement signals to ensure a test is really driving the metrics that matter and not just one-off engagement.
  • Automatic re-learning of predictive performance of variations. If visitor behavior changes over time, the system dynamically adapts to always maximize conversions. Likewise, as variations are added, paused, or removed, the system adapts to serve the best content at any given time.

Intellimize Continuous Conversion™ is designed to drive more conversions while reducing exploration costs. Faster testing speed with Continuous Conversion™ comes from being able to independently run many experiences across a page and/or the site, allowing for more testing in parallel than traditional A/B tests, which have to split traffic. This approach allows you and the system to learn more, gain quicker insights and adapt accordingly.