AB-vs-MAB--pros-of-AB-testing.html

<!DOCTYPE html>
<html lang="en">

<head>
  <!-- ## for client-side less
  <link rel="stylesheet/less" type="text/css" href="/theme/css/style.less">
  <script src="http://cdnjs.cloudflare.com/ajax/libs/less.js/1.7.3/less.min.js" type="text/javascript"></script>
  -->
  <link rel="stylesheet" type="text/css" href="/theme/css/style.css">
  <link rel="stylesheet" type="text/css" href="/theme/css/pygments.css">
  <link rel="stylesheet" type="text/css" href="//fonts.googleapis.com/css?family=PT+Sans|PT+Serif|PT+Mono">

  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta name="author" content="Gaël">
  <meta name="description" content="Posts and writings by Gaël">


<meta name="keywords" content="">

  <title>
    And yet it moves
&ndash; The pros of the A/B testing&nbsp;method, Multi-armed bandit vs. A/B testing for web service&nbsp;optimization  </title>

</head>

<body>
  <aside>
    <div id="user_meta">
      <div id="metalogo">
        <a href="/">
          <img src="/images/logo.svg" alt="logo" id="logo">
        </a>
        <a href="/" id="title">And yet it moves</a>
      </div>
      <p></p>
      <ul>
            <li><a href="/category/data-processing.html" class="active">Data processing</a></li>
      </ul>
    </div>
  </aside>

  <main>
    <header>
      <p>
      <a href="/">Index</a> &brvbar; <a href="/archives.html">Archives</a>
      </p>
    </header>

<article>
      <div class="series_toc">
          <p>This post is part 4 of the "AB vs. MAB" series:</p>
          <ol class="parts">
                  <li >
    <a href='/AB-vs-MAB.html'>Multi-armed bandit vs. A/B testing for web service&nbsp;optimization</a>
                  </li>
                  <li >
    <a href='/AB-vs-MAB--basics.html'>Basics of testing for web&nbsp;optimisation, Multi-armed bandit vs. A/B testing for web service&nbsp;optimization</a>
                  </li>
                  <li >
    <a href='/AB-vs-MAB--making-a-choice.html'>Choosing the best&nbsp;variant, Multi-armed bandit vs. A/B testing for web service&nbsp;optimization</a>
                  </li>
                  <li class="active">
    <a href='/AB-vs-MAB--pros-of-AB-testing.html'>The pros of the A/B testing&nbsp;method, Multi-armed bandit vs. A/B testing for web service&nbsp;optimization</a>
                  </li>
                  <li >
    <a href='/AB-vs-MAB--reward-maximisation.html'>Reward&nbsp;maximisation, Multi-armed bandit vs. A/B testing for web service&nbsp;optimization</a>
                  </li>
                  <li >
    <a href='/AB-vs-MAB--other-drawbacks-conclusion.html'><span class="caps">MAB</span> other drawbacks and&nbsp;conclusion, Multi-armed bandit vs. A/B testing for web service&nbsp;optimization</a>
                  </li>
          </ol>
      </div>
  <div class="article_meta">
    <p style="text-align: right; margin: 0;">Posted on: Sun, 15 May 2016</p>
  </div>
  <div class="article_title">
    <h3><a href="/AB-vs-MAB--pros-of-AB-testing.html">The pros of the A/B testing&nbsp;method, Multi-armed bandit vs. A/B testing for web service&nbsp;optimization</a></h3>
  </div>
  <div class="article_text">
    <center><p><img alt="image0" src="/images/dab.gif" style="width: 640px; height: auto; max-width: 100%;"/></p>
</center><p>In order to bring some nuances and perspective into the discussion
(a.k.a. <em>unsupported claims smashing</em>™), this post is dedicated to
showing what A/B testing is good at and how it compares to the
multi-armed bandit strategy.</p>
<div class="section" id="a-b-testing-is-more-powerful-than-mab-testing">
<h2 id="ab-testing-is-more-powerful-than-mab-testing">A/B testing is more powerful than <span class="caps">MAB</span> testing</h2>
<p>When a (statistical) test is applied to data, it answers the question
for which it was designed with a given <em>level of significance</em> and
<em>power</em> (the two components of what we could improperly call “level of
confidence” for simplicity):</p>
<ul class="simple">
<li>The <em>level of significance</em> is the probability of correctly
concluding that the variants (<object data="/images/orange_btn.svg" type="image/svg+xml">the orange</object> and <object data="/images/green_btn.svg" type="image/svg+xml">the green button</object>
here) <em>have **no*</em> significant impact* on the click-through rate.</li>
<li>The <em>power</em> is the probability of correctly concluding that the
variants <strong>have</strong> <em>a significant impact</em> on the click-through rate.</li>
</ul>
<p>The power is the most important estimator of confidence in our case: if
we see a difference, we would like to know if it is significant. So we
modelled the whole process and estimated the power in each method at a
given level of significance and for different sample sizes.</p>
<div class="figure">
<object data="/images/power_vs_sample_size.svg" type="image/svg+xml">
</object>
<p class="caption">If you can’t see the figure, please try another web browser which
supports <span class="caps">SVG</span> images</p>
</div>
<p>It appears clearly that A/B testing reaches higher (and better) powers
at far smaller sample sizes: it typically requires sample sizes <em>5
times</em> smaller than with <span class="caps">MAB</span>!</p>
<p><em>Why is that?</em> As we claimed above when presenting the A/B testing
method, it is generally more efficient to provide each variant as often
than to favour one. A/B testing is better here and we didn’t even have
to look for a convoluted setup: this is valid for all the setups we tried!</p>
</div>
<div class="section" id="a-b-testing-is-correct-more-often-at-small-sample-sizes">
<h2 id="ab-testing-is-correct-more-often-at-small-sample-sizes">A/B testing is correct more often at small sample sizes</h2>
<p>There are cases however where no test is used. This is equivalent to
saying that you choose a level of significance of <span class="math">\(0\%\)</span>. Yet that
does not mean that the power would reach <span class="math">\(100\%\)</span> (in this case,
the power is equivalent to how often the seemingly best-performing
variant is really the best performer).</p>
<p>As it turned out, even in that case, the A/B data-gathering strategy
does provide better results:</p>
<div class="figure">
<object data="/images/correct_vs_sample_size.svg" type="image/svg+xml">
</object>
<p class="caption">If you can’t see the figure, please try another web browser which
supports <span class="caps">SVG</span> images</p>
</div>
<p>In this example, the A/B framework again provide better results at
smaller sample sizes. For instance, the A/B framework only requires
<span class="math">\(300\)</span> trials to be correct in <span class="math">\(9\)</span> campaigns out of
<span class="math">\(10\)</span>. To provide the <em>same</em> guarantee, the <span class="caps">MAB</span> strategy would
require <span class="math">\(600\)</span> trials: twice as many!</p>
<p>This can be explained the same way as above: providing each website
variant as often to the visitors to get the best-performing variant is
more efficient.</p>
</div>
<div class="section" id="a-b-testing-provides-more-accurate-estimations-of-the-differences">
<h2 id="ab-testing-provides-more-accurate-estimations-of-the-differences">A/B testing provides more accurate estimations of the differences</h2>
<p>A correct estimation of the differences in click-through rates is
important: it is required in methods relying on its quantifications
(e.g. multivariate analysis) but also to actually determine which
variant performs better:</p>
<div class="math">
\begin{equation*}
a &gt; b
\end{equation*}
</div>
<p>is equivalent to</p>
<div class="math">
\begin{equation*}
a - b &gt; 0\text{.}
\end{equation*}
</div>
<p>I estimated the difference in click-through rates (denoted
<span class="math">\(P(O|B) - P(O|A)\)</span> in the figure — see why here) in the same setup
as above and obtained the following figure:</p>
<div class="figure">
<object data="/images/difference_vs_sample_size.svg" type="image/svg+xml">
</object>
<p class="caption">If you can’t see the figure, please try another web browser which
supports <span class="caps">SVG</span> images</p>
</div>
<p>We can clearly see that the A/B testing data gathering method converges
toward the expected value of <span class="math">\(5\%\)</span> much faster than the <span class="caps">MAB</span>
strategy. Worse, that remaining difference, though small, actually goes
on and on for a very long time.</p>
</div>
<div class="section" id="summary-what-is-a-b-testing-good-at">
<h2 id="summary-what-is-ab-testing-good-at">Summary: what is A/B testing good at?</h2>
<p>These results illustrated what it means for A/B testing to answer the
question: <em>Is there a difference?</em></p>
<ul class="simple">
<li>Its data-gathering scheme is efficient: it provide a very high power
in the contingency tests, it leads to correct results more often at
lower sample rates than with the <span class="caps">MAB</span> strategy.</li>
<li>It makes use of the contingency test to actually answer that question
at a given “level of confidence” (significance and power). It is also
used to determine what the sample size should be.</li>
<li>It also generally provides more precise and more accurate estimates
of the actual difference in click-through rates.</li>
</ul>
<p>Beyond invalidating the most outrageous claims of the original blog
post, these results outline another very important fact: the <span class="caps">MAB</span>
strategy generally requires more samples to provide the exact same
guarantees as A/B testing. This means in particular that it would not be
fair to compare click-through rates at a single given sample size: that
sample size should be adjusted separately for each method in order to
provide the same guarantees.</p>
<p>From my perspective, this is just like an insurance seller undermining
the competition by comparing the price of his most basic contract with
the price of a broader contract (covering more stuff). Yes the basic
contract is cheaper but this is comparing apples to oranges. That said,
that basic contract might very well be sufficient and this is what I am
going to assess in the next post.</p>
</div>

  </div>

</article>


    <div id="ending_message">
      <p>&copy; Gaël. Built using <a href="http://getpelican.com" target="_blank">Pelican</a>. Theme by Giulio Fidente on <a href="https://github.com/gfidente/pelican-svbhack" target="_blank">github</a>. </p>
    </div>
  </main>
</body>
</html>