-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMAB-other-drawbacks.html
301 lines (290 loc) · 18 KB
/
MAB-other-drawbacks.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>MAB other drawbacks</title>
<link rel="stylesheet" href="/theme/css/main.css" />
<!--[if IE]>
<script src="https://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body id="index" class="home">
<header id="banner" class="body">
<h1><a href="/">And yet it moves! </a></h1>
<nav><ul>
<li class="active"><a href="/category/data-processing.html">Data processing</a></li>
</ul></nav>
</header><!-- /#banner -->
<section id="content" class="body">
<article>
<header>
<h1 class="entry-title">
<a href="/MAB-other-drawbacks.html" rel="bookmark"
title="Permalink to MAB other drawbacks"><span class="caps">MAB</span> other drawbacks</a></h1>
</header>
<div class="entry-content">
<footer class="post-info">
<abbr class="published" title="2016-05-19T14:58:00+01:00">
Published: jeu. 19 mai 2016
</abbr>
<address class="vcard author">
By <a class="url fn" href="/author/gael.html">Gaël</a>
</address>
<p>In <a href="/category/data-processing.html">Data processing</a>.</p>
</footer><!-- /.post-info --> <p>Until now, I have spent time (and <span class="caps">CPU</span> power) to precisely illustrate what the
two methods were designed for:</p>
<ul>
<li>A/B testing efficiently answers the question <em>is there a difference?</em></li>
<li>The <span class="caps">MAB</span> strategy provides the highest click-through rates over
a wider range of sample sizes.</li>
</ul>
<p>As their respective purposes are completely different, thinking of one of them as
intrinsically better than the other is nonsensical, especially as we have shown
that <strong>each</strong> method is clearly better than the other at what it was
designed for. Therefore saying that one is better than the other is saying that
a screwdriver is better than a hammer: it makes absolutely no sense because
each of them are useful for different things.</p>
<p>In this last post, I want to lay the emphasis on a few extra drawbacks of the
<span class="caps">MAB</span> strategy, mostly to show some side effects of the technique that were
seldom evoked in what I read.</p>
<h1 id="the-mab-strategy-does-skew-the-results">The <span class="caps">MAB</span> strategy does skew the results</h1>
<p>Results are skewed when there is a bias in the measurement process:
the “ideal” measured value is no longer equal to the hidden “true” value.</p>
<p>In statistics, there are a few dragons that are known to almost invariably
skew the results: changing the measurement process using the measurements
themselves is one of them.
The problem is that this kind of bias is hard for our brain to grasp
and, oddly enough, this is the very crux of most probability (false) “paradoxes”
(like the Monty-hall problem,
<a href="https://en.wikipedia.org/wiki/Monty_Hall_problem">wiki link</a>).</p>
<p>Back to the original blog post, the blogger claimed:</p>
<blockquote>
<p>Showing the different options at different rates will skew the results.
(No it won’t. You always have an estimate of the click through rate for
each choice)</p>
</blockquote>
<p>That claim is rather strange: having data does not mean these data are not
skewed and of course showing different options at different rate won’t
(usually) skew the results… however, as I said, changing these rates
<strong>according</strong> to the measurements most likely will…</p>
<p>So how can I prove that? I already gave an hint earlier in the series in that figure:</p>
<p><img alt="If you can't see the figure, please try another web browser which supports
SVG images" src="/images/difference_vs_sample_size.svg"/></p>
<p>As I said at the time, there seems to be a remaining difference at higher
sample sizes but it is not obvious and really small. I can also plot the
distribution of these differences:</p>
<p><img alt="If you can't see the figure, please try another web browser which supports
SVG images" src="/images/dist_difference_vs_sample_size.svg"/></p>
<p>and the same conclusion applies: <strong>if</strong> there is a skew, it is not really
visible: the two distributions are centered on <span class="math">\(5\%\)</span> and roughly
Gaussian-shaped (which allowed us to calculate and represent the arithmetic
mean as the expected value earlier).</p>
<p>But remember, what <em>usually</em> introduces a skew is altering some probabilities
somewhere using the data themselves as an input. This situation is
likely to occur more often if the difference in click-through rates is small
or even null. I therefore calculated the same kind of distributions but in a
setup with equal click-through rates:</p>
<p><img alt="If you can't see the figure, please try another web browser which supports
SVG images" src="/images/dist_zero_difference_vs_sample_size.svg"/></p>
<p>The Gaussian-shaped distribution suddenly turned into a very strange one:
it is still roughly symmetrical but the
expected value (<span class="math">\(0\%\)</span>) is actually <strong>not</strong> the most probable value anymore.</p>
<p>As we can already guess, we should be able to see what happens more clearly
by calculating the difference between the click-though rate of the variant which
happened to be mostly favored with the one which happened to be mostly
disfavored. The resulting distribution are represented in the following figure:</p>
<p><img alt="If you can't see the figure, please try another web browser which supports
SVG images" src="/images/abs_dist_zero_difference_vs_sample_size.svg"/></p>
<p>We can observe that:</p>
<ul>
<li>in the case of A/B testing, the distribution is a half of a Gaussian with
its maximal value reached at <span class="math">\(0\%\)</span>, as expected;</li>
<li>in the case of the <span class="caps">MAB</span> strategy, the distribution is not Gaussian anymore
and the maximal value is reach at <span class="math">\(0.4\%\)</span>.</li>
</ul>
<p>The <span class="caps">MAB</span> strategy therefore skews the data themselves by introducing a
difference where there shouldn’t be any.</p>
<p>This is an issue for two reasons:</p>
<ol>
<li>Actually the individual click-through rate distributions are skewed too,
which basically means that we cannot trust any contingency test performed
on the data gathered using the <span class="caps">MAB</span> strategy.</li>
<li>There is no way to know whether a given difference is <strong>actually</strong> a difference.</li>
</ol>
<p>These two extra reasons also justify why applying the <span class="caps">MAB</span> strategy to drug testing
and the likes is a very bad idea.</p>
<h1 id="no-test-no-confidence-no-conclusion">No test, no confidence: no conclusion?</h1>
<p>Most people seem to forget completely about the test part of A/B <strong>testing</strong>.
The reasoning here, as seen elsewhere, is that whether or not the test is
positive, what appears to be the best-performing variant would be used anyway.
Therefore the test itself is useless; A/B testing itself is
useless; A/B testing can be replaced with something providing higher
click-through rates.</p>
<p>But the actual <em>test</em> in A/B testing is actually intended as a feedback, a way to
<strong>estimate</strong> your confidence in the results you obtained and make decisions
with both pieces of informations. In contrast, the <span class="caps">MAB</span> strategy seems to be
used far more often relying on one’s “good luck”: it is <strong>assumed</strong> that the
population is large enough to eventually provide meaningful results and the
problem is precisely that this is <strong>usually</strong> true.</p>
<p>Think of a successful campaign as a light bubble. When you switch it on, you
expect to get light and this is what <strong>usually</strong> happens.
The <span class="caps">MAB</span> strategy is like
saying “let’s ask a blind person to turn that light on: he will move more easily in
the dark, in that sense he will be more efficient for the job”.
On the other hand, the A/B testing method would be more like saying
“let’s ask a sighted person
to switch it on because he needs to know whether he was successful and report
back”. Yes, in most cases, the light will be on anyway!
However, there is no way to detect that the light did not turn on for whatever
reason in the <span class="caps">MAB</span> strategy.</p>
<p>Mind you, the final and allegedly real results given in the original blog
post do not even pass the contingency test (the maximal <span class="math">\(p\)</span>-values is only <span class="math">\(0.7\)</span>).
So strictly speaking, his conclusions are questionable, especially given that
the <span class="caps">MAB</span> strategy is known to introduce such a difference.</p>
<h1 id="conclusion">Conclusion</h1>
<h2 id="other-drawbacks">Other drawbacks</h2>
<p>There are a few moot moot points that I did not evoke such as the effect of
time-varying click-through rates, the effect of the lack of equally-sampled
control group, the implications of the actual overlap of the click-through rate
distributions depending on the method used, the relative spread of the
distributions, etc. I decided not to include them because the most disastrous
outcomes would require stringent (albeit not that rare) requirements.
I did not want to weaken the whole series because some would dismiss these
arguments saying “what are the odds?”.</p>
<h2 id="original-post">Original post</h2>
<p>I’ve generally not spoken about the original blog itself throughout
the series. The main reason is that everyone has the right not to be completely
right and nagging on each moot point would not have been right.</p>
<p>That said, is seems important to indicate that if one does not want to talk
about actual A/B <strong>testing</strong>, but A/B testing <strong>data-gathering</strong> strategy, it would
have been less misleading to talk about <em><span class="caps">MAB</span> with <span class="math">\(\varepsilon = 0\%\)</span></em>
vs. <em><span class="caps">MAB</span> with <span class="math">\(\varepsilon = 90\%\)</span></em> as it is rigorously what is compared in
that post.</p>
<p>As well, A/B testing was not designed in a set-and-forget state of mind: there
are two different stages (<em>testing</em> and <em>exploitation</em>) that must be used. In
particular, conflating the two as done in the original post is misleading. </p>
<p>Finally, sorry to raise that specific point but it illustrates well the degree
of understanding of the original blogger:</p>
<blockquote>
<p>In the epsilon-first strategy, you can explore 100% of the time in
the beginning and once you have a good sample, switch to pure-greedy.</p>
</blockquote>
<p>is an <strong>exact and accurate</strong> description of what A/B testing is (assuming the
“good sample” part is assessed with a contingency test).</p>
<p>I won’t go further down that road: the point is made and not knowing that tiny
bit is a shame for someone who blogged about it. But it is perfectly
understandable and fine (it happens to everyone and I guess that series of
posts does contain its share of inaccuracies and incomplete understanding).
In particular, this does not make him a bad developer, far from it.</p>
<h2 id="of-hammers-and-screwdrivers">Of hammers and screwdrivers</h2>
<p>As surprising as it may appear, A/B testing and the multi-armed bandit strategy
were designed for two completely different purposes:</p>
<ul>
<li>The purpose of A/B testing is to determine whether or not there is a
difference and provide that answer (with the uncertainties) through a
statistical test.</li>
<li>The purpose of the multi-armed bandit strategy is to maximize the reward
over a large range of sample sizes. Usually it can (and rigorously should)
be used in a set-and-forget mode.</li>
</ul>
<p>These methods are just like a hammer and a screwdriver: both can be used to do
nearly the same thing, yet cannot really be freely swapped.</p>
<h2 id="when-should-i-choose-ab-testing">When should I choose A/B testing?</h2>
<ul>
<li>When you need to really determine which variant is better (e.g. drug trials).</li>
<li>When you need to estimate the difference precisely (e.g. multivariate analysis).</li>
<li>When you need the shortest testing period possible for a given confidence
in the results.</li>
<li>When you need to know the uncertainties on the results.</li>
<li>When you need to assess that there is actually no difference.</li>
</ul>
<h2 id="when-should-i-use-the-mab-strategy">When should I use the <span class="caps">MAB</span> strategy?</h2>
<ul>
<li>When being sometimes wrong without any indication of it is not that important.</li>
<li>When you cannot setup an optimal A/B testing campaign because you lack too
much information (minimum population size, estimate of the difference, etc.).</li>
</ul>
<h2 id="the-end">The End?</h2>
<p>What this post series lacks (as most post out there) are references: I am
pretty sure that all this work and far far more has already been done by people
out there. Though I cannot personally recommend it (as I have not read it), I
know there is a short <a href="http://shop.oreilly.com/product/0636920027393.do?sortby=publicationDate">O’Reilly book</a>
on the subject of bandit algorithms in general. I also know that there are a
lot of references in Google scholar about both methods.</p>
<p>Literature scanning is not just for scientist: it is necessary to efficiently
reuse the knowledge humankind already gathered on the subject in order to
eventually avoid the traps, pitfalls and inefficiencies caused by starting from
scratch. I am sure a lot of great minds already worked on this, this is what
you should read, not some random blog posts on the internet if you really want
to do things correctly.</p>
<p>As far as this blog is concerned, it just started out as a few tests and it is
mainly intended to show once again that the world is not just black and white
by testing some of the original blogger claims.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
var location_protocol = (false) ? 'https' : document.location.protocol;
if (location_protocol !== 'http' && location_protocol !== 'https') location_protocol = 'https:';
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = location_protocol + '//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>
</div><!-- /.entry-content -->
</article>
</section>
<section id="extras" class="body">
</section><!-- /#extras -->
<footer id="contentinfo" class="body">
<address id="about" class="vcard body">
Proudly powered by <a href="http://getpelican.com/">Pelican</a>, which takes great advantage of <a href="http://python.org">Python</a>.
</address><!-- /#about -->
<p>The theme is by <a href="http://coding.smashingmagazine.com/2009/08/04/designing-a-html-5-layout-from-scratch/">Smashing Magazine</a>, thanks!</p>
</footer><!-- /#contentinfo -->
</body>
</html>