-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
670 lines (495 loc) · 206 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
<!DOCTYPE html>
<html lang="zh-cmn-Hans">
<head>
<!-- hexo-inject:begin --><!-- hexo-inject:end --><meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>nameoverflow</title>
<!--link rel="stylesheet" href="//cdn.jsdelivr.net/highlight.js/9.10.0/styles/github-gist.min.css"-->
<link rel="stylesheet" href="//cdn.jsdelivr.net/highlight.js/9.10.0/styles/github-gist.min.css">
<link rel="stylesheet" href="/css/style.css"><!-- hexo-inject:begin --><link rel='stylesheet' href='https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.5.1/katex.min.css'><!-- hexo-inject:end -->
</head>
<body>
<!-- hexo-inject:begin --><!-- hexo-inject:end --><div class="Shell">
<aside class='SideBar'>
<section class='avatar' style="background-image: url(/assets/header.png)">
<div class='av-pic' style="background-image: url(/assets/tree_small.png)">
</div>
</section>
<section class='menu'>
<div>nameoverflow</div>
<div>What the f__k?</div>
<ul>
<a href="/" class="Btn">
<li>Home</li>
</a>
<a href="/archives/" class="Btn">
<li>Archive</li>
</a>
<a href="/tags/" class="Btn">
<li>Tags</li>
</a>
<a href="/about/" class="Btn">
<li>About</li>
</a>
</ul>
</section>
<section class="media">
<a href="https://github.com/nameoverflow">
<img src="/assets/github.svg" />
</a>
<a href="https://www.facebook.com/profile.php?id=100004252391322">
<img src="/assets/facebook.svg" />
</a>
</section>
</aside>
<div class="container">
<div data-pager-shell>
<ul class="Index">
<li>
<article class='ListView'>
<header class="title">
<h1>
<a href="/2019/09/06/bayesian-neural-network/">Bayesian Neural Networks:贝叶斯神经网络</a>
</h1>
<div class='ListMeta'>
<time datetime="2019-09-06T03:45:14.000Z" itemprop="datePublished">
2019-09-06
</time>
|
<ul>
<li class="meta-text">
{ <a href="/tags/neural-network/">neural network</a> }
</li>
<li class="meta-text">
{ <a href="/tags/deep-learning/">deep learning</a> }
</li>
<li class="meta-text">
{ <a href="/tags/variational-inference/">variational inference</a> }
</li>
</ul>
</div>
</header>
<div>
<link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css"><p>贝叶斯神经网络,简单来说可以理解为通过为神经网络的权重引入不确定性进行正则化(regularization),也相当于集成(ensemble)某权重分布上的无穷多组神经网络进行预测。</p>
<p>本文主要基于 Charles et al. 2015<sup id="fnref:1"><a href="#fn:1" rel="footnote"><span class="hint--top hint--error hint--medium hint--rounded hint--bounce" aria-label="https://arxiv.org/abs/1505.05424
">[1]</span></a></sup>。</p>
<p>另发表于<a href="https://zhuanlan.zhihu.com/p/81170602" target="_blank" rel="external">知乎</a>。</p>
<h2 id="神经网络的概率模型"><a href="#神经网络的概率模型" class="headerlink" title="神经网络的概率模型"></a>神经网络的概率模型</h2><p>众所周知,一个神经网络模型可以视为一个条件分布模型 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">y</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="bold">x</mi></mrow><mo separator="true">,</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo></mrow><annotation encoding="application/x-tex">P(\mathbf{y}|\mathbf{x},\mathbf{w})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">y</span></span><span class="mord mathrm">∣</span><span class="mord textstyle uncramped"><span class="mord mathbf">x</span></span><span class="mpunct">,</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span></span></span></span> :输入 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">x</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{x}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.44444em;"></span><span class="strut bottom" style="height:0.44444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf">x</span></span></span></span></span> ,输出预测值 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">y</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{y}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.44444em;"></span><span class="strut bottom" style="height:0.63888em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">y</span></span></span></span></span> 的分布, <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">w</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{w}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.44444em;"></span><span class="strut bottom" style="height:0.44444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span></span>为神经网络中的权重。在分类问题中这个分布对应各类的概率,在回归问题中一般认为是(标准差固定的)高斯(Gaussian)分布并取均值作为预测结果。相应地,神经网络的学习可以视作是一个最大似然估计(Maximum Likelihood Estimation, MLE):</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><msup><mrow><mi mathvariant="bold">w</mi></mrow><mrow><mi mathvariant="normal">M</mi><mi mathvariant="normal">L</mi><mi mathvariant="normal">E</mi></mrow></msup></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mi>arg</mi><msub><mi>max</mi><mrow><mi mathvariant="bold">w</mi></mrow></msub><mi>log</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mi>arg</mi><msub><mi>max</mi><mrow><mi mathvariant="bold">w</mi></mrow></msub><msub><mo>∑</mo><mi>i</mi></msub><mi>log</mi><mi>P</mi><mo>(</mo><msub><mrow><mi mathvariant="bold">y</mi></mrow><mi>i</mi></msub><mi mathvariant="normal">∣</mi><msub><mrow><mi mathvariant="bold">x</mi></mrow><mi>i</mi></msub><mo separator="true">,</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned} \mathbf{w}^\mathrm{MLE}&=\arg\max_\mathbf{w}\log P(\mathcal{D}|\mathbf{w})\\
&=\arg\max_\mathbf{w}\sum_i\log P(\mathbf{y}_i|\mathbf{x}_i,\mathbf{w}) \end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:2.2095025em;"></span><span class="strut bottom" style="height:3.919005em;vertical-align:-1.7095024999999997em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-1.3181715000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">M</span><span class="mord mathrm">L</span><span class="mord mathrm">E</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span><span style="top:0.43183349999999976em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-1.3181715000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mop">ar<span style="margin-right:0.01389em;">g</span></span><span class="mop op-limits"><span class="vlist"><span style="top:0.6em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="mop">max</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span></span></span><span style="top:0.43183349999999976em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mop">ar<span style="margin-right:0.01389em;">g</span></span><span class="mop op-limits"><span class="vlist"><span style="top:0.6em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="mop">max</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mop op-limits"><span class="vlist"><span style="top:1.1776689999999999em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span style="top:-0.000005000000000143778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="op-symbol large-op mop">∑</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">y</span></span><span class="vlist"><span style="top:0.24444em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf">x</span></span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mpunct">,</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>其中 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="script">D</mi></mrow></mrow><annotation encoding="application/x-tex">\mathcal{D}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span></span></span></span> 对应我们用来训练的数据集(dataset)。回归问题中我们代入高斯分布就可以得到平均平方误差(Mean Squared Error, MSE),分类问题则代入逻辑函数(logistic)可以推出交叉熵(cross-entropy)。求神经网络的极小值点一般使用梯度下降,基于反向传播(back-propagation, BP)实现。<br>MLE 中不对 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">w</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{w}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.44444em;"></span><span class="strut bottom" style="height:0.44444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span></span> 的先验概率作假设,也就是认为 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">w</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{w}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.44444em;"></span><span class="strut bottom" style="height:0.44444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span></span> 取什么值的机会都均等。如果为 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">w</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{w}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.44444em;"></span><span class="strut bottom" style="height:0.44444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span></span> 引入先验,那就变成了最大后验估计(Maximum Posteriori, MAP):</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><msup><mrow><mi mathvariant="bold">w</mi></mrow><mrow><mi mathvariant="normal">M</mi><mi mathvariant="normal">A</mi><mi mathvariant="normal">P</mi></mrow></msup></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mi>arg</mi><msub><mi>max</mi><mrow><mi mathvariant="bold">w</mi></mrow></msub><mi>log</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mi>arg</mi><msub><mi>max</mi><mrow><mi mathvariant="bold">w</mi></mrow></msub><mi>log</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mo>+</mo><mi>log</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned} \mathbf{w}^\mathrm{MAP}&=\arg\max_\mathbf{w}\log P(\mathbf{w}|\mathcal{D})\\
&=\arg\max_\mathbf{w}\log P(\mathcal{D}|\mathbf{w}) + \log P(\mathbf{w}) \end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.8156655000000002em;"></span><span class="strut bottom" style="height:3.1313310000000003em;vertical-align:-1.3156655000000002em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.9243345000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">M</span><span class="mord mathrm">A</span><span class="mord mathrm">P</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span><span style="top:0.6156655000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-0.9243345000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mop">ar<span style="margin-right:0.01389em;">g</span></span><span class="mop op-limits"><span class="vlist"><span style="top:0.6em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="mop">max</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span></span></span><span style="top:0.6156655000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mop">ar<span style="margin-right:0.01389em;">g</span></span><span class="mop op-limits"><span class="vlist"><span style="top:0.6em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="mop">max</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mbin">+</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>代入高斯分布可以推出 L2 正则化(倾向于取小值),代入拉普拉斯分布(Laplace)可以推出 L1 正则化(倾向于取 0 使权重稀疏)。</p>
<h2 id="贝叶斯起来了!"><a href="#贝叶斯起来了!" class="headerlink" title="贝叶斯起来了!"></a>贝叶斯起来了!</h2><p>贝叶斯估计(bayesian estimation)同样引入先验假设,与 MAP 的区别是贝叶斯估计求出 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">w</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{w}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.44444em;"></span><span class="strut bottom" style="height:0.44444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span></span> 的后验分布 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo></mrow><annotation encoding="application/x-tex">P(\mathbf{w}|\mathcal{D})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span></span></span></span> ,而不限于 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>arg</mi><mi>max</mi></mrow><annotation encoding="application/x-tex">\arg\max</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mop">ar<span style="margin-right:0.01389em;">g</span></span><span class="mop">max</span></span></span></span> 值,这样我们就可以为神经网络的预测引入不确定性。由于我们求得的是分布,基于 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">w</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{w}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.44444em;"></span><span class="strut bottom" style="height:0.44444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span></span> 由输入 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mover accent="true"><mrow><mrow><mi mathvariant="bold">x</mi></mrow></mrow><mo>^</mo></mover></mrow><annotation encoding="application/x-tex">\hat{\mathbf{x}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.70788em;"></span><span class="strut bottom" style="height:0.70788em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathbf">x</span></span></span></span><span style="top:-0.013440000000000007em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>^</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span> 预测 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mover accent="true"><mrow><mrow><mi mathvariant="bold">y</mi></mrow></mrow><mo>^</mo></mover></mrow><annotation encoding="application/x-tex">\hat{\mathbf{y}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.70788em;"></span><span class="strut bottom" style="height:0.90232em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">y</span></span></span></span><span style="top:-0.013440000000000007em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>^</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span> 的概率模型就变成了:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>P</mi><mo>(</mo><mover accent="true"><mrow><mrow><mi mathvariant="bold">y</mi></mrow></mrow><mo>^</mo></mover><mi mathvariant="normal">∣</mi><mover accent="true"><mrow><mrow><mi mathvariant="bold">x</mi></mrow></mrow><mo>^</mo></mover><mo>)</mo><mo>=</mo><msub><mrow><mi mathvariant="double-struck">E</mi></mrow><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo></mrow></msub><mo>[</mo><mi>P</mi><mo>(</mo><mover accent="true"><mrow><mrow><mi mathvariant="bold">y</mi></mrow></mrow><mo>^</mo></mover><mi mathvariant="normal">∣</mi><mover accent="true"><mrow><mrow><mi mathvariant="bold">x</mi></mrow></mrow><mo>^</mo></mover><mo separator="true">,</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mo>]</mo></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}
P(\hat{\mathbf{y}}|\hat{\mathbf{x}})=\mathbb{E}_{P(\mathbf{w}|\mathcal{D})}[P(\hat{\mathbf{y}}|\hat{\mathbf{x}},\mathbf{w})]
\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8500000000000001em;"></span><span class="strut bottom" style="height:1.2000000000000002em;vertical-align:-0.35000000000000003em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.010000000000000009em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle cramped"><span class="mord displaystyle textstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">y</span></span></span></span><span style="top:-0.013440000000000007em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>^</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle cramped"><span class="mord displaystyle textstyle cramped"><span class="mord mathbf">x</span></span></span></span><span style="top:-0.013440000000000007em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>^</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span><span class="mrel">=</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbb">E</span></span><span class="vlist"><span style="top:0.18019999999999992em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord scriptstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord scriptstyle cramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle cramped"><span class="mord displaystyle textstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">y</span></span></span></span><span style="top:-0.013440000000000007em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>^</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle cramped"><span class="mord displaystyle textstyle cramped"><span class="mord mathbf">x</span></span></span></span><span style="top:-0.013440000000000007em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>^</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mpunct">,</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mclose">]</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>这样我们每次预测 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mover accent="true"><mrow><mrow><mi mathvariant="bold">y</mi></mrow></mrow><mo>^</mo></mover></mrow><annotation encoding="application/x-tex">\hat{\mathbf{y}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.70788em;"></span><span class="strut bottom" style="height:0.90232em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">y</span></span></span></span><span style="top:-0.013440000000000007em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>^</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span> 都得求个期望,问题是这个期望我们并不可能真的算出来,因为这就相当于要计算在 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo></mrow><annotation encoding="application/x-tex">P(\mathbf{w}|\mathcal{D})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span></span></span></span> 上的所有可能的神经网络的预测值。<br>另一方面,求后验分布 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo></mrow><annotation encoding="application/x-tex">P(\mathbf{w}|\mathcal{D})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span></span></span></span> 也是件麻烦的事情。众所周知,根据贝叶斯理论,求 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo></mrow><annotation encoding="application/x-tex">P(\mathbf{w}|\mathcal{D})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span></span></span></span> 需要通过:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo><mo>=</mo><mfrac><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo separator="true">,</mo><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo></mrow><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo></mrow></mfrac><mo>=</mo><mfrac><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo></mrow><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo></mrow></mfrac></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}
P(\mathbf{w}|\mathcal{D})= \frac{P(\mathbf{w},\mathcal{D})}{P(\mathcal{D})} =\frac{P(\mathcal{D}|\mathbf{w})P(\mathbf{w})}{P(\mathcal{D})}
\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.4315em;"></span><span class="strut bottom" style="height:2.363em;vertical-align:-0.9315000000000001em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.0044999999999999485em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle cramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span></span></span></span><span style="top:-0.2300000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mpunct">,</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle cramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span></span></span></span><span style="top:-0.2300000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mord mathrm">∣</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>这东西也是难解(intractable)的。</p>
<p>所以,为了在神经网络中引入贝叶斯估计,需要找到方法近似这些东西,并且最好能转化成为求解优化(optimization)问题的形式,这样比较符合我们炼丹师的追求。</p>
<h2 id="变分估计"><a href="#变分估计" class="headerlink" title="变分估计"></a>变分估计</h2><p>利用变分(variational)的方法,我们可以使用一个由一组参数 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>θ</mi></mrow><annotation encoding="application/x-tex">\theta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.69444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.02778em;">θ</span></span></span></span> 控制的分布 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">q(\mathbf{w}|\theta)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span> 去逼近真正的后验 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo></mrow><annotation encoding="application/x-tex">P(\mathbf{w}|\mathcal{D})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span></span></span></span> ,比如用高斯来近似的话 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>θ</mi></mrow><annotation encoding="application/x-tex">\theta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.69444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.02778em;">θ</span></span></span></span> 就是 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mo>(</mo><mi>μ</mi><mo separator="true">,</mo><mi>σ</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">(\mu,\sigma)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mopen">(</span><span class="mord mathit">μ</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.03588em;">σ</span><span class="mclose">)</span></span></span></span>,这样就把求后验分布的问题转化成了求最好的 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>θ</mi></mrow><annotation encoding="application/x-tex">\theta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.69444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.02778em;">θ</span></span></span></span> 这样的优化问题。这个过程可以通过最小化两个分布的 KL 散度(Kullback-Leibler divergence)实现:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><msup><mi>θ</mi><mo>∗</mo></msup></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mi>arg</mi><msub><mi>min</mi><mi>θ</mi></msub><msub><mi>D</mi><mrow><mi mathvariant="normal">K</mi><mi mathvariant="normal">L</mi></mrow></msub><mo>[</mo><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo><mi mathvariant="normal">∣</mi><mi mathvariant="normal">∣</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="script">D</mi></mrow><mo>)</mo><mo>]</mo></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mi>arg</mi><msub><mi>min</mi><mi>θ</mi></msub><mo>∫</mo><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo><mi>log</mi><mfrac><mrow><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo></mrow><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mi>P</mi><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo></mrow></mfrac><mi>d</mi><mrow><mi mathvariant="bold">w</mi></mrow></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mi>arg</mi><msub><mi>min</mi><mi>θ</mi></msub><msub><mi>D</mi><mrow><mi mathvariant="normal">K</mi><mi mathvariant="normal">L</mi></mrow></msub><mo>[</mo><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo><mi mathvariant="normal">∣</mi><mi mathvariant="normal">∣</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mo>]</mo><mo>−</mo><msub><mrow><mi mathvariant="double-struck">E</mi></mrow><mrow><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo></mrow></msub><mo>[</mo><mi>log</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mo>]</mo></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}
\theta^*&=\arg\min_\theta D_\mathrm{KL}[q(\mathbf{w}|\theta)||P(\mathbf{w}|\mathcal{D})]\\
&=\arg\min_\theta \int q(\mathbf{w}|\theta)\log \frac{q(\mathbf{w}|\theta)}{P(\mathbf{w})P(\mathcal{D}|\mathbf{w})} d\mathbf{w}\\
&=\arg\min_\theta D_\mathrm{KL}[q(\mathbf{w}|\theta)||P(\mathbf{w})] - \mathbb{E}_{q(\mathbf{w}|\theta)}[\log P(\mathcal{D}|\mathbf{w})]
\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:3.0236079999999994em;"></span><span class="strut bottom" style="height:5.547215999999999em;vertical-align:-2.523608em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-2.1836079999999995em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span><span style="top:-0.0044999999999998375em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span style="top:1.7715em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-2.1836079999999995em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mop">ar<span style="margin-right:0.01389em;">g</span></span><span class="mop op-limits"><span class="vlist"><span style="top:0.6521079999999999em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit" style="margin-right:0.02778em;">θ</span></span></span><span style="top:-2.7755575615628914e-17em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="mop">min</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit" style="margin-right:0.02778em;">D</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.02778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathrm">K</span><span class="mord mathrm">L</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mord mathrm">∣</span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mclose">)</span><span class="mclose">]</span></span></span><span style="top:-0.0044999999999998375em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mop">ar<span style="margin-right:0.01389em;">g</span></span><span class="mop op-limits"><span class="vlist"><span style="top:0.6521079999999999em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit" style="margin-right:0.02778em;">θ</span></span></span><span style="top:-2.7755575615628914e-17em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="mop">min</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="op-symbol large-op mop" style="margin-right:0.44445em;top:-0.0011249999999999316em;">∫</span><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle cramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mord mathrm">∣</span><span class="mord textstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span></span></span></span><span style="top:-0.2300000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mord mathit">d</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span><span style="top:1.7715em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mop">ar<span style="margin-right:0.01389em;">g</span></span><span class="mop op-limits"><span class="vlist"><span style="top:0.6521079999999999em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit" style="margin-right:0.02778em;">θ</span></span></span><span style="top:-2.7755575615628914e-17em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="mop">min</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit" style="margin-right:0.02778em;">D</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.02778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathrm">K</span><span class="mord mathrm">L</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mord mathrm">∣</span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mclose">]</span><span class="mbin">−</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbb">E</span></span><span class="vlist"><span style="top:0.18019999999999992em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord scriptstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mclose">]</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>这样看起来比前式好多了。写成目标函数(objective function)的形式就是:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mrow><mi mathvariant="script">F</mi></mrow><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mo separator="true">,</mo><mi>θ</mi><mo>)</mo><mo>=</mo><msub><mi>D</mi><mrow><mi mathvariant="normal">K</mi><mi mathvariant="normal">L</mi></mrow></msub><mo>[</mo><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo><mi mathvariant="normal">∣</mi><mi mathvariant="normal">∣</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mo>]</mo><mo>−</mo><msub><mrow><mi mathvariant="double-struck">E</mi></mrow><mrow><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo></mrow></msub><mo>[</mo><mi>log</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mo>]</mo></mrow></mtd></mtr></mtable><mtext> </mtext><mtext> </mtext><mtext> </mtext><mtext> </mtext><mtext> </mtext><mo>(</mo><mn>1</mn><mo>)</mo></mrow><annotation encoding="application/x-tex">\begin{aligned}
\mathcal{F}(\mathcal{D},\theta)=D_\mathrm{KL}[q(\mathbf{w}|\theta)||P(\mathbf{w})]-\mathbb{E}_{q(\mathbf{w}|\theta)}[\log P(\mathcal{D}|\mathbf{w})]
\end{aligned}\ \ \ \ \ (1)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8500000000000001em;"></span><span class="strut bottom" style="height:1.2000000000000002em;vertical-align:-0.35000000000000003em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.010000000000000009em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.09931em;">F</span></span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord"><span class="mord mathit" style="margin-right:0.02778em;">D</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.02778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathrm">K</span><span class="mord mathrm">L</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mord mathrm">∣</span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mclose">]</span><span class="mbin">−</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbb">E</span></span><span class="vlist"><span style="top:0.18019999999999992em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord scriptstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mclose">]</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span><span class="mord mspace"> </span><span class="mord mspace"> </span><span class="mord mspace"> </span><span class="mord mspace"> </span><span class="mord mspace"> </span><span class="mopen">(</span><span class="mord mathrm">1</span><span class="mclose">)</span></span></span></span></span>
<p>这个其实仍然没法算出来,但是至少长得更像能算出来的东西。第一项就是我们的变分后验与先验的 KL 散度;第二项的取值依赖了训练数据。 把第一项叫作复杂性代价(complexity cost),描述的是权重和先验的契合程度;把第二项叫作似然代价(likelihood cost),描述对样本的拟合程度。优化这个目标函数可以看作是炼丹师们最熟悉的正则化,在两种代价中取平衡。</p>
<p>对于 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo></mrow><annotation encoding="application/x-tex">P(\mathbf{w})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span></span></span></span> 的形式, 给出了一个混合尺度高斯先验(scale mixture gaussian prior):</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mo>=</mo><msub><mo>∏</mo><mrow><mi>j</mi></mrow></msub><mi>π</mi><mrow><mi mathvariant="script">N</mi></mrow><mrow><mo fence="true">(</mo><msub><mrow><mi mathvariant="bold">w</mi></mrow><mrow><mi>j</mi></mrow></msub><mi mathvariant="normal">∣</mi><mn>0</mn><mo separator="true">,</mo><msubsup><mi>σ</mi><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></msubsup><mo fence="true">)</mo></mrow><mo>+</mo><mo>(</mo><mn>1</mn><mo>−</mo><mi>π</mi><mo>)</mo><mrow><mi mathvariant="script">N</mi></mrow><mrow><mo fence="true">(</mo><msub><mrow><mi mathvariant="bold">w</mi></mrow><mrow><mi>j</mi></mrow></msub><mi mathvariant="normal">∣</mi><mn>0</mn><mo separator="true">,</mo><msubsup><mi>σ</mi><mrow><mn>2</mn></mrow><mrow><mn>2</mn></mrow></msubsup><mo fence="true">)</mo></mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}
P(\mathbf{w})=\prod_{j} \pi \mathcal{N}\left(\mathbf{w}_{j} | 0, \sigma_{1}^{2}\right)+(1-\pi) \mathcal{N}\left(\mathbf{w}_{j} | 0, \sigma_{2}^{2}\right)
\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.481891em;"></span><span class="strut bottom" style="height:2.463782em;vertical-align:-0.9818910000000001em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.431886em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mrel">=</span><span class="mop op-limits"><span class="vlist"><span style="top:1.177669em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span style="top:-0.000005000000000032756em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="op-symbol large-op mop">∏</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit" style="margin-right:0.03588em;">π</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.14736em;">N</span></span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size1">(</span></span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class="mord mathrm">0</span><span class="mpunct">,</span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">σ</span><span class="vlist"><span style="top:0.247em;margin-left:-0.03588em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathrm">1</span></span></span></span><span style="top:-0.4129999999999999em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">2</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size1">)</span></span></span><span class="mbin">+</span><span class="mopen">(</span><span class="mord mathrm">1</span><span class="mbin">−</span><span class="mord mathit" style="margin-right:0.03588em;">π</span><span class="mclose">)</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.14736em;">N</span></span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size1">(</span></span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class="mord mathrm">0</span><span class="mpunct">,</span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">σ</span><span class="vlist"><span style="top:0.247em;margin-left:-0.03588em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathrm">2</span></span></span></span><span style="top:-0.4129999999999999em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">2</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size1">)</span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>即每个权重其分布的先验都是两种相同均值、不同标准差的高斯分布的叠加。</p>
<p>下一步要做的就是继续对目标函数取近似,直到能求出来为止。</p>
<h2 id="遇事不决,蒙特卡罗"><a href="#遇事不决,蒙特卡罗" class="headerlink" title="遇事不决,蒙特卡罗"></a>遇事不决,蒙特卡罗</h2><p>蒙特卡罗方法(Monte Carlo method)是刻在炼丹师 DNA 里的方法。(1) 中有一个期望不好求,可以使用这种喜闻乐见的办法弄出来。</p>
<p>众所周知,同样利用贝叶斯估计推导出来的变分自编码器(Variational Auto-Encoder, VAE)<sup id="fnref:2"><a href="#fn:2" rel="footnote"><span class="hint--top hint--error hint--medium hint--rounded hint--bounce" aria-label="https://arxiv.org/abs/1312.6114
">[2]</span></a></sup> 引入了一个妙不可言的重参数化(reparameterize)操作:对于 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>z</mi><mo>∼</mo><mrow><mi mathvariant="script">N</mi></mrow><mo>(</mo><mi>μ</mi><mo separator="true">,</mo><msup><mi>σ</mi><mn>2</mn></msup><mo>)</mo></mrow><annotation encoding="application/x-tex">z\sim \mathcal{N}(\mu,\sigma^2)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8141079999999999em;"></span><span class="strut bottom" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.04398em;">z</span><span class="mrel">∼</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.14736em;">N</span></span><span class="mopen">(</span><span class="mord mathit">μ</span><span class="mpunct">,</span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">σ</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathrm">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span></span></span></span> ,直接从 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="script">N</mi></mrow><mo>(</mo><mi>μ</mi><mo separator="true">,</mo><msup><mi>σ</mi><mn>2</mn></msup><mo>)</mo></mrow><annotation encoding="application/x-tex">\mathcal{N}(\mu,\sigma^2)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8141079999999999em;"></span><span class="strut bottom" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.14736em;">N</span></span><span class="mopen">(</span><span class="mord mathit">μ</span><span class="mpunct">,</span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">σ</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathrm">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span></span></span></span> 采样(sample)会使得 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>μ</mi></mrow><annotation encoding="application/x-tex">\mu</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">μ</span></span></span></span> 和 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>σ</mi></mrow><annotation encoding="application/x-tex">\sigma</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">σ</span></span></span></span> 变得不可微;为了得到它们的梯度,将 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>z</mi></mrow><annotation encoding="application/x-tex">z</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.04398em;">z</span></span></span></span> 重写为 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>z</mi><mo>=</mo><mi>σ</mi><mi>ϵ</mi><mo>+</mo><mi>μ</mi></mrow><annotation encoding="application/x-tex">z=\sigma \epsilon+\mu</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.58333em;"></span><span class="strut bottom" style="height:0.7777700000000001em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.04398em;">z</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.03588em;">σ</span><span class="mord mathit">ϵ</span><span class="mbin">+</span><span class="mord mathit">μ</span></span></span></span> ,其中 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ϵ</mi><mo>∼</mo><mrow><mi mathvariant="script">N</mi></mrow><mo>(</mo><mn>0</mn><mo separator="true">,</mo><mn>1</mn><mo>)</mo></mrow><annotation encoding="application/x-tex">\epsilon\sim \mathcal{N}(0,1)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ϵ</span><span class="mrel">∼</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.14736em;">N</span></span><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mpunct">,</span><span class="mord mathrm">1</span><span class="mclose">)</span></span></span></span> ,这样便可以先从标准高斯分布采样出随机量,然后可导地引入 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>μ</mi></mrow><annotation encoding="application/x-tex">\mu</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">μ</span></span></span></span> 和 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>σ</mi></mrow><annotation encoding="application/x-tex">\sigma</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">σ</span></span></span></span> 。</p>
<p>对此进行了推广,证明了对一个随机变量 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ϵ</mi></mrow><annotation encoding="application/x-tex">\epsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ϵ</span></span></span></span> 和概率密度 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>q</mi><mo>(</mo><mi>ϵ</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">q(\epsilon)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord mathit">ϵ</span><span class="mclose">)</span></span></span></span> ,只要能满足 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>q</mi><mo>(</mo><mi>ϵ</mi><mo>)</mo><mi>d</mi><mi>ϵ</mi><mo>=</mo><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo><mi>d</mi><mrow><mi mathvariant="bold">w</mi></mrow></mrow><annotation encoding="application/x-tex">q(\epsilon)d\epsilon=q(\mathbf{w}|\theta)d\mathbf{w}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord mathit">ϵ</span><span class="mclose">)</span><span class="mord mathit">d</span><span class="mord mathit">ϵ</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mord mathit">d</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span></span>,则对于期望也可以使用类似操作得到可导的对期望偏导的无偏估计:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mfrac><mrow><mi mathvariant="normal">∂</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>θ</mi></mrow></mfrac><msub><mrow><mi mathvariant="double-struck">E</mi></mrow><mrow><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo></mrow></msub><mo>[</mo><mi>f</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo separator="true">,</mo><mi>θ</mi><mo>)</mo><mo>]</mo><mo>=</mo><msub><mrow><mi mathvariant="double-struck">E</mi></mrow><mrow><mi>q</mi><mo>(</mo><mi>ϵ</mi><mo>)</mo></mrow></msub><mrow><mo fence="true">[</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>f</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo separator="true">,</mo><mi>θ</mi><mo>)</mo></mrow><mrow><mi mathvariant="normal">∂</mi><mrow><mi mathvariant="bold">w</mi></mrow></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mrow><mi mathvariant="bold">w</mi></mrow></mrow><mrow><mi mathvariant="normal">∂</mi><mi>θ</mi></mrow></mfrac><mo>+</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>f</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo separator="true">,</mo><mi>θ</mi><mo>)</mo></mrow><mrow><mi mathvariant="normal">∂</mi><mi>θ</mi></mrow></mfrac><mo fence="true">]</mo></mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}
\frac{\partial}{\partial \theta} \mathbb{E}_{q(\mathbf{w} | \theta)}[f(\mathbf{w}, \theta)]=\mathbb{E}_{q(\epsilon)}\left[\frac{\partial f(\mathbf{w}, \theta)}{\partial \mathbf{w}} \frac{\partial \mathbf{w}}{\partial \theta}+\frac{\partial f(\mathbf{w}, \theta)}{\partial \theta}\right]
\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.450015em;"></span><span class="strut bottom" style="height:2.40003em;vertical-align:-0.9500149999999999em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.000015000000000098268em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbb">E</span></span><span class="vlist"><span style="top:0.18019999999999992em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord scriptstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mclose">]</span><span class="mrel">=</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbb">E</span></span><span class="vlist"><span style="top:0.18019999999999992em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord mathit">ϵ</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="mord textstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mbin">+</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size3">]</span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>利用这一点可以得到 (1) 的蒙特卡罗近似:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mrow><mi mathvariant="script">F</mi></mrow><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mo separator="true">,</mo><mi>θ</mi><mo>)</mo><mo>≈</mo><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>n</mi></mrow></msubsup><mi>log</mi><mi>q</mi><mrow><mo fence="true">(</mo><msup><mrow><mi mathvariant="bold">w</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow></msup><mi mathvariant="normal">∣</mi><mi>θ</mi><mo fence="true">)</mo></mrow><mo>−</mo><mi>log</mi><mi>P</mi><mrow><mo fence="true">(</mo><msup><mrow><mi mathvariant="bold">w</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow></msup><mo fence="true">)</mo></mrow><mo>−</mo><mi>log</mi><mi>P</mi><mrow><mo fence="true">(</mo><mrow><mi mathvariant="script">D</mi></mrow><mi mathvariant="normal">∣</mi><msup><mrow><mi mathvariant="bold">w</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow></msup><mo fence="true">)</mo></mrow></mrow></mtd></mtr></mtable><mtext> </mtext><mtext> </mtext><mtext> </mtext><mtext> </mtext><mtext> </mtext><mo>(</mo><mn>2</mn><mo>)</mo></mrow><annotation encoding="application/x-tex">\begin{aligned}
\mathcal{F}(\mathcal{D}, \theta) \approx \sum_{i=1}^{n} \log q\left(\mathbf{w}^{(i)} | \theta\right)-\log P\left(\mathbf{w}^{(i)}\right) -\log P\left(\mathcal{D} | \mathbf{w}^{(i)}\right)
\end{aligned}\ \ \ \ \ (2)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.714533em;"></span><span class="strut bottom" style="height:2.929066em;vertical-align:-1.214533em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.06313599999999986em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.09931em;">F</span></span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mrel">≈</span><span class="mop op-limits"><span class="vlist"><span style="top:1.1776689999999999em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mrel">=</span><span class="mord mathrm">1</span></span></span></span><span style="top:-0.000005000000000143778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="op-symbol large-op mop">∑</span></span></span><span style="top:-1.2500050000000003em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathit">n</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size2">(</span></span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathit">i</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size2">)</span></span></span><span class="mbin">−</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size2">(</span></span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathit">i</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size2">)</span></span></span><span class="mbin">−</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size2">(</span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mord mathrm">∣</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathit">i</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size2">)</span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span></span></span><span class="mord mspace"> </span><span class="mord mspace"> </span><span class="mord mspace"> </span><span class="mord mspace"> </span><span class="mord mspace"> </span><span class="mopen">(</span><span class="mord mathrm">2</span><span class="mclose">)</span></span></span></span></span>
<p>其中 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mrow><mi mathvariant="bold">w</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow></msup></mrow><annotation encoding="application/x-tex">\mathbf{w}^{(i)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8879999999999999em;"></span><span class="strut bottom" style="height:0.8879999999999999em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class=""><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathit">i</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span> 是处理第 i 个数据点时的权重采样。</p>
<p>[1] 中提出的这个近似把 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="script">F</mi></mrow><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">\mathcal{F}(\mathcal{D}|\theta)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.09931em;">F</span></span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span> 的 KL 项也给蒙特卡罗了,而其实对于很多先验形式这个 KL 项是可以有解析解的。[1] 这么做的理由是为了配适更复杂的先验/后验形式。另一篇文章 只考虑高斯先验,于是在同样的证据下界中取了 KL 项的解析解。实践中可以根据使用的先验不同来取不同的近似。</p>
<h2 id="贝叶斯小批梯度下降"><a href="#贝叶斯小批梯度下降" class="headerlink" title="贝叶斯小批梯度下降"></a>贝叶斯小批梯度下降</h2><p>(1) 中的目标函数及 (2) 中的近似都是模型在整个数据集上的下界。实践中的现代炼丹都是采用的小批梯度下降(mini-batch gradient descent),所以需要相应地缩放复杂性代价。假设整个数据集被分为 M 批,最简单的形式就是对每个小批作平均:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><msubsup><mrow><mi mathvariant="script">F</mi></mrow><mrow><mi>i</mi></mrow><mrow><mrow><mi mathvariant="normal">E</mi><mi mathvariant="normal">Q</mi></mrow></mrow></msubsup><mrow><mo fence="true">(</mo><msub><mrow><mi mathvariant="script">D</mi></mrow><mrow><mi>i</mi></mrow></msub><mo separator="true">,</mo><mi>θ</mi><mo fence="true">)</mo></mrow><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>M</mi></mrow></mfrac><mrow><mi mathvariant="normal">K</mi><mi mathvariant="normal">L</mi></mrow><mo>[</mo><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo><mi mathvariant="normal">∥</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mo>]</mo><mo>−</mo><msub><mrow><mi mathvariant="double-struck">E</mi></mrow><mrow><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo></mrow></msub><mrow><mo fence="true">[</mo><mi>log</mi><mi>P</mi><mrow><mo fence="true">(</mo><msub><mrow><mi mathvariant="script">D</mi></mrow><mrow><mi>i</mi></mrow></msub><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="bold">w</mi></mrow><mo fence="true">)</mo></mrow><mo fence="true">]</mo></mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}
\mathcal{F}_{i}^{\mathrm{EQ}}\left(\mathcal{D}_{i}, \theta\right)=\frac{1}{M} \mathrm{KL}[q(\mathbf{w} | \theta) \| P(\mathbf{w})] -\mathbb{E}_{q(\mathbf{w} | \theta)}\left[\log P\left(\mathcal{D}_{i} | \mathbf{w}\right)\right]
\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.25372em;"></span><span class="strut bottom" style="height:2.00744em;vertical-align:-0.7537200000000001em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:0.06772em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.09931em;">F</span></span><span class="vlist"><span style="top:0.276864em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span style="top:-0.4809079999999999em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">E</span><span class="mord mathrm">Q</span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">(</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">)</span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit" style="margin-right:0.10903em;">M</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathrm">K</span><span class="mord mathrm">L</span></span><span class="mopen">[</span><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mord mathrm">∥</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mclose">]</span><span class="mbin">−</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbb">E</span></span><span class="vlist"><span style="top:0.18019999999999992em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord scriptstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">[</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">(</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">)</span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">]</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>这样可以使得 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mo>∑</mo><mrow><mi>i</mi></mrow></msub><msubsup><mrow><mi mathvariant="script">F</mi></mrow><mrow><mi>i</mi></mrow><mrow><mrow><mi mathvariant="normal">E</mi><mi mathvariant="normal">Q</mi></mrow></mrow></msubsup><mrow><mo fence="true">(</mo><msub><mrow><mi mathvariant="script">D</mi></mrow><mrow><mi>i</mi></mrow></msub><mo separator="true">,</mo><mi>θ</mi><mo fence="true">)</mo></mrow><mo>=</mo><mrow><mi mathvariant="script">F</mi></mrow><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mo separator="true">,</mo><mi>θ</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">\sum_{i} \mathcal{F}_{i}^{\mathrm{EQ}}\left(\mathcal{D}_{i}, \theta\right)=\mathcal{F}(\mathcal{D}, \theta)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.959239em;"></span><span class="strut bottom" style="height:1.259249em;vertical-align:-0.30001em;"></span><span class="base textstyle uncramped"><span class="mop"><span class="op-symbol small-op mop" style="top:-0.0000050000000000050004em;">∑</span><span class="vlist"><span style="top:0.30001em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class=""><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.09931em;">F</span></span><span class="vlist"><span style="top:0.276864em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span style="top:-0.480908em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">E</span><span class="mord mathrm">Q</span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="minner textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">(</span><span class=""><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">)</span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.09931em;">F</span></span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span> 成立。在此基础上 还提出了另一种缩放:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><msubsup><mrow><mi mathvariant="script">F</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>π</mi></mrow></msubsup><mrow><mo fence="true">(</mo><msub><mrow><mi mathvariant="script">D</mi></mrow><mrow><mi>i</mi></mrow></msub><mo separator="true">,</mo><mi>θ</mi><mo fence="true">)</mo></mrow><mo>=</mo><msub><mi>π</mi><mrow><mi>i</mi></mrow></msub><mrow><mi mathvariant="normal">K</mi><mi mathvariant="normal">L</mi></mrow><mo>[</mo><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo><mi mathvariant="normal">∥</mi><mi>P</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mo>)</mo><mo>]</mo><mo>−</mo><msub><mrow><mi mathvariant="double-struck">E</mi></mrow><mrow><mi>q</mi><mo>(</mo><mrow><mi mathvariant="bold">w</mi></mrow><mi mathvariant="normal">∣</mi><mi>θ</mi><mo>)</mo></mrow></msub></mrow></mtd><mtd><mrow><mrow></mrow><mrow><mo fence="true">[</mo><mi>log</mi><mi>P</mi><mrow><mo fence="true">(</mo><msub><mrow><mi mathvariant="script">D</mi></mrow><mrow><mi>i</mi></mrow></msub><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="bold">w</mi></mrow><mo fence="true">)</mo></mrow><mo fence="true">]</mo></mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}
\mathcal{F}_{i}^{\pi}\left(\mathcal{D}_{i}, \theta\right)=\pi_{i} \mathrm{KL}[q(\mathbf{w} | \theta)\| P(\mathbf{w})] -\mathbb{E}_{q(\mathbf{w} | \theta)} &\left[\log P\left(\mathcal{D}_{i} | \mathbf{w}\right)\right]
\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8500000000000001em;"></span><span class="strut bottom" style="height:1.2000000000000002em;vertical-align:-0.35000000000000003em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.010000000000000009em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.09931em;">F</span></span><span class="vlist"><span style="top:0.247em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span style="top:-0.4129999999999999em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">π</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">(</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">)</span></span><span class="mrel">=</span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">π</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03588em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathrm">K</span><span class="mord mathrm">L</span></span><span class="mopen">[</span><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mord mathrm">∥</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mclose">)</span><span class="mclose">]</span><span class="mbin">−</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathbb">E</span></span><span class="vlist"><span style="top:0.18019999999999992em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord scriptstyle cramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-0.010000000000000009em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">[</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">(</span><span class=""><span class="mord displaystyle textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">w</span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">)</span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">]</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>只要取 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>π</mi><mo>∈</mo><mo>[</mo><mn>0</mn><mo separator="true">,</mo><mn>1</mn><msup><mo>]</mo><mrow><mi>M</mi></mrow></msup></mrow><annotation encoding="application/x-tex">\pi \in[0,1]^{M}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8413309999999999em;"></span><span class="strut bottom" style="height:1.0913309999999998em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">π</span><span class="mrel">∈</span><span class="mopen">[</span><span class="mord mathrm">0</span><span class="mpunct">,</span><span class="mord mathrm">1</span><span class="mclose"><span class="mclose">]</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">M</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span> 并保证 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>M</mi></mrow></msubsup><msub><mi>π</mi><mrow><mi>i</mi></mrow></msub><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">\sum_{i=1}^{M} \pi_{i}=1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8423309999999999em;"></span><span class="strut bottom" style="height:1.142341em;vertical-align:-0.30001em;"></span><span class="base textstyle uncramped"><span class="mop"><span class="op-symbol small-op mop" style="top:-0.0000050000000000050004em;">∑</span><span class="vlist"><span style="top:0.30001em;margin-left:0em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mrel">=</span><span class="mord mathrm">1</span></span></span></span><span style="top:-0.364em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">M</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">π</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03588em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mord mathrm">1</span></span></span></span> ,那么 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>M</mi></mrow></msubsup><msubsup><mrow><mi mathvariant="script">F</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>π</mi></mrow></msubsup><mrow><mo fence="true">(</mo><msub><mrow><mi mathvariant="script">D</mi></mrow><mrow><mi>i</mi></mrow></msub><mo separator="true">,</mo><mi>θ</mi><mo fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">\sum_{i=1}^{M} \mathcal{F}_{i}^{\pi}\left(\mathcal{D}_{i}, \theta\right)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8423309999999999em;"></span><span class="strut bottom" style="height:1.142341em;vertical-align:-0.30001em;"></span><span class="base textstyle uncramped"><span class="mop"><span class="op-symbol small-op mop" style="top:-0.0000050000000000050004em;">∑</span><span class="vlist"><span style="top:0.30001em;margin-left:0em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mrel">=</span><span class="mord mathrm">1</span></span></span></span><span style="top:-0.364em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">M</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class=""><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.09931em;">F</span></span><span class="vlist"><span style="top:0.258664em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">π</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="minner textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">(</span><span class=""><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">)</span></span></span></span></span> 就是 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="script">F</mi></mrow><mo>(</mo><mrow><mi mathvariant="script">D</mi></mrow><mo separator="true">,</mo><mi>θ</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">\mathcal{F}(\mathcal{D}, \theta)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.09931em;">F</span></span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span> 的无偏估计。特别地,取 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>π</mi><mrow><mi>i</mi></mrow></msub><mo>=</mo><mfrac><mrow><msup><mn>2</mn><mrow><mi>M</mi><mo>−</mo><mi>i</mi></mrow></msup></mrow><mrow><msup><mn>2</mn><mrow><mi>M</mi></mrow></msup><mo>−</mo><mn>1</mn></mrow></mfrac></mrow><annotation encoding="application/x-tex">\pi_{i}=\frac{2^{M-i}}{2^{M}-1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.9897649999999999em;"></span><span class="strut bottom" style="height:1.4020609999999998em;vertical-align:-0.4122959999999999em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">π</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03588em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mord reset-textstyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.35396499999999986em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord"><span class="mord mathrm">2</span><span class="vlist"><span style="top:-0.289em;margin-right:0.07142857142857144em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-scriptstyle scriptscriptstyle cramped"><span class="mord scriptscriptstyle cramped"><span class="mord mathit" style="margin-right:0.10903em;">M</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mbin">−</span><span class="mord mathrm">1</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.394em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="vlist"><span style="top:-0.363em;margin-right:0.07142857142857144em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-scriptstyle scriptscriptstyle uncramped"><span class="mord scriptscriptstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">M</span><span class="mbin">−</span><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span> ,可以使在一轮(epoch)训练的初期着重于契合先验、后期着重于拟合数据,带来(玄学的)性能提升。</p>
<h2 id="局部重参数化"><a href="#局部重参数化" class="headerlink" title="局部重参数化"></a>局部重参数化</h2><p>至此为止的在神经网络权重中引入的不确定性可以看作是全局的(global)不确定性。在神经网络中引入全局不确定性意味着在推理计算(inference)过程中要对全局所有参数进行采样操作,这个代价其实要比想象中高昂——比如一个 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn><mo>×</mo><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn></mrow><annotation encoding="application/x-tex">1000\times1000</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mbin">×</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span></span></span></span> 的全连接层(fully connected layer),对于 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>M</mi><mo>×</mo><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn></mrow><annotation encoding="application/x-tex">M\times1000</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.76666em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">M</span><span class="mbin">×</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span></span></span></span> 的输入需要 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>M</mi><mo>×</mo><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn><mo>×</mo><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn></mrow><annotation encoding="application/x-tex">M\times1000\times1000</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.76666em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">M</span><span class="mbin">×</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mbin">×</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span></span></span></span> 个不同的采样,并且更致命的是,一般的神经网络中这样的全连接层,由于参数是同一个矩阵,可以转换为一个 (M,1000) 矩阵和 (1000,1000) 矩阵之间的乘法;而引入了不确定性后,需要采样 M 组不同的 (1000,1000) 参数,进行 M 次 (1,1000) 与 (1000,1000) 的矩阵乘法,对于一般的矩阵并行库而言这是两件完全不同的事情。</p>
<p>针对这个问题,<sup id="fnref:3"><a href="#fn:3" rel="footnote"><span class="hint--top hint--error hint--medium hint--rounded hint--bounce" aria-label="https://arxiv.org/abs/1506.02557">[3]</span></a></sup> 观察到,如果所有参数都是独立高斯分布,那么进行矩阵乘法后的结果也都会是独立高斯分布。也就是说,对于 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">Y</mi></mrow><mo>=</mo><mrow><mi mathvariant="bold">X</mi></mrow><mrow><mi mathvariant="bold">W</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{Y}=\mathbf{X}\mathbf{W}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68611em;"></span><span class="strut bottom" style="height:0.68611em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.02875em;">Y</span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathbf">X</span></span><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">W</span></span></span></span></span> ,若有</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>q</mi><mrow><mo fence="true">(</mo><msub><mi>w</mi><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mo fence="true">)</mo></mrow><mo>=</mo><mi>N</mi><mrow><mo fence="true">(</mo><msub><mi>μ</mi><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mo separator="true">,</mo><msubsup><mi>σ</mi><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow><mrow><mn>2</mn></mrow></msubsup><mo fence="true">)</mo></mrow><mi mathvariant="normal">∀</mi><msub><mi>w</mi><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mo>∈</mo><mrow><mi mathvariant="bold">W</mi></mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}
q\left(w_{i, j}\right)=N\left(\mu_{i, j}, \sigma_{i, j}^{2}\right) \forall w_{i, j} \in \mathbf{W}
\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8736079999999999em;"></span><span class="strut bottom" style="height:1.2472159999999999em;vertical-align:-0.37360800000000005em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.009499999999999953em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">(</span><span class="mord"><span class="mord mathit" style="margin-right:0.02691em;">w</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.02691em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">)</span></span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.10903em;">N</span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size1">(</span></span><span class="mord"><span class="mord mathit">μ</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">σ</span><span class="vlist"><span style="top:0.24700000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span style="top:-0.41300000000000003em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">2</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;"><span class="delimsizing size1">)</span></span></span><span class="mord mathrm">∀</span><span class="mord"><span class="mord mathit" style="margin-right:0.02691em;">w</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.02691em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">∈</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">W</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>那么对于 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">Y</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{Y}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68611em;"></span><span class="strut bottom" style="height:0.68611em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.02875em;">Y</span></span></span></span></span> 就会有</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>q</mi><mrow><mo fence="true">(</mo><msub><mi>y</mi><mrow><mi>m</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mi mathvariant="normal">∣</mi><mrow><mi mathvariant="bold">X</mi></mrow><mo fence="true">)</mo></mrow><mo>=</mo><mi>N</mi><mrow><mo fence="true">(</mo><msub><mi>γ</mi><mrow><mi>m</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mo separator="true">,</mo><msub><mi>δ</mi><mrow><mi>m</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mo fence="true">)</mo></mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}
q\left(y_{m, j} | \mathbf{X}\right)=N\left(\gamma_{m, j}, \delta_{m, j}\right)
\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8500000000000001em;"></span><span class="strut bottom" style="height:1.2000000000000002em;vertical-align:-0.35000000000000003em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.010000000000000009em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">q</span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">(</span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">y</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03588em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">m</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mord mathbf">X</span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">)</span></span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.10903em;">N</span><span class="minner displaystyle textstyle uncramped"><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">(</span><span class="mord"><span class="mord mathit" style="margin-right:0.05556em;">γ</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.05556em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">m</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathit" style="margin-right:0.03785em;">δ</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03785em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">m</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="style-wrap reset-textstyle textstyle uncramped" style="top:0em;">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>其中 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>γ</mi><mrow><mi>m</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mo>=</mo><msub><mo>∑</mo><mrow><mi>i</mi></mrow></msub><msub><mi>x</mi><mrow><mi>m</mi><mo separator="true">,</mo><mi>i</mi></mrow></msub><msub><mi>μ</mi><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub></mrow><annotation encoding="application/x-tex">\gamma_{m, j}=\sum_{i} x_{m, i} \mu_{i, j}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1.0500099999999999em;vertical-align:-0.30001em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.05556em;">γ</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.05556em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">m</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mop"><span class="op-symbol small-op mop" style="top:-0.0000050000000000050004em;">∑</span><span class="vlist"><span style="top:0.30001em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">m</span><span class="mpunct">,</span><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit">μ</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span> 且 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>δ</mi><mrow><mi>m</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mo>=</mo><msub><mo>∑</mo><mrow><mi>i</mi></mrow></msub><msubsup><mi>x</mi><mrow><mi>m</mi><mo separator="true">,</mo><mi>i</mi></mrow><mrow><mn>2</mn></mrow></msubsup><msubsup><mi>σ</mi><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow><mrow><mn>2</mn></mrow></msubsup></mrow><annotation encoding="application/x-tex">\delta_{m, j}=\sum_{i} x_{m, i}^{2} \sigma_{i, j}^{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8141079999999999em;"></span><span class="strut bottom" style="height:1.20888em;vertical-align:-0.394772em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03785em;">δ</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03785em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">m</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mop"><span class="op-symbol small-op mop" style="top:-0.0000050000000000050004em;">∑</span><span class="vlist"><span style="top:0.30001em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.258664em;margin-left:0em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">m</span><span class="mpunct">,</span><span class="mord mathit">i</span></span></span></span><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">2</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">σ</span><span class="vlist"><span style="top:0.258664em;margin-left:-0.03588em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">2</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span> 。</p>
<p>有了这个结论,我们就没有必要每次都采样参数 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">W</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{W}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68611em;"></span><span class="strut bottom" style="height:0.68611em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">W</span></span></span></span></span> 了,可以直接计算出结果 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">Y</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{Y}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68611em;"></span><span class="strut bottom" style="height:0.68611em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.02875em;">Y</span></span></span></span></span> 的均值和方差进行采样,然后反向传播到 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mi mathvariant="bold">W</mi></mrow></mrow><annotation encoding="application/x-tex">\mathbf{W}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68611em;"></span><span class="strut bottom" style="height:0.68611em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathbf" style="margin-right:0.01597em;">W</span></span></span></span></span> 上。这样每次计算进行的采样都是相应数据点的局部(local)采样, 将这个技巧称为局部重参数化(local reparameterization)。</p>
<div id="footnotes"><hr><div id="footnotelist"><ol style="list-style: none; padding-left: 0; margin-left: 40px"><li id="fn:1"><span style="display: inline-block; vertical-align: top; padding-right: 10px; margin-left: -40px">1.</span><span style="display: inline-block; vertical-align: top; margin-left: 10px;"><a href="https://arxiv.org/abs/1505.05424" target="_blank" rel="external">https://arxiv.org/abs/1505.05424</a><a href="#fnref:1" rev="footnote"> ↩</a></span></li><li id="fn:2"><span style="display: inline-block; vertical-align: top; padding-right: 10px; margin-left: -40px">2.</span><span style="display: inline-block; vertical-align: top; margin-left: 10px;"><a href="https://arxiv.org/abs/1312.6114" target="_blank" rel="external">https://arxiv.org/abs/1312.6114</a><a href="#fnref:2" rev="footnote"> ↩</a></span></li><li id="fn:3"><span style="display: inline-block; vertical-align: top; padding-right: 10px; margin-left: -40px">3.</span><span style="display: inline-block; vertical-align: top; margin-left: 10px;"><a href="https://arxiv.org/abs/1506.02557" target="_blank" rel="external">https://arxiv.org/abs/1506.02557</a><a href="#fnref:3" rev="footnote"> ↩</a></span></li></ol></div></div>
</div>
</article>
</li>
<li>
<article class='ListView'>
<header class="title">
<h1>
<a href="/2018/01/13/Windows 输入法的 metro 应用兼容性改造/">〔吐槽向〕Windows 输入法的 metro 应用兼容性改造</a>
</h1>
<div class='ListMeta'>
<time datetime="2018-01-13T13:51:55.000Z" itemprop="datePublished">
2018-01-13
</time>
|
<ul>
<li class="meta-text">
{ <a href="/tags/input-method/">input method</a> }
</li>
<li class="meta-text">
{ <a href="/tags/windows/">windows</a> }
</li>
<li class="meta-text">
{ <a href="/tags/named-pipe/">named pipe</a> }
</li>
<li class="meta-text">
{ <a href="/tags/interprocess/">interprocess</a> }
</li>
</ul>
</div>
</header>
<div>
<h2 id="TL-DR"><a href="#TL-DR" class="headerlink" title="TL;DR"></a>TL;DR</h2><p>微软真是坑爹不允许在 UWP 里用共享内存结果小狼毫就挂了,改成用 Windows 的 <code>named pipe</code> 来做跨进程交互后就可以用了。</p>
<h2 id="缘由"><a href="#缘由" class="headerlink" title="缘由"></a>缘由</h2><p>重新捡起五笔后,一直苦恼没有一个合用的五笔输入法。身为半个初学者,很看中的一个功能就是五笔反查——总不能每个不会的字都专门打开一个反查工具;而 Windows 自带的五笔只有一个并不好用的五笔拼音混输功能。各种老牌的五笔输入法要么面界太丑要么久不更新,更有甚者会摆出一幅流氓作派。</p>
<p>几年之前试水过 RIME 输入法,知道它是一个很强大的输入法,可以满足各种定制需求。但是当初的我并没有开发能力,RIME 的定制又稍显复杂,于是并没有继续下去。现在有了需求后回头看,发现 RIME 无疑是最适合现在的我的。</p>
<p>但是现在要在 Windows 下使用 RIME 却有了个当初没有的新问题——兼容性。作为第一批 Windows 10 用户,无疑需要一个能够在 UWP 应用下正常使用的输入法。但是 RIME 最初的 Windows 前端「小狼毫」却对从 Windows 8 开始的 Metro 应用——包括 Windows 10 的 UWP 都不兼容。原作者也因为无暇分身而<a href="https://github.com/rime/home/issues/25">放弃了「小狼毫」的维护</a>;其它开发者实现的其它前端又存在各种不稳定的问题——比如 RIME 吧的用户将 RIME 移植到了一个叫作 <a href="https://github.com/EasyIME/PIME">PIME</a> 的输入法框架下,这个框架的界面实现非常的搓,除此之外它的输入服务实现也很蛋疼,经常崩溃,也没有可靠的异常处理,很多时候需要手动重启输入服务。</p>
<p>开源界的一大准则就是「你行你上」和「show me the code」。当年使用小狼毫的时候没有经历过什么异常崩溃,说明它在这方面的设计是十分优秀的;现在的主要问题也就是不兼容 Metro 应用而已。既然现在自己有了开发能力,不如自已维护一下,方便自己,也方便他人。</p>
<div class="more-link">
<a href="/2018/01/13/Windows 输入法的 metro 应用兼容性改造/#more">Read On »</a>
</div>
</div>
</article>
</li>
<li>
<article class='ListView'>
<header class="title">
<h1>
<a href="/2018/01/06/Miko-Geisha-的半残编译器/">Miko: Geisha 的半残编译器</a>
</h1>
<div class='ListMeta'>
<time datetime="2018-01-06T07:13:23.000Z" itemprop="datePublished">
2018-01-06
</time>
|
<ul>
<li class="meta-text">
{ <a href="/tags/programming-language/">programming language</a> }
</li>
<li class="meta-text">
{ <a href="/tags/compiler/">compiler</a> }
</li>
<li class="meta-text">
{ <a href="/tags/rust/">rust</a> }
</li>
</ul>
</div>
</header>
<div>
<p>人的命运就是这么不可预料,最后 Geisha 的编译器实现居然是作为学校的课程设计在考研期间抽空肝出来的。</p>
<p>因为各种因素,最后交上去的结果和预想的其实差了挺远——比如没来得及实现 GC 所以闭包是残缺的;比如自定义类型根本就没有。不过总归最后做成了一个有结果的东西(虽然是半残的)。</p>
<p>这篇是暑假期间作为课程报告交上去的文档。至于后续的工作,虽然现在已经考完研了但是突然又多出了一堆事情,只能看看什么时候有空能继续完善了……</p>
<p>代码当然也<a href="https://github.com/geisha-lang/miko">在 GitHub 上</a>,只不过 star 都点在了之前那个 Haskell 写的前端 demo 上,这个就无人问津 emmmm</p>
<h2 id="整体设计"><a href="#整体设计" class="headerlink" title="整体设计"></a>整体设计</h2><p>利用了 <a href="https://github.com/kevinmehall/rust-peg">rust-peg</a> 生成语法分析程序,以 LLVM 为代码生成后端。</p>
<div class="more-link">
<a href="/2018/01/06/Miko-Geisha-的半残编译器/#more">Read On »</a>
</div>
</div>
</article>
</li>
<li>
<article class='ListView'>
<header class="title">
<h1>
<a href="/2017/09/04/如何编译函数闭包/">如何编译函数闭包</a>
</h1>
<div class='ListMeta'>
<time datetime="2017-09-04T08:16:53.000Z" itemprop="datePublished">
2017-09-04
</time>
|
<ul>
<li class="meta-text">
{ <a href="/tags/functional-programming/">functional programming</a> }
</li>
<li class="meta-text">
{ <a href="/tags/compile/">compile</a> }
</li>
</ul>
</div>
</header>
<div>
<p>// 知乎不仅不给我开专栏还说我这篇文章zz敏感,破网站吃枣药丸</p>
<blockquote>
<p> <strong>闭包</strong>(英语:Closure),又称<strong>词法闭包</strong>(Lexical Closure)或<strong>函数闭包</strong>(function closures),是引用了自由变量的函数。这个被引用的自由变量将和这个函数一同存在,即使已经离开了创造它的环境也不例外。所以,有另一种说法认为闭包是由函数和与其相关的引用环境组合而成的实体。闭包在运行时可以有多个实例,不同的引用环境和相同的函数组合可以产生不同的实例。</p>
</blockquote>
<p>想要实现一个同时支持词法作用域与 first-class function 的编程语言,一个重要的转换就是所谓的 lambda-lifting:把嵌套函数定义(包括匿名函数)转换为独立的全局函数定义;把自由变量的引用转换为参数传递。</p>
<div class="more-link">
<a href="/2017/09/04/如何编译函数闭包/#more">Read On »</a>
</div>
</div>
</article>
</li>
<li>
<article class='ListView'>
<header class="title">
<h1>
<a href="/2017/05/11/plagiarism-detection/">基于 Hash 与 Winnowing 方法的文档查重</a>
</h1>
<div class='ListMeta'>
<time datetime="2017-05-11T15:20:08.000Z" itemprop="datePublished">
2017-05-11
</time>
</div>
</header>
<div>
<blockquote>
<p>这只是一篇实验报告</p>
</blockquote>
<h2 id="设计思路"><a href="#设计思路" class="headerlink" title="设计思路"></a>设计思路</h2><p>基于现实计算能力考虑,许多大段文档之间的相似性比较不可能使用传统的文本 diff 算法等耗时长的方法。Hash 值可以在一定程度上反应数据的特征;但是一般的 Hash 方法强调避免碰撞,源数据的少许改变就可以引起 Hash 值的较大变化。对于查重来说,需要提取出文档的特征,这个特征在源数据相似时也应具有相似性。</p>
<p>Winnowing 方法是 Saul Schleimer 等提出的提取文档特征(文档指纹)的方法。通过对文档的 K-gram 序列进行 hash ,提取出能够反应文档相似性的特征值序列,再对这个特征值序列进行比较[1],得到的相同特征值的比例即可反映出文档之间的相似性。</p>
<div class="more-link">
<a href="/2017/05/11/plagiarism-detection/#more">Read On »</a>
</div>
</div>
</article>
</li>
<li>
<article class='ListView'>
<header class="title">
<h1>
<a href="/2017/04/09/Windows-LLVM-环境的二三事/">Windows LLVM 环境的二三事</a>
</h1>
<div class='ListMeta'>
<time datetime="2017-04-08T17:44:05.000Z" itemprop="datePublished">
2017-04-09
</time>
|
<ul>
<li class="meta-text">
{ <a href="/tags/llvm/">llvm</a> }
</li>
<li class="meta-text">
{ <a href="/tags/compile/">compile</a> }
</li>
</ul>
</div>
</header>
<div>
<link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css"><p>讲道理,考研党被这种配环境的事情打扰了咸鱼的生活是很不开心的。</p>
<p>起因是 Haskell 的 LLVM binding —— <code>llvm-general</code> 在 Hackage 上只支持到 3.5 版本。当然 repo 里是有后续版本的,但是作者在 issues 里说后续的版本并没有弄好。</p>
<p>既然如此那就装 3.5 吧。然而下载了 LLVM 3.5.2 的源码后,Cmake 很顺利,用 VS 2017 打开也很顺利,然而编译的时候报错</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><div class="line">这里假装有编译 log (好吧忘记存了)</div></pre></td></tr></table></figure>
<p>总之就是某个类的方法后有 const 限定符,然后调用的上下文却又不是返回 const 于是不给过。</p>
<p>奇怪的是报错的位置是 VS 提供的标准库文件。这就很尴尬了。我要么改 VS 的标准库要么研究一波 LLVM 源码。</p>
<p>在 WSL 里试了一下,无痛编译。看来是 VS 有什么蛋疼的限制。网上稍微搜了一圈也没发现有类似情况(毕竟 3.5 的年代他们还没官方支持 VS 编译吧)。于是决定上 MinGW 来构建。</p>
<p>使用 CMake 生成 MinGW 的 makefile 并没有问题,然而在 make 的时候提示命令语法错误</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><div class="line">命令语法不正确。</div><div class="line">mingw32-make.exe[2]: *** [tools\lto\CMakeFiles\LTO_exports.dir\build.make:61: tools/lto/LTO.def] Error 1</div><div class="line">mingw32-make.exe[2]: *** Deleting file 'tools/lto/LTO.def'</div><div class="line">mingw32-make.exe[1]: *** [CMakeFiles\Makefile2:9977: tools/lto/CMakeFiles/LTO_exports.dir/all] Error 2</div><div class="line">mingw32-make.exe: *** [Makefile:151: all] Error 2</div></pre></td></tr></table></figure>
<p>找到了这个 build.make 发现这一行的命令很奇怪</p>
<figure class="highlight powershell"><table><tr><td class="code"><pre><div class="line">cd /d C:\Users\hcyue\code\llvm-<span class="number">3.5</span>.<span class="number">2</span>.build\tools\lto && <span class="string">"C:\Program Files\CMake\bin\cmake.exe"</span> -E echo EXPORTS > LTO.def</div><div class="line">cd /d C:\Users\hcyue\code\llvm-<span class="number">3.5</span>.<span class="number">2</span>.build\tools\lto && type C:/Users/hcyue/code/llvm-<span class="number">3.5</span>.<span class="number">2</span>.src/tools/lto/lto.exports >> LTO.def</div></pre></td></tr></table></figure>
<p><del>纵横 Windows 这么多年没见过 cd 还可以带个 <code>/d</code> 参数的</del> 似乎 powershell 的 cd 并不支持 /d 参数。那把它删了吧。</p>
<p>之后继续报错,link 的时候找不到 symbol。发现是刚刚改的那两行生成的目标文件 <code>LTO.def</code> 有问题,不知道为什么没有写入成正常的文本文件,而是二进制文件(VS code 打开提示文件过大或为二进制文件)。直接 <code>type</code> 出来是一个奇怪的、字距拉得很开的、看起来像是文本文件的格式。很气。</p>
<p>然而既然 <code>.def</code> 反正是文本文件,我大不了手动构建一下 <code>def</code> 。于是手动运行命令把 <code>lto.exports</code> 里的东西 <code>type</code> 出来</p>
<figure class="highlight powershell"><table><tr><td class="code"><pre><div class="line">type C:/Users/hcyue/code/llvm-<span class="number">3.5</span>.<span class="number">2</span>.src/tools/lto/lto.exports</div></pre></td></tr></table></figure>
<p>得到一大堆符号名</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><div class="line">lto_get_error_message</div><div class="line">lto_get_version</div><div class="line">lto_initialize_disassembler</div><div class="line">lto_module_create</div><div class="line">lto_module_create_from_fd</div><div class="line">lto_module_create_from_fd_at_offset</div><div class="line">lto_module_create_from_memory</div><div class="line"></div><div class="line">(……一大堆 lto 符号名)</div><div class="line"></div><div class="line">LLVMCreateDisasm</div><div class="line">LLVMCreateDisasmCPU</div><div class="line">LLVMDisasmDispose</div><div class="line">LLVMDisasmInstruction</div><div class="line">LLVMSetDisasmOptions</div></pre></td></tr></table></figure>
<p>新建个 <code>LTO.def</code> 把原来的覆盖掉,贴进去。</p>
<p>重新运行 <code>cmake --build .</code> ,终于过了。</p>
</div>
</article>
</li>
<li>
<article class='ListView'>
<header class="title">
<h1>
<a href="/2017/03/03/使用-Parsec-处理左递归/">使用 Parsec 处理左递归</a>
</h1>
<div class='ListMeta'>
<time datetime="2017-03-03T12:41:36.000Z" itemprop="datePublished">
2017-03-03
</time>
|
<ul>
<li class="meta-text">
{ <a href="/tags/haskell/">haskell</a> }
</li>
<li class="meta-text">
{ <a href="/tags/compile/">compile</a> }
</li>
<li class="meta-text">
{ <a href="/tags/parsec/">parsec</a> }
</li>
</ul>
</div>
</header>
<div>
<p>在给之前写的 Lisp 解释器之前套上表达式语法时,遇到这样几条文法</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>E</mi><mi>x</mi><mi>p</mi><mi>r</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>F</mi><mi>a</mi><mi>c</mi><mi>t</mi><mi>o</mi><mi>r</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr><mtr><mtd><mrow><mi>E</mi><mi>x</mi><mi>p</mi><mi>r</mi><mi>s</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>E</mi><mi>x</mi><mi>p</mi><mi>r</mi><mo separator="true">,</mo><mi>E</mi><mi>x</mi><mi>p</mi><mi>r</mi><mi>s</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr><mtr><mtd><mrow><mi>F</mi><mi>a</mi><mi>c</mi><mi>t</mi><mi>o</mi><mi>r</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>I</mi><mi>n</mi><mi>t</mi><mi>e</mi><mi>g</mi><mi>e</mi><mi>r</mi><mi mathvariant="normal">∣</mi><mi>A</mi><mi>p</mi><mi>p</mi><mi>l</mi><mi>y</mi><mi mathvariant="normal">∣</mi><mi>I</mi><mi>d</mi><mi>e</mi><mi>n</mi><mi>t</mi><mi>i</mi><mi>f</mi><mi>y</mi><mi mathvariant="normal">∣</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">∣</mi><mrow><mo>(</mo></mrow><mrow><mi>E</mi><mi>x</mi><mi>p</mi><mi>r</mi></mrow><mrow><mo>)</mo></mrow></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr><mtr><mtd><mrow><mi>I</mi><mi>n</mi><mi>t</mi><mi>e</mi><mi>g</mi><mi>e</mi><mi>r</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr><mtr><mtd><mrow><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr><mtr><mtd><mrow><mi>A</mi><mi>p</mi><mi>p</mi><mi>l</mi><mi>y</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>F</mi><mi>a</mi><mi>c</mi><mi>t</mi><mi>o</mi><mi>r</mi><mo>(</mo><mi>E</mi><mi>x</mi><mi>p</mi><mi>r</mi><mi>s</mi><mo>)</mo></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}
Expr & \rightarrow Factor ... \\\\
Exprs & \rightarrow Expr , Exprs\\\\
Factor & \rightarrow Integer|Apply|Identify|...|{(} {Expr} {)} \\\\
Integer & \rightarrow... \\\\
...\\\\
Apply & \rightarrow Factor ( Exprs )
\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:6.849999999999999em;"></span><span class="strut bottom" style="height:13.2em;vertical-align:-6.35em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-6.009999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="mord mathit">x</span><span class="mord mathit">p</span><span class="mord mathit" style="margin-right:0.02778em;">r</span></span></span><span style="top:-4.809999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span style="top:-3.609999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="mord mathit">x</span><span class="mord mathit">p</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mord mathit">s</span></span></span><span style="top:-2.409999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span style="top:-1.209999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mord mathit">a</span><span class="mord mathit">c</span><span class="mord mathit">t</span><span class="mord mathit">o</span><span class="mord mathit" style="margin-right:0.02778em;">r</span></span></span><span style="top:-0.009999999999997733em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span style="top:1.1900000000000024em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.07847em;">I</span><span class="mord mathit">n</span><span class="mord mathit">t</span><span class="mord mathit">e</span><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="mord mathit">e</span><span class="mord mathit" style="margin-right:0.02778em;">r</span></span></span><span style="top:2.390000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span style="top:3.590000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathrm">.</span><span class="mord mathrm">.</span><span class="mord mathrm">.</span></span></span><span style="top:4.79em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span style="top:5.989999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit">A</span><span class="mord mathit">p</span><span class="mord mathit">p</span><span class="mord mathit" style="margin-right:0.01968em;">l</span><span class="mord mathit" style="margin-right:0.03588em;">y</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-6.009999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mord mathit">a</span><span class="mord mathit">c</span><span class="mord mathit">t</span><span class="mord mathit">o</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mord mathrm">.</span><span class="mord mathrm">.</span><span class="mord mathrm">.</span></span></span><span style="top:-3.6099999999999985em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="mord mathit">x</span><span class="mord mathit">p</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="mord mathit">x</span><span class="mord mathit">p</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mord mathit">s</span></span></span><span style="top:-1.2099999999999989em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.07847em;">I</span><span class="mord mathit">n</span><span class="mord mathit">t</span><span class="mord mathit">e</span><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="mord mathit">e</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mord mathrm">∣</span><span class="mord mathit">A</span><span class="mord mathit">p</span><span class="mord mathit">p</span><span class="mord mathit" style="margin-right:0.01968em;">l</span><span class="mord mathit" style="margin-right:0.03588em;">y</span><span class="mord mathrm">∣</span><span class="mord mathit" style="margin-right:0.07847em;">I</span><span class="mord mathit">d</span><span class="mord mathit">e</span><span class="mord mathit">n</span><span class="mord mathit">t</span><span class="mord mathit">i</span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mord mathit" style="margin-right:0.03588em;">y</span><span class="mord mathrm">∣</span><span class="mord mathrm">.</span><span class="mord mathrm">.</span><span class="mord mathrm">.</span><span class="mord mathrm">∣</span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="mord mathit">x</span><span class="mord mathit">p</span><span class="mord mathit" style="margin-right:0.02778em;">r</span></span><span class="mord displaystyle textstyle uncramped"><span class="mclose">)</span></span></span></span><span style="top:1.1900000000000015em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathrm">.</span><span class="mord mathrm">.</span><span class="mord mathrm">.</span></span></span><span style="top:5.989999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mord mathit">a</span><span class="mord mathit">c</span><span class="mord mathit">t</span><span class="mord mathit">o</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="mord mathit">x</span><span class="mord mathit">p</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mord mathit">s</span><span class="mclose">)</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span>
<p>显然,non-terminal <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi><mi>p</mi><mi>p</mi><mi>l</mi><mi>y</mi></mrow><annotation encoding="application/x-tex">Apply</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span><span class="mord mathit">p</span><span class="mord mathit">p</span><span class="mord mathit" style="margin-right:0.01968em;">l</span><span class="mord mathit" style="margin-right:0.03588em;">y</span></span></span></span> 的派生最左端会进入 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>F</mi><mi>a</mi><mi>c</mi><mi>t</mi><mi>o</mi><mi>r</mi></mrow><annotation encoding="application/x-tex">Factor</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mord mathit">a</span><span class="mord mathit">c</span><span class="mord mathit">t</span><span class="mord mathit">o</span><span class="mord mathit" style="margin-right:0.02778em;">r</span></span></span></span> ,之后又会回到 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi><mi>p</mi><mi>p</mi><mi>l</mi><mi>y</mi></mrow><annotation encoding="application/x-tex">Apply</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span><span class="mord mathit">p</span><span class="mord mathit">p</span><span class="mord mathit" style="margin-right:0.01968em;">l</span><span class="mord mathit" style="margin-right:0.03588em;">y</span></span></span></span> 。教科书式的左递归。</p>
<div class="more-link">
<a href="/2017/03/03/使用-Parsec-处理左递归/#more">Read On »</a>
</div>
</div>
</article>
</li>
<li>
<article class='ListView'>
<header class="title">
<h1>
<a href="/2016/08/24/Call-With-Current-Continuation-Coroutine/">Call With Current Continuation: Coroutine</a>
</h1>
<div class='ListMeta'>
<time datetime="2016-08-24T14:33:53.000Z" itemprop="datePublished">
2016-08-24
</time>
</div>
</header>
<div>
<link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css"><p>通过嵌套 <code>call/cc</code> 可以方便的实现协程</p>
<figure class="highlight"><table><tr><td class="code"><pre><div class="line">(define (coroutine routine)</div><div class="line"> (let ((current routine)</div><div class="line"> (status 'suspended))</div><div class="line"> (lambda args</div><div class="line"> (cond ((null? args) </div><div class="line"> (if (eq? status 'dead)</div><div class="line"> (error 'dead-coroutine)</div><div class="line"> (let ((continuation-and-value</div><div class="line"> (call/cc (lambda (return)</div><div class="line"> (let ((returner</div><div class="line"> (lambda (value)</div><div class="line"> (call/cc (lambda (next)</div><div class="line"> (return (cons next value)))))))</div><div class="line"> (current returner)</div><div class="line"> (set! status 'dead))))))</div><div class="line"> (if (pair? continuation-and-value)</div><div class="line"> (begin (set! current (car continuation-and-value))</div><div class="line"> (cdr continuation-and-value))</div><div class="line"> continuation-and-value))))</div><div class="line"> ((eq? (car args) 'status?) status)</div><div class="line"> ((eq? (car args) 'dead?) (eq? status 'dead))</div><div class="line"> ((eq? (car args) 'alive?) (not (eq? status 'dead)))</div><div class="line"> ((eq? (car args) 'kill!) (set! status 'dead))</div><div class="line"> (true nil)))))</div></pre></td></tr></table></figure>
</div>
</article>
</li>
<li>
<article class='ListView'>
<header class="title">
<h1>
<a href="/2016/08/01/Write-You-a-Scheme/">Write You a Scheme</a>
</h1>
<div class='ListMeta'>
<time datetime="2016-08-01T01:33:57.000Z" itemprop="datePublished">
2016-08-01
</time>
|
<ul>
<li class="meta-text">
{ <a href="/tags/haskell/">haskell</a> }
</li>
<li class="meta-text">
{ <a href="/tags/scheme/">scheme</a> }
</li>
<li class="meta-text">
{ <a href="/tags/interpreter/">interpreter</a> }
</li>
<li class="meta-text">
{ <a href="/tags/lambda-calculus/">lambda calculus</a> }
</li>
</ul>
</div>
</header>
<div>
<p>撸了个 <a href="https://github.com/nameoverflow/LittleScheme">Scheme 解释器</a>,也算是拿 Haskell 做过东西了(虽然只是个玩具</p>
<p>最大的体会就是,既熟悉了 Haskell,也巩固了 Scheme (虽然看过 SICP 但是并不是很明白它的 quosiquote 和 call/cc 之类的鬼东西</p>
<p>本来是打算自己定义一门语言(像<a href="https://hcyue.me/article/56eb92c6d09a19dd0956c469">这个</a>),但是发现挺麻烦的(大雾),而且我比较关心的也是解释执行的过程,于是还是决定把 Scheme 实现一下。大体上是跟着 <a href="https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours">Write Yourself a Scheme in 48 Hours</a> 来的,在它的基础上增加了 Continuation 之类的玩意儿</p>
<div class="more-link">
<a href="/2016/08/01/Write-You-a-Scheme/#more">Read On »</a>
</div>
</div>
</article>
</li>
<li>
<article class='ListView'>
<header class="title">
<h1>
<a href="/2016/05/14/什么是函数式编程思维/">什么是函数式编程思维?</a>
</h1>
<div class='ListMeta'>
<time datetime="2016-05-14T06:00:30.000Z" itemprop="datePublished">
2016-05-14
</time>
|
<ul>
<li class="meta-text">
{ <a href="/tags/functional-programming/">functional programming</a> }
</li>
</ul>
</div>
</header>
<div>
<link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css"><p>我为什么要把我的知乎回答搬到这里呢……大概是太久没发东西了来凑数吧。</p>
<p>作者:nameoverflow</p>
<p>链接:<a href="https://www.zhihu.com/question/28292740/answer/100284611" target="_blank" rel="external">https://www.zhihu.com/question/28292740/answer/100284611</a></p>
<p>来源:知乎</p>
<p>著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。</p>
<p>函数式编程与命令式编程最大的不同其实在于:</p>
<p><strong>函数式编程关心数据的映射,命令式编程关心解决问题的步骤</strong></p>
<p>这里的映射就是数学上“函数”的概念——一种东西和另一种东西之间的对应关系。</p>
<p>这也是为什么“函数式编程”叫做“函数式编程”。</p>
<p>这是什么意思呢?</p>
<p>假如,现在你来到 google 面试,面试官让你把二叉树镜像反转一下(大雾</p>
<p>几乎不假思索的,就可以写出这样的 Python 代码:</p>
<figure class="highlight python"><table><tr><td class="code"><pre><div class="line"></div><div class="line"><span class="function"><span class="keyword">def</span> <span class="title">invertTree</span><span class="params">(root)</span>:</span></div><div class="line"></div><div class="line"> <span class="keyword">if</span> root <span class="keyword">is</span> <span class="keyword">None</span>:</div><div class="line"></div><div class="line"> <span class="keyword">return</span> <span class="keyword">None</span></div><div class="line"></div><div class="line"> <span class="keyword">if</span> root.left:</div><div class="line"></div><div class="line"> invertTree(root.left)</div><div class="line"></div><div class="line"> <span class="keyword">if</span> root.right:</div><div class="line"></div><div class="line"> invertTree(root.right)</div><div class="line"></div><div class="line"> root.left, root.right = root.right, root.left</div><div class="line"></div><div class="line"> <span class="keyword">return</span> root</div></pre></td></tr></table></figure>
<p>好了,现在停下来看看这段代码究竟代表着什么——</p>
<p>它的含义是:首先判断节点是否为空;然后翻转左树;然后翻转右树;最后左右互换。</p>
<p>这就是命令式编程——你要做什么事情,你得把达到目的的步骤详细的描述出来,然后交给机器去运行。</p>
<p>这也正是命令式编程的理论模型——图灵机的特点。一条写满数据的纸带,一条根据纸带内容运动的机器,机器每动一步都需要纸带上写着如何达到。</p>
<p>那么,不用这种方式,如何翻转二叉树呢?</p>
<p>函数式思维提供了另一种思维的途径——</p>
<p>所谓“翻转二叉树”,可以看做是要得到一颗和原来二叉树对称的新二叉树。</p>
<p>这颗新二叉树的特点是每一个节点都递归地和原树相反。</p>
<p>用 haskell 代码表达出来就是:</p>
<figure class="highlight haskell"><table><tr><td class="code"><pre><div class="line"><span class="class"><span class="keyword">data</span> <span class="type">Tree</span> a = <span class="type">Nil</span> | <span class="type">Node</span> a (<span class="type">Tree</span> <span class="title">a</span>) (<span class="type">Tree</span> <span class="title">a</span>)</span></div><div class="line"> <span class="keyword">deriving</span> (<span class="type">Show</span>, <span class="type">Eq</span>)</div><div class="line"></div><div class="line"><span class="title">invert</span> :: <span class="type">Tree</span> a -> <span class="type">Tree</span> a</div><div class="line"><span class="title">invert</span> <span class="type">Nil</span> = <span class="type">Nil</span></div><div class="line"><span class="title">invert</span> (<span class="type">Node</span> v l r) = <span class="type">Node</span> v (invert r) (invert l)</div></pre></td></tr></table></figure>
<p>(防止看不懂,翻译成等价的 python )</p>
<figure class="highlight python"><table><tr><td class="code"><pre><div class="line"></div><div class="line"><span class="function"><span class="keyword">def</span> <span class="title">invert</span><span class="params">(node)</span>:</span></div><div class="line"></div><div class="line"> <span class="keyword">if</span> node <span class="keyword">is</span> <span class="keyword">None</span>:</div><div class="line"></div><div class="line"> <span class="keyword">return</span> <span class="keyword">None</span></div><div class="line"></div><div class="line"> <span class="keyword">else</span></div><div class="line"></div><div class="line"> <span class="keyword">return</span> Tree(node.value, invert(node.right), invert(node.left))</div></pre></td></tr></table></figure>
<p>这段代码体现的思维,就是旧树到新树的映射——对一颗二叉树而言,它的镜像树就是左右节点递归镜像的树。</p>
<p>这段代码最终达到的目的同样是翻转二叉树,但是它得到结果的方式和 python 代码有着本质的差别:通过描述一个 旧树->新树 的映射,而不是描述“从旧树得到新树应该怎样做”来达到目的。</p>
<p>那么这样思考有什么好处呢?</p>
<p>首先,最直观的角度来说,函数式风格的代码可以写得很精简,大大减少了键盘的损耗(</p>
<p>更重要的是,函数式的代码是“对映射的描述”,它不仅可以描述二叉树这样的数据结构之间的对应关系,任何能在计算机中体现的东西之间的对应关系都可以描述——比如函数和函数之间的映射(比如 functor);比如外部操作到 GUI 之间的映射(就是现在前端热炒的所谓 FRP)。它的抽象程度可以很高,这就意味着函数式的代码可以更方便的复用。</p>
<p>同时,将代码写成这种样子可以方便用数学的方法进行研究(这就是为什么可以扯上“___范畴上的___”这种数学上的高深概念)</p>
<p>至于什么科里化、什么数据不可变,都只是外延体现而已。</p>
</div>
</article>
</li>
</ul>
<section id="nav-wrapper">
<nav id="page-nav">
<span class="page-number current">1</span><a class="page-number" href="/page/2/">2</a><a class="page-number" href="/page/3/">3</a><span class="space">…</span><a class="page-number" href="/page/5/">5</a><a class="extend next" rel="next" href="/page/2/">next »</a>
</nav>
</section>
<footer>
<div>© 2016 - Hcyue , CC BY-NC-SA 4.0 </div>
<div>
Powered by Hexo
</div>
</footer>
</div>
</div>
</div>
<script src="/js/pager/dist/singlepager.js"></script>
<script>
var sp = new Pager('data-pager-shell')
</script><!-- hexo-inject:begin --><!-- hexo-inject:end -->
</body>
</html>