-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathData Locality · Optimization Patterns · Game Programming Patterns.html
1123 lines (1014 loc) · 69 KB
/
Data Locality · Optimization Patterns · Game Programming Patterns.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- saved from url=(0053)http://gameprogrammingpatterns.com/data-locality.html -->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Data Locality · Optimization Patterns · Game Programming Patterns</title>
<!-- Tell mobile browsers we're optimized for them and they don't need to crop
the viewport. -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" type="text/css" href="./Data Locality · Optimization Patterns · Game Programming Patterns_files/style.css">
<link href="./Data Locality · Optimization Patterns · Game Programming Patterns_files/css" rel="stylesheet" type="text/css">
<link rel="icon" type="image/png" href="http://gameprogrammingpatterns.com/favicon-32x32.png" sizes="32x32">
<link rel="icon" type="image/png" href="http://gameprogrammingpatterns.com/favicon-16x16.png" sizes="16x16">
<script async="" src="./Data Locality · Optimization Patterns · Game Programming Patterns_files/analytics.js"></script><script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-42804721-1', 'gameprogrammingpatterns.com');
ga('send', 'pageview');
</script>
<script src="./Data Locality · Optimization Patterns · Game Programming Patterns_files/jquery-1.10.1.min.js"></script>
<script src="./Data Locality · Optimization Patterns · Game Programming Patterns_files/script.js"></script>
</head>
<body id="top">
<div class="page sidebar">
<div class="content">
<nav class="top">
<table>
<tbody><tr>
<td>← <a href="http://gameprogrammingpatterns.com/optimization-patterns.html">Previous<span class="full-nav"> Chapter</span></a>
</td><td class="gap">
</td><td>≡ <a href="http://gameprogrammingpatterns.com/">About <span class="full-nav">The Book</span></a>
</td><td class="gap">
</td><td>§ <a href="http://gameprogrammingpatterns.com/contents.html">Contents</a>
</td><td class="gap">
</td><td><a href="http://gameprogrammingpatterns.com/dirty-flag.html">Next<span class="full-nav"> Chapter</span></a> →
</td></tr>
</tbody></table>
</nav>
<h1>Data Locality</h1>
<h1 class="book"><a href="http://gameprogrammingpatterns.com/">Game Programming Patterns</a><span class="section"><a href="http://gameprogrammingpatterns.com/optimization-patterns.html">Optimization Patterns</a></span></h1>
<h2><a href="http://gameprogrammingpatterns.com/data-locality.html#intent" name="intent">Intent</a></h2>
<p><em>Accelerate memory access by arranging data to take advantage of CPU caching.</em></p>
<h2><a href="http://gameprogrammingpatterns.com/data-locality.html#motivation" name="motivation">Motivation</a></h2>
<p>We’ve been lied to. They keep showing us charts where CPU speed goes up and up
every year as if Moore’s Law isn’t just a historical observation but some kind
of divine right. Without lifting a finger, we software folks watch our programs
magically accelerate just by virtue of new hardware.</p>
<p>Chips <em>have</em> been getting faster (though even that’s plateauing now), but the
hardware heads failed to mention something. Sure, we can <em>process</em> data faster
than ever, but we can’t <em>get</em> that data faster.</p>
<p><span name="legend"></span></p>
<p><img src="./Data Locality · Optimization Patterns · Game Programming Patterns_files/data-locality-chart.png" alt="A chart showing processor and RAM speed from 1980 to 2010. Processor speed increases quickly, but RAM speed lags behind."></p>
<aside name="legend" style="top: 658px;">
<p>Processor and RAM speed relative to their respective speeds in 1980. As you can
see, CPUs have grown in leaps and bounds, but RAM access is lagging far behind.</p>
<p>Data for this is from <em>Computer Architecture: A Quantitative Approach</em>
by John L. Hennessy, David A. Patterson, Andrea C. Arpaci-Dusseau by way of Tony
Albrecht’s “<a href="http://seven-degrees-of-freedom.blogspot.com/2009/12/pitfalls-of-object-oriented-programming.html">Pitfalls of Object-Oriented Programming</a>”.</p>
</aside>
<p>For your super-fast CPU to blow through a ream of calculations, it actually has
to get the data out of main memory and into registers. As you can see, RAM hasn’t
been keeping up with increasing CPU speeds. Not even close.</p>
<p>With today’s hardware, it can take <em>hundreds</em> of cycles to fetch a byte of data
from <span name="ram">RAM</span>. If most instructions need data, and it takes
hundreds of cycles to get it, how is it that our CPUs aren’t sitting idle 99%
of the time waiting for data?</p>
<p>Actually, they <em>are</em> stuck waiting on memory an astonishingly large fraction of
time these days, but it’s not as bad as it could be. To explain how, let’s take
a trip to the Land of Overly Long Analogies…</p>
<aside name="ram" style="top: 1133px;">
<p>It’s called “random access memory” because, unlike disc drives, you can
theoretically access any piece of it as quick as any other. You don’t have
to worry about reading things consecutively like you do a disc.</p>
<p>Or, at least, you <em>didn’t</em>. As we’ll see, RAM isn’t so random access anymore.</p>
</aside>
<h3><a href="http://gameprogrammingpatterns.com/data-locality.html#a-data-warehouse" name="a-data-warehouse">A data warehouse</a></h3>
<p>Imagine you’re an accountant in a tiny little office. Your job is to request a
box of papers and then do some <span name="accountant">accountant</span>-y stuff
with them — add up a bunch of numbers or something. You must do this for
specific labeled boxes according to some arcane logic that only makes sense to
other accountants.</p>
<aside name="accountant" style="top: 1376px;">
<p>I probably shouldn’t have used a job I know absolutely nothing about in this
analogy.</p>
</aside>
<p>Thanks to a mixture of hard work, natural aptitude, and stimulants, you can
finish an entire box in, say, a minute. There’s a little problem, though. All of
those boxes are stored in a warehouse in a separate building. To get a box, you
have to ask the warehouse guy to bring it to you. He goes and gets a forklift
and drives around the aisles until he finds the box you want.</p>
<p>It takes him, seriously, an entire day to do this. Unlike you, he’s not getting
employee of the month any time soon. This means that no matter how fast you are,
you only get one box a day. The rest of the time, you just sit there and
question the life decisions that led to this soul-sucking job.</p>
<p>One day, a group of industrial designers shows up. Their job is to improve the
efficiency of operations — things like making assembly lines go faster. After
watching you work for a few days, they notice a few things:</p>
<ul>
<li>
<p>Pretty often, when you’re done with one box, the next box you request is
right <span name="next">next</span> to it on the same shelf in the
warehouse.</p>
</li>
<li>
<p>Using a forklift to carry a single box of papers is pretty dumb.</p>
</li>
<li>
<p>There’s actually a little bit of spare room in the corner of your office.</p>
</li>
</ul>
<aside name="next" style="top: 1820px;">
<p>The technical term for using something near the thing you just used is <em>locality
of reference</em>.</p>
</aside>
<p>They come up with a clever fix. Whenever you request a box from the warehouse
guy, he’ll grab an entire pallet of them. He gets the box you want and then some
more boxes that are next to it. He doesn’t know if you want those (and, given
his work ethic, clearly doesn’t care); he simply takes as many as he can fit on
the pallet.</p>
<p>He loads the whole pallet and brings it to you. Disregarding concerns for
workplace safety, he drives the forklift right in and drops the pallet in the
corner of your office.</p>
<p>When you need a new box, now, the first thing you do is see if it’s already on
the pallet in your office. If it is, great! It only takes you a second to grab
it and get back to crunching numbers. If a pallet holds fifty boxes and you
got lucky and <em>all</em> of the boxes you need happen to be on it, you can churn
through fifty times more work than you could before.</p>
<p>But if you need a box that’s <em>not</em> on the pallet, you’re back to square one.
Since you can only fit one pallet in your office, your warehouse friend will
have to take that one back and then bring you an entirely new one.</p>
<h3><a href="http://gameprogrammingpatterns.com/data-locality.html#a-pallet-for-your-cpu" name="a-pallet-for-your-cpu">A pallet for your CPU</a></h3>
<p>Strangely enough, this is similar to how CPUs in modern computers work. In case
it isn’t obvious, you play the role of the CPU. Your desk is the CPU’s
registers, and the box of papers is the data you can fit in them. The warehouse
is your machine’s RAM, and that annoying warehouse guy is the bus that pulls
data from main memory into registers.</p>
<p>If I were writing this chapter thirty years ago, the analogy would stop there.
But as chips got faster and RAM, well, <em>didn’t</em>, hardware engineers started
looking for solutions. What they came up with was <em>CPU caching</em>.</p>
<p>Modern computers have a <span name="caches">little chunk</span> of memory right
inside the chip. The CPU can pull data from this much faster than it can from
main memory. It’s small because it has to fit in the chip and because the faster
type of memory it uses (static RAM or “SRAM”) is way more expensive.</p>
<aside name="caches" style="top: 2648px;">
<p>Modern hardware has multiple levels of caching, which is what they mean
when you hear “L1”, “L2”, “L3”, etc. Each level is larger but slower than the
previous. For this chapter, we won’t worry about the fact that memory is
actually a <a href="http://en.wikipedia.org/wiki/Memory_hierarchy">hierarchy</a>, but it’s
important to know.</p>
</aside>
<p>This little chunk of memory is called a <em>cache</em> (in particular, the chunk on the
chip is your <em>L1 cache</em>), and in my belabored analogy, its part was played by the
pallet of boxes. Whenever your chip needs a byte of data from RAM, it
automatically grabs a whole chunk of contiguous memory — usually around 64 to
128 bytes — and puts it in the cache. This dollop of memory is called a <em>cache
line</em>.</p>
<p><img src="./Data Locality · Optimization Patterns · Game Programming Patterns_files/data-locality-cache-line.png" alt="A cache line showing the one byte requested along with the adjacent bytes that also get loaded into the cache."></p>
<p>If the <span name="pallet">next byte</span> of data you need happens to be in
that chunk, the CPU reads it straight from the cache, which is <em>much</em> faster
than hitting RAM. Successfully finding a piece of data in the cache is called a
<em>cache hit</em>. If it can’t find it in there and has to go to main memory, that’s a
<em>cache miss</em>.</p>
<aside name="pallet" style="top: 3103px;">
<p>I glossed over (at least) one detail in the analogy. In your office, there was
only room for one pallet, or one cache line. A real cache contains a number of
cache lines. The details about how those work is out of scope here, but search
for “cache associativity” to feed your brain.</p>
</aside>
<p>When a cache miss occurs, the CPU <em>stalls</em> — it can’t process the next
instruction because it needs data. It sits there, bored out of its mind for a
few hundred cycles until the fetch completes. Our mission is to avoid that.
Imagine you’re trying to optimize some performance-critical piece of game code
and it looks like this:</p>
<div class="codehilite"><pre><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">NUM_THINGS</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">sleepFor500Cycles</span><span class="p">();</span>
<span class="n">things</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">doStuff</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<p>What’s the first change you’re going to make to that code? Right. Take out that
pointless, expensive function call. That call is equivalent to the performance
cost of a cache miss. Every time you bounce to main memory, it’s like you put a
delay in your code.</p>
<h3><a href="http://gameprogrammingpatterns.com/data-locality.html#wait,-data-is-performance" name="wait,-data-is-performance">Wait, data is performance?</a></h3>
<p>When I started working on this chapter, I spent some time putting together
little game-like programs that would trigger best case and worst case cache
usage. I wanted benchmarks that would thrash the cache so I could see first-hand
how much bloodshed it causes.</p>
<p>When I got some stuff working, I was surprised. I knew it was a big deal, but
there’s nothing quite like seeing it with your own eyes. <span name="ymmv">I
wrote two programs</span> that did the <em>exact same</em> computation. The only
difference was how many cache misses they caused. The slow one was <em>fifty times</em>
slower than the other.</p>
<aside name="ymmv" style="top: 3757px;">
<p>There are a lot of caveats here. In particular, different computers have different
cache setups, so my machine may be different from yours, and dedicated game
consoles are very different from PCs, which are quite different from mobile
devices.</p>
<p>Your mileage will vary.</p>
</aside>
<p>This was a real eye-opener to me. I’m used to thinking of performance being an
aspect of <em>code</em>, not <em>data</em>. A byte isn’t slow or fast, it’s just some static
thing sitting there. But because of caching, <em>the way you organize data
directly impacts performance</em>.</p>
<p>The challenge now is to wrap that up into something that fits into a chapter
here. Optimization for cache usage is a huge topic. I haven’t even touched on
<em>instruction caching</em>. Remember, code is in memory too and has to be loaded onto
the CPU before it can be executed. Someone more versed on the subject could
write an entire <span name="book">book</span> on it.</p>
<aside name="book" style="top: 4051px;">
<p>In fact, someone <em>did</em> write a book on it: <a href="http://www.dataorienteddesign.com/dodmain/"><em>Data-Oriented
Design</em></a>, by Richard Fabian.</p>
</aside>
<p>Since you’re already reading <em>this</em> book right now, though, I have a few basic
techniques that will get you started along the path of thinking about how data
structures impact your performance.</p>
<p>It all boils down to something pretty simple: whenever the chip reads some
memory, it gets a whole cache line. The more you can use stuff in that <span name="line">cache line, the faster you go</span>. So the goal then is to
<em>organize your data structures so that the things you’re processing are next to
each other in memory</em>.</p>
<aside name="line" style="top: 4201px;">
<p>There’s a key assumption here, though: one thread. If you are modifying nearby
data on multiple threads, it’s faster to have it on <em>different</em> cache lines. If
two threads try to tweak data on the same cache line, both cores have to do some
costly synchronization of their caches.</p>
</aside>
<p>In other words, if your code is crunching on <code>Thing</code>, then <code>Another</code>, then
<code>Also</code>, you want them laid out in memory like this:</p>
<p><img src="./Data Locality · Optimization Patterns · Game Programming Patterns_files/data-locality-things.png" alt="Thing, Another, and Also laid out directly next to each other in order in memory."></p>
<p>Note, these aren’t <em>pointers</em> to <code>Thing</code>, <code>Another</code>, and <code>Also</code>. This is the
actual data for them, in place, lined up one after the other. As soon as the CPU
reads in <code>Thing</code>, it will start to get <code>Another</code> and <code>Also</code> too (depending on
how big they are and how big a cache line is). When you start working on them
next, they’ll already be cached. Your chip is happy, and you’re happy.</p>
<h2><a href="http://gameprogrammingpatterns.com/data-locality.html#the-pattern" name="the-pattern">The Pattern</a></h2>
<p>Modern CPUs have <strong>caches to speed up memory access</strong>. These can access memory
<strong>adjacent to recently accessed memory much quicker</strong>. Take advantage of that to
improve performance by <strong>increasing data locality</strong> — keeping data in
<strong>contiguous memory in the order that you process it</strong>.</p>
<h2><a href="http://gameprogrammingpatterns.com/data-locality.html#when-to-use-it" name="when-to-use-it">When to Use It</a></h2>
<p>Like most optimizations, the first guideline for using the Data Locality pattern is <em>when you have a
performance problem.</em> Don’t waste time applying this to some infrequently
executed corner of your codebase. Optimizing code that doesn’t need it just
makes your life harder since the result is almost always more complex and less
flexible.</p>
<p>With this pattern specifically, you’ll also want to be sure your performance
problems <em>are caused by cache misses</em>. If your code is slow for other reasons,
this won’t help.</p>
<p>The cheap way to profile is to manually add a bit of instrumentation that checks
how much time has elapsed between two points in the code, hopefully using a
precise timer. To catch poor cache usage, you’ll want something a little more
sophisticated. You really want to see how many cache misses are occurring and
where.</p>
<p>Fortunately, there are <span name="cachegrind">profilers</span> out there that
report this. It’s worth spending the time to get one of these working and make
sure you understand the (surprisingly complex) numbers it throws at you before
you do major surgery on your data structures.</p>
<aside name="cachegrind" style="top: 5227px;">
<p>Unfortunately, most of those tools aren’t cheap. If you’re on a console dev
team, you probably already have licenses for them.</p>
<p>If not, an excellent free option is
<a href="http://valgrind.org/docs/manual/cg-manual.html">Cachegrind</a>. It runs your
program on top of a simulated CPU and cache hierarchy and then reports all of
the cache interactions.</p>
</aside>
<p>That being said, cache misses <em>will</em> affect the performance of your game. While
you shouldn’t spend a ton of time pre-emptively optimizing for cache usage, do
think about how cache-friendly your data structures are throughout the design
process.</p>
<h2><a href="http://gameprogrammingpatterns.com/data-locality.html#keep-in-mind" name="keep-in-mind">Keep in Mind</a></h2>
<p>One of the hallmarks of software architecture is <em>abstraction</em>. A large chunk of
this book is about patterns to decouple pieces of code from each other so that
they can be changed more easily. In object-oriented languages, this almost
always means interfaces.</p>
<p>In C++, using interfaces implies accessing objects through <span name="virtual">pointers or references</span>. But going through a pointer means
hopping across memory, which leads to the cache misses this pattern works to
avoid.</p>
<aside name="virtual" style="top: 5626px;">
<p>The other half of interfaces is <em>virtual method calls</em>. Those require the CPU to
look up an object’s vtable and then find the pointer to the actual method to
call there. So, again, you’re chasing pointers, which can cause cache misses.</p>
</aside>
<p>In order to please this pattern, you will have to sacrifice some of your
precious abstractions. The more you design your program around data locality,
the more you will have to give up inheritance, interfaces, and the benefits
those tools can provide. There’s no silver bullet here, only challenging
trade-offs. That’s what makes it fun!</p>
<h2><a href="http://gameprogrammingpatterns.com/data-locality.html#sample-code" name="sample-code">Sample Code</a></h2>
<p>If you really go down the rathole of optimizing for data locality, you’ll
discover countless ways to slice and dice your data structures into pieces your
CPU can most easily digest. To get you started, I’ll show an example for each of
a few of the most common ways to organize your data. We’ll cover them in the
context of some specific part of a game engine, but (as with other patterns),
keep in mind that the general technique can be applied anywhere it fits.</p>
<h3><a href="http://gameprogrammingpatterns.com/data-locality.html#contiguous-arrays" name="contiguous-arrays">Contiguous arrays</a></h3>
<p>Let’s start with a <a href="http://gameprogrammingpatterns.com/game-loop.html" class="pattern">game loop</a> that
processes a bunch of game entities. Those entities are decomposed into different
domains — AI, physics, and rendering — using the <a href="http://gameprogrammingpatterns.com/component.html" class="pattern">Component</a> pattern. Here’s the <code>GameEntity</code> class:</p>
<div class="codehilite"><pre><span class="k">class</span> <span class="nc">GameEntity</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="n">GameEntity</span><span class="p">(</span><span class="n">AIComponent</span><span class="o">*</span> <span class="n">ai</span><span class="p">,</span>
<span class="n">PhysicsComponent</span><span class="o">*</span> <span class="n">physics</span><span class="p">,</span>
<span class="n">RenderComponent</span><span class="o">*</span> <span class="n">render</span><span class="p">)</span>
<span class="o">:</span> <span class="n">ai_</span><span class="p">(</span><span class="n">ai</span><span class="p">),</span> <span class="n">physics_</span><span class="p">(</span><span class="n">physics</span><span class="p">),</span> <span class="n">render_</span><span class="p">(</span><span class="n">render</span><span class="p">)</span>
<span class="p">{}</span>
<span class="n">AIComponent</span><span class="o">*</span> <span class="n">ai</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">ai_</span><span class="p">;</span> <span class="p">}</span>
<span class="n">PhysicsComponent</span><span class="o">*</span> <span class="n">physics</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">physics_</span><span class="p">;</span> <span class="p">}</span>
<span class="n">RenderComponent</span><span class="o">*</span> <span class="n">render</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">render_</span><span class="p">;</span> <span class="p">}</span>
<span class="nl">private:</span>
<span class="n">AIComponent</span><span class="o">*</span> <span class="n">ai_</span><span class="p">;</span>
<span class="n">PhysicsComponent</span><span class="o">*</span> <span class="n">physics_</span><span class="p">;</span>
<span class="n">RenderComponent</span><span class="o">*</span> <span class="n">render_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>Each component has a relatively small amount of state, maybe little more than a
few vectors or a matrix, and then a method to <span name="update">update</span>
it. The details aren’t important here, but imagine something roughly along the
lines of:</p>
<aside name="update" style="top: 6554px;">
<p>As the name implies, these are examples of the <a href="http://gameprogrammingpatterns.com/update-method.html" class="pattern">Update Method</a> pattern. Even <code>render()</code> is this pattern, just
by another name.</p>
</aside>
<div class="codehilite"><pre><span class="k">class</span> <span class="nc">AIComponent</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="kt">void</span> <span class="n">update</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* Work with and modify state... */</span> <span class="p">}</span>
<span class="nl">private:</span>
<span class="c1">// Goals, mood, etc. ...</span>
<span class="p">};</span>
<span class="k">class</span> <span class="nc">PhysicsComponent</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="kt">void</span> <span class="n">update</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* Work with and modify state... */</span> <span class="p">}</span>
<span class="nl">private:</span>
<span class="c1">// Rigid body, velocity, mass, etc. ...</span>
<span class="p">};</span>
<span class="k">class</span> <span class="nc">RenderComponent</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="kt">void</span> <span class="n">render</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* Work with and modify state... */</span> <span class="p">}</span>
<span class="nl">private:</span>
<span class="c1">// Mesh, textures, shaders, etc. ...</span>
<span class="p">};</span>
</pre></div>
<p>The game maintains a big array of pointers to all of the entities in the world.
Each spin of the game loop, we need to run the following:</p>
<ol>
<li>
<p>Update the AI components for all of the entities.</p>
</li>
<li>
<p>Update the physics components for them.</p>
</li>
<li>
<p>Render them using their render components.</p>
</li>
</ol>
<p>Lots of game engines implement that like so:</p>
<div class="codehilite"><pre><span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">gameOver</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Process AI.</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numEntities</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">entities</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-></span><span class="n">ai</span><span class="p">()</span><span class="o">-></span><span class="n">update</span><span class="p">();</span>
<span class="p">}</span>
<span class="c1">// Update physics.</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numEntities</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">entities</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-></span><span class="n">physics</span><span class="p">()</span><span class="o">-></span><span class="n">update</span><span class="p">();</span>
<span class="p">}</span>
<span class="c1">// Draw to screen.</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numEntities</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">entities</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-></span><span class="n">render</span><span class="p">()</span><span class="o">-></span><span class="n">render</span><span class="p">();</span>
<span class="p">}</span>
<span class="c1">// Other game loop machinery for timing...</span>
<span class="p">}</span>
</pre></div>
<p>Before you ever heard of a CPU cache, this looked totally innocuous. But by now,
you’ve got an inkling that something isn’t right here. This code isn’t just
thrashing the cache, it’s taking it around back and beating it to a pulp. Watch
what it’s doing:</p>
<ol>
<li>
<p>The array of game entities is storing <em>pointers</em> to them, so for each element in the
array, we have to traverse that pointer. That’s a cache miss.</p>
</li>
<li>
<p>Then the game entity has a pointer to the component. Another cache miss.</p>
</li>
<li>
<p>Then we update the component.</p>
</li>
<li>
<p>Now we go back to step one for <em>every component of every entity in the
game</em>.</p>
</li>
</ol>
<p>The scary part is that we have no idea how these objects are laid out in memory.
We’re completely at the mercy of the memory manager. As entities get allocated
and freed over time, the heap is likely to become increasingly randomly
organized.</p>
<p><span name="lines"></span></p>
<p><img src="./Data Locality · Optimization Patterns · Game Programming Patterns_files/data-locality-pointer-chasing.png" alt="A tangled mess of objects strewn randomly through memory with pointers wiring them all together."></p>
<aside name="lines" style="top: 8077px;">
<p>Every frame, the game loop has to follow all of those arrows to get to the data
it cares about.</p>
</aside>
<p>If our goal was to take a whirlwind tour around the game’s address space like
some “256MB of RAM in Four Nights!” cheap vacation package, this would be a
fantastic deal. But our goal is to run the game quickly, and <span name="chase">traipsing</span> all over main memory is <em>not</em> the way to do that.
Remember that <code>sleepFor500Cycles()</code> function? Well this code is effectively
calling that <em>all the time</em>.</p>
<aside name="chase" style="top: 8461px;">
<p>The term for wasting a bunch of time traversing pointers is “pointer chasing”,
which it turns out is nowhere near as fun as it sounds.</p>
</aside>
<p>Let’s do something better. Our first observation is that the only reason we
follow a pointer to get to the game entity is so we can immediately follow
<em>another</em> pointer to get to a component. <code>GameEntity</code> itself has no interesting
state and no useful methods. The <em>components</em> are what the game loop cares
about.</p>
<p>Instead of a giant constellation of game entities and components scattered
across the inky darkness of address space, we’re going to get back down to
Earth. We’ll have a big array for each type of component: a flat array of AI
components, another for physics, and another for rendering.</p>
<p>Like this:</p>
<p><span name="long-name"></span></p>
<div class="codehilite"><pre><span class="n">AIComponent</span><span class="o">*</span> <span class="n">aiComponents</span> <span class="o">=</span>
<span class="k">new</span> <span class="n">AIComponent</span><span class="p">[</span><span class="n">MAX_ENTITIES</span><span class="p">];</span>
<span class="n">PhysicsComponent</span><span class="o">*</span> <span class="n">physicsComponents</span> <span class="o">=</span>
<span class="k">new</span> <span class="n">PhysicsComponent</span><span class="p">[</span><span class="n">MAX_ENTITIES</span><span class="p">];</span>
<span class="n">RenderComponent</span><span class="o">*</span> <span class="n">renderComponents</span> <span class="o">=</span>
<span class="k">new</span> <span class="n">RenderComponent</span><span class="p">[</span><span class="n">MAX_ENTITIES</span><span class="p">];</span>
</pre></div>
<aside name="long-name" style="top: 8854px;">
<p>My least favorite part about using components is how long the word “component”
is.</p>
</aside>
<p>Let me stress that these are arrays of <em>components</em> and not <em>pointers to
components</em>. The data is all there, one byte after the other. The game loop can
then walk these directly:</p>
<p><span name="arrow"></span></p>
<div class="codehilite"><pre><span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">gameOver</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Process AI.</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numEntities</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">aiComponents</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">update</span><span class="p">();</span>
<span class="p">}</span>
<span class="c1">// Update physics.</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numEntities</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">physicsComponents</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">update</span><span class="p">();</span>
<span class="p">}</span>
<span class="c1">// Draw to screen.</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numEntities</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">renderComponents</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">render</span><span class="p">();</span>
<span class="p">}</span>
<span class="c1">// Other game loop machinery for timing...</span>
<span class="p">}</span>
</pre></div>
<aside name="arrow" style="top: 9074px;">
<p>One hint that we’re doing better here is how few <code>-></code> operators there are in the
new code. If you want to improve data locality, look for indirection operators
you can get rid of.</p>
</aside>
<p>We’ve ditched all of that pointer chasing. Instead of skipping around in memory,
we’re doing a straight crawl through three contiguous arrays.</p>
<p><img src="./Data Locality · Optimization Patterns · Game Programming Patterns_files/data-locality-component-arrays.png" alt="An array for each of three different kinds of components. Each array neatly packs its components together."></p>
<p>This pumps a solid stream of bytes right into the hungry maw of the CPU. In my
testing, this change made the update loop <em>fifty times</em> faster than the previous
version.</p>
<p>Interestingly, we haven’t lost much encapsulation here. Sure, the game loop is
updating the components directly instead of going through the game entities, but
it was doing that before to ensure they were processed in the right order. Even
so, each component itself is still nicely encapsulated. It owns its own data and
methods. We simply changed the way it’s used.</p>
<p>This doesn’t mean we need to get rid of <code>GameEntity</code> either. We can leave it as it
is with pointers to its components. They’ll just point into those
arrays. This is still useful for other parts of the game where you want to pass
around a conceptual “game entity” and everything that goes with it. The
important part is that the performance-critical game loop sidesteps that and
goes straight to the data.</p>
<h3><a href="http://gameprogrammingpatterns.com/data-locality.html#packed-data" name="packed-data">Packed data</a></h3>
<p>Say we’re doing a particle system. Following the advice of the previous section,
we’ve got all of our particles in a nice big contiguous array. Let’s wrap it in
a little <span name="pool">manager class</span> too:</p>
<aside name="pool" style="top: 10178px;">
<p>The <code>ParticleSystem</code> class is an example of an <a href="http://gameprogrammingpatterns.com/object-pool.html" class="pattern">object pool</a> custom built for a single type of object.</p>
</aside>
<div class="codehilite"><pre><span class="k">class</span> <span class="nc">Particle</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="kt">void</span> <span class="n">update</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* Gravity, etc. ... */</span> <span class="p">}</span>
<span class="c1">// Position, velocity, etc. ...</span>
<span class="p">};</span>
<span class="k">class</span> <span class="nc">ParticleSystem</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="n">ParticleSystem</span><span class="p">()</span>
<span class="o">:</span> <span class="n">numParticles_</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="p">{}</span>
<span class="kt">void</span> <span class="n">update</span><span class="p">();</span>
<span class="nl">private:</span>
<span class="k">static</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">MAX_PARTICLES</span> <span class="o">=</span> <span class="mi">100000</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">numParticles_</span><span class="p">;</span>
<span class="n">Particle</span> <span class="n">particles_</span><span class="p">[</span><span class="n">MAX_PARTICLES</span><span class="p">];</span>
<span class="p">};</span>
</pre></div>
<p>A rudimentary update method for the system just looks like this:</p>
<div class="codehilite"><pre><span class="kt">void</span> <span class="n">ParticleSystem</span><span class="o">::</span><span class="n">update</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numParticles_</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">particles_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">update</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>But it turns out that we don’t actually need to process <em>all</em> of the particles
all the time. The particle system has a fixed-size pool of objects, but they
aren’t usually all actively twinkling across the screen. The easy answer is
something like this:</p>
<div class="codehilite"><pre><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numParticles_</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">particles_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">isActive</span><span class="p">())</span>
<span class="p">{</span>
<span class="n">particles_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">update</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>We give <code>Particle</code> a flag to track whether its in use or not. In the update
loop, we <span name="branch">check</span> that for each particle. That loads the
flag into the cache along with all of that particle’s other data. If the
particle <em>isn’t</em> active, then we skip over it to the next one. The rest
of the particle’s data that we loaded into the cache is a waste.</p>
<p>The fewer active particles there are, the more we’re skipping across memory. The
more we do that, the more cache misses there are between actually doing useful
work updating active particles. If the array is large and has <em>lots</em> of inactive
particles in it, we’re back to thrashing the cache again.</p>
<p>Having objects in a contiguous array doesn’t solve much if the objects we’re
actually processing aren’t contiguous in it. If it’s littered with inactive
objects we have to dance around, we’re right back to the original problem.</p>
<aside name="branch" style="top: 11062px;">
<p>Savvy low-level coders can see another problem here. Doing an <code>if</code> check for
every particle can cause a <em>branch misprediction</em> and a <em>pipeline stall</em>. In
modern CPUs, a single “instruction” actually takes several clock cycles. To keep
the CPU busy, instructions are <em>pipelined</em> so that the subsequent instructions
start processing before the first one finishes.</p>
<p>To do that, the CPU has to guess which instructions it will be executing next.
In straight line code, that’s easy, but with control flow, it’s harder. While
it’s executing the instructions for that <code>if</code>, does it guess that the particle
is active and start executing the code for the <code>update()</code> call, or does it guess
that it isn’t?</p>
<p>To answer that, the chip does <em>branch prediction</em> — it sees which branches your
code previously took and guesses that it will do that again. But when the loop
is constantly toggling between particles that are and aren’t active, that
prediction fails.</p>
<p>When it does, the CPU has to ditch the instructions it had started speculatively
processing (a <em>pipeline flush</em>) and start over. The performance impact of this
varies widely by machine, but this is why you sometimes see developers avoid
flow control in hot code.</p>
</aside>
<p>Given the title of this section, you can probably guess the answer. Instead of
<em>checking</em> the active flag, we’ll <em>sort</em> by it. We’ll keep all of the active
particles in the front of the list. If we know all of those particles are
active, we don’t have to check the flag at all.</p>
<p>We can also easily keep track of how many active particles there are. With this,
our update loop turns into this thing of beauty:</p>
<div class="codehilite"><pre><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numActive_</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">particles</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">update</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<p>Now we aren’t skipping over <em>any</em> data. Every byte that gets sucked into the
cache is a piece of an active particle that we actually need to process.</p>
<p>Of course, I’m not saying you should quicksort the entire collection of
particles every frame. That would more than eliminate the gains here. What we
want to do is <em>keep</em> the array sorted.</p>
<p>Assuming the array is already sorted — and it is at first when all particles
are inactive — the only time it can become <em>un</em>sorted is when a particle has
been activated or deactivated. We can handle those two cases pretty easily. When
a particle gets activated, we move it up to the end of the active particles by
swapping it with the first <em>in</em>active one:</p>
<div class="codehilite"><pre><span class="kt">void</span> <span class="n">ParticleSystem</span><span class="o">::</span><span class="n">activateParticle</span><span class="p">(</span><span class="kt">int</span> <span class="n">index</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Shouldn't already be active!</span>
<span class="n">assert</span><span class="p">(</span><span class="n">index</span> <span class="o">>=</span> <span class="n">numActive_</span><span class="p">);</span>
<span class="c1">// Swap it with the first inactive particle</span>
<span class="c1">// right after the active ones.</span>
<span class="n">Particle</span> <span class="n">temp</span> <span class="o">=</span> <span class="n">particles_</span><span class="p">[</span><span class="n">numActive_</span><span class="p">];</span>
<span class="n">particles_</span><span class="p">[</span><span class="n">numActive_</span><span class="p">]</span> <span class="o">=</span> <span class="n">particles_</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="n">particles_</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">=</span> <span class="n">temp</span><span class="p">;</span>
<span class="c1">// Now there's one more.</span>
<span class="n">numActive_</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>To deactivate a particle, we just do the opposite:</p>
<div class="codehilite"><pre><span class="kt">void</span> <span class="n">ParticleSystem</span><span class="o">::</span><span class="n">deactivateParticle</span><span class="p">(</span><span class="kt">int</span> <span class="n">index</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Shouldn't already be inactive!</span>
<span class="n">assert</span><span class="p">(</span><span class="n">index</span> <span class="o"><</span> <span class="n">numActive_</span><span class="p">);</span>
<span class="c1">// There's one fewer.</span>
<span class="n">numActive_</span><span class="o">--</span><span class="p">;</span>
<span class="c1">// Swap it with the last active particle</span>
<span class="c1">// right before the inactive ones.</span>
<span class="n">Particle</span> <span class="n">temp</span> <span class="o">=</span> <span class="n">particles_</span><span class="p">[</span><span class="n">numActive_</span><span class="p">];</span>
<span class="n">particles_</span><span class="p">[</span><span class="n">numActive_</span><span class="p">]</span> <span class="o">=</span> <span class="n">particles_</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="n">particles_</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">=</span> <span class="n">temp</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Lots of programmers (myself included) have developed allergies to moving things
around in memory. Schlepping a bunch of bytes around <em>feels</em> heavyweight
compared to assigning a pointer. But when you add in the cost of
<em>traversing</em> that pointer, it turns out that our intuition is sometimes wrong.
In <span name="profile">some cases</span>, it’s cheaper to push things around in
memory if it helps you keep the cache full.</p>
<aside name="profile" style="top: 12588px;">
<p>This is your friendly reminder to <em>profile</em> when making these kinds of
decisions.</p>
</aside>
<p>There’s a neat consequence of keeping the particles <em>sorted</em> by their active
state — we don’t need to store an active flag in each particle at all. It can be
inferred by its position in the array and the <code>numActive_</code> counter. This makes
our particle objects smaller, which means we can pack more in our cache lines,
and that makes them even faster.</p>
<p>It’s not all rosy, though. As you can see from the API, we’ve lost a bit of
object orientation here. The <code>Particle</code> class no longer controls its own active
state. You can’t call some <code>activate()</code> method on it since it doesn’t know
its index. Instead, any code that wants to activate particles needs access to
the particle <em>system</em>.</p>
<p>In this case, I’m OK with <code>ParticleSystem</code> and <code>Particle</code> being tightly tied
like this. I think of them as a single <em>concept</em> spread across two physical
<em>classes</em>. It just means accepting the idea that particles are <em>only</em> meaningful
in the context of some particle system. Also, in this case it’s likely to be the
particle system that will be spawning and killing particles anyway.</p>
<h3><a href="http://gameprogrammingpatterns.com/data-locality.html#hotcold-splitting" name="hotcold-splitting">Hot/cold splitting</a></h3>
<p>OK, this is the last example of a simple technique for making your cache
happier. Say we’ve got an AI component for some game entity. It has some state
in it — the animation it’s currently playing, a goal position its heading
towards, energy level, etc. — stuff it checks and tweaks every single frame.
Something like:</p>
<div class="codehilite"><pre><span class="k">class</span> <span class="nc">AIComponent</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="kt">void</span> <span class="n">update</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">}</span>
<span class="nl">private:</span>
<span class="n">Animation</span><span class="o">*</span> <span class="n">animation_</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">energy_</span><span class="p">;</span>
<span class="n">Vector</span> <span class="n">goalPos_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>But it also has some state for rarer eventualities. It stores some data
describing what loot it drops when it has an unfortunate encounter with the
noisy end of a shotgun. That drop data is only used once in the entity’s
lifetime, right at its bitter end:</p>
<div class="codehilite"><pre><span class="k">class</span> <span class="nc">AIComponent</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="kt">void</span> <span class="n">update</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">}</span>
<span class="nl">private:</span>
<span class="c1">// Previous fields...</span>
<span class="n">LootType</span> <span class="n">drop_</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">minDrops_</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">maxDrops_</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">chanceOfDrop_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>Assuming we followed the earlier patterns, when we update these AI components,
we walk through a nice packed, contiguous array of data. But that data includes
all of the loot drop information. That makes each component bigger, which
reduces the number of components we can fit in a cache line. We get more cache misses
because the total memory we walk over is larger. The loot data gets pulled into
the cache for every component in every frame, even though we aren’t even touching
it.</p>
<p>The solution for this is called “hot/cold splitting”. The idea is to break our
data structure into two separate pieces. The first holds the “hot” data, the
state we need to touch every frame. The other piece is the “cold” data,
everything else that gets used less frequently.</p>
<p>The hot piece is the <em>main</em> AI component. It’s the one we need to use the most,
so we don’t want to chase a pointer to find it. The cold component can be off to
the side, but we still need to get to it, so we give the hot component a pointer
to it, like so:</p>
<div class="codehilite"><pre><span class="k">class</span> <span class="nc">AIComponent</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="c1">// Methods...</span>
<span class="nl">private:</span>
<span class="n">Animation</span><span class="o">*</span> <span class="n">animation_</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">energy_</span><span class="p">;</span>
<span class="n">Vector</span> <span class="n">goalPos_</span><span class="p">;</span>
<span class="n">LootDrop</span><span class="o">*</span> <span class="n">loot_</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">class</span> <span class="nc">LootDrop</span>
<span class="p">{</span>
<span class="k">friend</span> <span class="k">class</span> <span class="nc">AIComponent</span><span class="p">;</span>
<span class="n">LootType</span> <span class="n">drop_</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">minDrops_</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">maxDrops_</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">chanceOfDrop_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>Now when we’re walking the AI components every frame, the only data that gets
loaded into the cache is stuff we are actually processing (with the <span name="parallel">exception</span> of that one little pointer to the cold data).</p>
<aside name="parallel" style="top: 14583px;">
<p>We could conceivably ditch the pointer too by having parallel arrays for the hot
and cold components. Then we can find the cold AI data for a component since
both pieces will be at the same index in their respective arrays.</p>
</aside>
<p>You can see how this starts to get fuzzy, though. In my example here, it’s
pretty obvious which data should be hot and cold, but in a real game it’s
rarely so clear-cut.
What if you have fields that are used when an entity is in a certain mode but
not in others? What if entities use a certain chunk of data only when they’re in
certain parts of the level?</p>
<p>Doing this kind of optimization is somewhere between a black art and a rathole.
It’s easy to get sucked in and spend endless time pushing data around to see
what speed difference it makes. It will take practice to get a handle on where
to spend your effort.</p>
<h2><a href="http://gameprogrammingpatterns.com/data-locality.html#design-decisions" name="design-decisions">Design Decisions</a></h2>
<p>This pattern is really about a mindset — it’s getting you to think about your
data’s arrangement in memory as a key part of your game’s performance story. The
actual concrete design space is wide open. You can let <span name="dod">data
locality</span> affect your whole architecture, or maybe it’s just a localized
pattern you apply to a few core data structures.</p>
<p>The biggest questions you’ll need to answer are when and where you apply this
pattern, but here are a couple of others that may come up.</p>
<aside name="dod" style="top: 14982px;">
<p>Noel Llopis’ <a href="http://gamesfromwithin.com/data-oriented-design">famous article</a>
that got a lot more people thinking about designing games around cache usage
calls this “data-oriented design”.</p>
</aside>
<h3><a href="http://gameprogrammingpatterns.com/data-locality.html#how-do-you-handle-polymorphism" name="how-do-you-handle-polymorphism">How do you handle polymorphism?</a></h3>
<p>Up to this point, we’ve avoided subclassing and virtual methods. We have assumed we
have nice packed arrays of <em>homogenous</em> objects. That way, we know they’re all
the exact same size. But polymorphism and dynamic dispatch are useful tools
too. How do we reconcile this?</p>
<ul>
<li>
<p><strong>Don’t:</strong></p>
<p>The <span name="type">simplest</span> answer is to avoid subclassing,
or at least avoid it in places where you’re optimizing for cache usage.
Software engineer culture is drifting away from heavy use of inheritance
anyway.</p>
<aside name="type" style="top: 15327px;">
<p>One way to keep much of the flexibility of polymorphism without using
subclassing is through the <a href="http://gameprogrammingpatterns.com/type-object.html" class="pattern">Type
Object</a> pattern.</p>
</aside>
<ul>
<li>
<p><em>It’s safe and easy.</em> You know exactly what class you’re dealing with,
and all objects are obviously the same size.</p>
</li>
<li>
<p><em>It’s faster.</em> Dynamic dispatch means looking up the method in the
vtable and then traversing that pointer to get to the actual code. While
the cost of this varies widely across different hardware, there is <span name="cpp"><em>some</em></span> cost to dynamic dispatch.</p>
</li>
</ul>
<aside name="cpp" style="top: 15525px;">
<p>As usual, the only absolute is that there are no absolutes. In most cases, a
C++ compiler will require an indirection for a virtual method call. But in
<em>some</em> cases, the compiler may be able to do <em>devirtualization</em> and
statically call the right method if it knows what concrete type the receiver
is. Devirtualization is more common in just-in-time compilers for languages
like Java and JavaScript.</p>
</aside>
<ul>
<li><em>It’s inflexible.</em> Of course, the reason we use dynamic dispatch is
because it gives us a powerful way to vary behavior between objects. If
you want different entities in your game to have their own rendering
styles or their own special moves and attacks, virtual methods are a
proven way to model that. Having to instead stuff all of that code into
a single non-virtual method that does something like a big <code>switch</code> gets
messy quickly.</li>
</ul>
</li>
<li>
<p><strong>Use separate arrays for each type:</strong></p>
<p>We use polymorphism so that we can invoke behavior on an object whose type
we don’t know. In other words, we have a mixed bag of stuff, and we want each
object in there to do its own thing when we tell it to go.</p>
<p>But that raises the question of why mix the bag to begin with? Instead,
why not maintain separate, homogenous collections for each type?</p>
<ul>
<li>
<p><em>It keeps objects tightly packed.</em> Since each array only contains
objects of one class, there’s no padding or other weirdness.</p>
</li>
<li>
<p><em>You can statically dispatch.</em> Once you’ve got objects partitioned by
type, you don’t need polymorphism at all any more. You can use
regular, non-virtual method calls.</p>
</li>
<li>
<p><em>You have to keep track of a bunch of collections.</em> If you have a lot of
different object types, the overhead and complexity of maintaining
separate arrays for each can be a chore.</p>
</li>
<li>
<p><em>You have to be aware of every type</em>. Since you have to maintain
separate collections for each type, you can’t be decoupled from the
<em>set</em> of classes. Part of the magic of polymorphism is that it’s
<em>open-ended</em> — code that works with an interface can be completely
decoupled from the potentially large set of types that implement that
interface.</p>
</li>
</ul>
</li>
<li>
<p><strong>Use a collection of pointers:</strong></p>
<p>If you weren’t worried about caching, this is the natural solution. Just
have an array of pointers to some base class or interface type. You get all the
polymorphism you could want, and objects can be whatever size they want.</p>
<ul>
<li>
<p><em>It’s flexible.</em> The code that consumes the collection can work with
objects of any type as long as it supports the interface you care about.
It’s completely open-ended.</p>
</li>
<li>
<p><em>It’s less cache-friendly.</em> Of course, the whole reason we’re discussing
other options here is because this means cache-unfriendly pointer
indirection. But, remember, if this code isn’t performance-critical,
that’s probably OK.</p>
</li>
</ul>
</li>
</ul>
<h3><a href="http://gameprogrammingpatterns.com/data-locality.html#how-are-game-entities-defined" name="how-are-game-entities-defined">How are game entities defined?</a></h3>
<p>If you use this pattern in tandem with the <a href="http://gameprogrammingpatterns.com/component.html" class="pattern">Component</a> pattern, you’ll have nice contiguous arrays for
all of the components that make up your game entities. The game loop will be
iterating over those directly, so the object for the game entity itself is less
important, but it’s still useful in other parts of the codebase where you want
to work with a single conceptual “entity”.</p>
<p>The question then is how should it be represented? How does it keep track of its
components?</p>
<ul>
<li>
<p><strong>If game entities are classes with pointers to their components:</strong></p>
<p>This is what our first example looked like. It’s sort of the vanilla OOP
solution. You’ve got a class for <code>GameEntity</code>, and it has pointers to the
components it owns. Since they’re just pointers, it’s agnostic about where
and how those components are organized in memory.</p>
<ul>
<li>
<p><em>You can store components in contiguous arrays.</em> Since the game entity
doesn’t care where its components are, you can organize them in a nice
packed array to optimize iterating over them.</p>
</li>
<li>
<p><em>Given an entity, you can easily get to its components.</em> They’re just a
pointer indirection away.</p>
</li>
<li>
<p><em>Moving components in memory is hard.</em> When components get enabled or
disabled, you may want to move them around in the array to keep the
active ones up front and contiguous. If you move a component while the
entity has a raw pointer to it, though, that pointer gets broken if you
aren’t careful. You’ll have to make sure to update the entity’s pointer
at the same time.</p>
</li>
</ul>
</li>
<li>
<p><strong>If game entities are classes with IDs for their components:</strong></p>
<p>The challenge with raw pointers to components is that it makes it harder to
move them around in memory. You can address that by using something more
abstract: an ID or index that can be used to <em>look up</em> a component.</p>
<p>The actual semantics of the ID and lookup process are up to you. It could be
as simple as storing a unique ID in each component and walking the array, or
more complex like a hash table that maps IDs to their current index in the
component array.</p>
<ul>
<li>
<p><em>It’s more complex.</em> Your ID system doesn’t have to be rocket science,
but it’s still more work than a basic pointer. You’ll have to implement
and debug it, and there will be memory overhead for bookkeeping.</p>
</li>
<li>
<p><em>It’s slower</em>. It’s hard to beat traversing a raw pointer. There may be
some searching or hashing involved to get from an entity to one
of its components.</p>
</li>
<li>
<p><em>You’ll need access to the component “manager”.</em> The basic idea is that
you have some abstract ID that identifies a component. You can use it to
get a reference to the actual component object. But to do that, you need
to hand that ID to something that can actually find the component. That
will be the class that wraps your raw contiguous array of component
objects.</p>
<p>With raw pointers, if you have a game entity, you can find its