You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 09-Locating-memory-leaks-with-Memcheck.md
+66-28Lines changed: 66 additions & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
It's the night before the Data Structures linked list assignment is due, and you are so ready.
6
6
Not only do you *understand* linked lists, you *are* a linked list.
7
-
You sit down at your terminal[^term], crack your knuckles, and (digitally) pen a masterpiece.
7
+
You sit down at your terminal,[^term] crack your knuckles, and (digitally) pen a masterpiece.
8
8
Never before in the history of computer science has there been such a succinct, elegant linked list implementation.
9
9
10
10
You compile it and run the test suite, expecting a beautiful `All Tests Passed!`.
@@ -13,18 +13,20 @@ I guess that's what you get for expecting a machine to appreciate beauty!
13
13
14
14
A more pragmatic reason for that segmentation fault is that somewhere your program has accessed memory it didn't have permission to access.
15
15
Segmentation faults are but one kind of memory safety bug.
16
-
Other memory bugs tend to be less immediately obvious, but they can introduce hard-to-find bugs.
16
+
You can also unintentionally overwrite other variables in your program,
17
+
or interpret one kind of variable as another (say, treating a class as an `int`)!
18
+
These sorts of memory bugs can cause your program to behave unpredictably and thus can be quite difficult to find.
17
19
In this chapter we will explore the different types of memory safety bugs you may encounter as well as tools for detecting and analyzing them.
18
20
19
21
There is another incentive for ruthlessly excising memory safety bugs: every kind of memory safety bug allows an attacker to use it to exploit
20
22
your program.
21
23
Many of these bugs allow for arbitrary code execution, where the attacker injects their own code into your program to be executed.
22
24
23
25
The first widespread internet worm, the [Morris worm](https://en.wikipedia.org/wiki/Morris_worm), used a memory safety bug to infect other machines in 1988.
24
-
Unfortunately, even after almost *thirty years*, memory safety bugs are incredibly common in popular software and many viruses still use memory safety bugs.
26
+
Unfortunately, *thirty years* later, memory safety bugs are still incredibly common in popular software and many viruses continue to exploit them.
25
27
For one example, the [WannaCry](https://en.wikipedia.org/wiki/WannaCry_ransomware_attack) and [Petya](https://en.wikipedia.org/wiki/2017_NotPetya_cyberattack)
26
-
viruses use a memory safety exploit called [EternalBlue](https://www.rapid7.com/db/modules/exploit/windows/smb/ms17_010_eternalblue)"allegedly" developed
27
-
by the NSA and released by "Russian" hackers early in 2017.
28
+
viruses use a memory safety exploit called [EternalBlue](https://www.rapid7.com/db/modules/exploit/windows/smb/ms17_010_eternalblue) developed
29
+
by the NSA[^allegedly] and released by Russian[^allegedly2] hackers early in 2017.
Your operating system provides two areas where memory can be allocated: the stack and the heap.
47
-
Memory on the stack is managed automatically[^joint], but any allocation only lives as long as the function that makes it.
49
+
Memory on the stack is managed automatically,[^joint] but any allocation only lives as long as the function that makes it is executing.
48
50
When a function is called, a *stack frame* is pushed onto the stack.
49
-
The stack frame holds variables as well as some bookkeeping information.
50
-
Crucially, this information includes the memory address of the code to return to once the function completes.
51
+
The stack frame holds variables declared in that function as well as some bookkeeping information.
52
+
Crucially, this bookkeeping information includes the memory address of the code to return to once the function completes.
51
53
When that function `return`s, its stack frame is popped off the stack and the associated memory is used for the stack frame of the next called function.
52
54
53
55
Memory allocated on the heap lives as long as you like it to; however, you have to manually allocate and free that memory using `new` and `delete`.[^free]
54
-
While the automatic management of the stack is nice, the freedom of being able to make memory allocations that live longer than the function that created them
56
+
While the automatic management of the stack is nice, the freedom to be able to make memory allocations that live longer than the function that created them
55
57
is essential, especially in large programs.
56
58
57
-
On modern Intel CPUs, the stack starts at a high memory address and grows downward, while the heap starts at a low memory address and grows upward.
59
+
To help you get the mental picture right,
60
+
on modern Intel CPUs, the stack starts at a high memory address and grows downward, while the heap starts at a low memory address and grows upward.
58
61
The addresses they start at vary from system to system, and are often randomized to make writing exploits more difficult.
59
62
60
63
### Uninitialized Values
@@ -75,7 +78,8 @@ that you know for sure are there.
75
78
In other words: initialize your dang variables!
76
79
77
80
Here's an example of uninitialized values, one on the stack and one on the heap:
78
-
```c++
81
+
82
+
~~~{.cpp .numberLines}
79
83
#include<iostream>
80
84
usingnamespacestd;
81
85
@@ -99,7 +103,7 @@ int main()
99
103
100
104
return 0;
101
105
}
102
-
```
106
+
~~~
103
107
104
108
Your first hint that this isn't right is from the compiler itself if you use the `-Wall` flag:[^linebreak]
105
109
@@ -117,9 +121,15 @@ Here GCC is smart enough to catch the uninitialized use of our stack-allocated v
117
121
118
122
Valgrind's Memcheck tool[^memcheck] can detect when your program uses a value uninitialized.
119
123
Memcheck can also track where the uninitialized value is created with the `--track-origins=yes` option.
120
-
If we run the above program (named `uninitialized-values`) through Valgrind (`valgrind --track-origins=yes uninitialized-values`), we get two messages.
124
+
To run the above program (named `uninitialized-value`) through Valgrind, do the following:
When we do this, we get messages about both of our variables.
121
131
122
-
The stack-allocated uninitialized value was accessed on line 8 and created on line 5:
132
+
The stack-allocated uninitialized value is accessed on line 8 and created on line 5:
123
133
```
124
134
==19296== Conditional jump or move depends on uninitialised value(s)
125
135
==19296== at 0x4008FE: main (uninitialized-value.cpp:8)
@@ -139,15 +149,15 @@ The heap-allocated uninitialized value was accessed on line 15 and created by a
139
149
==19296== by 0x400941: main (uninitialized-value.cpp:14)
140
150
```
141
151
142
-
Heap-allocated uninitialized values[^ptr] cannot be caught by the compiler -- you must use a tool like Valgrind to find them.
152
+
Heap-allocated uninitialized values[^ptr] cannot be caught by the compiler --- you must use a tool like Valgrind to find them.
143
153
144
154
Using `--track-origins=yes` is particularly handy when debugging heap uninitialized values as it is possible for something to be `new`'d in one function
145
155
and then not used until much later on.
146
156
147
157
### Unallocated or Out-of-Bounds Reads and Writes
148
158
149
159
Perhaps the most common memory bug is reading or writing to memory you ought not to.
150
-
This type of bug comes in a few flavors: you could use a pointer with an uninitialized value,
160
+
This type of bug comes in a few flavors: you could use a pointer with an uninitialized value (as opposed to a pointer *to* an uninitialized value),
151
161
or you could access outside of an array's bounds, or you could use a pointer after deleting the thing it points at.
152
162
153
163
Sometimes this kind of error causes a segmentation fault, but sometimes the memory being accessed happens to be something else your program
@@ -161,7 +171,7 @@ Once they have this, they can have the computer start executing whatever code th
161
171
This kind of exploit is known as a buffer overflow exploit.
162
172
163
173
You can detect these kinds of bugs using either Valgrind or Address Sanitizer (a.k.a. `asan`).
164
-
`asan` is part library, part compiler feature that instruments your code at compile time.
174
+
`asan` is part runtime library, part compiler feature that instruments your code at compile time.
165
175
Then when you run your program, the instrumentation tracks memory information much in the way Valgrind does.
166
176
`asan` is much faster than Valgrind, but requires special compiler flags to work.
`asan`'s output is somewhat terrifying to see, but the relevant parts look like this:[^edit]
198
219
199
220
```
200
221
==29210==ERROR: AddressSanitizer: stack-buffer-overflow on ↩
@@ -225,10 +246,12 @@ rather than wade through screenfulls of errors trying to figure out which one un
225
246
226
247
Pretty handy, eh? What more could you ask for!
227
248
249
+
#### Out-of-bounds Heap Access
250
+
228
251
Both Valgrind and `asan` can detect heap out-of-bounds accesses.
229
252
Here is a small sample program that demonstrates an out-of-bounds write:
230
253
231
-
```c++
254
+
```{.cpp .numberLines}
232
255
#include<iostream>
233
256
234
257
intmain()
@@ -279,12 +302,14 @@ The write itself occurred on line 9.
279
302
Furthermore, they show that the write happened 0 bytes to the right[^chickens] (in other words, after the end of) our allocated chunk,
280
303
indicating that we are writing one index past the end of the array.
281
304
305
+
#### Use After Free
306
+
282
307
Finally, let's see an example of a use-after-free.
283
308
This type of bug is exploitable by means similar to using an uninitialized value, but it is usually far easier to control
284
309
the contents of memory for a use-after-free bug.[^exploit]
285
-
Like out-of-bounds accesses, this type of bug can go undetected; the below example appears to work, even though it is incorrect!
310
+
Like out-of-bounds accesses, this type of bug can go undetected if you don't check for it; the below example appears to work, even though it is incorrect!
286
311
287
-
```c++
312
+
```{.cpp .numberLines}
288
313
#include<iostream>
289
314
290
315
intmain()
@@ -341,9 +366,13 @@ previously allocated by thread T0 here:
341
366
#2 0x7f7c327c682f in __libc_start_main
342
367
```
343
368
344
-
Both outputs show that a 4-byte (i.e., `int`) read happened 4 bytes (i.e., at index 1) inside our block of 20 bytes
369
+
Both outputs show that a 4-byte (the size of an `int`) read happened 4 bytes (so, at index 1) inside the block of 20 bytes
345
370
that is our array of 5 `int`s.
346
371
372
+
Invalid read or write bugs can have a number of fixes.
373
+
Sometimes they're as simple as adding a bounds check somewhere to make sure you don't write off the end of an array.
374
+
Other times, you'll have to think carefully about where your code went astray --- see the Debugging chapter for more advice on this.
375
+
347
376
### Mismatched and Double Deletes
348
377
349
378
Mismatched deletes occur when you use `delete` to delete an array or `delete []` to delete a non-array.
@@ -392,14 +421,15 @@ Both identify where the delete and matching allocation occurred (here, on lines
392
421
You can tell what the exact mismatch is by looking ath the operators called by the deletion and allocation lines.
393
422
In this example, `operator delete` is called to delete the allocation, but `operator new[]` is called to allocate it.
394
423
424
+
A double-delete occurs when you delete the same block of memory twice.
395
425
Double deletes may seem innocuous, but they can be easily turned into a use-after-free bug.
396
426
This is because freed memory is usually re-used in future allocations.
397
427
So deleting something, then allocating a second thing, then deleting the first thing again results in
398
428
the second thing being deleted!
399
429
Any future uses of the second thing then become a use-after-free problem, and attempting to properly clean up
400
430
that second allocation brings on a double delete.
401
431
For example,
402
-
```c++
432
+
```{.cpp .numberLines}
403
433
#include<iostream>
404
434
405
435
intmain()
@@ -459,6 +489,8 @@ previously allocated by thread T0 here:
459
489
Both show the location of the allocation and the first delete.
460
490
Typically, this kind of bug arises when you don't properly keep track of whether
461
491
a pointer has been `delete`d yet.
492
+
Sometimes a quick fix for this is to set your pointers to `NULL` after you call `delete`.
493
+
Other times, you'll get a double-delete if you forget to write a copy constructor for a class (or write a buggy one).
462
494
463
495
### Memory Leaks
464
496
@@ -473,7 +505,7 @@ The distinction is drawn because typically indirect memory leaks occur due to no
473
505
474
506
Both Valgrind and Address Sanitizer can detect memory leaks.
475
507
Let's look at a simple example that has one directly leaked block and one indirectly leaked block:
- [Paper on Address Sanitizer](https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf)
588
624
589
625
[^term]: And the computer it is running on, being that it's the 21st century and all.
590
-
[^joint]: It's a joint effort between how the compiler compiles your code and the operating system.
626
+
[^joint]: It's a joint effort between the compiler and the operating system.
591
627
[^free]: Or if you're writing C, with the `malloc` and `free` functions.
592
628
[^random]: It's not even a good source of random numbers, unless you like random numbers that aren't very random.
593
-
[^always]: It doesn't always do this because statically analyzing software (i.e., at compile time) is Really Hard, but it still catches some stuff.
629
+
[^always]: It doesn't always do this because statically analyzing software (i.e., at compile time) is Really Hard, but it still catches most obvious things.
594
630
[^memcheck]: Valgrind has a whole bunch of tools included, but it runs the Memcheck tool by default.
595
631
We'll see some other Valgrind tools in future chapters of this book.
596
632
[^ptr]: And more generally, any uninitialized value being accessed through a pointer.
@@ -601,9 +637,11 @@ Don't be concerned if the output you see is slightly different from what is prin
601
637
It's best to make a special `asan` makefile target that turns on the relevant compiler flags.
602
638
[^chickens]: Did you know that chickens also visually organize smaller quantities on the left and larger quantities on the right?
603
639
[^exploit]: Since this is not a book on exploiting software, we won't go into further detail; writing exploits is its own universe of rabbit holes.
604
-
[^implementation]: The implementation of `delete []` isn't specified, but the size of the allocation is stored somewhere;
640
+
[^implementation]: The implementation of `delete []` isn't specified, but the size of the allocation has to be stored somewhere;
605
641
depending on where it is stored, various Bad Things can happen if you try to `delete []` something that wasn't intended to be.
606
642
[^lazy]: It's not fair to say that the runtime developers are lazy, though.
607
643
There are some technical difficulties with freeing this memory, and since it is in use up until your program exits anyway,
608
644
there is little benefit to going to the effort of freeing it since the operating system deallocates it once your program exits anyway.
609
645
[^llvm]: This requires `llvm` to be installed. Also, depending on the system you are running, you may need to append a version number, e.g., ``export ASAN_SYMBOLIZER_PATH=`which llvm-symbolizer-3.9` ``
0 commit comments