Skip to content

Commit 77f59c0

Browse files
Merge pull request #14 from LearnYouSomeComputer/ch9
Chapter 9 edits
2 parents 942e9ed + a9d63c1 commit 77f59c0

File tree

1 file changed

+66
-28
lines changed

1 file changed

+66
-28
lines changed

09-Locating-memory-leaks-with-Memcheck.md

Lines changed: 66 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
It's the night before the Data Structures linked list assignment is due, and you are so ready.
66
Not only do you *understand* linked lists, you *are* a linked list.
7-
You sit down at your terminal[^term], crack your knuckles, and (digitally) pen a masterpiece.
7+
You sit down at your terminal,[^term] crack your knuckles, and (digitally) pen a masterpiece.
88
Never before in the history of computer science has there been such a succinct, elegant linked list implementation.
99

1010
You compile it and run the test suite, expecting a beautiful `All Tests Passed!`.
@@ -13,18 +13,20 @@ I guess that's what you get for expecting a machine to appreciate beauty!
1313

1414
A more pragmatic reason for that segmentation fault is that somewhere your program has accessed memory it didn't have permission to access.
1515
Segmentation faults are but one kind of memory safety bug.
16-
Other memory bugs tend to be less immediately obvious, but they can introduce hard-to-find bugs.
16+
You can also unintentionally overwrite other variables in your program,
17+
or interpret one kind of variable as another (say, treating a class as an `int`)!
18+
These sorts of memory bugs can cause your program to behave unpredictably and thus can be quite difficult to find.
1719
In this chapter we will explore the different types of memory safety bugs you may encounter as well as tools for detecting and analyzing them.
1820

1921
There is another incentive for ruthlessly excising memory safety bugs: every kind of memory safety bug allows an attacker to use it to exploit
2022
your program.
2123
Many of these bugs allow for arbitrary code execution, where the attacker injects their own code into your program to be executed.
2224

2325
The first widespread internet worm, the [Morris worm](https://en.wikipedia.org/wiki/Morris_worm), used a memory safety bug to infect other machines in 1988.
24-
Unfortunately, even after almost *thirty years*, memory safety bugs are incredibly common in popular software and many viruses still use memory safety bugs.
26+
Unfortunately, *thirty years* later, memory safety bugs are still incredibly common in popular software and many viruses continue to exploit them.
2527
For one example, the [WannaCry](https://en.wikipedia.org/wiki/WannaCry_ransomware_attack) and [Petya](https://en.wikipedia.org/wiki/2017_NotPetya_cyberattack)
26-
viruses use a memory safety exploit called [EternalBlue](https://www.rapid7.com/db/modules/exploit/windows/smb/ms17_010_eternalblue) "allegedly" developed
27-
by the NSA and released by "Russian" hackers early in 2017.
28+
viruses use a memory safety exploit called [EternalBlue](https://www.rapid7.com/db/modules/exploit/windows/smb/ms17_010_eternalblue) developed
29+
by the NSA[^allegedly] and released by Russian[^allegedly2] hackers early in 2017.
2830

2931
<!--
3032
Image is in the public domain.
@@ -44,17 +46,18 @@ source: https://commons.wikimedia.org/wiki/File:Smokey3.jpg
4446
### The Stack and The Heap
4547

4648
Your operating system provides two areas where memory can be allocated: the stack and the heap.
47-
Memory on the stack is managed automatically[^joint], but any allocation only lives as long as the function that makes it.
49+
Memory on the stack is managed automatically,[^joint] but any allocation only lives as long as the function that makes it is executing.
4850
When a function is called, a *stack frame* is pushed onto the stack.
49-
The stack frame holds variables as well as some bookkeeping information.
50-
Crucially, this information includes the memory address of the code to return to once the function completes.
51+
The stack frame holds variables declared in that function as well as some bookkeeping information.
52+
Crucially, this bookkeeping information includes the memory address of the code to return to once the function completes.
5153
When that function `return`s, its stack frame is popped off the stack and the associated memory is used for the stack frame of the next called function.
5254

5355
Memory allocated on the heap lives as long as you like it to; however, you have to manually allocate and free that memory using `new` and `delete`.[^free]
54-
While the automatic management of the stack is nice, the freedom of being able to make memory allocations that live longer than the function that created them
56+
While the automatic management of the stack is nice, the freedom to be able to make memory allocations that live longer than the function that created them
5557
is essential, especially in large programs.
5658

57-
On modern Intel CPUs, the stack starts at a high memory address and grows downward, while the heap starts at a low memory address and grows upward.
59+
To help you get the mental picture right,
60+
on modern Intel CPUs, the stack starts at a high memory address and grows downward, while the heap starts at a low memory address and grows upward.
5861
The addresses they start at vary from system to system, and are often randomized to make writing exploits more difficult.
5962

6063
### Uninitialized Values
@@ -75,7 +78,8 @@ that you know for sure are there.
7578
In other words: initialize your dang variables!
7679

7780
Here's an example of uninitialized values, one on the stack and one on the heap:
78-
```c++
81+
82+
~~~{.cpp .numberLines}
7983
#include<iostream>
8084
using namespace std;
8185

@@ -99,7 +103,7 @@ int main()
99103

100104
return 0;
101105
}
102-
```
106+
~~~
103107

104108
Your first hint that this isn't right is from the compiler itself if you use the `-Wall` flag:[^linebreak]
105109

@@ -117,9 +121,15 @@ Here GCC is smart enough to catch the uninitialized use of our stack-allocated v
117121

118122
Valgrind's Memcheck tool[^memcheck] can detect when your program uses a value uninitialized.
119123
Memcheck can also track where the uninitialized value is created with the `--track-origins=yes` option.
120-
If we run the above program (named `uninitialized-values`) through Valgrind (`valgrind --track-origins=yes uninitialized-values`), we get two messages.
124+
To run the above program (named `uninitialized-value`) through Valgrind, do the following:
125+
126+
~~~
127+
$ valgrind --track-origins=yes ./uninitialized-value
128+
~~~
129+
130+
When we do this, we get messages about both of our variables.
121131

122-
The stack-allocated uninitialized value was accessed on line 8 and created on line 5:
132+
The stack-allocated uninitialized value is accessed on line 8 and created on line 5:
123133
```
124134
==19296== Conditional jump or move depends on uninitialised value(s)
125135
==19296== at 0x4008FE: main (uninitialized-value.cpp:8)
@@ -139,15 +149,15 @@ The heap-allocated uninitialized value was accessed on line 15 and created by a
139149
==19296== by 0x400941: main (uninitialized-value.cpp:14)
140150
```
141151

142-
Heap-allocated uninitialized values[^ptr] cannot be caught by the compiler -- you must use a tool like Valgrind to find them.
152+
Heap-allocated uninitialized values[^ptr] cannot be caught by the compiler --- you must use a tool like Valgrind to find them.
143153

144154
Using `--track-origins=yes` is particularly handy when debugging heap uninitialized values as it is possible for something to be `new`'d in one function
145155
and then not used until much later on.
146156

147157
### Unallocated or Out-of-Bounds Reads and Writes
148158

149159
Perhaps the most common memory bug is reading or writing to memory you ought not to.
150-
This type of bug comes in a few flavors: you could use a pointer with an uninitialized value,
160+
This type of bug comes in a few flavors: you could use a pointer with an uninitialized value (as opposed to a pointer *to* an uninitialized value),
151161
or you could access outside of an array's bounds, or you could use a pointer after deleting the thing it points at.
152162

153163
Sometimes this kind of error causes a segmentation fault, but sometimes the memory being accessed happens to be something else your program
@@ -161,7 +171,7 @@ Once they have this, they can have the computer start executing whatever code th
161171
This kind of exploit is known as a buffer overflow exploit.
162172

163173
You can detect these kinds of bugs using either Valgrind or Address Sanitizer (a.k.a. `asan`).
164-
`asan` is part library, part compiler feature that instruments your code at compile time.
174+
`asan` is part runtime library, part compiler feature that instruments your code at compile time.
165175
Then when you run your program, the instrumentation tracks memory information much in the way Valgrind does.
166176
`asan` is much faster than Valgrind, but requires special compiler flags to work.
167177

@@ -174,9 +184,12 @@ export ASAN_OPTIONS=symbolize=1
174184
```
175185

176186
Let's look at some examples of this class of bugs and the relevant Valgrind and `asan` output.
187+
188+
#### Out-of-bounds Stack Access
189+
177190
First up, out-of-bounds accesses on a stack-allocated array:
178191

179-
```c++
192+
```{.cpp .numberLines}
180193
#include<iostream>
181194

182195
int main()
@@ -192,9 +205,17 @@ int main()
192205
If you run this program normally, it probably won't crash, and in fact it will probably behave how you expect.
193206
This is a mirage. It only works because whatever is one `int` after `array` in `main()`'s stack frame happens to not be used again.
194207
This illustrates how important it is to check that you do not have these bugs!
208+
195209
Even worse, Valgrind does not detect this out-of-bounds access!
210+
However, `asan` does.
211+
Compile the program with this command:
196212

197-
However, `asan` does. Its output is somewhat terrifying to see, but the relevant parts look like this:[^edit]
213+
~~~
214+
$ g++ -g -fsanitize=address -fno-omit-frame-pointer ↩
215+
invalid-stack.cpp -o invalid-stack
216+
~~~
217+
218+
`asan`'s output is somewhat terrifying to see, but the relevant parts look like this:[^edit]
198219

199220
```
200221
==29210==ERROR: AddressSanitizer: stack-buffer-overflow on ↩
@@ -225,10 +246,12 @@ rather than wade through screenfulls of errors trying to figure out which one un
225246

226247
Pretty handy, eh? What more could you ask for!
227248

249+
#### Out-of-bounds Heap Access
250+
228251
Both Valgrind and `asan` can detect heap out-of-bounds accesses.
229252
Here is a small sample program that demonstrates an out-of-bounds write:
230253

231-
```c++
254+
```{.cpp .numberLines}
232255
#include<iostream>
233256

234257
int main()
@@ -279,12 +302,14 @@ The write itself occurred on line 9.
279302
Furthermore, they show that the write happened 0 bytes to the right[^chickens] (in other words, after the end of) our allocated chunk,
280303
indicating that we are writing one index past the end of the array.
281304

305+
#### Use After Free
306+
282307
Finally, let's see an example of a use-after-free.
283308
This type of bug is exploitable by means similar to using an uninitialized value, but it is usually far easier to control
284309
the contents of memory for a use-after-free bug.[^exploit]
285-
Like out-of-bounds accesses, this type of bug can go undetected; the below example appears to work, even though it is incorrect!
310+
Like out-of-bounds accesses, this type of bug can go undetected if you don't check for it; the below example appears to work, even though it is incorrect!
286311

287-
```c++
312+
```{.cpp .numberLines}
288313
#include<iostream>
289314

290315
int main()
@@ -341,9 +366,13 @@ previously allocated by thread T0 here:
341366
#2 0x7f7c327c682f in __libc_start_main
342367
```
343368

344-
Both outputs show that a 4-byte (i.e., `int`) read happened 4 bytes (i.e., at index 1) inside our block of 20 bytes
369+
Both outputs show that a 4-byte (the size of an `int`) read happened 4 bytes (so, at index 1) inside the block of 20 bytes
345370
that is our array of 5 `int`s.
346371

372+
Invalid read or write bugs can have a number of fixes.
373+
Sometimes they're as simple as adding a bounds check somewhere to make sure you don't write off the end of an array.
374+
Other times, you'll have to think carefully about where your code went astray --- see the Debugging chapter for more advice on this.
375+
347376
### Mismatched and Double Deletes
348377

349378
Mismatched deletes occur when you use `delete` to delete an array or `delete []` to delete a non-array.
@@ -392,14 +421,15 @@ Both identify where the delete and matching allocation occurred (here, on lines
392421
You can tell what the exact mismatch is by looking ath the operators called by the deletion and allocation lines.
393422
In this example, `operator delete` is called to delete the allocation, but `operator new[]` is called to allocate it.
394423

424+
A double-delete occurs when you delete the same block of memory twice.
395425
Double deletes may seem innocuous, but they can be easily turned into a use-after-free bug.
396426
This is because freed memory is usually re-used in future allocations.
397427
So deleting something, then allocating a second thing, then deleting the first thing again results in
398428
the second thing being deleted!
399429
Any future uses of the second thing then become a use-after-free problem, and attempting to properly clean up
400430
that second allocation brings on a double delete.
401431
For example,
402-
```c++
432+
```{.cpp .numberLines}
403433
#include<iostream>
404434

405435
int main()
@@ -459,6 +489,8 @@ previously allocated by thread T0 here:
459489
Both show the location of the allocation and the first delete.
460490
Typically, this kind of bug arises when you don't properly keep track of whether
461491
a pointer has been `delete`d yet.
492+
Sometimes a quick fix for this is to set your pointers to `NULL` after you call `delete`.
493+
Other times, you'll get a double-delete if you forget to write a copy constructor for a class (or write a buggy one).
462494

463495
### Memory Leaks
464496

@@ -473,7 +505,7 @@ The distinction is drawn because typically indirect memory leaks occur due to no
473505

474506
Both Valgrind and Address Sanitizer can detect memory leaks.
475507
Let's look at a simple example that has one directly leaked block and one indirectly leaked block:
476-
```c++
508+
```{.cpp .numberLines}
477509
struct List
478510
{
479511
int value;
@@ -547,6 +579,10 @@ SUMMARY: AddressSanitizer: 32 byte(s) leaked in 2 allocation(s).
547579

548580
As opposed to Valgrind, Address Sanitizer shows where both directly and indirectly leaked blocks are allocated.
549581

582+
Sometimes memory leaks come from a missing or buggy destructor.
583+
Other times, they happen when you accidentally overwrite a pointer, say in a buggy `insert()` function.
584+
These can be more difficult to track down; again, see the Debugging chapter for advice.
585+
550586
\newpage
551587
## Questions
552588
Name: `______________________________`
@@ -587,10 +623,10 @@ export ASAN_OPTIONS=symbolize=1
587623
- [Paper on Address Sanitizer](https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf)
588624
589625
[^term]: And the computer it is running on, being that it's the 21st century and all.
590-
[^joint]: It's a joint effort between how the compiler compiles your code and the operating system.
626+
[^joint]: It's a joint effort between the compiler and the operating system.
591627
[^free]: Or if you're writing C, with the `malloc` and `free` functions.
592628
[^random]: It's not even a good source of random numbers, unless you like random numbers that aren't very random.
593-
[^always]: It doesn't always do this because statically analyzing software (i.e., at compile time) is Really Hard, but it still catches some stuff.
629+
[^always]: It doesn't always do this because statically analyzing software (i.e., at compile time) is Really Hard, but it still catches most obvious things.
594630
[^memcheck]: Valgrind has a whole bunch of tools included, but it runs the Memcheck tool by default.
595631
We'll see some other Valgrind tools in future chapters of this book.
596632
[^ptr]: And more generally, any uninitialized value being accessed through a pointer.
@@ -601,9 +637,11 @@ Don't be concerned if the output you see is slightly different from what is prin
601637
It's best to make a special `asan` makefile target that turns on the relevant compiler flags.
602638
[^chickens]: Did you know that chickens also visually organize smaller quantities on the left and larger quantities on the right?
603639
[^exploit]: Since this is not a book on exploiting software, we won't go into further detail; writing exploits is its own universe of rabbit holes.
604-
[^implementation]: The implementation of `delete []` isn't specified, but the size of the allocation is stored somewhere;
640+
[^implementation]: The implementation of `delete []` isn't specified, but the size of the allocation has to be stored somewhere;
605641
depending on where it is stored, various Bad Things can happen if you try to `delete []` something that wasn't intended to be.
606642
[^lazy]: It's not fair to say that the runtime developers are lazy, though.
607643
There are some technical difficulties with freeing this memory, and since it is in use up until your program exits anyway,
608644
there is little benefit to going to the effort of freeing it since the operating system deallocates it once your program exits anyway.
609645
[^llvm]: This requires `llvm` to be installed. Also, depending on the system you are running, you may need to append a version number, e.g., ``export ASAN_SYMBOLIZER_PATH=`which llvm-symbolizer-3.9` ``
646+
[^allegedly]: Allegedly
647+
[^allegedly2]: Allegedly

0 commit comments

Comments
 (0)