You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _parts/part8.md
+20-29Lines changed: 20 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
1
---
2
2
title: Part 8 - B-Tree Leaf Node Format
3
-
date: 2017-09-24
3
+
date: 2017-09-25
4
4
---
5
5
6
6
We're changing the format of our table from an unsorted array of rows to a B-Tree. This is a pretty big change that is going to take multiple articles to implement. By the end of this article, we'll define the layout of a leaf node and support inserting key/value pairs into a single-node tree. But first, let's recap the reasons for switching to a tree structure.
7
7
8
8
## Alternative Table Formats
9
9
10
-
With the current format, each page stores only rows (no metadata) so it is pretty space efficient. Insertion is also fast because we just append it to the end. However, finding a particular row can only be done by scanning the entire table. And if we want to delete a row, we have to move every row that comes after it to fill in the hole.
10
+
With the current format, each page stores only rows (no metadata) so it is pretty space efficient. Insertion is also fast because we just append to the end. However, finding a particular row can only be done by scanning the entire table. And if we want to delete a row, we have to to fill in the hole by movinvg every row that comes after it.
11
11
12
12
If we stored the table as an array, but kept rows sorted by id, we could use binary search to find a particular id. However, insertion would have the same problem as deletion where we have to move a lot of rows to make space.
13
13
@@ -30,9 +30,9 @@ Leaf nodes and internal nodes have different layouts. Let's make an enum to keep
30
30
+typedef enum NodeType_t NodeType;
31
31
```
32
32
33
-
Each node will correspond to one page. Internal nodes will point to their children by storing the page number that stores the child. The btree receives pointers to pages by asking the pager for a particular page number.
33
+
Each node will correspond to one page. Internal nodes will point to their children by storing the page number that stores the child. The btree asks the pager for a particular page number and gets back a pointer into the page cache. Pages are stored in the database file one after the other in order of page number.
34
34
35
-
Nodes need to store some metadata in a header at the beginning of the page. Both types of nodes will store what type of node they are, whether or not they are the root node, and a pointer to their parent (to allow finding a node's siblings). I define constants for the size and offset of every header field:
35
+
Nodes need to store some metadata in a header at the beginning of the page. Every node will store what type of node it is, whether or not it is the root node, and a pointer to its parent (to allow finding a node's siblings). I define constants for the size and offset of every header field:
36
36
37
37
```diff
38
38
+/*
@@ -79,13 +79,13 @@ The body of a leaf node is an array of cells. Each cell is a key followed by a v
Based on these constants, here's what the layout of leaf node looks like currently:
82
+
Based on these constants, here's what the layout of a leaf node looks like currently:
83
83
84
84
{% include image.html url="assets/images/leaf-node-format.png" description="Our leaf node format" %}
85
85
86
-
It's a little space inefficient to use an entire byte per boolean value in the header, but this makes it a little easier to write code to access those values.
86
+
It's a little space inefficient to use an entire byte per boolean value in the header, but this makes it easier to write code to access those values.
87
87
88
-
Also notice that there's some wasted space at the end. We store as many cells as we can after the header, but there is some space left over that can't hold an entire cell. We leave it empty to avoid splitting cells between nodes.
88
+
Also notice that there's some wasted space at the end. We store as many cells as we can after the header, but the leftover space can't hold an entire cell. We leave it empty to avoid splitting cells between nodes.
89
89
90
90
## Accessing Leaf Node Fields
91
91
@@ -96,7 +96,7 @@ The code to access keys, values and metadata all involve pointer arithmetic usin
@@ -167,7 +167,7 @@ Every node is going to take up exactly one page, even if it's not full. That mea
167
167
printf("Error closing db file.\n");
168
168
```
169
169
170
-
Now it makes more sense to store the number of pages in our database rather than the number of rows. The number of pages should be assoicated with the pager, object, not the table, since it's the number of pages used by the database, not a particular table.
170
+
Now it makes more sense to store the number of pages in our database rather than the number of rows. The number of pages should be assoicated with the pager object, not the table, since it's the number of pages used by the database, not a particular table. A btree is identified by its root node page number, so the table object needs to keep track of that.
171
171
172
172
```diff
173
173
const uint32_t PAGE_SIZE = 4096;
@@ -223,7 +223,7 @@ Now it makes more sense to store the number of pages in our database rather than
223
223
224
224
## Changes to the Cursor Object
225
225
226
-
A cursor represents a position in the table. When our table was a simple array of rows, that could be represented by a row number. Now that it's a tree, we identify a position by the page number of the node, and the cell number within that node.
226
+
A cursor represents a position in the table. When our table was a simple array of rows, we could access a row given just the row number. Now that it's a tree, we identify a position by the page number of the node, and the cell number within that node.
227
227
228
228
```diff
229
229
struct Cursor_t {
@@ -355,9 +355,9 @@ Next we'll make a function for inserting a key/value pair into a leaf node. It w
355
355
+
356
356
```
357
357
358
-
We holding off on implementing splitting for now, so we error if the node is full. Next we shift cells once space to the right to make room for the new cell. Then we write the new key/value into the empty space.
358
+
We haven't implemented splitting yet, so we error if the node is full. Next we shift cells one space to the right to make room for the new cell. Then we write the new key/value into the empty space.
359
359
360
-
Since we're assuming the tree has only one node for now, our `execute_insert()` function simply needs to call this helper method:
360
+
Since we assume the tree only has one node, our `execute_insert()` function simply needs to call this helper method:
Uh oh, we're still not storing rows in sorted order. You'll notice that `execute_insert()` inserts into the leaf node at the position returned by `table_end()`. So rows are stored in the order they were inserted, just like before.
497
495
498
-
## Current Limitations
496
+
## Next Time
499
497
500
498
This all might seem like a step backwards. Our database now stores fewer rows than it did before, and we're still storing rows in unsorted order. But like I said at the beginning, this is a big change and it's important to break it up into manageable steps.
501
499
@@ -576,7 +574,7 @@ Next time, we'll implement finding a record by primary key, and start storing ro
0 commit comments