Skip to content

Commit ad33856

Browse files
committed
Write part 1.
1 parent fb5c3bd commit ad33856

File tree

5 files changed

+228
-2
lines changed

5 files changed

+228
-2
lines changed

Diff for: Gemfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
source 'https://rubygems.org'
22
gem 'github-pages', group: :jekyll_plugins
3-
gem "jekyll-theme-minimal"
3+
gem "jekyll-theme-minimal"

Diff for: _includes/image.html

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<table class="image">
2+
<caption align="bottom">{{ include.description }}</caption>
3+
<tr><td><img src="{{ include.url }}" alt="{{ include.description }}"/></td></tr>
4+
</table>

Diff for: assets/css/style.scss

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
---
3+
4+
@import "{{ site.theme }}";
5+
6+
// Everything below this line will override the default template styles
7+
8+
a:hover, a:focus { color: #069; font-weight: normal; text-decoration: underline; }
9+
10+
table.image td {
11+
text-align: center;
12+
}
13+
14+
body {
15+
color: #323232;
16+
}

Diff for: assets/images/arch1.gif

7.82 KB
Loading

Diff for: part1.md

+207-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,209 @@
11
# Part 1 - Introduction and Setting up the REPL
22

3-
Hello World
3+
As a web developer, I use relational databases every day at my job, but they were always a black box to me. Some questions I had:
4+
- What format is data saved in? (in memory and on disk)
5+
- When does it move from memory to disk?
6+
- Why can there only be one primary key per table?
7+
- How does rolling back a transaction work?
8+
- How are indexes formatted?
9+
- When and how does a full table scan happen?
10+
- What format is a prepared statement save in?
11+
12+
In other words, how does a database _work_?
13+
14+
To figure things out, I started writing a database from scratch. It's modeled off sqlite because it is designed to be small with fewer features than MySQL or PostgreSQL, so I have a better hope of understanding it. The entire database is stored in a single file!
15+
16+
# Sqlite
17+
18+
There's lots of documentation of [sqlite internals on their website](https://www.sqlite.org/arch.html), plus I've got a copy of [SQLite Database System: Design and Implementation](https://play.google.com/store/books/details?id=9Z6IQQnX1JEC).
19+
20+
{% include image.html url="/assets/images/arch1.gif" description="sqlite architecture (https://www.sqlite.org/zipvfs/doc/trunk/www/howitworks.wiki)" %}
21+
22+
A query goes through a chain of components in order to retrieve or modify data. The _front-end_ consists of the:
23+
- tokenizer
24+
- parser
25+
- code generator
26+
27+
The input to the front-end is a SQL query. the output is sqlite virtual machine bytecode (essentially a compiled program that can operate on the database).
28+
29+
The _back-end_ consists of the:
30+
- virtual machine
31+
- B-tree
32+
- pager
33+
- os interface
34+
35+
The **virtual machine** takes bytecode generated by the front-end as instructions. It can then issue operations on one or more tables or indexes, each of which is stored in a data structure called a B-tree. The VM is essentially a big switch statement on the type the bytecode instruction.
36+
37+
Each **B-tree** consists of many nodes. Each node is one page in length. The B-tree can retrieve a page from disk or save it back to disk by issuing commands to the pager.
38+
39+
The **pager** receives commands to read or write pages of data. It is responsible for reading/writing at appropriate offsets in the database file. It also keeps a cache of recently-accessed pages in memory, and determines when those pages need to be written back to disk.
40+
41+
The **os interface** is the layer that differs depending on which operating system sqlite was compiled for. In this tutorial, I'm not going to support multiple platforms.
42+
43+
[A journey of a thousand miles begins with a single step](https://en.wiktionary.org/wiki/a_journey_of_a_thousand_miles_begins_with_a_single_step), so let's start with something a little more straightforward: the REPL.
44+
45+
## Making a Simple REPL
46+
47+
Sqlite starts a read-execute-print loop when you start it from the command line:
48+
49+
```shell
50+
~ sqlite3
51+
SQLite version 3.16.0 2016-11-04 19:09:39
52+
Enter ".help" for usage hints.
53+
Connected to a transient in-memory database.
54+
Use ".open FILENAME" to reopen on a persistent database.
55+
sqlite> create table users (id int, username varchar(255), email varchar(255));
56+
sqlite> .tables
57+
users
58+
sqlite> .exit
59+
~
60+
```
61+
62+
To do that, our main function will have an infinite loop that prints the prompt, gets a line of input, then processes that line of input:
63+
64+
```c
65+
int main(int argc, char* argv[]) {
66+
InputBuffer* input_buffer = new_input_buffer();
67+
while (true) {
68+
print_prompt();
69+
read_input(input_buffer);
70+
71+
if (strcmp(input_buffer->buffer, ".exit") == 0) {
72+
exit(EXIT_SUCCESS);
73+
} else {
74+
printf("Unrecognized command '%s'.\n", input_buffer->buffer);
75+
}
76+
}
77+
}
78+
```
79+
80+
We'll define `InputBuffer` as a small wrapper around the state we need to store to interact with [getline()](http://man7.org/linux/man-pages/man3/getline.3.html). (More on that in a minute)
81+
```c
82+
struct InputBuffer_t {
83+
char* buffer;
84+
size_t buffer_length;
85+
ssize_t input_length;
86+
};
87+
typedef struct InputBuffer_t InputBuffer;
88+
89+
InputBuffer* new_input_buffer() {
90+
InputBuffer* input_buffer = malloc(sizeof(InputBuffer));
91+
input_buffer->buffer = NULL;
92+
input_buffer->buffer_length = 0;
93+
input_buffer->input_length = 0;
94+
95+
return input_buffer;
96+
}
97+
```
98+
99+
Next, `print_prompt()` prints a prompt to the user. We do this before reading each line of input.
100+
101+
```c
102+
void print_prompt() { printf("db > "); }
103+
```
104+
105+
To read a line of input, use [getline()](http://man7.org/linux/man-pages/man3/getline.3.html):
106+
```c
107+
ssize_t getline(char **lineptr, size_t *n, FILE *stream);
108+
```
109+
`linepter` : a pointer to the variable we use to point to the buffer containing the read line.
110+
111+
`n` : a pointer to the variable we use to save the size of allocated buffer.
112+
113+
`stream` : the input stream to read from. We'll be reading from standard input.
114+
115+
`return value` : the number of bytes read, which may be less than the size of the buffer.
116+
117+
We tell `getline` to store the read line in `input_buffer->buffer` and the size of the allocated buffer in `input_buffer->buffer_length`. We store the return value in `input_buffer->input_length`.
118+
119+
`buffer` starts as null, so `getline` allocates enough memory to hold the line of input and makes `buffer` point to it.
120+
121+
```c
122+
void read_input(InputBuffer* input_buffer) {
123+
ssize_t bytes_read =
124+
getline(&(input_buffer->buffer), &(input_buffer->buffer_length), stdin);
125+
126+
if (bytes_read <= 0) {
127+
printf("Error reading input\n");
128+
exit(EXIT_FAILURE);
129+
}
130+
131+
// Ignore trailing newline
132+
input_buffer->input_length = bytes_read - 1;
133+
input_buffer->buffer[bytes_read - 1] = 0;
134+
}
135+
```
136+
137+
Finally, we parse and execute the command. There is only one recognized command right now : `.exit`, which terminates the program. Otherwise we print an error message and continue the loop.
138+
139+
```c
140+
if (strcmp(input_buffer->buffer, ".exit") == 0) {
141+
exit(EXIT_SUCCESS);
142+
} else {
143+
printf("Unrecognized command '%s'.\n", input_buffer->buffer);
144+
}
145+
```
146+
147+
Let's try it out!
148+
```shell
149+
~ ./db
150+
db > .tables
151+
Unrecognized command '.tables'.
152+
db > .exit
153+
~
154+
```
155+
156+
Alright, we've got a working REPL. In the next part we'll try creating and retrieving records in-memory. Meanwhile, here's the entire program from this part:
157+
158+
```c
159+
#include <stdbool.h>
160+
#include <stdio.h>
161+
#include <stdlib.h>
162+
#include <string.h>
163+
164+
struct InputBuffer_t {
165+
char* buffer;
166+
size_t buffer_length;
167+
ssize_t input_length;
168+
};
169+
typedef struct InputBuffer_t InputBuffer;
170+
171+
InputBuffer* new_input_buffer() {
172+
InputBuffer* input_buffer = malloc(sizeof(InputBuffer));
173+
input_buffer->buffer = NULL;
174+
input_buffer->buffer_length = 0;
175+
input_buffer->input_length = 0;
176+
177+
return input_buffer;
178+
}
179+
180+
void print_prompt() { printf("db > "); }
181+
182+
void read_input(InputBuffer* input_buffer) {
183+
ssize_t bytes_read =
184+
getline(&(input_buffer->buffer), &(input_buffer->buffer_length), stdin);
185+
186+
if (bytes_read <= 0) {
187+
printf("Error reading input\n");
188+
exit(EXIT_FAILURE);
189+
}
190+
191+
// Ignore trailing newline
192+
input_buffer->input_length = bytes_read - 1;
193+
input_buffer->buffer[bytes_read - 1] = 0;
194+
}
195+
196+
int main(int argc, char* argv[]) {
197+
InputBuffer* input_buffer = new_input_buffer();
198+
while (true) {
199+
print_prompt();
200+
read_input(input_buffer);
201+
202+
if (strcmp(input_buffer->buffer, ".exit") == 0) {
203+
exit(EXIT_SUCCESS);
204+
} else {
205+
printf("Unrecognized command '%s'.\n", input_buffer->buffer);
206+
}
207+
}
208+
}
209+
```

0 commit comments

Comments
 (0)