Skip to content

Commit 424ef50

Browse files
author
Daniel Kroening
authored
Merge pull request #1090 from karkhaz/kk-neu-documentation
CBMC documentation campaign
2 parents 4577401 + 58f6dde commit 424ef50

File tree

163 files changed

+4420
-14314
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

163 files changed

+4420
-14314
lines changed

doc/CPPLINT.cfg

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
exclude_files=.*

doc/architectural/cbmc-guide.md

Lines changed: 601 additions & 0 deletions
Large diffs are not rendered by default.

doc/architectural/front-page.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
CProver Documentation
2+
=====================
3+
4+
\author Kareem Khazem
5+
6+
These pages contain user tutorials, automatically-generated API
7+
documentation, and higher-level architectural overviews for the
8+
CProver codebase. Users can download CProver tools from the
9+
<a href="http://www.cprover.org/">CProver website</a>; contributors
10+
should use the <a href="https://github.com/diffblue/cbmc">repository</a>
11+
hosted on GitHub.
12+
13+
### For users:
14+
15+
* The \ref cprover-manual "CProver Manual" details the capabilities of
16+
CBMC and SATABS and describes how to install and use these tools. It
17+
also covers the underlying theory and prerequisite concepts behind how
18+
these tools work.
19+
20+
### For contributors:
21+
22+
* If you already know exactly what you're looking for, the API reference
23+
is generated from the codebase. You can search for classes and class
24+
members in the search bar at top-right or use one of the links in the
25+
sidebar.
26+
27+
* For higher-level architectural information, each of the pages under
28+
the "Modules" link in the sidebar gives an overview of a directory in
29+
the CProver codebase.
30+
31+
* The \ref module_cbmc "CBMC guided tour" is a good start for new
32+
contributors to CBMC. It describes the stages through which CBMC
33+
transforms source files into bug reports and counterexamples, linking
34+
to the relevant documentation for each stage.
35+
36+
* The \subpage cbmc-hacking "CBMC hacking HOWTO" helps new contributors
37+
to CProver to get their feet wet through a series of programming
38+
exercises---mostly modifying goto-instrument, and thus learning to
39+
manipulate the main data structures used within CBMC.
40+
41+
* The \subpage cbmc-guide "CBMC guide" is a single document describing
42+
the layout of the codebase and many of the important data structures.
43+
It probably contains more information than the module pages at the
44+
moment, but may be somewhat out-of-date.
45+
46+
\defgroup module_hidden _hidden

doc/architectural/howto.md

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
\ingroup module_hidden
2+
\page cbmc-hacking CBMC Hacking HOWTO
3+
4+
\author Kareem Khazem
5+
6+
This is an introduction to hacking on the `cprover` codebase. It is not
7+
intended as a user guide to `CBMC` or related tools. It is structured
8+
as a series of programming exercises that aim to acclimatise the reader
9+
to the basic data structures and workflow needed for contributing to
10+
`CBMC`.
11+
12+
13+
## Initial setup
14+
15+
Clone the [CBMC repository][cbmc-repo] and build it:
16+
17+
git clone https://github.com/diffblue/cbmc.git
18+
cd cbmc/src
19+
make minisat2-download
20+
make
21+
22+
Ensure that [graphviz][graphviz] is installed on your system (in
23+
particular, you should be able to run a program called `dot`). Install
24+
[Doxygen][doxygen] and generate doxygen documentation:
25+
26+
# In the src directory
27+
doxygen doxyfile
28+
# View the documentation in a web browser
29+
firefox doxy/html/index.html
30+
31+
If you've never used doxygen documentation before, get familiar with the
32+
layout. Open the generated HTML page in a web browser; search for the
33+
class `goto_programt` in the search bar, and jump to the documentation
34+
for that class; and read through the copious documentation.
35+
36+
The build writes executable programs into several of the source
37+
directories. In this tutorial, we'll be using binaries inside the
38+
`cbmc`, `goto-instrument`, and `goto-cc` directories. Add these
39+
directories to your `$PATH`:
40+
41+
# Assuming you cloned CBMC into ~/code
42+
export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH
43+
# Add to your shell's startup configuration file so that you don't have to run that command every time.
44+
echo 'export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH' >> .bashrc
45+
46+
Optional: install an image viewer that can read images on stdin.
47+
I use [feh][feh].
48+
49+
[cbmc-repo]: https://github.com/diffblue/cbmc/
50+
[doxygen]: http://www.stack.nl/~dimitri/doxygen/
51+
[graphviz]: http://www.graphviz.org/
52+
[feh]: https://feh.finalrewind.org/
53+
54+
55+
56+
## Whirlwind tour of the tools
57+
58+
CBMC's code is located under the `cbmc` directory. Even if you plan to
59+
contribute only to CBMC, it is important to be familiar with several
60+
other of cprover's auxiliary tools.
61+
62+
63+
### Compiling with `goto-cc`
64+
65+
There should be an executable file called `goto-cc` in the `goto-cc`
66+
directory; make a symbolic link to it called `goto-gcc`:
67+
68+
cd cbmc/src/goto-cc
69+
ln -s "$(pwd)/goto-cc" goto-gcc
70+
71+
Find or write a moderately-interesting C program; we'll call it `main.c`.
72+
Run the following commands:
73+
74+
goto-gcc -o main.goto main.c
75+
cc -o main.exe main.c
76+
77+
Invoke `./main.goto` and `./main.exe` and observe that they run identically.
78+
The version that was compiled with `goto-gcc` is larger, though:
79+
80+
du -hs *.{goto,exe}
81+
82+
Programs compiled with `goto-gcc` are mostly identical to their `clang`-
83+
or `gcc`-compiled counterparts, but contain additional object code in
84+
cprover's intermediate representation. The intermediate representation
85+
is (informally) called a *goto-program*.
86+
87+
88+
### Viewing goto-programs
89+
90+
`goto-instrument` is a Swiss army knife for viewing goto-programs and
91+
performing single program analyses on them. Run the following command:
92+
93+
goto-instrument --show-goto-functions main.goto
94+
95+
Many of the instructions in the goto-program intermediate representation
96+
are similar to their C counterparts. `if` and `goto` statements replace
97+
structured programming constructs.
98+
99+
Find or write a small C program (2 or 3 functions, each containing a few
100+
varied statements). Compile it using `goto-gcc` as above into an object
101+
file called `main`. If you installed `feh`, try the following command
102+
to dump a control-flow graph:
103+
104+
goto-instrument --dot main | tail -n +2 | dot -Tpng | feh -
105+
106+
If you didn't install `feh`, you can write the diagram to the file and
107+
then view it:
108+
109+
goto-instrument --dot main | tail -n +2 | dot -Tpng > main.png
110+
Now open main.png with an image viewer
111+
112+
(the invocation of `tail` is used to filter out the first line of
113+
`goto-instrument` output. If `goto-instrument` writes more or less
114+
debug output by the time you read this, read the output of
115+
`goto-instrument --dot main` and change the invocation of `tail`
116+
accordingly.)
117+
118+
There are a few other views of goto-programs. Run `goto-instrument -h`
119+
and try the various switches under the "Diagnosis" section.
120+
121+
122+
123+
## Learning about goto-programs
124+
125+
In this section, you will learn about the basic goto-program data
126+
structures. Reading from and manipulating these data structures form
127+
the core of writing an analysis for CBMC.
128+
129+
130+
### First steps with `goto-instrument`
131+
132+
<div class=memdoc>
133+
**Task:** Write a simple C program with a few functions, each containing
134+
a few statements. Compile the program with `goto-gcc` into a binary
135+
called `main`.
136+
</div>
137+
138+
139+
The entry point of `goto-instrument` is in `goto_instrument_main.cpp`.
140+
Follow the control flow into `goto_instrument_parse_optionst::doit()`, located in `goto_instrument_parse_options.cpp`.
141+
At some point in that function, there will be a long sequence of `if` statements.
142+
143+
<div class=memdoc>
144+
**Task:** Add a `--greet` switch to `goto-instrument`, taking an optional
145+
argument, with the following behaviour:
146+
147+
$ goto-instrument --greet main
148+
hello, world!
149+
$ goto-instrument --greet Leperina main
150+
hello, Leperina!
151+
152+
You will also need to add the `greet` option to the
153+
`goto_instrument_parse_options.h` file in order for this to work.
154+
Notice that in the `.h` file, options that take an argument are followed
155+
by a colon (like `(property):`), while simple switches have no colon.
156+
Make sure that you `return 0;` after printing the message.
157+
</div>
158+
159+
The idea behind `goto-instrument` is that it parses a goto-program and
160+
then performs one single analysis on that goto-program, and then
161+
returns. Each of the switches in `doit` function of
162+
`goto_instrument_parse_options` does something different with the
163+
goto-program that was supplied on the command line.
164+
165+
166+
### Goto-program basics
167+
168+
At this point in `goto-instrument_parse_options` (where the `if`
169+
statements are), the goto-program will have been loaded into the object
170+
`goto_functions`, of type `goto_functionst`. This has a field called
171+
`function_map`, a map from function names to functions.
172+
173+
174+
<div class="memdoc">
175+
**Task:** Add a `--print-function-names` switch to `goto-instrument`
176+
that prints out the name of every function in the goto-binary. Are
177+
there any functions that you didn't expect to see?
178+
</div>
179+
180+
The following is quite difficult to follow from doxygen, but: the value
181+
type of `function_map` is `goto_function_templatet<goto_programt>`.
182+
183+
184+
<div class=memdoc>
185+
**Task:** Read the documentation for `goto_function_templatet<bodyT>`
186+
and `goto_programt`.
187+
</div>
188+
189+
Each goto_programt object contains a list of
190+
\ref goto_program_templatet::instructiont called
191+
`instructions`. Each instruction has a field called `code`, which has
192+
type \ref codet.
193+
194+
<div class=memdoc>
195+
**Task:** Add a `--pretty-program` switch to `goto-instrument`. This
196+
switch should use the `codet::pretty()` function to pretty-print every
197+
\ref codet in the entire program. The strings that `pretty()` generates
198+
for a codet look like this:
199+
200+
index
201+
* type: unsignedbv
202+
* width: 8
203+
* #c_type: char
204+
0: symbol
205+
* type: array
206+
* size: nil
207+
* type:
208+
* #source_location:
209+
* file: src/main.c
210+
* line: 18
211+
* function:
212+
* working_directory: /some/dir
213+
0: unsignedbv
214+
* width: 8
215+
* #c_type: char
216+
...
217+
</div>
218+
219+
The sub-nodes of a particular node in the pretty representation are
220+
numbered, starting from 0. They can be accessed through the `op0()`,
221+
`op1()` and `op2()` methods in the `exprt` class.
222+
223+
Every node in the pretty representation has an identifier, accessed
224+
through the `id()` function. The file `util/irep_ids.def` lists the
225+
possible values of these identifiers; have a quick scan through that
226+
file. In the pretty representation above, the following facts are true
227+
of that particular node:
228+
229+
- `node.id() == ID_index`
230+
- `node.type().id() == ID_unsignedbv`
231+
- `node.op0().id() == ID_symbol`
232+
- `node.op0().type().id() == ID_array`
233+
234+
The fact that the `op0()` child has a `symbol` ID menas that you could
235+
cast it to a `symbol_exprt` (which is a subtype of `exprt`) using the
236+
function `to_symbol_expr`.
237+
238+
<div class=memdoc>
239+
**Task:** Add flags to `goto-instrument` to print out the following information:
240+
* the name of every function that is *called* in the program;
241+
* the value of every constant in the program;
242+
* the value of every symbol in the program.
243+
</div>
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)