Skip to content

Commit 1eb26aa

Browse files
committed
Add some high-level documentation on symex and GOTO-program instructions.
1 parent eb883a6 commit 1eb26aa

File tree

2 files changed

+130
-2
lines changed

2 files changed

+130
-2
lines changed

doc/architectural/cbmc-architecture.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,10 @@ To be documented.
7171

7272
To be documented.
7373

74-
## Goto functions → BMC → Counterexample (trace) ##
74+
## Goto functions → BMC → Counterexample (trace)
7575

76-
To be documented.
76+
For an explanation of part of how the BMC (Symex) process works, please refer
77+
to [Symex and GOTO program instructions](symex-instructions.md)
7778

7879
## Trace → interpreter → memory map ##
7980

+127
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# Symex and GOTO program instructions
2+
3+
In [doc/central-data-structures](central-data-structures.md) we talked about
4+
the relationship between various central data structures of CBMC.
5+
6+
We identified the `goto_programt` as being the core data structure representing
7+
GOTO-IR, which we defined as a list of GOTO program instructions.
8+
9+
As a reminder, instructions are composed of three subcomponents:
10+
11+
* An instruction type,
12+
* A statement (denoting the actual code the instruction contains),
13+
* An instruction guard (an expression that needs to be evaluated to `true` before
14+
the instruction is to be executed - if one is attached to the instruction).
15+
16+
In this document, we are going to take a closer look at the first subcomponent,
17+
the instruction types, along with the interplay between the instructions and a
18+
central CBMC module, the *symbolic execution engine* (from now on, just *symex*).
19+
20+
## A (very) short introduction to Symex
21+
22+
Symex is, at its core, a GOTO-program interpreter. While Symex is interpreting
23+
the program, it's also building a list of SSA steps that form part of the equation
24+
that is to be sent to the solver.
25+
26+
You can see the main instruction dispatcher (what corresponds to the main interpreter
27+
loop) at goto_symext::execute_next_instruction.
28+
29+
Symex's source code is available under [src/goto-symex](../../src/goto-symex/).
30+
31+
## Instruction Types
32+
33+
Every GOTO-program instruction has a specific type. You can see the instruction type
34+
at #goto_program_instruction_typet but we will also list some of them here for illustration
35+
purposes:
36+
37+
```c
38+
enum goto_program_instruction_typet
39+
{
40+
GOTO = 1, // branch, possibly guarded
41+
ASSUME = 2, // non-failing guarded self loop
42+
ASSERT = 3, // assertions
43+
SKIP = 5, // just advance the PC
44+
SET_RETURN_VALUE = 12, // set function return value (no control-flow change)
45+
ASSIGN = 13, // assignment lhs:=rhs
46+
DECL = 14, // declare a local variable
47+
DEAD = 15, // marks the end-of-live of a local variable
48+
FUNCTION_CALL = 16, // call a function
49+
};
50+
```
51+
52+
(*NOTE*: The above is for illustration purposes only - the list is not complete, neither is it
53+
expected to be kept up-to-date. Please refer to the type #goto_program_instruction_typet for a
54+
list of what the instructions look like at this point in time.)
55+
56+
You can observe these instruction types in the user-friendly print of the goto-program that
57+
various CPROVER programs produce with the flag `--show-goto-functions`. For a live example,
58+
consider the following C file:
59+
60+
```c
61+
int main(int argc, char **argv)
62+
{
63+
int arry[] = {0, 1, 2, 3};
64+
__CPROVER_assert(arry[3] != 3, "expected failure: last arry element is equal to 3");
65+
}
66+
```
67+
68+
If we ask CBMC to print the GOTO-program instructions with `--show-goto-functions`, then part
69+
of the output looks like this:
70+
71+
```c
72+
[...]
73+
74+
main /* main */
75+
// 0 file /tmp/example.c line 3 function main
76+
DECL main::1::arry : signedbv[32][4]
77+
// 1 file /tmp/example.c line 3 function main
78+
ASSIGN main::1::arry := { 0, 1, 2, 3 }
79+
// 2 file /tmp/example.c line 4 function main
80+
ASSERT main::1::arry[cast(3, signedbv[64])] ≠ 3 // expected failure: last arry element is equal to 3
81+
// 3 file /tmp/example.c line 5 function main
82+
DEAD main::1::arry
83+
// 4 file /tmp/example.c line 5 function main
84+
SET RETURN VALUE side_effect statement="nondet" is_nondet_nullable="1"
85+
// 5 file /tmp/example.c line 5 function main
86+
END_FUNCTION
87+
```
88+
89+
In the above excerpt, the capitalised words at the beginning each instruction are the
90+
correspondent instruction types (`DECL`, `ASSIGN`, etc).
91+
92+
---
93+
94+
Symex (as mentioned above) is going to pick a designated entry point and then start going through
95+
each instruction. This happens at goto_symext::execute_next_instruction. While doing so, it will
96+
inspect the instruction's type, and then dispatch to a designated handling function (which usually go
97+
by the name `symex_<instruction-type>`) to handle that particular instruction type and its symbolic
98+
execution. In pseudocode, it looks like this:
99+
100+
```c
101+
switch(instruction.type())
102+
{
103+
[...]
104+
105+
case GOTO:
106+
symex_goto(state);
107+
break;
108+
109+
case ASSUME:
110+
symex_assume(state, instruction.condition());
111+
break;
112+
```
113+
114+
The way the [`symex` subfolder](../../src/goto-symex/) is structured, the different
115+
dispatching functions are usually in their own file, designated by the instruction's
116+
name. As an example, you can find the code for the function goto_symext::symex_goto
117+
in [symex_goto.cpp](../../src/goto-symex/symex_goto.cpp)
118+
119+
The output of symex is an symex_target_equationt which consists of equalities between
120+
different expressions in the program. You can visualise it as a data structure that
121+
serialises to this: `(a = 5 ∨ a = 3) ∧ (b = 3) ∧ (c = 4) ∧ ...` that describe assignments
122+
and conditions for a range of possible executions of a program that cover a range of
123+
potential paths within it.
124+
125+
This is a high-level overview of symex and goto-program instructions.
126+
For more information (and a more robust introduction), please have a look
127+
at \ref symbolic-execution.

0 commit comments

Comments
 (0)