Skip to content

Document symex instructions #7497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

NlightNFotis
Copy link
Contributor

Add documentation for the goto-symex and goto-program-instructions.

The `goto_modelt` is the main data structure that CBMC (and the other tools) use to
represent GOTO-IR (the `GOTO` Intermediate Representation).

A `goto_modelt` is effectively a product type, consisting of:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as on the previous draft; I would suggest using "pair" or "tuple" as a more C++-ish way of saying this.


A `goto_modelt` is effectively a product type, consisting of:

* A list of GOTO functions (pseudocode: `type goto_functionst = list<goto_functiont>`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really a list...

A `goto_functiont` is also a product type. It's designed to represent a function
at the IR level, and effectively it's the following ADT (in pseudocode):

```js
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use C++-style pseudocode?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And / or a reference to the code.

Another important data structure that needs to be discussed at this point is
`source_locationt`.

This is an `irept`. `irep`s are the central data structure that model most entities inside
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

irept is pretty critical and deserves space of its own, not just under source_locationt

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


## A (very) short introduction to Symex

Symex is, at its core, a GOTO-program interpreter. While Symex is interpreting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'an interpreter BUT using symbolic values, not literal ones. For example... Note that this produces a formula which describes all possible outputs rather than a single output value.'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Changed

}
```

If we ask CBMC to print the GOTO-program instructions with `--show-goto-functions`, then part
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in the goto-functions documentation rather than just in symex.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume (hope) it is. Maybe here we should have a cross reference to that part of the goto-functions documentation.

^SIGNAL=0$
--
--
The test is a minimized version of code generated from s2n-quic using Kani.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More tests are always good but why is this in a documentation PR? I guess I have missed something...

@@ -215,14 +215,6 @@ void symex_assignt::assign_non_struct_symbol(
assignment.rhs,
target,
symex_config.allow_pointer_unsoundness);
// Erase the composite symbol from our working state. Note that we need to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could well be a good justification for this removal of code but please could they not be in a documentation PR?

@NlightNFotis
Copy link
Contributor Author

NlightNFotis commented Jan 24, 2023

Hi @martin-cs , this was branched of at an earlier point from the same branch as the other documentation PR (#7470), and it hasn't tracked the changes that addressed the comments on the other PR, so it appears like it has the same problems, but it's just being stale.

This means that the earlier commits here are just adding noise, so I will focus on addressing everything on the other PR, get it merged, and then rebase this one on top of develop, for a cleaner, less stale history. I will then address the comments that remain relevant after the rebase and ask you for a re-review.

@@ -64,9 +71,10 @@ To be documented.

To be documented.

## Goto functions &rarr; BMC &rarr; Counterexample (trace) ##
## Goto functions &rarr; BMC &rarr; Counterexample (trace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Doesn't seem to match the other lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this as a drive-by clean up. As far as I know headings in markdown are only prepended by a ##.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second visit, it appears that doxygen does support any number of # after the title, but that's invalid markdown.

If this change offends in any capacity however, it's easy to be reverted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this file has that issue multiple times. Can you please clean up all occurrences? (src/goto-instrument/README.md also suffers from this problem. Maybe do one commit that does the cleanup for all of these?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean fair enough. I was thinking maybe there should be a separate PR that does this clean up. Or, at least, maybe consistently clean them all up in this file as there are many line this (as Michael as noticed). You want to leave the file in a consistent state so that newcomers will follow the example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make trailing # consistent by removing them in the files touched by this PR.

Another important data structure that needs to be discussed at this point is
`source_locationt`.

This is an `irept`. `irep`s are the central data structure that model most entities inside
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

purposes:

```c
enum goto_program_instruction_typet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw in some "..."s here to emphasize that this isn't the full set.

enum goto_program_instruction_typet
{
GOTO = 1, // branch, possibly guarded
ASSUME = 2, // non-failing guarded self loop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps not the worlds most obvious description of an assumption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is that it's a copy from goto_program_instruction_typet where the comment of ASSUME states what was written.

I changed it to ASSUME to avoid confusion.

```c
int main(int argc, char **argv)
{
int arry[] = {0, 1, 2, 3};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this point you may as well just say "array", or go the whole way and call it "a". "arry" offers neither brevity nor clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a and improved assertion comment

}
```

If we ask CBMC to print the GOTO-program instructions with `--show-goto-functions`, then part
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume (hope) it is. Maybe here we should have a cross reference to that part of the goto-functions documentation.

dispatching functions are usually in their own file, designated by the instruction's
name. As an example, you can find the code for the function goto_symext::symex_goto
in [symex_goto.cpp](../../src/goto-symex/symex_goto.cpp)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A description of what symex produces, why, and what it means could use a longer explanation.

@@ -74,8 +74,8 @@ void multi_path_symex_only_checkert::generate_equation()
{
const auto symex_start = std::chrono::steady_clock::now();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems like it should be a couple of PRs about different things. This doesn't feel like it belongs in a documentation PR.

@NlightNFotis NlightNFotis force-pushed the document_symex_instructions branch from 1ce9b87 to 1eb26aa Compare February 9, 2023 14:47
@NlightNFotis NlightNFotis assigned esteffin and unassigned NlightNFotis Feb 9, 2023
@codecov
Copy link

codecov bot commented Feb 9, 2023

Codecov Report

Base: 78.49% // Head: 78.49% // No change to project coverage 👍

Coverage data is based on head (1eb26aa) compared to base (eb883a6).
Patch has no changes to coverable lines.

❗ Current head 1eb26aa differs from pull request most recent head a4464fb. Consider uploading reports for the commit a4464fb to get more accurate results

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7497   +/-   ##
========================================
  Coverage    78.49%   78.49%           
========================================
  Files         1663     1663           
  Lines       191339   191339           
========================================
  Hits        150185   150185           
  Misses       41154    41154           
Impacted Files Coverage Δ
src/analyses/ai.h 64.28% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -64,9 +71,10 @@ To be documented.

To be documented.

## Goto functions &rarr; BMC &rarr; Counterexample (trace) ##
## Goto functions &rarr; BMC &rarr; Counterexample (trace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this file has that issue multiple times. Can you please clean up all occurrences? (src/goto-instrument/README.md also suffers from this problem. Maybe do one commit that does the cleanup for all of these?)

Comment on lines 22 to 24
Symex is, at its core, a GOTO-program interpreter. While Symex is interpreting
the program, it's also building a list of SSA steps that form part of the equation
that is to be sent to the solver.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to refer the reader to src/goto-symex/README.md?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added

Comment on lines +84 to +90
SET RETURN VALUE side_effect statement="nondet" is_nondet_nullable="1"
// 5 file /tmp/example.c line 5 function main
END_FUNCTION
```

In the above excerpt, the capitalised words at the beginning each instruction are the
correspondent instruction types (`DECL`, `ASSIGN`, etc).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the above really belongs to goto programs, not so much to symex, does it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in this file we are describing the Instruction type that is exactly what we are showing.
In my opinion it is useful to show how the goto-program looks like.

@esteffin
Copy link
Contributor

@martin-cs Can you please have a look and confirm your concerns have been addressed?

@TGWDB
Copy link
Contributor

TGWDB commented Feb 27, 2023

@martin-cs Can you please act on this soon? It's been a week since the last request for review and more than that since various fixes were pushed. Note that we plan to force merge this in about 24 hours unless there is some new specific concern. We can make further changes in follow up tickets if necessary, but we'd like to merge this and progress ASAP.

@peterschrammel peterschrammel merged commit 796fb38 into diffblue:develop Feb 28, 2023
@NlightNFotis NlightNFotis deleted the document_symex_instructions branch February 28, 2023 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

8 participants