Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of Arrays_rel #1268

Merged
merged 3 commits into from
Jan 27, 2025
Merged

Conversation

Halbaroth
Copy link
Collaborator

While writing this documentation, I read the paper Generalized, Efficient Array Decision Procedures of de Moura and Bjorner.

Before writing this documentation, I believed that we only implement the basic
inferences rules, which is exposed in Fig 1 of the paper. These rules are:

  1. (idx) a == store(b, i, v) ===> a[i] = v
  2. (down) a == store(b, i, v), w == a'[j], a ~ a' ===> i = j \/ a[j] = b[j]
  3. (up) a == store(b, i, v), w = b'[j], b ~ b' ===> i = j \/ a[j] = b[j]
  4. (ext) a, b know ===> a = b \/ a[k] <> b[k] (k fresh name)

(I use the double equality symbol == to denote syntactical equality and ~ for the
congruence relation.)

In fact, we implement a slightly different version of these rules. The down
, up and ext inference rules are implemented by respectively get_of_set,
get_and_set and extensionality functions in Arrays_rel module.

The idx axiom is not implemented by a dedicated function. Instead, different
instantiations of this axiom are produced by get_of_set, get_and_set and
get_from_set:

1a. (idx a) a[i] known, a ~ store(b, j, v) ==> i <> j \/ a[i] = v
1b. (idx b) store(b, i, v), store(a, j, w) known, a ~ b ==> store(a, j, w)[j] = w.

Technically, our implementation does not send directly instantiated axioms to the
SAT solver. Consider for instance the axiom down and assume that all its hypotheses
are satisfied. We produce the deduction i = j \/ a[j] = b[j] but we do not send it
to the SAT solver. Instead, we add the equality i = j in the set env.split in
order to split on its negation later. If we split on it or we see i <> j in assume,
we send a[j] = b[j] to the SAT solver.

Our implementation suffers from a number of issues:

  1. We produce a lot of redundant axioms (see section A and section B of the paper).
    For instance, we do not need to propagate a[i] = v if we already know that
    a'[i'] = v' for with a ~ a', i ~ i' and v ~ v'. The same goes for other axioms.
  2. We store all the relevant terms in huge maps which is not efficient.
  3. We transform literal expressions into literal semantic values.

To be more efficient, our implementation only should follow equivalent classes
of read/write terms and relevant indices. When some of these classes merge, we
may have to instantiate axioms. This new version could be implemented using a union-find
in Arrays_rel itself but as I explained during the last dev meeting, this modification
will conflict with our plan for the union-find in UF (using Basile's store).

Copy link
Contributor

@hra687261 hra687261 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation (and the edits to the code), it makes the code more readable. I made some comments mainly on the form and the used language.

type stype = {s:E.t; st:E.t; si:E.t; sv:E.t}
module S :Set.S with type elt = stype = Set.Make
module S = Set.Make
(struct type t = stype let compare t1 t2 = E.compare t1.s t2.s end)

(* map t |-> {set(t,-,-)} qui associe a chaque tableau l'ensemble
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed translation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't translate it because this comment is useless as the meaning of the TBS module is explained in the documentation of the field tbset. I suggest to remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, there is another in line 49 that should also be removed

Comment on lines 97 to 98
gets : G.t;
(* Set of all the known reads. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentations talks about reads and writes (referring to select and store) but the code refers to them as sets and gets. I think it should be harmonized and only one notation should be used.

Maybe at some point we should simply switch to the standard and use select and store internally, but I think select is used in the ADT theory as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's a mess... There are three notations:

  • set/get from the code
  • select/store from the SMT-LIB standard,
  • read/write from many papers on Array theories.

In my opinion, we should use the SMT-LIB standard but, as you pointed out, select is already in use in the ADT theory. I suggest that we keep set/get in Array and harmonize notations later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a new commit to address your comments. Is it better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! LGTM


split : LRset.t;
(* Set of equalities or disequalities on relevant indices.
There are the hypotheses of consequences in the field [conseq].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are the hypotheses of consequences in the field [conseq].
These are the hypotheses of consequences in the field [conseq].

Comment on lines 129 to 132
(* size_splits : Numbers.Q.t; *)
(* Limit the number of case splits performed on indices. This field
is currently unused. *)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it can be removed

`gi <> si` at the same time. *)
assert false

(* Produce a formula to propagate a read over a store.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the comment above about notation, we should settle one notation and not mix multiple ones (in comments use read and write, in the code we use get and set, and here we use store as well)


(* Assume that [stype] represents the write `(store b j v)`.
For all known writes of the form `(store a i w)` such that `a` and
`b` are equal, we form the formula:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this is more of a preference really

Suggested change
`b` are equal, we form the formula:
`b` are equal, we create the formula:

let two = Numbers.Q.from_int 2

(* Choose a equality/disequality between relevant indices to split on it. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* Choose a equality/disequality between relevant indices to split on it. *)
(* Choose an equality/disequality between relevant indices to split on it. *)

Comment on lines 467 to 477
(* unused
let count_splits env la =
let nb =
List.fold_left
(fun nb (_,_,_,i) ->
match i with
| Th_util.CS (Th_util.Th_arrays, n) -> Numbers.Q.mult nb n
| _ -> nb
)env.size_splits la
in
{env with size_splits = nb}
) env.size_splits la
in
{env with size_splits = nb} *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove?

@Halbaroth Halbaroth force-pushed the documentation-arrays branch from e4397fa to 3807cee Compare January 27, 2025 10:53
@Halbaroth Halbaroth requested a review from hra687261 January 27, 2025 10:54
Copy link
Contributor

@hra687261 hra687261 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Other than the comment that needs to be remove.

and an index `i`.
- A relevant index is any term that appears as an index in get or set terms.
- A get or set term is known if it has been encountered in [assume]. *)

let src = Logs.Src.create ~doc:"Arrays_rel" __MODULE__
module Log = (val Logs.src_log src : Logs.LOG)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, GitHub doesn't allow comments on unchanged lines, but the comment in line 49 should be removed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this issue, it's really annoying!

@Halbaroth Halbaroth enabled auto-merge (squash) January 27, 2025 11:10
Comment on lines +38 to +44
(* In this module, we follow the below conventions in the comments:
- A get term refers to a term of the form `(select a i)` for an array `a` and
an index `i`.
- A set term refers to a term of the form `(store a i v)` for an array `a`
and an index `i`.
- A relevant index is any term that appears as an index in get or set terms.
- A get or set term is known if it has been encountered in [assume]. *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that it would make more sense to call these "select terms" and "store terms" in documentation, with a comment mentioning that the variable names are legacy (or also rename the variables).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the other discussion — ignore this comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you've reviewed the PR, I need your approbation to merge it now :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, go ahead

@Halbaroth
Copy link
Collaborator Author

Strangely the CI failed on Ubuntu targets. I remember there is a test that fails sometimes on MacOS because the MacOS runner is too slow but it didn't happen on Ubuntu too? I restarted these tasks.

@bclement-ocp
Copy link
Collaborator

Strangely the CI failed on Ubuntu targets. I remember there is a test that fails sometimes on MacOS because the MacOS runner is too slow but it didn't happen on Ubuntu too? I restarted these tasks.

It is (somewhat) the opposite -- that test is supposed to timeout, but sometimes the runner is fast and it doesn't (and fails with a reported reason of "incomplete" or "proof-search" instead of "timeout"), because our "reproducible-resource-limit" is related to time and not actually reproducible.

See #1277 for an attempt at a fix.

@Halbaroth Halbaroth merged commit 2f10948 into OCamlPro:next Jan 27, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants