-
Notifications
You must be signed in to change notification settings - Fork 273
Huge SMT file and slow proof for simple array function #8617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The reproducible test case is here: Using CBMC 6.5.0, go there and do
to generate the GOTO file. Then
to generate SMT - warning - this takes about 30 minutes, and fills 1.5GB of disk. |
FWIW - I re-wrote this example in SPARK just to make sure it really does verify. The entire analysis and proof run succeeds and takes 2 seconds using 1 CPU core on my Mac... |
It seems that the assigns-clause/write-set tracking is causing very large array update expressions. We need to review why we are using |
It turns out the diff --git a/issues/8617/poly.h b/issues/8617/poly.h
index b65bbbb..8070d99 100644
--- a/issues/8617/poly.h
+++ b/issues/8617/poly.h
@@ -25,8 +25,10 @@ __contract__(
requires(memory_no_alias(a0, sizeof(poly)))
requires(memory_no_alias(a, sizeof(poly)))
requires(array_bound(a->coeffs, 0, MLDSA_N, 0, MLDSA_Q))
- assigns(memory_slice(a1, sizeof(poly)))
- assigns(memory_slice(a0, sizeof(poly)))
+ // assigns(memory_slice(a1, sizeof(poly)))
+ // assigns(memory_slice(a0, sizeof(poly)))
+ assigns(object_whole(a1))
+ assigns(object_whole(a0))
ensures(array_bound(a1->coeffs, 0, MLDSA_N, 0, (MLDSA_Q-1)/(2*MLDSA_GAMMA2)))
ensures(array_abs_bound(a0->coeffs, 0, MLDSA_N, MLDSA_GAMMA2+1))
);
diff --git a/issues/8617/polyvec.c b/issues/8617/polyvec.c
index 530743b..9bf7f7a 100644
--- a/issues/8617/polyvec.c
+++ b/issues/8617/polyvec.c
@@ -12,13 +12,20 @@ void polyveck_decompose(polyveck *v1, polyveck *v0, const polyveck *v)
for (i = 0; i < MLDSA_K; ++i)
__loop__(
- assigns(i, memory_slice(v0, sizeof(polyveck)), memory_slice(v1, sizeof(polyveck)))
+ // assigns(i, memory_slice(v0, sizeof(polyveck)), memory_slice(v1, sizeof(polyveck)))
+ assigns(i, object_whole(v0), object_whole(v1))
invariant(i <= MLDSA_K)
invariant(forall(k1, 0, i,
array_bound(v1->vec[k1].coeffs, 0, MLDSA_N, 0, (MLDSA_Q-1)/(2*MLDSA_GAMMA2)) &&
array_abs_bound(v0->vec[k1].coeffs, 0, MLDSA_N, MLDSA_GAMMA2+1)))
)
{
- poly_decompose(&v1->vec[i], &v0->vec[i], &v->vec[i]);
+ poly c1, c0;
+ c1 = v1->vec[i];
+ c0 = v0->vec[i];
+ // poly_decompose(&v1->vec[i], &v0->vec[i], &v->vec[i]);
+ poly_decompose(&c1, &c0, &v->vec[i]);
+ v1->vec[i] = c1;
+ v0->vec[i] = c0;
}
} |
Thanks. I probably need to apply the same to several other functions. What are the basic rules in play here? e.g. "Don't do X, but do Y"? |
Why are the first two assignments to c0 and c1 required then poly_decompose() re-initializes those object entirely? |
I confirm your patch passes all our test cases, and proof is fast. I also find that removing the first two assignments is also OK. The whole-struct assignments are large (1024 bytes each I think), so we're adding copying of MLDSA_K * 2048 bytes here, which is not good for performance. |
@rod-chapman to avoid the static explosion of the SMT formula we replace With this new contract, So in the context of this loop, for (i = 0; i < MLDSA_K; ++i)
__loop__(
assigns(i, object_whole(v0), object_whole(v1))
invariant(i <= MLDSA_K)
invariant(forall(k1, 0, i,
array_bound(v1->vec[k1].coeffs, 0, MLDSA_N, 0, (MLDSA_Q-1)/(2*MLDSA_GAMMA2)) &&
array_abs_bound(v0->vec[k1].coeffs, 0, MLDSA_N, MLDSA_GAMMA2+1)))
)
{
poly_decompose(&v1->vec[i], &v0->vec[i], &v->vec[i]);
} To be able to both use for (i = 0; i < MLDSA_K; ++i)
__loop__(
assigns(i, object_whole(v0), object_whole(v1))
invariant(i <= MLDSA_K)
invariant(forall(k1, 0, i,
array_bound(v1->vec[k1].coeffs, 0, MLDSA_N, 0, (MLDSA_Q-1)/(2*MLDSA_GAMMA2)) &&
array_abs_bound(v0->vec[k1].coeffs, 0, MLDSA_N, MLDSA_GAMMA2+1)))
)
{
poly c1, c0;
c1 = v1->vec[i];
c0 = v0->vec[i];
poly_decompose(&c1, &c0, &v->vec[i]);
v1->vec[i] = c1;
v0->vec[i] = c0;
} But that's just a temporary workaround. |
It would be great if call-sites of function contracts would interpret |
That's the obvious interpretation. Why isn't this the default behaviour? |
I looked back at the discussion on PR#8603, when (March 4th) Remi wrote: "We use __CPROVER_object_whole(ptr) internally when inferring side effects of loops and need to widen the loop footprint to a sound superset. For instance when a loop touches both i and arr[i], and arr is itself embedded in some aggregate, and we don't know anything about 'i', we just widen the footprint of the loop to __CPROVER_object_whole(arr) to havoc the whole underlying object." This strikes me as slightly unusual. You say "we don't know anything about i", but we do know about i - in particular, we're going to prove that |
A parameter with type |
I think the discussion (and example) in #8570 is relevant here: that's a case where (the equivalent of) |
That's true, the signature alone is not enough, but the precondition should? The caller should not havoc more than what the callee could, based on its preconditions, have legitimately accessed. But I can imagine that making this precise is not as easy at it may seem at first... |
If the function signature says "int *a" then fair enough, but you really should have an Is_Fresh() precondition telling you how much data "a" is pointing at, right? BUT... in almost all our crypto code, we have statically constrained array parameters, where the formal parameter type is something like Then surely an assignment |
You can already express that using As far as havocing goes in CBMC, where have the following primitives/mechanisms:
The first two are super efficient, the last one expands to different things depending on the exact backend we use and the type of the pointer, whether the object size and the slice sizes are known or symbolic, wether the size is a multiple of the object size in case ptr is an array of structs, etc. It can sometimes trigger the array theory, etc. so we're really trying to avoid using it whenever possible.
Yes that's something we can do by leveraging type information at the time of instrumentation, but it would still result in a havoc_slice operation which can unexpectedly blow up in size as you have witnessed. Back then we thought that since havocing the whole object is sound and does not blowup it was the best default. |
Having spent more time collecting the relevant sections from the C standard I believe I'll have to take back the above claim: the example from #8570 does not seem to be well-defined for it is not just multi-dimensional arrays, but an arrray-of-structs where the struct members are arrays. As such, we have an aggregate object, which implies that pointer comparison (C11, 6.5.8 Relational operators) is well defined, but pointer addition/subtraction (6.5.6 Additive operators) beyond individual members of the aggregate object is not. This, however, CBMC does not currently enforce as all bounds-related checks are tied to the aggregate object, not individual members of aggregate objects. |
Hey sorry for the late realization but instead of using assigns(memory_slice(a1, sizeof(polyvec)))
assigns(memory_slice(a0, sizeof(polyvec))) you should really just use assigns(*a0, *a1) which directly compiles to the most optimal thing we can do (a single nondet assignment). |
@remi-delmas-3000 Interesting, thank you! What if a |
We did a deep dive on that issue today and even the single assignment causes some blowup, but luckily @tautschnig identified a way to propagate more precise information about pointer offsets and their alignment that should enable more static simplifications and prevent the explosion, we’ll keep you posted. |
Resolves #117 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #117 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #127 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 This also moves the correctness post-condition in poly_sub into a assert at the end of the function. This vastly improves performance. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #127 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 This also moves the correctness post-condition in poly_sub into a assert at the end of the function. This vastly improves performance. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #135 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #117 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #127 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 This also moves the correctness post-condition in poly_sub into a assert at the end of the function. This vastly improves performance. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #135 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #117 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #127 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 This also moves the correctness post-condition in poly_sub into a assert at the end of the function. This vastly improves performance. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #135 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #127 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 This also moves the correctness post-condition in poly_sub into a assert at the end of the function. This vastly improves performance. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #135 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #117 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #127 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 This also moves the correctness post-condition in poly_sub into a assert at the end of the function. This vastly improves performance. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #135 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #117 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #127 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 This also moves the correctness post-condition in poly_sub into a assert at the end of the function. This vastly improves performance. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #135 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #117 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #127 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 This also moves the correctness post-condition in poly_sub into a assert at the end of the function. This vastly improves performance. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #135 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #117 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #127 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 This also moves the correctness post-condition in poly_sub into a assert at the end of the function. This vastly improves performance. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Resolves #135 Unrolling resulted in too poor performance. The direct proof works, but it requires to workaround the CBMC limitation described in: #102 diffblue/cbmc#8617 Signed-off-by: Matthias J. Kannwischer <[email protected]>
I am experienced a huge (1.5GB) SMT file resulting from attempt to verify what appears to be a very simple function. I will add a link below to a simple reproducer.
This is drawn from the mldsa-native project.
This issue is currently blocking progress on mldsa-native and mlkem-native, so high priority for me.
The text was updated successfully, but these errors were encountered: