-
Notifications
You must be signed in to change notification settings - Fork 41
Disable event history sanity check in test #2520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Disable event history sanity check in test #2520
Conversation
As the validator backend is being stopped in this test, the event history sync is expected to not be in sync with other SVs Signed-off-by: Divam <[email protected]>
It seems that CI had some problem, a re-run might fix it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logs pointed respectively:
Simtime: Test failed: 'SvTimeBasedRoundMgmtIntegrationTest/round management', message: forAll failed, because...
Wallclock: Test failed: 'SvReconcileSynchronizerConfigIntegrationTest/SV automation reconcile amulet config change to domain parameter', message: forAll failed, because...
This fix might still make sense, but it's not the one we stumbled upon, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in wallclock this test was also run, and the failure was reported for tests which ran together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I'm saying is that the failure happened, for wallclock tests, in SvReconcileSynchronizerConfigIntegrationTest
as this log indicates:
│2025-10-02T13:23:23.288Z [⋮] WARN - c.d.c.LogReporter:reporter=scala-test (---) - Test failed: '***SvReconcileSynchronizerConfigIntegrationTest***/SV automation reconcile amulet config change to ┤
domain parameter', message: forAll failed, because: │
│ at index 0, Vector((EventHistoryItem
Furthermore, the test SvDevNetReonboardingIntegrationTest.scala
(which you've disabled here), which was scheduled to run afterwards, does not even manage to run because the plugin failed before it:
│2025-10-02T13:23:24.133Z [⋮] ERROR - o.l.s.i.t.***SvDevNetReonboardingIntegrationTest***:SvDevNetReonboardingIntegrationTest (20e07fc75471f6eb977531fc84ce3f87-newSpan--d5cbdb014e5d3ee8) - Failed ┤
creating environment: Creating fixture (EnvironmentSetup.scala:154) java.io.UncheckedIOException: Could not create Prometheus HTTP server
The error "Could not create Prometheus HTTP server" happens for silly reasons when the plugin fails, as it did above.
The point is, the flake happened in SvReconcileSynchronizerConfigIntegrationTest
, not in SvDevNetReonboardingIntegrationTest
, which didn't manage to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the logs again, the SvDevNetReonboardingIntegrationTest
did ran before the failure happened.
2025-10-02T13:14:30.5404944Z [info] Test still running after 1 minute, 9 seconds: suite name: SvDevNetReonboardingIntegrationTest, test name: Reonboarding an SV
with the same name removes the old SV from PartyToParticipant and the Decentralized Namespace.
2025-10-02T13:23:00.2196524Z [info] Test still running after 9 minutes, 39 seconds: suite name: SvDevNetReonboardingIntegrationTest, test name: Reonboarding an
SV with the same name removes the old SV from PartyToParticipant and the Decentralized Namespace.
Though I agree that this fix is not sufficient to avoid the failure happening due to the plugin.
If a test which disables the EventHistorySanityCheckPlugin
is run along with another test which does not disable it, then the failure could still happen in the test which did not disable the plugin, since the underlying DB used by the tests is the same in both. (AFAIK the DB is not reset when running different tests).
So the correct fix would be to run the tests requiring EventHistorySanityCheckPlugin
to be disabled, separate from other tests. Perhaps the tests with the plugin disabled should be run in a separate bucket? (as done in test-full-class-names-*.log
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(let me number things so I don't get lost 😄 )
- ah, I see where the confusion is 😃 the test shows up as "running" according to the
Test still running after
log, but there's also this:
│2025-10-02T13:23:23.285Z [⋮] DEBUG - c.d.c.i.ConcurrentEnvironmentLimiter$ (---) - org.lfdecentralizedtrust.splice.integration.tests.SvDevNetReonboardingIntegrationTest: Queued => Running after 606s (3 pending)
and
│2025-10-02T13:23:24.131Z [⋮] DEBUG - o.l.s.i.t.SvDevNetReonboardingIntegrationTest:SvDevNetReonboardingIntegrationTest (20e07fc75471f6eb977531fc84ce3f87-newSpan--d5cbdb014e5d3ee8) - Starting creating environment for SvDevNetReonboardingIntegrationTest, test 'Reonboarding an SV with the same name removes the old SV from PartyToParticipant and the Decentralized Namespace':
so it's not actually doing any work until 13:23:24.131Z
... sorry I didn't mention/catch that earlier 🤦
- As for running the tests using the plugin... we have the same problem in our other 2 sanity check plugins. So far when checking for test failures we've accepted "Prometheus failure = some other test failed before, likely in a plugin". I'm not too worried about that behavior (as annoying as it is), but of course we still want to figure out how it's flaking, because it could be a bug.
- I'm happy to merge this PR because it still fixes stuff, but
SvTimeBasedRoundMgmtIntegrationTest
andSvReconcileSynchronizerConfigIntegrationTest
might flake again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, how did you see these DEBUG messages?
Given this, the diff of the events reported by the plugin, after running the
SvReconcileSynchronizerConfigIntegrationTest
, could be due to some real problem in synchronizer's workings / sync of events logic.
The SV4 synchronizer emitted a reject verdict with record_time "1:39:36.000156Z" (having confirming_parties: [dso, sv4, digital-asset-eng-4]. This verdict was captured in SV1's (or SV2) scan-app, but was not part of the other SV's scan events.
Its hard to say what could be the cause of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, how did you see these DEBUG messages?
it's in the logs in the drive I shared with you, let me know if you don't have access / lost the link
You should also be able to find there by updateid (or similar) the verdict and updates as they happen across all 4 SVs
As the validator backend is being stopped in this test, the event history sync is expected to not be in sync with other SVs
Fixes #2513
Pull Request Checklist
Cluster Testing
/cluster_test
on this PR to request it, and ping someone with access to the DA-internal system to approve it./hdm_test
on this PR to request it, and ping someone with access to the DA-internal system to approve it.PR Guidelines
Fixes #n
, and mention issues worked on using#n
Merge Guidelines