-
Notifications
You must be signed in to change notification settings - Fork 110
Detect accidentally dropped errors in tests #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
One more patch coming up to fix the remaining dropped error in |
|
@DerGuteMoritz Sorry to say this, but I no longer remember Manifold well enough to review these, I think. Nor do I really have the time atm. I would suggest soliciting some helpers from the #manifold slack channel. |
| [f handle-dropped-errors] | ||
| (assert (nil? dropped-errors) "with-dropped-error-detection may not be nested") | ||
| ;; Flush out any pending dropped errors from before | ||
| (System/gc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was told before that System/gc is more of a suggestion to the VM than an actual call
Did you observe this working reliably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, there is no guarantee that System/gc actually triggers a GC, let alone whether it reclaims all garbage. Quoting from the docs
Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for reuse by the Java Virtual Machine. When control returns from the method call, the Java Virtual Machine has made a best effort to reclaim space from all unused objects. There is no guarantee that this effort will recycle any particular number of unused objects, reclaim any particular amount of space, or complete at any particular time, if at all, before the method returns or ever. There is also no guarantee that this effort will determine the change of reachability in any particular number of objects, or that any particular number of Reference objects will be cleared and enqueued.
However, it does work pretty well in practice, see for example the failing tests for this PR which uses OpenJDK 8. Locally, I've also successfully used it with OpenJDK 21. So yeah, it's not deterministic but still quite useful as evidenced by the issues it uncovered 🙂
Manifold has a debug mechanism which instruments deferreds so that a warning is logged when they are in an error state with no error handler attached at garbage collection time. Such errors are referred to as "dropped errors" and the instrumented deferreds are called "leak-aware". The dropped error logging feature is enabled by default but for performance reasons, only every 1024th deferred is made leak-aware (see `manifold.deferred/deferred`). However, deferreds constructed via `manifold.deferred/error-deferred` are always leak-aware. This patch introduces a new option `manifold.debug/*leak-aware-deferred-rate*` which allows changing the rate of leak-aware deferreds being instantiated, with default still being 1024. When this option is set to to 1 (= every deferred will be leak-aware), dropped errors can be detected more reliably e.g. during testing. To that end, the patch also includes a test util for instrumenting all tests in a namespace so that they fail when a dropped error was detected. All test namespaces are now instrumented like this (except for `:benchmark` and `:stress` tests), which already uncovered a few latent dropped errors. These will be fixed in separate patches.
Also, make test-alt a bit more reliable and make the exception messages distinct so that a dropped error can easily be matched up with the deferred.
Involving `d/future` makes these tests needlessly flaky. By decoupling the `d/alt` invocation from dereferencing its result, we can deterministically control the order of events.
The result deferred `d` may end up in an error state when `(f)` throws an exception or returns an error deferred. Now since the cancellation callback was only attached via `d/chain` for side-effect (i.e. the resulting deferred is not returned), the dropped error detection would be triggered in this case. The fix is simply to also attach an error handler with `d/on-realized`. Furthermore, putting the result deferred in an error state would not have cancelled the timer because there was no error handler attached to it. This is now also fixed and the tests are extended to cover the cancellation API, too. The latter uncovered an oversight in the mock clock implementation where the `in` method didn't return a cancellation function, resulting in an NPE. This is now also fixed.
This was a mistake, this metadata flag is supposed to go on the test *var* only.
d31871a to
cff7428
Compare
When putting the result deferred `d` in an error state, deferred returned by `chain` would be detected as a dropped error because it's only used to attach a side-effecting callback for cancelling the timeout. The fix then is to use `on-realized` instead to attach a handler for both cases. As a consequence, timeouts are now also properly cancelled when the result deferred is put in an error state, freeing up resources in a more timely fashion. This fixes one of the causes of clj-commons/aleph#766, as well as 2 of the 3 dropped errors in `manifold.go-off-test`. Also, add dedicated tests for `d/timeout!`.
When `in` is closed and `out` is also closed by the time the chained handler runs, closing `s'`
would fail because it's `::none`. This non-deterministic case was occasionally observed with
`manifold.stream-test/test-cleanup` failing due to a leaked error deferred like so:
WARNING: unconsumed deferred in error state, make sure you're using `catch`.
java.lang.ClassCastException: clojure.lang.Keyword cannot be cast to manifold.stream.core.IEventStream
at manifold.stream$concat$this__3189__auto____14958$fn__14959$fn__14960.invoke(stream.clj:819)
at manifold.deferred$eval3018$chain_SINGLEQUOTE____3039.invoke(deferred.clj:904)
at manifold.stream$concat$this__3189__auto____14958$fn__14959.invoke(stream.clj:816)
at manifold.stream$concat$this__3189__auto____14958.invoke(stream.clj:815)
at clojure.lang.AFn.applyToHelper(AFn.java:154)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.core$apply.invokeStatic(core.clj:669)
at clojure.core$apply.invoke(core.clj:662)
at manifold.stream$concat$this__3189__auto____14958$fn__14984.invoke(stream.clj:815)
Timeout tests are changed so that every timeout exception has a uniquely identifiable message which makes it easier to match up the dropped error warning with the originating timeout.
This saves a bit of overhead relative to `d/chain` + `d/catch`.
Manifold has a debug mechanism which instruments deferreds so that a warning is logged when they are in an error state with no error handler attached at garbage collection time. Such errors are referred to as "dropped errors" and the instrumented deferreds are called "leak-aware".
The dropped error logging feature is enabled by default but for performance reasons, only every 1024th deferred is made leak-aware (see
manifold.deferred/deferred). However, deferreds constructed viamanifold.deferred/error-deferredare always leak-aware.This patch introduces a new option
manifold.debug/*leak-aware-deferred-rate*which allows changing the rate of leak-aware deferreds being instantiated, with default still being 1024. When this option is set to to 1 (= every deferred will be leak-aware), dropped errors can be detected more reliably e.g. during testing.To that end, the patch also includes a test util for instrumenting all tests in a namespace so that they fail when a dropped error was detected. All test namespaces are now instrumented like this (except for
:benchmarkand:stresstests), which already uncovered a few latent dropped errors. These will be fixed in separate patches.This was prompted by investigating the causes of clj-commons/aleph#766. One of these was found to originate in Manifold itself which will be addressed in a follow-up PR.
This PR is equivalent to #243 but the source branch lives in this here repository which allows us to create stacked PRs with the fixes for it.