-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and property test callbacks; log replicas on timeout #4030
base: trunk
Are you sure you want to change the base?
Conversation
- Includes rewrite of Ideal CL metrics calculation Patch by Josh McKenzie; reviewed by Jon Meredith and David Capwell for CASSANDRA-XXXXXX
public void checkStillAppliesTo(ClusterMetadata newMetadata) | ||
{ | ||
stillAppliesTo(newMetadata); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not rename the stillAppliesTo
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this came in with TCM and I'm not sure the provenance, but we have:
public boolean stillAppliesTo(...)
which is overridden in some child classes to actually return false in cases where it doesn't apply vs. the base that throws, and then we have usage of the method that takes the form:
if (stillAppliesTo(thing)) // looks like vestigial noop
return;
} // end of method
at the tail end of a method to throw if things have shifted.
My real beef here was that we had a method signature that looked like a noop from the outside because the semantics of the base version of the call are basically "return true or throw". I was going for a lightweight wrapper to shift the verb from "hey, I expect to get a boolean back", to "hey, void return. You either work or you barf."
In theory I could go deeper on the surgery to get the ReplicaPlan hierarchy away from overloading the usage patterns of this base method but... meh. Figured this semantic sugar verb wrapper would be enough to make me stop being sad.
* TODO: this method is brittle for its purpose of deciding when we should fail a query; | ||
* this needs to be aware of which nodes are live/down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this TODO carried through in the refactor or should there be a JIRA here that tracks the work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Carried through in the refactor. Probably should be a JIRA but I don't quite know the original context. Looks like it was Benedict on CASSANDRA-14404 but the method's been touched twice since then and then we have TCM in the mix, so not sure if it still applies.
} | ||
|
||
/** | ||
* TODO: Add {@code Gen<BigInteger>} support into {@link AbstractTypeGenerators#PRIMITIVE_TYPE_DATA_GENS} and call that, similar to {@link #nextUUID} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still todo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah; added a note to make a follow up JIRA to move both nextBigInteger
and nextBigDecimal
into AbstractTypeGenerators.java
.
ByteBuffer bb = ByteBuffer.wrap(new byte[numBytes]); | ||
for (int i = 0; i < numBytes / 4; i++) | ||
bb.putInt(nextInt()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
org.apache.cassandra.index.sai.SAITester.Randomization#nextBigInteger(int)
is interesting, passing the random source into BigInteger
to generate. It doesn't look like nextBoundedBigInteger
that calls this is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not here; that's used on 2 other patches that I'm working to upstream separately so it's a shared code artifact between them. 😇
I like the simplicity of the SAI impl but this impl lets us lean on RandomGenerator
instead of Random
which is where the ability to lean on the better period length and distribution of MersenneTwister
comes in. Is it going to realistically make a big difference here? Almost certainly not. Did it let me sleep better at night when writing fuzz testing for paging across tombstones and I originally authored it?
Yes. Yes it did.
I'm receptive to the argument that we should K.I.S.S. on this one, but otherwise I'm fine leaving as is if you are.
{ | ||
case PAXOS: | ||
minRF = 2; | ||
testedCL = EnumSet.of(ConsistencyLevel.SERIAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't overwriting the class level testedCLs mean the tested CLs are dependent on the order runTestMatrix
is called if it isn't reset for each type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I believe you're right. Think we need to set that back in the NORMAL and default case to the default value of:
private EnumSet<ConsistencyLevel> defaultCLTests = EnumSet.of(ConsistencyLevel.ALL, ConsistencyLevel.QUORUM, ConsistencyLevel.ONE);
private EnumSet<ConsistencyLevel> testedCL = EnumSet.of(ConsistencyLevel.ALL, ConsistencyLevel.QUORUM, ConsistencyLevel.ONE);
I originally wrote these tests as 1 property test per file but then realized I could easily co-locate them all in the same file, but that certainly shifted the implications in terms of durable static state now didn't it? 🤔
{ | ||
runTestMatrix(CallbackType.EACH_QUORUM, () -> { | ||
debugLog("---------------- CYCLE. rf: {}. nd1: {}. nd2: {}. quorum: {}", rf, dc1NodesDown, dc2NodesDown, quorum); | ||
EachQuorumResponseHandler callback = new EachQuorumResponseHandler(eachQuorumPlan, null, WriteType.SIMPLE, null, requestTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the test with coverage on and noticed the ideal CL stuff wasn't being excercised, so I tried adding
callback.trackIdealCL(writeReplicaPlan.withConsistencyLevel(ConsistencyLevel.ALL));
and found a counterexample
java.lang.AssertionError: Property falsified after 7 example(s)
Smallest found falsifying value(s) :-
{rf: 6, cl: ONE, nodesDown: 0}
Cause was :-
java.lang.AssertionError: Expected successful callback resolution but saw exception.
at org.junit.Assert.fail(Assert.java:88)
at org.apache.cassandra.net.CallbackPropertyTest.validateExceptionState(CallbackPropertyTest.java:773)
I'm assuming setting an ideal CL should never fail the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at this; I added the idealCL stuff last week (and by that I mean I rewrote it) and these property tests predated that change. I didn't think to update the property tests to include that aspect as well (given it's exercised at least nominally in WriteResponseHandlerTest
); I'm glad you thought to check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think this is because in the runMultiDCTest
case, we don't actually initialize writeReplicaPlan
but instead relied on eachQuorumPlan
and localQuorumPlan
. This is an artifact of the previous hierarchy of having an abstract base that each individual CL-based property test extended from and needing that context constructed and stored in the base to be available on each test invocation. I've tweaked it so all the write side tests regardless of CL use writeReplicaPlan
, which will then not be null and then not NPE on the CallbackResponseTracker#buildDCMap
call. Prelim "set idealCL and run this" is passing fine on both testEachQuorum
and testLocalQuorum
locally.
Open question as to whether we think we should gracefully allow passing in a null ReplicaPlan.ForWrite
as an argument to that method; it's kind of nonsensical to say "build a DC map for me of this Non-Thing".
Looking some more at coverage testing for the It currently doesn't exercise a few aspects:
|
I'm pro adding all of the above w/the following context:
|
Patch by Josh McKenzie; reviewed by Jon Meredith and David Capwell for CASSANDRA-20505