Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup 15mar25 #3986

Open
wants to merge 330 commits into
base: cep-15-accord
Choose a base branch
from

Conversation

belliottsmith
Copy link
Contributor

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

belliottsmith and others added 30 commits September 19, 2024 15:09
Co-authored-by: Ariel Weisberg <[email protected]>
Co-authored-by: Benedict Elliott Smith <[email protected]>
Introduce a special kind of non-durable read that provides only per-key linearizable isolation; i.e. strict-serializable isolation for single partition-key reads.
This read creates only a happens-before edge, by collecting dependencies for execution and ensuring that execution happens strictly after these dependencies
have executed, but at no precise time otherwise. So later writes may be witnessed, and if multiple keys are read they may represents different points in time.

patch by Benedict; reviewed by Ariel Weisberg for CASSANDRA-19305
…endency elision

patch by Benedict; reviewed by Aleksey Yeshchenko for CASSANDRA-19310
Patch by Blake Eggleston; Reviewed by Ariel Wesberg for CASSANDRA-19016
patch by David Capwell; reviewed by Benedict Elliott Smith for CASSANDRA-19355
…d uses the sync point empty txn for reads

patch by David Capwell; reviewed by Blake Eggleston for CASSANDRA-19503
…tdown causing the test to fail

patch by David Capwell; reviewed by Blake Eggleston for CASSANDRA-19514
Patch by Blake Eggleston; Reviewed by Ariel Weisberg and David Capwell for CASSANDRA-19472
patch by Caleb Rackliffe; reviewed by David Capwell for CASSANDRA-18987
patch by Benedict Elliott Smith, David Capwell; reviewed by Benedict Elliott Smith for CASSANDRA-19605
…ct issues earlier before they are seen in Cassandra

patch by Benedict Elliott Smith, David Capwell; reviewed by Benedict Elliott Smith, David Capwell for CASSANDRA-19618
patch by Aleksey Yeschenko; reviewed by Alex Petrov and Benedict Elliott Smith for CASSANDRA-18888
patch by David Capwell; reviewed by Blake Eggleston for CASSANDRA-19642
patch by Caleb Rackliffe; reviewed by David Capwell and Ariel Weisberg for CASSANDRA-18732
ifesdjeen and others added 16 commits March 4, 2025 16:44
Patch by Alex Petrov; reviewed by Benedict Elliott Smith for CASSANDRA-20393.
patch by Benedict Elliott Smith; reviewed by David Capwell for CASSANDRA-20420
…a NPE

patch by David Capwell; reviewed by Benedict Elliott Smith for CASSANDRA-20417
Patch by Alex Petrov; reviewed by Benedict Elliott Smith CASSANDRA-20347.
Patch by Alex Petrov; reviewed by Benedict Elliott Smith for CASSANDRA-20424.
… required topology changes

patch by David Capwell; reviewed by Benedict Elliott Smith for CASSANDRA-20426
 - Decouple command serialization from TableMetadata version; introduce ColumnMetadata ids; gracefully handle missing TableId
 - DataInputPlus.readLeastSignificantBytes must truncate high bits
 - Fix RandomPartitioner accord serialization
 - Fast path stable commits must not override recovery propose/commit decisions regarding visibility of a transaction
 - RejectBefore must mergeMax, not merge, to ensure we maintain epoch and hlc increasing independently
 - Bad commitInvalidate decision
 - consistent filtering for touches and stillTouches
 - ensure TRUNCATE_BEFORE implies SHARD_APPLIED
 - TopologyManager.unsyncedOnly off-by-one error
 - DurabilityQueue should not retry SyncPointErased
 - handle rare case of no deps but none needed
 - not updating CFK synchronously on recovery, which can lead to erroneous recovery decisions for other transactions
 - Don't return partial read response when one commandStore rejects the commit
 - Filter touches/stillTouches consistently
 - WaitingState computeLowEpoch must use hasTouched to handle historic key with no route
Improve:
 - Use format parameters to defer building Invariants.requireArgument string
 - streamline RedundantStatus/RedundantBefore
Patch by Alex Petrov; reviewed by Benedict Elliott Smith for CASSANDRA-20114
…e changes without impacting C* messagin

patch by David Capwell; reviewed by Aleksey Yeschenko, Alex Petrov for CASSANDRA-20403
…ioner not being setup. The first example is the empty range so no updates to the static partitioner is done which leads to this NPE
 - Accord Journal purging was disabled
 - remove unique_id from schema keyspace
 - avoid String.format in Compactor hot path
 - avoid string concatenation on hot path; improve segment compactor partition build efficiency
 - Partial compaction should update records in place to ensure truncation of discontiguous compactions do not lead to an incorrect field version being used
 - StoreParticipants.touches behaviour for RX was erroneously modified; should touch all non-redundant ranges including those no longer owned
 - SetShardDurable should correctly set DurableBefore Majority/Universal based on the Durability parameter
 - fix erroneous prunedBefore invariant
 - Journal compaction should not rewrite fields shadowed by a newer record
 - Don't save updates to ERASED commands
 - Simplify CommandChange.getFlags
 - fix handling of Durability for Invalidated
 - Don't use ApplyAt for GC_BEFORE with partial input, as might be a saveStatus >= ApplyAtKnown but with executeAt < ApplyAtKnown

patch by Benedict; reviewed by Alex Petrov for CASSANDRA-20441
  * Fix short accord simulation test (seed 0x6bea128ae851724b), ConcurrentModificationException
  * Increase wait time during closing to avoid Unterminated threads
  * Increase timeouts, improve test stability
  * More descriptive output from CQL test
  * Shorten max CMS delay
  * Improve future handling in config service

Patch by Alex Petrov; reviewed by Benedict Elliott Smith for CASSANDRA-20440
@dcapwell dcapwell changed the base branch from trunk to cep-15-accord March 21, 2025 17:58
import static java.util.concurrent.TimeUnit.NANOSECONDS;
import static org.apache.cassandra.concurrent.ExecutorFactory.Global.executorFactory;

public class LockWithAsyncSignalTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this be a better in Simulator? There is only one real "assert" in the test and its

FBUtilities.waitOnFutures(consumers, 2L, TimeUnit.SECONDS);

and environmental issues can cause this to not timeout within 2s leading it to be flakey. If we wish to debug why the test failed it wouldn't be reproducable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simulator is insufficient for fully testing this kind of construct, as it cannot simulate the memory model. The simulator does help validate systems like this, but we do need to also run it as the hardware intended.

I think as now designed it would be fine to increase this timeout arbitrarily high though. It would be nice to further embellish the testing of these constructs, but this is a fairly good investment:reward tradeoff (and at the moment in particular, everything is about balance of investment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, this is lightweight enough I would hope 2s wasn’t unreasonable for any environment. Have you seen timeouts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to fix repeat job to work with unit tests and on accord... that will help collect data about how stable this test is

Copy link
Contributor

@dcapwell dcapwell Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ junit_report_viewer.py
All tests successful

1000 iterations with weaker resources and was stable, so guess we can leave this alone for now

import static org.apache.cassandra.service.accord.AccordExecutor.Mode.RUN_WITH_LOCK;
import static org.apache.cassandra.utils.Clock.Global.nanoTime;

class AccordExecutorInfiniteLoops implements Shutdownable
class AccordExecutorLoops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why you moved away from implements Shutdownable? During shutdown we don't look to directly tell this class anymore that we are shutting down, and we hope that each thread was told to shutdown; this feels like it should cause stability issues during nodetool drain and jvm-dtest shutdown no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking closer this class is only used by: AccordExecutorAsyncSubmit, AccordExecutorSemiSyncSubmit, AccordExecutorSyncSubmit. So this class really doesn't care about directly being shutdown

private final AtomicInteger running = new AtomicInteger();
private final Condition terminated = Condition.newOneTimeCondition();

public AccordExecutorLoops(Mode mode, int threads, IntFunction<String> name, Function<Mode, Runnable> tasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this class no longer Loops this name feels off to me. We define that there are N threads, and we are done w/e all N finish running their task. This class doesn't really have anything to do with "looping", nor does it enforce that the sub-tasks are looped (old logic did enforce this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is arbitrary and to me it still makes perfect sense to call these threads “loops” since that’s what they do - we could if you like rename the loop provider from tasks to loops, if it helps your mental model.

public Object shutdownNow()
{
shutdown();
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the semantics that we don't to drop the backlog, yet this abstract class looks to push that to each implementation. In the locking case we only check if its shutdown if we get a null task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m afraid I don’t follow. This could be changed really as we don’t implement Shutdownable, so we don’t need to return anything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would remove this method to avoid confusion. shutdownNow normally clears the backlog, so if you see this method you assume it would as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment above was mistaken (done from a phone on holiday making assumptions about code location). This place does implement Shutdownable. Removing this therefore means not implementing Shutdownable. The description for shutdownNow in Shutdownable makes clear it is up to the implementation whether shutdownNow actually does anything to stop ongoing tasks, and this is true even for ExecutorService (though, the description there is that it is 'best effort', it makes clear there are no guarantees which is broadly equivalent to implementation defined)

I don't terribly mind not implementing Shutdownable though, if you're strongly of this view.

@Override
public void shutdown()
{
shutdown = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't volatile so how do we make sure the visibility is correct for all versions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is what i see for AccordExecutorSyncSubmit

  1. shutdown is called in random thread during shutdown
  2. notifyWorkExclusive does hasWork.signal();
  3. with lock case only checks this lock once we have no work to do, which is after we access this field

so from a JMM point of view this doesn't look like it actually forces the visibility?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we don't block new work once we shutdown, so if the node is very busy this will keep going?

Copy link
Contributor Author

@belliottsmith belliottsmith Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visibility is enforced by other volatile accesses. The writer notifies work as you correctly surmised (which involves a volatile write), and the readers are all guarded by volatile reads on the lock they are guarded by (that we notify of work). The volatile keyword on this field would provide no guarantees on its own and offers no additional guarantees in concert with the lock.

it is true that we don’t reject new work but this executor should be shutdown after networking and we should not really reject any work being produced by the local machine during shutdown.

@dcapwell
Copy link
Contributor

is there a reason this PR depends on apache/cassandra-accord#183? As far as i can tell they are unrelated?

@dcapwell
Copy link
Contributor

Triggering CI to help validate the shutdown logic and the impact on CI. The shutdown semantic is the only thing really open so want to collect data on this.

@dcapwell
Copy link
Contributor

Ran CI and the shutdown logic is causing tests to fail, we don't close the cluster within 3m

@belliottsmith
Copy link
Contributor Author

is there a reason this PR depends on apache/cassandra-accord#183? As far as i can tell they are unrelated?

They are related in the sense that they each represent the latest batch of significant fixes for either side. It just so happens this time that it appears the two sides do not actually intersect. I will update the description and remove the dependency.

@dcapwell dcapwell force-pushed the cep-15-accord branch 2 times, most recently from 7be53d5 to de9925f Compare April 2, 2025 21:37
@@ -24,7 +24,7 @@
import java.util.function.IntFunction;

import accord.utils.Invariants;
import org.agrona.collections.LongHashSet;
import io.netty.util.collection.LongObjectHashMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why you moved off agrona to netty's. Not against it, just wondering if there was a reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mistake - intended only to switch to a map to avoid unlikely scenario of thread if being reused for a dead thread (close to impossible given how we use it, but safer). Will switch to Agrona map version for consistency

notifyWorkExclusive();
notifyWork();
lock.lock();
try
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this try/finally isn't needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, leftover it looks like, not sure how that happened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants