Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new SAT-based, process-isolated quorum intersection checker #4653

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jayz22
Copy link
Contributor

@jayz22 jayz22 commented Feb 24, 2025

Description

Resolves #4106

New quorum intersection checker take 2 (with process-isolation, supersedes #4629).

  • Bridge implementation for stellar/stellar-quorum-analyzer, the underneath Rust library that utilizes SAT-solving.
  • New solver can be toggled via config flag USE_QUORUM_INTERSECTION_CHECKER_V2, or if running in command line --v2
  • Rust side: New quorum intersection checker stellar-quorum-analyzer#1
  • Provides process isolation -- the live network quorum health monitoring runs stellar-core check-quorum-intersection in a separate process
  • Resource bounding and interruption -- the new solver can specify time and memory limits (QUORUM_INTERSECTION_CHECKER_TIME_LIMIT_MS and QUORUM_INTERSECTION_CHECKER_MEMORY_LIMIT_BYTES in the config file, or --time-limit-ms and `--memory-limit-bytes if running command-line mode).

This approach is significantly more performant than the current one, allowing live monitoring of network beyond 7 tier-1 organizations.

A single quorum-intersection check for a 16-org tier 1 quorum configuration sample took roughly 2ms, whereas previously it was unable to solve.
(Here are some sample quorums configs, you can check them by running the following command
$ stellar-core check-quorum-intersection ~/$PATH-TO-RUST-REPO/tests/test_data/random/for_stellar_core/almost_symmetric_network_16_orgs_delete_prob_factor_11_for_stellar_core.json --result-json output.json under your favorite benchmark tool)

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

(Marking draft for stellar/stellar-quorum-analyzer#1 to be merged first)

Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking this looks great! Really clear work. Couple minor comments that I'm happy to see deferred to followups if you just want to get this landed, I realize we don't have a lot more iterations left before I'm away.

mLastQuorumMapIntersectionState, mApp.getProcessManager(), true,
mApp.getConfig().QUORUM_INTERSECTION_CHECKER_TIME_LIMIT_MS,
mApp.getConfig().QUORUM_INTERSECTION_CHECKER_MEMORY_LIMIT_BYTES);
// TODO: parse the metrics and send to medida
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not going to do this in this PR, can you file it as a followup?

@@ -1858,21 +1911,21 @@ HerderImpl::checkAndMaybeReanalyzeQuorumMap()
ZoneScoped;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this prefix of the v1 and v2 code -- the bit that checks whether it is stable, or already running, etc. -- seems like it's repeated between the two. Do you think it could be factored out easily?

Comment on lines +20 to +33
template <typename T>
std::vector<uint8_t>
toVec(T const& t)
{
return std::vector<uint8_t>(xdr::xdr_to_opaque(t));
}

template <typename T>
CxxBuf
toCxxBuf(T const& t)
{
return CxxBuf{std::make_unique<std::vector<uint8_t>>(toVec(t))};
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated code from InvokeHostFunctionOpFrame.cpp. Please factor out into src/utils/types.h or add a src/rust/RustUtils.h or something?

namespace stellar
{

// static adaptor class over rust QuorumChecker that exposes its functionalities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal (and you don't need to change if you're happy as-is) but .. generally I think if we have a bunch of essentially static/free functions, we'd just stick them in a sub-namespace rather than a class-full-of-static-methods. Like the class part seems a little odd if nobody ever instantiates it. But I'm not too particular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The quorum-intersection checker cannot handle large networks
2 participants