Skip to content

feat(electrum): optimize merkle proof validation with batching #1957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

LagginTimes
Copy link
Contributor

Replaces #1908, originally authored by @Keerthi421.
Fixes #1891.

Description

This PR optimizes the Electrum client's performance by improving Merkle proof validation, addressing the significant performance regression in BDK 1.1.0 where full sync time increased from 4s to 26s.

Key improvements:

  • Implemented batch processing for Merkle proof validations.
  • Added Merkle proof caching to prevent redundant network calls.
  • Optimized header handling with pre-fetching and reuse.
  • Modified core functions to use batch operations instead of individual calls.

Also adds reorg-safe eviction of stale proofs: before each Merkle batch, we verify the highest cached block’s hash against the live tip and, on mismatch, evict proofs until the fork point is reached.

Notes to the reviewers

The optimization approach focuses on three main areas:

  1. Reducing network round trips through batched Merkle proof requests.
  2. Minimizing redundant operations with a new Merkle proof cache.
  3. Improving header handling efficiency with pre-fetching.

The batch size is set to 100 as a balance between performance and memory usage. This value can be adjusted based on testing results.

Changelog notice

  • New Merkle proof cache to prevent redundant network calls.
  • Batch processing for Merkle proof validations.
  • Performance tests to verify sync time improvements.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

@LagginTimes LagginTimes requested a review from evanlinjin May 15, 2025 19:06
@LagginTimes LagginTimes self-assigned this May 15, 2025
Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving this forward.

This is not a full review, but I think it's enough to push this PR in a good direction.

Comment on lines +28 to +29
/// The Merkle proof cache
merkle_cache: Mutex<HashMap<(Txid, BlockHash), GetMerkleRes>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be more efficient if we cache anchors instead of GetMerkleRes here.

Comment on lines +566 to +592
/// Remove any proofs for blocks that may have been re-orged out.
///
/// Checks if the latest cached block hash matches the current chain tip. If not, evicts proofs
/// for blocks that were re-orged out, stopping at the fork point.
fn clear_stale_proofs(&self) -> Result<(), Error> {
let mut cache = self.merkle_cache.lock().unwrap();

// Collect one (height, old_hash) pair per proof.
let mut entries: Vec<(u32, BlockHash)> = cache
.iter()
.map(|((_, old_hash), res)| (res.block_height as u32, *old_hash))
.collect();

// Sort descending and dedup so we only check each height once.
entries.sort_unstable_by(|a, b| b.0.cmp(&a.0));
entries.dedup();

// Evict any stale proofs until fork point is found.
for (height, old_hash) in entries {
let current_hash = self.fetch_header(height)?.block_hash();
if current_hash == old_hash {
break;
}
cache.retain(|&(_txid, bh), _| bh != old_hash);
}
Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorgs don't happen that often so we won't have much "extra data". This method looks like it's O(n^2). Let's remove it.

Comment on lines +318 to 322
// Batch validate all collected transactions.
if !txs_to_validate.is_empty() {
let proofs = self.batch_fetch_merkle_proofs(&txs_to_validate)?;
self.batch_validate_merkle_proofs(tx_update, proofs)?;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having every populate_with_{} method call this internally, it will be more efficient and make more logical sense if we extract this so that we only call it at the end of full_scan and sync.

In other words, populate_with_{} should no longer fetch anchors. Instead, they should either mutate, or return a list of (Txid, BlockId) for which we try to fetch anchors for in a separate step.

It will be even better if full txs are fetched in a separate step too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Electrum client Performance issues
3 participants