Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Separate location for ancillary for cardano database v1 #2380

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

Alenar
Copy link
Collaborator

@Alenar Alenar commented Mar 19, 2025

Content

This PR upgrade cardano database v1 snapshots, instead of producing one archive that contains the whole database, two archives are produced:

  • the main archive that contains all completed immutable files, identified using the same field as before in the artifact (locations and size).
  • an ancillary archive that contains the last ledger and the uncompleted immutable trio, identified with two new optional fields (ancillary_locations and ancillary_size).

No breaking changes are introduced

  • for a client that is not updated: it would still be able to validate a restored database, but the ancillary files won't be downloaded leading to a slow start of the restored node.
  • for updated client: no new options are added on the public api, meaning that the download of the ancillary files is seamless to users and would produce the same result as when there where only one archive.

Details

open-api & mithril-common

  • Define and add new optional ancillary_locations and ancillary_size fields to Snapshot artifact and messages
  • Snapshot: removed new ctor, the additions of the new fields triggered the "more than 7 parameters" clippy lint, and since it was not used that much it was more easy to just construct the structure directly than introduce a sub-structure to solve the issue.
  • DummyCardanoDb: store the path to the created files in the ledger directory

mithril-aggregator

  • CardanoImmutableFilesFullArtifactBuilder: separate build and upload for completed immutable files and ancillary
  • LocalSnapshotUploader: removed, its role was to copy the file to its storage location (something also covered by the LocalUploader) and to compute the location uri, but that uri was targeting the /artifact/snapshot/{digest}/download route which could not be easily reused for ancillary download. Instead a LocalUploader is used meaning that, when local upload is set, the locations in the artifact are now /snapshot_download/{filename}.
  • snapshotter:
    • add snapshot_all_completed_immutables
    • remove snapshot_all since they are no usage anymore

mithril-client

  • snapshot_client: updated to also download the ancillary archive if a location is set in ancillary_locations.
  • feedback: add three new events for cardano db v1 ancillary archive download (SnapshotAncillaryDownloadStarted, SnapshotAncillaryDownloadProgress, and SnapshotAncillaryDownloadCompleted).
  • refactor integration tests:
    • promote the fake.rs as a directory and split its component so there's one file for each integration tests
    • put the utility functions to create fake snapshots archive into a snapshot_archives.rs module so they can easily be reused in multiple scenario.

mithril-client-cli

  • update cardano-db-v1 disk space requirement check so the ancillary archive size is taken into account
  • update feedback mechanism to add a progress bar when cardano-db-v1 ancillary file is downloaded
  • fixed dev.json configuration file: the genesis_verification_key was not the one used in mithril-end-to-end

example-cardano-database (v1)

  • update feedback mechanism to add a progress bar when cardano-db-v1 ancillary file is downloaded

Pre-submit checklist

  • Branch
    • Tests are provided (if possible)
    • Crates versions are updated (if relevant)
    • CHANGELOG file is updated (if relevant)
    • Commit sequence broadly makes sense
    • Key commits have useful messages
  • PR
    • All check jobs of the CI have succeeded
    • Self-reviewed the diff
    • Useful pull request description
    • Reviewer requested
  • Documentation
    • Update documentation website (if relevant)
    • No new TODOs introduced

Issue(s)

Relates to #2362

Copy link

github-actions bot commented Mar 19, 2025

Test Results

    3 files  ± 0     57 suites  ±0   11m 38s ⏱️ +3s
1 778 tests + 6  1 778 ✅ + 6  0 💤 ±0  0 ❌ ±0 
2 184 runs  +14  2 184 ✅ +14  0 💤 ±0  0 ❌ ±0 

Results for commit 80668fe. ± Comparison against base commit 81e90f7.

This pull request removes 14 and adds 20 tests. Note that renamed tests count towards both.
mithril-aggregator ‑ artifact_builder::cardano_immutable_files_full::tests::snapshot_archive_name_after_beacon_values
mithril-aggregator ‑ artifact_builder::cardano_immutable_files_full::tests::snapshot_archive_name_after_compression_algorithm
mithril-aggregator ‑ file_uploaders::local_snapshot_uploader::tests::should_copy_file_to_target_location
mithril-aggregator ‑ file_uploaders::local_snapshot_uploader::tests::should_error_if_path_is_a_directory
mithril-aggregator ‑ file_uploaders::local_snapshot_uploader::tests::should_extract_digest_to_deduce_location
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database::complete_without_start_should_not_panic
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database::completed_immutable_downloads_bump_progress
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database::start_including_ancillary_add_one_to_total_downloads
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database::start_then_complete_should_remove_multi_progress_bar
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database::starting_ancillary_downloads_should_add_a_progress_bar
…
mithril-aggregator ‑ artifact_builder::cardano_immutable_files_full::tests::snapshot_archive_name_include_beacon_and_network_values
mithril-aggregator ‑ file_uploaders::dumb_uploader::tests::get_history_of_multiple_uploads
mithril-aggregator ‑ services::snapshotter::compressed_archive_snapshotter::tests::snapshot_all_completed_immutables::include_only_completed_immutables
mithril-aggregator ‑ services::snapshotter::test_doubles::tests::dumb_snapshotter::set_dumb_snapshotter_archive_size
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database_v1::start_and_progress_ancillary_download_with_a_size_of_zero_should_not_crash
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database_v1::start_then_complete_should_remove_ancillary_progress_bar
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database_v1::start_then_complete_should_remove_immutables_progress_bar
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database_v1::starting_full_immutables_and_ancillary_together_spawn_two_progress_bars
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database_v2::complete_without_start_should_not_panic
mithril-client-cli ‑ utils::feedback_receiver::tests::cardano_database_v2::completed_immutable_downloads_bump_progress
…

♻️ This comment has been updated with latest results.

@Alenar Alenar force-pushed the ensemble/2362/cdb_v1/separate_location_for_ancillary branch from 6883aba to a3294c6 Compare March 19, 2025 14:16
Alenar and others added 13 commits March 19, 2025 15:18
This will supersede `snapshot_all` since for v1 snapshots the ancillary
files will be separated into another archive.

Co-authored-by: Damien Lachaume <[email protected]>
Instead of including the ancillary files.

Also suppress the now unused `snapshot_all` from the `Snapshotter` trait.

Co-authored-by: Damien Lachaume <[email protected]>
… creation in cdb v1 artifact builder

In order to re-use the name for the ancillary archive later.

Co-authored-by: Damien Lachaume <[email protected]>
…oads

Superseding the previously used `LocalSnapshotUploader`.

This is needed because the previous uploaders return the same location
for two differents path that contains a same digest, meaning that the
ancillary location was targetting the immutable location in snapshot
artifacts.

Co-authored-by: Damien Lachaume <[email protected]>
By splitting the `fake` module into several files so they can be browsed
more easily.

Co-authored-by: Damien Lachaume <[email protected]>
…fake ancillary archive

Co-authored-by: Damien Lachaume <[email protected]>
…db v1 snapshot

+ add new feedback events related to this download.

Co-authored-by: Damien Lachaume <[email protected]>
@Alenar Alenar force-pushed the ensemble/2362/cdb_v1/separate_location_for_ancillary branch from a3294c6 to e582efe Compare March 19, 2025 14:18
@Alenar Alenar temporarily deployed to testing-preview March 19, 2025 14:27 — with GitHub Actions Inactive
@Alenar Alenar requested review from jpraynaud and sfauvel March 19, 2025 15:04
@Alenar Alenar marked this pull request as ready for review March 19, 2025 15:05
@Alenar Alenar temporarily deployed to testing-preview March 19, 2025 15:31 — with GitHub Actions Inactive
Copy link
Collaborator

@sfauvel sfauvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -151,7 +151,7 @@ impl CardanoDbDownloadCommand {
CardanoDbDownloadChecker::ensure_dir_exist(db_dir)?;
if let Err(e) = CardanoDbDownloadChecker::check_prerequisites_for_archive(
db_dir,
cardano_db.size,
cardano_db.size + cardano_db.ancillary_size.unwrap_or(0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it make sense to create a function on Snapshot that compute this total size ?

pub size: u64,

/// Locations where the binary content of the snapshot can be retrieved
/// Size of the ancillary files in Bytes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a structure that can be created with size + locations which can be used for immutable and ancillary (with an option).
It may be too much work to do that without a real value.

Copy link
Member

@jpraynaud jpraynaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@@ -126,7 +127,7 @@ pub struct SnapshotClient {
#[cfg(feature = "fs")]
http_file_downloader: Arc<dyn FileDownloader>,
#[cfg(feature = "fs")]
feedback_sender: FeedbackSender,
_feedback_sender: FeedbackSender,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why this field is prefixed with an underscore? Is it to avoid a breaking change in the API?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment was added to clarify the reason for the underscore prefix.

Comment on lines 28 to 31
// Ugly horror needed to update the snapshot location after the server is started, server
// which need said snapshot to start and run in another thread.
// The RwLock is needed to mutate the value across threads and the Arc to transfer it
// to the server thread while keeping an access on the main thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this comment is not really relevant:

Suggested change
// Ugly horror needed to update the snapshot location after the server is started, server
// which need said snapshot to start and run in another thread.
// The RwLock is needed to mutate the value across threads and the Arc to transfer it
// to the server thread while keeping an access on the main thread.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was removed.

debug!(self.logger, ">> create_ancillary_snapshot_archive");

let snapshotter = self.snapshotter.clone();
let snapshot_name = format!("ancillary-{base_file_name_without_extension}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that this format is better as it will be easy to see files related to the same snapshot when listed alphabetically:

Suggested change
let snapshot_name = format!("ancillary-{base_file_name_without_extension}");
let snapshot_name = format!("{base_file_name_without_extension}-ancillary");

Copy link
Collaborator

@dlachaume dlachaume Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, we modified the name by adding .ancillary at the end to both meet your request and the function that extracts the digest from the file name:

let snapshot_name = format!("{base_file_name_without_extension}.ancillary");

…illary' at the end, and removal of the explanation comment about the technical reason for the use of RwLock in the test tooling of client integration tests.
@dlachaume dlachaume force-pushed the ensemble/2362/cdb_v1/separate_location_for_ancillary branch from 5581b05 to f2da5a6 Compare March 21, 2025 13:33
@dlachaume dlachaume temporarily deployed to testing-preview March 21, 2025 13:42 — with GitHub Actions Inactive
@dlachaume dlachaume temporarily deployed to testing-preview March 21, 2025 14:45 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants