Skip to content

Conversation

aawsome
Copy link
Member

@aawsome aawsome commented May 31, 2025

This PR refactors indexer and packer into a version which is completely independent from I/O and concurrency but simply provides the functionality.
This can then be separated into real "core" functionality.

As a side effect, this PR solves the following topics:

  • Ensure that really no duplicate blobs are saved (closes The copy command may copy duplicate data  #146)
  • Solves potential race conditions (which in theory could lead to data loss) between finalizing packer/indexed and file_saver by using self in the finalize methods.

Copy link

codecov bot commented Jun 4, 2025

Codecov Report

❌ Patch coverage is 61.81102% with 97 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.0%. Comparing base (6d3577d) to head (9555d5b).

Files with missing lines Patch % Lines
crates/core/src/blob/repopacker.rs 60.1% 53 Missing ⚠️
crates/core/src/commands/prune.rs 56.6% 13 Missing ⚠️
crates/core/src/blob/pack_sizer.rs 65.0% 7 Missing ⚠️
crates/core/src/blob/packer.rs 73.9% 6 Missing ⚠️
crates/core/src/commands/repair/index.rs 0.0% 6 Missing ⚠️
crates/core/src/index/indexer.rs 89.2% 3 Missing ⚠️
crates/core/src/backend/decrypt.rs 33.3% 2 Missing ⚠️
crates/core/src/commands/copy.rs 0.0% 2 Missing ⚠️
crates/core/src/commands/merge.rs 0.0% 2 Missing ⚠️
crates/core/src/commands/repair/snapshots.rs 0.0% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
Files with missing lines Coverage Δ
crates/core/src/archiver/file_archiver.rs 63.1% <100.0%> (ø)
crates/core/src/archiver/tree_archiver.rs 69.7% <100.0%> (ø)
crates/core/src/blob.rs 72.7% <ø> (-7.3%) ⬇️
crates/core/src/chunker.rs 50.5% <ø> (-1.2%) ⬇️
crates/core/src/archiver.rs 58.9% <66.6%> (-0.8%) ⬇️
crates/core/src/backend/decrypt.rs 48.1% <33.3%> (+0.2%) ⬆️
crates/core/src/commands/copy.rs 0.0% <0.0%> (ø)
crates/core/src/commands/merge.rs 0.0% <0.0%> (ø)
crates/core/src/commands/repair/snapshots.rs 0.0% <0.0%> (ø)
crates/core/src/index/indexer.rs 91.6% <89.2%> (+45.5%) ⬆️
... and 5 more

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aawsome aawsome requested a review from simonsan June 5, 2025 10:39
@aawsome aawsome added the S-waiting-for-review Status: PRs waiting for review label Jun 5, 2025
Copy link
Contributor

@simonsan simonsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the direction, feel like there is a bit more error handling needed at certain places. Need to take another, deeper look at RepoPacker, few TODOs and docs still open.

pub fn needs_save(&self) -> bool {
// check if IndexFile needs to be saved
let elapsed = self.created.elapsed().unwrap_or_else(|err| {
warn!("couldn't get elapsed time from system time: {err:?}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be part of the error handling here and not a warning in the logs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the error handling. When we can't get an elapsed time when we check if an index file should be saved, we have two options:

  1. return an error which would mean we don't finish the backup
  2. omit the check on the elapsed time, i.e. only write when the index file is "full" and warn the user. This will successfully finish the backup.

I strongly prefer option 2) which is the current error handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't what I meant, what I meant was if we have a warning in the log, user of rustic_core are not able to handle this, instead of a log warning this should return a rustic_error with the message so a user can handle this if we get to it from somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could discuss is we don't log the warning but just silently handle this case.
Anyway, this discussion doesn't belong to this PR, IMO, as the behavior is not changed w.r.t. main, see https://github.com/rustic-rs/rustic_core/blob/main/crates/core/src/blob/packer.rs#L610.

finish: finish_rx,
};

let _join_handle = std::thread::spawn(move || {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to use the join handles somewhere and join the threads, e.g. in case we get Ctrl + C or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. Actually the _join_handle_raw is joined by waiting for the status which is sent via a channel. And the join_handle is joined by the fact that it sends to the second thread, i.e. the _join_handle_raw will not finish unless _join_handle finishes. I'm not aware if there is a problem about something like Ctrl+C - can you depict what would be happening in this case?

Copy link
Contributor

@nardoor nardoor Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the threads can terminate quite "normally" even in case of an interruption in the main thread because they "simply" iterate on the rx:

  • main threads closes
  • tx is closed/dropped
  • _join_handle thread will then terminate
  • tx_raw is closed/dropped
  • _join_handle_raw thread will then terminate

Copy link
Contributor

@nardoor nardoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are nice API changes, it's good to see the BE backend generic type be less propagated if that makes sense.

pub fn save(&self) -> RusticResult<()> {
if (self.file.packs.len() + self.file.packs_to_delete.len()) > 0 {
_ = self.be.save_file(&self.file)?;
pub fn save(&mut self) -> Option<IndexFile> {
Copy link
Contributor

@nardoor nardoor Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has totally changed, some other methods totally changed as well, would you mind quickly updating their documentations pls :)?

Also, do you think it deserves a rename? (same questions for other methods that changed signature)

Comment on lines 174 to 179
/// * `id` - The id to check.
pub fn has(&self, id: &BlobId) -> bool {
pub fn has(&mut self, id: &BlobId) -> bool {
self.indexed
.as_ref()
.as_mut()
.is_some_and(|indexed| indexed.contains(id))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need mut there, is there any reason you left it as such?

finish: finish_rx,
};

let _join_handle = std::thread::spawn(move || {
Copy link
Contributor

@nardoor nardoor Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the threads can terminate quite "normally" even in case of an interruption in the main thread because they "simply" iterate on the rx:

  • main threads closes
  • tx is closed/dropped
  • _join_handle thread will then terminate
  • tx_raw is closed/dropped
  • _join_handle_raw thread will then terminate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-for-review Status: PRs waiting for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The copy command may copy duplicate data

3 participants