-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ENH] Support list_prefix operations for S3 and AC/S3. #4637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rescrv/wal3-gc
Are you sure you want to change the base?
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
options: GetOptions, | ||
) -> Result<Vec<String>, StorageError> { | ||
let atomic_priority = Arc::new(AtomicUsize::new(options.priority.as_usize())); | ||
let _permit = self.rate_limiter.enter(atomic_priority, None).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should list be part of the same rate limiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. It's a request to S3. A new tier might make sense, but I think rate limiting it as a read makes sense absent any other direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current rate limits assume a read is roughly 8MB and tries to saturate the network bandwidth accordingly. Seems like this has different characteristics and might not be ok to be used with the same rate limiter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make it separate? All I'm seeing is that we'll under perform during lists. Given that the only lists are offline ops, it seems OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If lists don't compete with the usual reads and writes for tokens then probably fine to keep it, otherwise I'd make it separate
Can you please add some discussion of the intended consumer of this? |
Add S3 list_prefix support and implement WAL3 garbage collection CPU-side logic This PR adds support for a new Key Changes: Affected Areas: Potential Impact: Functionality: Enables WAL3-based garbage collection with verifiable setsum accounting; adds ability to list objects via prefix for S3-like backends. No breaking API changes for external users unless they depend on manifest internal fields. Performance: No significant runtime/latency penalties; list_prefix operation is paginated and used for GC/bookkeeping operations, not critical path. Some extra setsum tracking may slightly increase manifest size. Security: No negative impact; manifests and GC logic maintain strong integrity invariants via setsum. Scalability: Scales to large logs; list_prefix paginates and GC happens out-of-band. No increases to hot path object storage contention or batch sizes. Review Focus: Testing Needed• Run all new and existing property-based and integration tests; ensure manifest/garbage collection invariants are not violated Code Quality Assessmentrust/wal3/src/manifest.rs: Model changes are backward-compatible and clearly documented; code is modular with added GC application/check logic rust/storage/src/s3.rs: Standard paginator ListObjectsV2 usage; good error handling and testing; aligns with AWS SDK best practices rust/storage/src/admissioncontrolleds3.rs: Appropriate rate-limiting, prioritized requests, follows review suggestion to rate limit list_prefix rust/wal3/tests/properties.rs: Thorough property/proptest-based testing of manifest+GC interaction rust/wal3/src/gc.rs: Well-structured, clearly documented; correct use of setsum differential accounting, explicit error cases for corruption Best PracticesTesting: Error Handling: Documentation: Modularity: Potential Issues• Manifest backward compatibility: If old code persists and loads manifests, they may see missing 'collected' field (protected by default serde behavior, but deployments should be checked) This summary was automatically generated by @propel-code-bot |
This adds support for a list_prefix operation that calls the V2 ListOBjects API in the AWS SDK. I've included an integration test.
Description of changes
This adds support for a list_prefix operation that calls the V2
ListOBjects API in the AWS SDK. I've included an integration test.
The intended consumer of this API is the wal3 garbage collector,
which needs to list cursors in the directory to find the lowest-version
cursor.
Test plan
Integration test added.
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
N/A