Add timeouts, and test restore with network partition#2247
Open
aredridel wants to merge 2 commits into
Open
Conversation
Add integration tests for libsql-server bottomless replication restore behavior when interrupted by various failure modes. Tests verify sqld can resume and complete an interrupted restore from S3-compatible object storage (minio) without requiring a restart. Test cases: - basic_restore: Sanity check that sqld restores from minio - sqld_interrupted: sqld killed mid-restore, restarted, completes - minio_interrupted: minio stopped mid-restore, restarted, sqld retries - network_partition: sqld disconnected from network mid-restore, reconnected Infrastructure: - Docker-based fixtures with isolated networks per test - Unique container/network names and ports via atomic counters - Port mapping (not host networking) for isolation - Automatic cleanup of Docker resources after each test Files added: - tests/bottomless/mod.rs - tests/bottomless/fixtures.rs - tests/bottomless/basic_restore.rs - tests/bottomless/sqld_interrupted.rs - tests/bottomless/minio_interrupted.rs - tests/bottomless/network_partition.rs - tests/bottomless/README.md Files modified: - tests/tests.rs: Add bottomless module - Cargo.toml: Add reqwest dev-dependency, remove duplicate hex
- Add LIBSQL_BOTTOMLESS_S3_READ_TIMEOUT_SECS (default 5s) - Add LIBSQL_BOTTOMLESS_S3_CONNECT_TIMEOUT_SECS (default 5s) - Add LIBSQL_BOTTOMLESS_S3_OPERATION_ATTEMPT_TIMEOUT_SECS (default 10s) - Configure TimeoutConfig on aws_sdk_s3::Config in bottomless::replicator::Options::client_config() - Update meta_store.rs Options construction to include new timeout fields - Remove #[ignore] from network_partition test - Fix test fixtures: endpoint timing, image caching, mut minio
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I'm happy to discuss and/or rework this, but I found that sqld did not recover when object storage was non-responsive.
Adding timeouts to the S3 library fixes this for me, and failures are detected.