-
Notifications
You must be signed in to change notification settings - Fork 14
Refactor storage access layer using obstore and obspec #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Fix logic in _resolve_search_dir and _resolve_path functions - Add detailed docstrings explaining prefix handling behavior - Clarify comments about directory vs file prefix logic
- Add tests/test_caching.py with 40 tests for LRUCache and CachedStore - Add tests/test_httpfs.py with 30 tests for FUSE operations - Test multi-tier caching behavior and scheme-based cache isolation - Test path_to_url helper, load_store function, and all FUSE methods - Use obstore.MemoryStore for realistic testing scenarios - Include proper exception handling and error condition testing
- Refactor CachedStore to use base_url parameter instead of separate scheme - Add dedicated cache key generation methods for metadata and blocks - Include URI scheme in cache keys to prevent collisions across storage backends - Fix critical bug in get_range() method passing wrong path parameter - Simplify list_with_delimiter() implementation
- Update to use obspec.exceptions for proper error handling with map_exception() - Replace isinstance checks with proper exception mapping - Enhance error handling in getattr(), readdir(), and read() methods - Add proper FUSE error codes (EACCES, ENOENT, EIO) for different error types - Update _load_cached_store() method for consistent store creation - Improve logging and debug output formatting
64015cc
to
f36122b
Compare
f36122b
to
9363f9c
Compare
9363f9c
to
02a6a52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the simple-httpfs codebase to modernize its architecture by replacing custom HTTP fetching with standardized cloud storage access via obstore/obspec libraries. The main purpose is to support HTTP(s), S3, GCS, Azure, and FTP protocols through a unified API while implementing a two-tier caching system.
Key changes include:
- Migration from custom HTTP fetching to obspec/obstore for standardized cloud storage access
- Implementation of a custom FTP store following the obspec protocol
- Complete modernization of the package infrastructure from setup.py to pyproject.toml
- Addition of comprehensive test coverage across all major components
Reviewed Changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
simple_httpfs/httpfs.py | Complete rewrite of the core FUSE filesystem using obstore/obspec APIs with improved error handling |
simple_httpfs/_caching.py | New caching layer implementation with LRU cache and CachedStore wrapper |
simple_httpfs/_ftp.py | Custom FTP store implementation following obspec protocol |
tests/ | Comprehensive test suite covering HttpFs, caching layers, and FTP store functionality |
pyproject.toml | Modern Python packaging configuration replacing setup.py |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
block = self.store.get_range( | ||
"", start=block_num * self.block_size, length=self.block_size | ||
) |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The empty string path parameter ('') appears to be a magic value. Consider adding a comment explaining why an empty path is used here or defining it as a named constant.
Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
This PR migrates from custom HTTP fetching to obspec/obstore for standardized cloud storage access: HTTP(s), S3, GCS, Azure. This includes implementing a custom obspec store for FTP and a
CachedStore
wrapper to manage the metadata caching and the two-tier caching of blocks.There is no longer a separate mount point for each URL scheme. Instead the url scheme is given explicitly and the request is dispatched to the appropriate storage client, e.g.:
simple-httpfs -f --log log.txt /tmp/cloud & cat /tmp/cloud/https://example.com/README.txt... cat /tmp/cloud/s3://example/README.txt...
Configuring the storage clients and their request behavior is supported through the
store_configs
,credential_providers
,client_options
, andretry_config
options. These are not yet exposed via the CLI, but can also be configured with environment variables (see docs). The default behavior includes retry attempts. When no storage config is provided for an object storage backend, we setskip_signature
to True to support public buckets.Additional updates
FUSE interface
readdir
operation to supportls
.noappledouble
)Modernization
Issues