Skip to content

Conversation

nvictus
Copy link
Member

@nvictus nvictus commented Aug 31, 2025

This PR migrates from custom HTTP fetching to obspec/obstore for standardized cloud storage access: HTTP(s), S3, GCS, Azure. This includes implementing a custom obspec store for FTP and a CachedStore wrapper to manage the metadata caching and the two-tier caching of blocks.

There is no longer a separate mount point for each URL scheme. Instead the url scheme is given explicitly and the request is dispatched to the appropriate storage client, e.g.:

simple-httpfs -f --log log.txt /tmp/cloud &

cat /tmp/cloud/https://example.com/README.txt...
cat /tmp/cloud/s3://example/README.txt...

Configuring the storage clients and their request behavior is supported through the store_configs, credential_providers, client_options, and retry_config options. These are not yet exposed via the CLI, but can also be configured with environment variables (see docs). The default behavior includes retry attempts. When no storage config is provided for an object storage backend, we set skip_signature to True to support public buckets.

Additional updates

FUSE interface

  • Improved FUSE operations and error reporting.
  • Implemented the readdir operation to support ls.
  • The trailing sentinel string to recognize URLs is now customizable.
  • Suppress the use of OS cruft files on MacOS (noappledouble)

Modernization

  • Package infra: Migrated from setup.py/versioneer to pyproject.toml with hatchling backend, uv, and ruff.
  • Testing: Added tests covering all major components (HttpFs, caching layers, FTP store). Added a smoke test bash script for manual use as well.
  • Type checking: Added type annotations and strict mypy configuration.

Issues

  • Native background mode doesn't seem to work properly on a mac.

nvictus added 28 commits August 17, 2025 13:32
- Fix logic in _resolve_search_dir and _resolve_path functions
- Add detailed docstrings explaining prefix handling behavior
- Clarify comments about directory vs file prefix logic
- Add tests/test_caching.py with 40 tests for LRUCache and CachedStore
- Add tests/test_httpfs.py with 30 tests for FUSE operations
- Test multi-tier caching behavior and scheme-based cache isolation
- Test path_to_url helper, load_store function, and all FUSE methods
- Use obstore.MemoryStore for realistic testing scenarios
- Include proper exception handling and error condition testing
- Refactor CachedStore to use base_url parameter instead of separate scheme
- Add dedicated cache key generation methods for metadata and blocks
- Include URI scheme in cache keys to prevent collisions across storage backends
- Fix critical bug in get_range() method passing wrong path parameter
- Simplify list_with_delimiter() implementation
- Update to use obspec.exceptions for proper error handling with map_exception()
- Replace isinstance checks with proper exception mapping
- Enhance error handling in getattr(), readdir(), and read() methods
- Add proper FUSE error codes (EACCES, ENOENT, EIO) for different error types
- Update _load_cached_store() method for consistent store creation
- Improve logging and debug output formatting
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the simple-httpfs codebase to modernize its architecture by replacing custom HTTP fetching with standardized cloud storage access via obstore/obspec libraries. The main purpose is to support HTTP(s), S3, GCS, Azure, and FTP protocols through a unified API while implementing a two-tier caching system.

Key changes include:

  • Migration from custom HTTP fetching to obspec/obstore for standardized cloud storage access
  • Implementation of a custom FTP store following the obspec protocol
  • Complete modernization of the package infrastructure from setup.py to pyproject.toml
  • Addition of comprehensive test coverage across all major components

Reviewed Changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
simple_httpfs/httpfs.py Complete rewrite of the core FUSE filesystem using obstore/obspec APIs with improved error handling
simple_httpfs/_caching.py New caching layer implementation with LRU cache and CachedStore wrapper
simple_httpfs/_ftp.py Custom FTP store implementation following obspec protocol
tests/ Comprehensive test suite covering HttpFs, caching layers, and FTP store functionality
pyproject.toml Modern Python packaging configuration replacing setup.py

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +212 to +214
block = self.store.get_range(
"", start=block_num * self.block_size, length=self.block_size
)
Copy link

Copilot AI Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The empty string path parameter ('') appears to be a magic value. Consider adding a comment explaining why an empty path is used here or defining it as a named constant.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant