|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +py-gtfs-loader is a Python library for loading and manipulating GTFS (General Transit Feed Specification) data. It parses GTFS directories into Python objects with schema validation and provides utilities for reading, modifying, and writing GTFS feeds. |
| 8 | + |
| 9 | +## Development Commands |
| 10 | + |
| 11 | +### Using uv (package manager) |
| 12 | + |
| 13 | +```bash |
| 14 | +# Install dependencies |
| 15 | +uv sync --all-extras --dev |
| 16 | + |
| 17 | +# Run tests |
| 18 | +uv run pytest . |
| 19 | + |
| 20 | +# Run linting |
| 21 | +uv run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics |
| 22 | +uv run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics |
| 23 | + |
| 24 | +# Build package |
| 25 | +uv build |
| 26 | + |
| 27 | +# Run a single test file |
| 28 | +uv run pytest tests/test_runner.py |
| 29 | + |
| 30 | +# Run a specific test case |
| 31 | +uv run pytest tests/test_runner.py::test_default -k "test_name" |
| 32 | +``` |
| 33 | + |
| 34 | +## Architecture |
| 35 | + |
| 36 | +### Core Components |
| 37 | + |
| 38 | +**`gtfs_loader/__init__.py`** - Main entry point with load/patch functions |
| 39 | +- `load(gtfs_dir, ...)`: Parses GTFS directory into structured objects |
| 40 | +- `patch(gtfs, gtfs_in_dir, gtfs_out_dir, ...)`: Modifies and writes GTFS data back to disk |
| 41 | +- Supports both standard GTFS and Transit itinerary format via `itineraries=True` flag |
| 42 | +- CSV and GeoJSON file type support |
| 43 | + |
| 44 | +**`gtfs_loader/schema.py`** - GTFS entity definitions and schemas |
| 45 | +- Defines all GTFS entities (Agency, Route, Trip, Stop, StopTime, etc.) |
| 46 | +- Entity classes have `_schema` attribute describing file structure (ID, grouping, required fields) |
| 47 | +- Two schema collections: `GTFS_SUBSET_SCHEMA` (standard) and `GTFS_SUBSET_SCHEMA_ITINERARIES` (Transit format) |
| 48 | +- Entities reference other entities via `_gtfs` attribute (e.g., `stop_time.stop` resolves to Stop object) |
| 49 | + |
| 50 | +**`gtfs_loader/schema_classes.py`** - Schema metadata system |
| 51 | +- `File`: Describes GTFS file structure (primary key, grouping, file type) |
| 52 | +- `Field`: Named tuple for field configuration (type, required, default) |
| 53 | +- `FileCollection`: Container for file schemas |
| 54 | +- Grouping support: entities with same ID can be grouped by secondary key (e.g., stop_times grouped by trip_id + stop_sequence) |
| 55 | + |
| 56 | +**`gtfs_loader/types.py`** - Custom types and base classes |
| 57 | +- `GTFSTime`: Integer-based time allowing >24h (e.g., "25:30:00" for next-day services) |
| 58 | +- `GTFSDate`: datetime subclass parsing YYYYMMDD and YYYY-MM-DD formats |
| 59 | +- `Entity`: Base class for all GTFS entities, dict-like with `_gtfs` reference to parent collection |
| 60 | +- `EntityDict`: Dict subclass storing resolved field metadata |
| 61 | + |
| 62 | +### Data Flow |
| 63 | + |
| 64 | +1. **Load**: CSV/GeoJSON → parse headers → validate fields → create Entity objects → index by ID → return nested dict structure |
| 65 | +2. **Access**: `gtfs.stops['stop_id']` or `gtfs.stop_times['trip_id'][sequence_index]` |
| 66 | +3. **Patch**: Flatten nested structures → write CSV with correct headers → preserve unmodified files |
| 67 | + |
| 68 | +### Key Patterns |
| 69 | + |
| 70 | +- **Entity indexing**: Primary entities indexed by `id` field, grouped entities create nested dicts/lists |
| 71 | +- **Cross-references**: Entities access related data via `_gtfs` backref (e.g., `trip.route`, `stop_time.stop`) |
| 72 | +- **Computed properties**: Use `@cached_property` for derived values (e.g., `trip.first_departure`) |
| 73 | +- **Two GTFS formats**: Standard (stop_times.txt) vs Transit itinerary format (itinerary_cells.txt + trip arrays) |
| 74 | + |
| 75 | +## Itinerary Format Support |
| 76 | + |
| 77 | +The library supports Transit's custom itinerary format where: |
| 78 | +- `itinerary_cells.txt` defines stop sequences (like templates) |
| 79 | +- Trips reference itineraries and contain time arrays instead of individual stop_times |
| 80 | +- Use `itineraries=True` flag when loading/patching to use this format |
0 commit comments