to-data-library is a Python library for data extraction, transformation, and loading (ETL) across multiple platforms (GCS, S3, BigQuery, FTP, etc). It is intended to be imported and used as a module within other data engineering projects, scripts, or pipelines. This is not a standalone application and is not designed to be run directly, in Docker, or via Airflow on its own.
- Extracts and loads data from Google Cloud Storage, S3, BigQuery, FTP, and more.
- Provides transformation utilities for pandas DataFrames.
- Automated dependency management and code linting.
to-data-library/
├── to_data_library/
│ ├── data/ # Core data transfer logic (GCS, S3, BQ, FTP)
│ ├── __init__.py
├── tests/ # Unit tests and test data
├── devops/ # CI/CD buildspecs
├── requirements.in # Python dependencies (source)
├── requirements.txt # Python dependencies (compiled)
├── setup.py # Python package setup
├── PYTHON_VERSION # Python version pin
└── README.md
- Python 3.10.x (see
PYTHON_VERSION) - timeout-tools for environment setup
# Install timeout-tools
pip install git+ssh://[email protected]/timeoutdigital/timeout-tools
# Clone the repo and set up Python environment
git clone [email protected]:timeoutdigital/to-data-library.git
cd to-data-library
timeout-tools python-setupOr, using workspace setup:
timeout-tools ws to-data-library <jira_ticket>- setup.py includes a list of the 3rd party packages required by the this package when distributed.
invoke python-install-requirementscoverage run -m unittest
coverage reportPR tests will fail if coverage is lower than the value defined in devops/pr-buildspec.yml:
grep fail-under devops/pr-buildspec.yml
# Example: coverage report --fail-under=78Increase the value as coverage improves.
You can also use:
pytestImport to_data_library in your own Python scripts or projects:
from to_data_library.data import transfer
client = transfer.Client(project='my-gcp-project')
# Use client methods for data transfer, e.g. client.gs_to_bq(...)This library is not intended to be run directly or as a standalone service. It does not provide a CLI or entrypoint script.
- The Data Apps run as allowed by specific prod/staging AWS IAM Roles.
- These AWS IAM Roles are defined in:
- Build and linting are handled via AWS CodeBuild using buildspecs in
devops/. - Pre-commit hooks are configured in
.pre-commit-config.yamlif present.
- Compile requirements:
invoke python-build-requirements - Upgrade requirements:
invoke python-upgrade-requirements - Install requirements:
invoke python-install-requirements - Run pre-commit hooks:
pre-commit run --all-files
Proprietary - Timeout.com
For questions, contact the Data Engineering team at Timeout.com.