Make sure you have the following installed on your system:
pip install -r requirements.txt
- Manually install startup investment data from the link & extract the content of the to the folder named data/startup-investments in the root directory of the repository.
or Download the data via by providing your Kaggle api token.
- Login to your Kaggle Account.
- Locate your username and api key. Credentials can be obtained from account settings
- Enter the credentials from terminal.
Final folder structure should look like:
└── startup-investments
├── csv_file1.csv
├── csv_file2.csv
├── csv_file3.csv
├── ...
In the terminal navigate to the root directory of the repository.
docker build . -t airflow
Step Notes
This step might take some time on the first run depending on the existing python packages in the system.
An admin airflow user is created by default. It is added for convenience of testing. It should be excluded from the Dockerfile in production environments.
docker-compose -f docker-compose.yml up
Step Notes
- You can reach Airflow webserver at: http://localhost:8080. Login to the default account. (Username: admin, Password: admin)
- You can reach mongodb instance at: http://localhost:8081.
You can reach Airflow webserver at: http://localhost:8080.
Airflow Test Credentials
Login to the default account. (Username: admin, Password: admin)- Navigate root directory of the repo via terminal.
- Should return 10 Document containing query if exists. (Note: Search is not case sensive)
python '<query>'
python3 '<query>'
python 'ai'
python3 'ai'
- You can check mongodb instance, db and collection at: http://localhost:8081.
Navigate root directory of the repo via terminal.
- You can check the compliance of the google python code style by
pylint dags
pylint src
pylint tests
- You can see test coverage using
coverage run -m pytest
coverage report --fail-under=100