Skip to content

Latest commit

 

History

History
94 lines (81 loc) · 3.32 KB

File metadata and controls

94 lines (81 loc) · 3.32 KB

This project can be used to load data into a cachedb database (see demo description and Solution diagram and components description). It can be used to load data generated by the artificial-data-generator project and prepared by the jupyter notebooks in training-with-artaficial-data.

Contents:

  • data folder -- holds a copy of data to be loaded
  • create_db.py -- python script that creates cache database tables
  • load_data.py -- loads data into the database
  • requirements.txt -- python requirements for the scripts
  • Dockerfile -- docker file to build a docker image that can be used to run the scripts

Environment variables

The python scripts use the following environment variables:

  • DATA_PATH -- path to the data folder
  • POSTGRES_USR -- cachedb user name
  • POSTGRES_PW -- cachedb password
  • POSTGRES_DB -- cachedb database name
  • POSTGRES_HST -- postgres host
  • RECREATE_DATABASE -- if set to true the database will be dropped and created again before creating the tables (default: false)

Local development

Run Postgres in a Docker container:

export POSTGRES_USR=cacheUser
export POSTGRES_PW=cachePass
export POSTGRES_DB=cacheDb
export POSTGRES_HST=localhost
docker run --name postgres -p 5432:5432 -e POSTGRES_USER=$POSTGRES_USR -e POSTGRES_PASSWORD=$POSTGRES_PW -d postgres

Run the scripts

Set the environment variables:

export DATA_PATH=$(pwd)/data
export POSTGRES_USR=cacheUser
export POSTGRES_PW=cachePass
export POSTGRES_DB=cacheDb
export POSTGRES_HST=localhost

Now you can run the scripts:

python create_db.py
python load_data.py

Run the scripts in a container

Build the image:

docker build -t cachedb-load-data .

Then you can run the scripts by providing the environment variables, for example:

docker run -it --net=host \
  --env DATA_PATH=/loader/data \
  --env POSTGRES_HST=127.0.0.1 \
  --env POSTGRES_DB=cacheDb \
  --env POSTGRES_USR=cacheUser \
  --env POSTGRES_PW=cachePass \
  cachedb-load-data

You could also mount the data folder to the container at a runtime (with different csv than the ones in the image):

docker run -it --net=host \
  -v $(pwd)/data:/loader/data \
  --env DATA_PATH=/loader/data \
  --env POSTGRES_HST=localhost \
  --env POSTGRES_DB=cacheDb \
  --env POSTGRES_USR=cacheUser \
  --env POSTGRES_PW=cachePass \
  cachedb-load-data

Run the scripts in Kubernetes/OpenShift

The scripts can be run in Kubernetes/OpenShift using a job. Example job definition is in cachedb-load-data-job.yaml.

Update the env variables in the job definition to match your environment, including postgres service name, username, password and database name.

Then you can run the job:

kubectl apply -f cachedb-load-data-job.yaml

Future work

Right now the scripts are using static data embedded in the image. In the future we could add a script that downloads the data from a remote location. (They could be downloaded, for example, from a central config server.)

For OpenShift usage, the helm charts could be updated to include the scripts and the data (for example, as post-install job) and the image preparation can be done with build config.