This project can be used to load data into a cachedb database (see demo description and Solution diagram and components description). It can be used to load data generated by the artificial-data-generator project and prepared by the jupyter notebooks in training-with-artaficial-data.

Contents:

data folder -- holds a copy of data to be loaded
create_db.py -- python script that creates cache database tables
load_data.py -- loads data into the database
requirements.txt -- python requirements for the scripts
Dockerfile -- docker file to build a docker image that can be used to run the scripts

Environment variables

The python scripts use the following environment variables:

DATA_PATH -- path to the data folder
POSTGRES_USR -- cachedb user name
POSTGRES_PW -- cachedb password
POSTGRES_DB -- cachedb database name
POSTGRES_HST -- postgres host
RECREATE_DATABASE -- if set to true the database will be dropped and created again before creating the tables (default: false)

Local development

Run Postgres in a Docker container:

export POSTGRES_USR=cacheUser
export POSTGRES_PW=cachePass
export POSTGRES_DB=cacheDb
export POSTGRES_HST=localhost
docker run --name postgres -p 5432:5432 -e POSTGRES_USER=$POSTGRES_USR -e POSTGRES_PASSWORD=$POSTGRES_PW -d postgres

Run the scripts

Set the environment variables:

export DATA_PATH=$(pwd)/data
export POSTGRES_USR=cacheUser
export POSTGRES_PW=cachePass
export POSTGRES_DB=cacheDb
export POSTGRES_HST=localhost

Now you can run the scripts:

python create_db.py
python load_data.py

Run the scripts in a container

Build the image:

docker build -t cachedb-load-data .

Then you can run the scripts by providing the environment variables, for example:

docker run -it --net=host \
  --env DATA_PATH=/loader/data \
  --env POSTGRES_HST=127.0.0.1 \
  --env POSTGRES_DB=cacheDb \
  --env POSTGRES_USR=cacheUser \
  --env POSTGRES_PW=cachePass \
  cachedb-load-data

You could also mount the data folder to the container at a runtime (with different csv than the ones in the image):

docker run -it --net=host \
  -v $(pwd)/data:/loader/data \
  --env DATA_PATH=/loader/data \
  --env POSTGRES_HST=localhost \
  --env POSTGRES_DB=cacheDb \
  --env POSTGRES_USR=cacheUser \
  --env POSTGRES_PW=cachePass \
  cachedb-load-data

Run the scripts in Kubernetes/OpenShift

The scripts can be run in Kubernetes/OpenShift using a job. Example job definition is in cachedb-load-data-job.yaml.

Update the env variables in the job definition to match your environment, including postgres service name, username, password and database name.

Then you can run the job:

kubectl apply -f cachedb-load-data-job.yaml

Future work

Right now the scripts are using static data embedded in the image. In the future we could add a script that downloads the data from a remote location. (They could be downloaded, for example, from a central config server.)

For OpenShift usage, the helm charts could be updated to include the scripts and the data (for example, as post-install job) and the image preparation can be done with build config.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Environment variables

Local development

Run the scripts

Run the scripts in a container

Run the scripts in Kubernetes/OpenShift

Future work

Files

README.md

Latest commit

History

README.md

File metadata and controls

Environment variables

Local development

Run the scripts

Run the scripts in a container

Run the scripts in Kubernetes/OpenShift

Future work