- Firstly we need to create network for spark cluster for correct work of spark cluster and odd-platform
docker network create spark
- Next we need to start all services, except collector, which will be started later.
Services:
database
- postgresql database for odd-platformodd-platform
- OpenDataDiscovery Platform, started athttp://localhost:8082
spark
andspark-worker
- spark masterminio
- minio storage for delta lake, started athttp://localhost:9000
with credentialsminioadmin:minioadmin
jupyter
- jupyter notebook with demo notebook, startedhttp://localhost:8890
docker compose up -d database odd-platform spark spark-worker minio jupyter
- After all services are started, you can open jupyter notebook at
http://localhost:8890/lab/tree/delta_lake.ipynb
- For collector we need to create token with any name we want and run it and use it in collector configuration.
Next command creates token with name
odd_collector
and creates collector configuration fileodd_collector.yaml
in '/config' directory.
sh create_collector.sh odd_collector
- As we have collector configuration file, we can start collector. It can take some time fetching data from Minio storage.
docker compose up -d odd-collector-aws
- Open OpenDataDiscovery Platform UI at
http://localhost:8082
to check the results. - Cleanup
docker compose down --volumes
Remove network
docker network rm spark