Skip to content

Latest commit

 

History

History
37 lines (35 loc) · 1.56 KB

README.md

File metadata and controls

37 lines (35 loc) · 1.56 KB

Demo example for Delta Lake stored at Minio storage.

  1. Firstly we need to create network for spark cluster for correct work of spark cluster and odd-platform
docker network create spark
  1. Next we need to start all services, except collector, which will be started later. Services:
    1. database - postgresql database for odd-platform
    2. odd-platform - OpenDataDiscovery Platform, started at http://localhost:8082
    3. spark and spark-worker - spark master
    4. minio - minio storage for delta lake, started at http://localhost:9000 with credentials minioadmin:minioadmin
    5. jupyter - jupyter notebook with demo notebook, started http://localhost:8890
docker compose up -d database odd-platform spark spark-worker minio jupyter
  1. After all services are started, you can open jupyter notebook at http://localhost:8890/lab/tree/delta_lake.ipynb
  2. For collector we need to create token with any name we want and run it and use it in collector configuration. Next command creates token with name odd_collector and creates collector configuration file odd_collector.yaml in '/config' directory.
sh create_collector.sh odd_collector
  1. As we have collector configuration file, we can start collector. It can take some time fetching data from Minio storage.
docker compose up -d odd-collector-aws
  1. Open OpenDataDiscovery Platform UI at http://localhost:8082 to check the results.
  2. Cleanup
docker compose down --volumes

Remove network

docker network rm spark