22
33Big data stack running in pseudo-distributed mode with the following components:
44
5- - Hadoop
6- - Minio
7- - Hive
8- - Presto
9- - Superset
10- - Hue
5+ - Hadoop 2.8.5
6+ - Minio RELEASE.2018-12-27T18-33-08Z
7+ - Hive 2.3.4
8+ - Presto 0.215
9+ - Superset 0.28.1
10+ - Hue 4.3.0
11+
12+ For more details see the following [ post] ( https://johs.me/posts/big-data-stack-running-sql-queries/ ) .
1113
1214## Quick start
1315
@@ -16,4 +18,41 @@ suitable directory (persistent storage for all containers). Bring up the stack:
1618```
1719docker-compose up -d
1820```
21+ and initialize the databases for Superset and Hue:
22+ ```
23+ ./scripts/init-hue.sh
24+ ./scripts/init-superset.sh
25+ ```
26+ The stack should now be up and running and the following services available:
27+
28+ - Hadoop namenode: [ http://localhost:50070 ] ( http://localhost:50070 )
29+ - Minio: [ http://localhost:9000 ] ( http://localhost:9000 )
30+ - Presto: [ http://localhost:8080 ] ( http://localhost:8080 )
31+ - Superset: [ http://localhost:8088 ] ( http://localhost:8088 )
32+ - Hue: [ http://localhost:8888 ] ( http://localhost:8888 )
33+
34+ ## Contents
35+
36+ The stack uses update/modified Docker images from [ Big Data Europe] ( https://github.com/big-data-europe ) ,
37+ [ shawnzhu] ( https://github.com/shawnzhu/docker-prestodb ) , and [ Cloudera] ( https://github.com/cloudera/hue ) . See
38+ Dockerfiles for details.
39+
40+ All needed images are on Docker Hub, but if you want to build the updated/modified images yourself, just run ` build-local.sh `
41+ in the different sub-directories.
42+
43+ Changes compared to original images:
44+
45+ - Hadoop updated to version 2.8.5
46+ - Hive update to version 2.3.4
47+ - S3 support added
48+ - Presto update to 0.215
49+ - Presto JDBC driver added to Hue
50+
51+ The scripts directory contains some helper scripts:
52+
53+ - ` beeline.sh ` : Launch Beeline (Hive CLI) in Hive container
54+ - ` hadoop-client.sh ` : Start container with Hadoop utilities (host filesystem mounted as ` /host ` ). Useful for moving files to HDFS.
55+ - ` init-hue.sh ` : Initialize Hue database
56+ - ` init-superset.sh ` : Initialize Superset database and add Presto as data source
57+ - ` presto-cli.sh ` : Launch Presto CLI (downloads jar if needed)
1958
0 commit comments