2
2
3
3
Big data stack running in pseudo-distributed mode with the following components:
4
4
5
- - Hadoop
6
- - Minio
7
- - Hive
8
- - Presto
9
- - Superset
10
- - Hue
5
+ - Hadoop 2.8.5
6
+ - Minio RELEASE.2018-12-27T18-33-08Z
7
+ - Hive 2.3.4
8
+ - Presto 0.215
9
+ - Superset 0.28.1
10
+ - Hue 4.3.0
11
+
12
+ For more details see the following [ post] ( https://johs.me/posts/big-data-stack-running-sql-queries/ ) .
11
13
12
14
## Quick start
13
15
@@ -16,4 +18,41 @@ suitable directory (persistent storage for all containers). Bring up the stack:
16
18
```
17
19
docker-compose up -d
18
20
```
21
+ and initialize the databases for Superset and Hue:
22
+ ```
23
+ ./scripts/init-hue.sh
24
+ ./scripts/init-superset.sh
25
+ ```
26
+ The stack should now be up and running and the following services available:
27
+
28
+ - Hadoop namenode: [ http://localhost:50070 ] ( http://localhost:50070 )
29
+ - Minio: [ http://localhost:9000 ] ( http://localhost:9000 )
30
+ - Presto: [ http://localhost:8080 ] ( http://localhost:8080 )
31
+ - Superset: [ http://localhost:8088 ] ( http://localhost:8088 )
32
+ - Hue: [ http://localhost:8888 ] ( http://localhost:8888 )
33
+
34
+ ## Contents
35
+
36
+ The stack uses update/modified Docker images from [ Big Data Europe] ( https://github.com/big-data-europe ) ,
37
+ [ shawnzhu] ( https://github.com/shawnzhu/docker-prestodb ) , and [ Cloudera] ( https://github.com/cloudera/hue ) . See
38
+ Dockerfiles for details.
39
+
40
+ All needed images are on Docker Hub, but if you want to build the updated/modified images yourself, just run ` build-local.sh `
41
+ in the different sub-directories.
42
+
43
+ Changes compared to original images:
44
+
45
+ - Hadoop updated to version 2.8.5
46
+ - Hive update to version 2.3.4
47
+ - S3 support added
48
+ - Presto update to 0.215
49
+ - Presto JDBC driver added to Hue
50
+
51
+ The scripts directory contains some helper scripts:
52
+
53
+ - ` beeline.sh ` : Launch Beeline (Hive CLI) in Hive container
54
+ - ` hadoop-client.sh ` : Start container with Hadoop utilities (host filesystem mounted as ` /host ` ). Useful for moving files to HDFS.
55
+ - ` init-hue.sh ` : Initialize Hue database
56
+ - ` init-superset.sh ` : Initialize Superset database and add Presto as data source
57
+ - ` presto-cli.sh ` : Launch Presto CLI (downloads jar if needed)
19
58
0 commit comments