Skip to content

Commit b65a34c

Browse files
committed
Updated README.
1 parent d1400fc commit b65a34c

File tree

1 file changed

+45
-6
lines changed

1 file changed

+45
-6
lines changed

README.md

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,14 @@
22

33
Big data stack running in pseudo-distributed mode with the following components:
44

5-
- Hadoop
6-
- Minio
7-
- Hive
8-
- Presto
9-
- Superset
10-
- Hue
5+
- Hadoop 2.8.5
6+
- Minio RELEASE.2018-12-27T18-33-08Z
7+
- Hive 2.3.4
8+
- Presto 0.215
9+
- Superset 0.28.1
10+
- Hue 4.3.0
11+
12+
For more details see the following [post](https://johs.me/posts/big-data-stack-running-sql-queries/).
1113

1214
## Quick start
1315

@@ -16,4 +18,41 @@ suitable directory (persistent storage for all containers). Bring up the stack:
1618
```
1719
docker-compose up -d
1820
```
21+
and initialize the databases for Superset and Hue:
22+
```
23+
./scripts/init-hue.sh
24+
./scripts/init-superset.sh
25+
```
26+
The stack should now be up and running and the following services available:
27+
28+
- Hadoop namenode: [http://localhost:50070](http://localhost:50070)
29+
- Minio: [http://localhost:9000](http://localhost:9000)
30+
- Presto: [http://localhost:8080](http://localhost:8080)
31+
- Superset: [http://localhost:8088](http://localhost:8088)
32+
- Hue: [http://localhost:8888](http://localhost:8888)
33+
34+
## Contents
35+
36+
The stack uses update/modified Docker images from [Big Data Europe](https://github.com/big-data-europe),
37+
[shawnzhu](https://github.com/shawnzhu/docker-prestodb), and [Cloudera](https://github.com/cloudera/hue). See
38+
Dockerfiles for details.
39+
40+
All needed images are on Docker Hub, but if you want to build the updated/modified images yourself, just run `build-local.sh`
41+
in the different sub-directories.
42+
43+
Changes compared to original images:
44+
45+
- Hadoop updated to version 2.8.5
46+
- Hive update to version 2.3.4
47+
- S3 support added
48+
- Presto update to 0.215
49+
- Presto JDBC driver added to Hue
50+
51+
The scripts directory contains some helper scripts:
52+
53+
- `beeline.sh`: Launch Beeline (Hive CLI) in Hive container
54+
- `hadoop-client.sh`: Start container with Hadoop utilities (host filesystem mounted as `/host`). Useful for moving files to HDFS.
55+
- `init-hue.sh`: Initialize Hue database
56+
- `init-superset.sh`: Initialize Superset database and add Presto as data source
57+
- `presto-cli.sh`: Launch Presto CLI (downloads jar if needed)
1958

0 commit comments

Comments
 (0)