-
Notifications
You must be signed in to change notification settings - Fork 13
Hive
Website | https://hive.apache.org/ |
Supported versions | 2.0.0 |
Current responsible(s) | Yiannis Mouchakis @ NCSR-D -- [email protected] |
Docker image(s) | bde2020/hive |
More info | https://github.com/apache/hive.git |
The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
The docker container for Apache Hive is based on https://github.com/big-data-europe/docker-hadoop so check there for Hadoop configurations. This container deploys Hive and starts a hiveserver2 on port 10000. By default metastore_db is located at /hive-metastore. All Hive configuration files are located in the conf directory.
First you have to clone the repository from https://github.com/big-data-europe/docker-hive.git
To build docker-hive go into the docker-hive directory and run
docker build -t hive .
To run it first deploy Hadoop (see https://github.com/big-data-europe/docker-hadoop) Then start hiveserver2 by running
docker run --name hive --net=hadoop -p 10000:10000 -p 10002:10002 -v <path/to/metastore_db/location>:/hive-metastore --env-file=./hadoop.env hive
Then you can access hiveserver2 from localhost:10000 and hiveserver2 UI from localhost:10002
You can also deploy Hive with Hadoop with docker compose. It will set up a hadoop cluster with 3 datanodes and hive with hiveserver running. All data are stored in ./data
To do so first create the hadoop network
docker network create hadoop
Then deploy the cluster with
docker-compose up
In order to scale up Hive you must add more Hadoop nodes. For more info see on how to add more nodes see https://github.com/big-data-europe/docker-hadoop. You can also edit the docker-compose.yml file and add more nodes there.