Hive metastore is responsible for storing all the metadata about the database tables we create in Presto and Hive. By default, the metastore stores this information in a local embedded Derby database in a PersistentVolume attached to the pod.
Generally the default configuration of Hive metastore works for small clusters, but users may wish to improve performance or move storage requirements out of cluster by using a dedicated SQL database for storing the Hive metastore data.
Hive, by default requires one Persistent Volume to operate.
hive-metastore-db-data
is the main PVC required by default.
This PVC is used by Hive metastore to store metadata about tables, such as table name, columns, and location.
Hive metastore is used by Presto and Hive server to lookup table metadata when processing queries.
In practice, it is possible to remove this requirement by using MySQL or PostgreSQL for the Hive metastore database.
To install, Hive metastore requires that dynamic volume provisioning be enabled via a Storage Class, a persistent volume of the correct size must be manually pre-created, or that you use a pre-existing MySQL or PostgreSQL database.
To configure and specify a StorageClass
for the hive-metastore-db-data PVC, specify the StorageClass
in your MeteringConfig.
A example StorageClass
section is included in metastore-storage.yaml.
Uncomment the spec.hive.spec.metastore.storage.class
sections and replace the null
in class: null
value with the name of the StorageClass to use.
Leaving the value null
will cause Metering to use the default StorageClass for the cluster.
Use metastore-storage.yaml as a template and adjust the size: "5Gi"
value to the desired capacity for the following sections:
spec.hive.spec.metastore.storage.size
By default, to make the installation easier, Metering configures Hive to use an embedded Java database called Derby, however this is unsuitable for larger environments or metering installations where a lot of reports and metrics are being collected.
Currently two alternative options are available, MySQL and PostgreSQL, both of which have been tested with the metering-operator.
There are three configuration options you can use to control the database used by Hive metastore: url
, driver
, and secretName
.
url
: the url of the MySQL or PostgreSQL instance. Examples are shown below.driver
: configures the class name for the JDBC driver that will be used to store the hive metadata.secretName
: the name of the secret which contains the base64 encrypted username and password for the database instance.
Before proceeding to following examples, you need to create a secret in the $METERING_NAMESPACE
containing the base64 encrypted username and password combination to the database instance.
Through the command-line, you can replace the following commands wrapped in the <...>
markers:
kubectl -n $METERING_NAMESPACE create secret generic <name of the secret> --from-literal=username=<database username> --from-literal=password=<database password>
Metering supports configuring the internal Hive Metastore to use MySQL 5.6, 5.7, and 8.0 server versions.
The following MeteringConfig
snippet serves as a minimal setup to configure the Hive Metastore with an existing MySQL instance:
spec:
hive:
spec:
config:
db:
url: "jdbc:mysql://mysql.example.com:3306/hive_metastore"
driver: "com.mysql.cj.jdbc.Driver"
secretName: "REPLACEME"
You can pass additional JDBC parameters using the spec.hive.spec.config.db.url
. For more details see the MySQL Connector/J 8.0 documentation.
Note: When configuring Metering to work with older MySQL server versions, like 5.6 or 5.7, you may need to add the enabledTLSProtocols
JDBC URL parameter when configuring the internal Hive metastore.
For example, in order to use the TLS v1.2 cipher suite, you can use the following snippet as a reference:
...
spec:
hive:
spec:
config:
db:
url: "jdbc:mysql://<hostname>:<port>/<schema>?enabledTLSProtocols=TLSv1.2"
...
spec:
hive:
spec:
config:
db:
url: "jdbc:postgresql://postgresql.example.com:5432/hive_metastore"
driver: "org.postgresql.Driver"
secretName: "REPLACEME"
autoCreateMetastoreSchema: false
You can pass additional JDBC parameters using the url
, for more details see the PostgreSQL JDBC driver documentation.