Skip to content

Latest commit

 

History

History
103 lines (71 loc) · 5.44 KB

configuring-hive-metastore.md

File metadata and controls

103 lines (71 loc) · 5.44 KB

Configuring the Hive metastore

Hive metastore is responsible for storing all the metadata about the database tables we create in Presto and Hive. By default, the metastore stores this information in a local embedded Derby database in a PersistentVolume attached to the pod.

Generally the default configuration of Hive metastore works for small clusters, but users may wish to improve performance or move storage requirements out of cluster by using a dedicated SQL database for storing the Hive metastore data.

Configuring PersistentVolumes

Hive, by default requires one Persistent Volume to operate.

hive-metastore-db-data is the main PVC required by default. This PVC is used by Hive metastore to store metadata about tables, such as table name, columns, and location. Hive metastore is used by Presto and Hive server to lookup table metadata when processing queries. In practice, it is possible to remove this requirement by using MySQL or PostgreSQL for the Hive metastore database.

To install, Hive metastore requires that dynamic volume provisioning be enabled via a Storage Class, a persistent volume of the correct size must be manually pre-created, or that you use a pre-existing MySQL or PostgreSQL database.

Configuring the Storage Class for Hive Metastore

To configure and specify a StorageClass for the hive-metastore-db-data PVC, specify the StorageClass in your MeteringConfig. A example StorageClass section is included in metastore-storage.yaml.

Uncomment the spec.hive.spec.metastore.storage.class sections and replace the null in class: null value with the name of the StorageClass to use. Leaving the value null will cause Metering to use the default StorageClass for the cluster.

Configuring the Volume Sizes for Hive Metastore

Use metastore-storage.yaml as a template and adjust the size: "5Gi" value to the desired capacity for the following sections:

  • spec.hive.spec.metastore.storage.size

Configuring the Database for Hive Metastore

By default, to make the installation easier, Metering configures Hive to use an embedded Java database called Derby, however this is unsuitable for larger environments or metering installations where a lot of reports and metrics are being collected.

Currently two alternative options are available, MySQL and PostgreSQL, both of which have been tested with the metering-operator.

There are three configuration options you can use to control the database used by Hive metastore: url , driver , and secretName.

  • url: the url of the MySQL or PostgreSQL instance. Examples are shown below.
  • driver: configures the class name for the JDBC driver that will be used to store the hive metadata.
  • secretName: the name of the secret which contains the base64 encrypted username and password for the database instance.

Before proceeding to following examples, you need to create a secret in the $METERING_NAMESPACE containing the base64 encrypted username and password combination to the database instance.

Through the command-line, you can replace the following commands wrapped in the <...> markers:

kubectl -n $METERING_NAMESPACE create secret generic <name of the secret> --from-literal=username=<database username> --from-literal=password=<database password>

Using MySQL for the Hive Metastore database

Metering supports configuring the internal Hive Metastore to use MySQL 5.6, 5.7, and 8.0 server versions.

The following MeteringConfig snippet serves as a minimal setup to configure the Hive Metastore with an existing MySQL instance:

spec:
  hive:
    spec:
      config:
        db:
          url: "jdbc:mysql://mysql.example.com:3306/hive_metastore"
          driver: "com.mysql.cj.jdbc.Driver"
          secretName: "REPLACEME"

You can pass additional JDBC parameters using the spec.hive.spec.config.db.url. For more details see the MySQL Connector/J 8.0 documentation.

Note: When configuring Metering to work with older MySQL server versions, like 5.6 or 5.7, you may need to add the enabledTLSProtocols JDBC URL parameter when configuring the internal Hive metastore.

For example, in order to use the TLS v1.2 cipher suite, you can use the following snippet as a reference:

...
spec:
  hive:
    spec:
      config:
        db:
          url: "jdbc:mysql://<hostname>:<port>/<schema>?enabledTLSProtocols=TLSv1.2"
...

Using PostgreSQL for the Hive Metastore database

spec:
  hive:
    spec:
      config:
        db:
          url: "jdbc:postgresql://postgresql.example.com:5432/hive_metastore"
          driver: "org.postgresql.Driver"
          secretName: "REPLACEME"
          autoCreateMetastoreSchema: false

You can pass additional JDBC parameters using the url, for more details see the PostgreSQL JDBC driver documentation.