Skip to content

Commit

Permalink
feat(data-catalog): Adding Thriftserver (opendatahub-io#228)
Browse files Browse the repository at this point in the history
  • Loading branch information
tumido authored Jan 5, 2021
1 parent bbe2b59 commit ca177f4
Show file tree
Hide file tree
Showing 24 changed files with 649 additions and 0 deletions.
22 changes: 22 additions & 0 deletions tests/basictests/thriftserver.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

source $TEST_DIR/common

MY_DIR=$(readlink -f `dirname "${BASH_SOURCE[0]}"`)

source ${MY_DIR}/../util

os::test::junit::declare_suite_start "$MY_SCRIPT"

function test_thriftserver() {
header "Testing ODH Hue installation"
os::cmd::expect_success "oc project ${ODHPROJECT}"
os::cmd::try_until_text "oc get deployment thriftserver" "thriftserver" $odhdefaulttimeout $odhdefaultinterval
os::cmd::try_until_text "oc get pods -l deployment=thriftserver --field-selector='status.phase=Running' -o jsonpath='{$.items[*].metadata.name}'" "thriftserver" $odhdefaulttimeout $odhdefaultinterval
runningpods=($(oc get pods -l deployment=thriftserver --field-selector="status.phase=Running" -o jsonpath="{$.items[*].metadata.name}"))
os::cmd::expect_success_and_text "echo ${#runningpods[@]}" "1"
}

test_thriftserver

os::test::junit::declare_suite_end
7 changes: 7 additions & 0 deletions tests/setup/kfctl_openshift.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,13 @@ spec:
name: manifests
path: hue/hue
name: hue
- kustomizeConfig:
overlays:
- create-spark-cluster
repoRef:
name: manifests
path: thriftserver/thriftserver
name: thriftserver
# strimzi/kafka moved to bottom due to strange slowness in our test cluster
# moving it down in the order seems to avoid the slowness
- kustomizeConfig:
Expand Down
73 changes: 73 additions & 0 deletions thriftserver/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Spark Thrift Server - HiveServer2

Spark Thrift Server component installs HiveServer2 variant for Spark SQL - Thriftserver. It deploys the Spark SQL Thrift Server intended to expose Spark dataframes modeled as Hive tables through a JDBC connection.

### Folders

There is one main folder in the Thrift Server component `thriftserver` which contains the kustomize manifests.

### Installation

To install Thrift Server add the following to the `kfctl` yaml file.

Minimal install:

```yaml
- kustomizeConfig:
parameters:
- name: spark_url
value: spark://spark.odh.com
repoRef:
name: manifests
path: thriftserver/thriftserver
name: thriftserver
```
Standalone install:
```yaml
- kustomizeConfig:
overlays:
- create-spark-cluster
parameters:
- name: s3_endpoint_url
value: s3.odh.com
- name: s3_credentials_secret
value: s3-credentials
repoRef:
name: manifests
path: thriftserver/thriftserver
name: thriftserver
```
### Overlays
Thrift Server component comes with 2 overlays.
#### storage-class
Customizes Thrift Server to use a specific `StorageClass` for PVCs, see `storage_class` parameter.

#### create-spark-cluster

Requires `radanalytics/spark` component of ODH to be installed first. It provisions a minimal Spark cluster matching the Thrift Server's Spark version and connects the Thrift Server instance to it as it's master Spark cluster. This overlay modifies value of `spark_url` parameter and routes Thrift server to the Spark cluster created by this overlay only.

### Parameters

There are 4 parameters exposed vie KFDef.

#### storage_class

Name of the storage class to be used for PVCs created by Thrift Server component. This requires `storage-class` **overlay** to be enabled as well to work.

#### s3_endpoint_url

HTTP endpoint exposed by your S3 object storage solution which will be made available to Thrift Server as the default S3 filesystem location. In order for this value to be respected properly, the Spark cluster of choice must use the same endpoint.

#### spark_url

Spark cluster [`master-url`](https://spark.apache.org/docs/latest/submitting-applications.html#master-urls) in format `spark://...` which points Thrift Server to Spark cluster which it should use. This parameter value is **overriden** if `create-spark-cluster` overlay is activated. This parameter **is required** to be set if the overlay mentioned before is not used.

#### s3_credentials_secret

Along with `s3_endpoint_url` this parameter configures the Thrift Server's access to S3 object storage. Setting this parameter to any name of local Openshift/Kubernetes Secret resource name would allow Thift Server to consume S3 credentials from it. The secret of choice must contain `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` keys. Keep in mind, in order for this value to be respected by Spark cluster properly, it must use the same credentials. If not set, credentials from [`thriftserver-sample-s3-secret`](thriftserver/base/thriftserver-sample-s3-secret.yaml) will be used instead.
102 changes: 102 additions & 0 deletions thriftserver/thriftserver/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- thriftserver-db-exporter.yaml
- thriftserver-db-pvc.yaml
- thriftserver-db-secret.yaml
- thriftserver-db-service.yaml
- thriftserver-db.yaml
- thriftserver-hdfs-hive-secret.yaml
- thriftserver-pvc.yaml
- thriftserver-route.yaml
- thriftserver-sample-s3-secret.yaml
- thriftserver-server-conf-secret.yaml
- thriftserver-service.yaml
- thriftserver.yaml

commonLabels:
opendatahub.io/component: "true"
component.opendatahub.io/name: thriftserver
component.opendatahub.io/part-of: datacatalog

generatorOptions:
disableNameSuffixHash: true

configMapGenerator:
- name: thriftserver-config
envs:
- params.env

vars:
- name: namespace
objref:
kind: Service
apiVersion: v1
name: thriftserver
fieldref:
fieldpath: metadata.namespace
- name: storage_class
objref:
kind: ConfigMap
apiVersion: v1
name: thriftserver-config
fieldref:
fieldpath: data.storage_class
- name: spark_url
objref:
kind: ConfigMap
apiVersion: v1
name: thriftserver-config
fieldref:
fieldpath: data.spark_url
- name: s3_endpoint_url
objref:
kind: ConfigMap
apiVersion: v1
name: thriftserver-config
fieldref:
fieldpath: data.s3_endpoint_url
- name: s3_credentials_secret
objref:
kind: ConfigMap
apiVersion: v1
name: thriftserver-config
fieldref:
fieldpath: data.s3_credentials_secret
- name: database_user
objref:
kind: Secret
apiVersion: v1
name: thriftserver-db
fieldref:
fieldpath: stringData.database-user
- name: database_password
objref:
kind: Secret
apiVersion: v1
name: thriftserver-db
fieldref:
fieldpath: stringData.database-password
- name: database_name
objref:
kind: Secret
apiVersion: v1
name: thriftserver-db
fieldref:
fieldpath: stringData.database-name

configurations:
- params.yaml

images:
- name: spark-cluster-image
newName: quay.io/opendatahub/spark-cluster-image
newTag: 2.4.3-h2.7
- name: postgresql
newName: registry.redhat.io/rhel8/postgresql-12
newTag: latest
- name: postgres-exporter
newName: quay.io/internaldatahub/postgres_exporter
newTag: latest
4 changes: 4 additions & 0 deletions thriftserver/thriftserver/base/params.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
storage_class=
s3_endpoint_url=
spark_url=
s3_credentials_secret=thriftserver-sample-s3
10 changes: 10 additions & 0 deletions thriftserver/thriftserver/base/params.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
varReference:
- path: stringData/thrift-server.conf
kind: Secret
- path: stringData/DATA_SOURCE_NAME
kind: Secret
- path: metadata/annotations/volume.beta.kubernetes.io\/storage-class
kind: PersistentVolumeClaim
- path: spec/template/spec/containers[]/env[]/valueFrom/secretKeyRef/name
kind: Deployment
10 changes: 10 additions & 0 deletions thriftserver/thriftserver/base/thriftserver-db-exporter.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
apiVersion: v1
kind: Secret
metadata:
name: thriftserver-db-exporter
labels:
name: thriftserver-db-exporter
stringData:
DATA_SOURCE_NAME: |
postgresql://$(database_user):$(database_password)@localhost:5432/$(database_name)?sslmode=disable
11 changes: 11 additions & 0 deletions thriftserver/thriftserver/base/thriftserver-db-pvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: thriftserver-db
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
9 changes: 9 additions & 0 deletions thriftserver/thriftserver/base/thriftserver-db-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
apiVersion: v1
kind: Secret
metadata:
name: thriftserver-db
stringData:
database-user: datacatalog
database-password: datacatalog
database-name: datacatalog
24 changes: 24 additions & 0 deletions thriftserver/thriftserver/base/thriftserver-db-service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
kind: Service
apiVersion: v1
metadata:
name: thriftserver-db
annotations:
template.openshift.io/expose-uri: |
postgres://{.spec.clusterIP}:{.spec.ports[?(.name=="postgres")].port}
spec:
ports:
- name: postgres
protocol: TCP
port: 5432
targetPort: 5432
- name: exporter
protocol: TCP
port: 9187
targetPort: 9187
selector:
deployment: thriftserver-db
type: ClusterIP
sessionAffinity: None
status:
loadBalancer: {}
98 changes: 98 additions & 0 deletions thriftserver/thriftserver/base/thriftserver-db.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: thriftserver-db
annotations:
template.alpha.openshift.io/wait-for-ready: "true"
spec:
replicas: 1
selector:
deployment: thriftserver-db
template:
metadata:
labels:
deployment: thriftserver-db
spec:
containers:
- name: postgres-exporter
image: postgres-exporter
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9187
env:
- name: DATA_SOURCE_NAME
valueFrom:
secretKeyRef:
name: thriftserver-db-exporter
key: DATA_SOURCE_NAME
livenessProbe:
httpGet:
path: /metrics
port: 9187
readinessProbe:
httpGet:
path: /metrics
port: 9187
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 200m
memory: 300Mi
- name: postgresql
image: postgresql
ports:
- containerPort: 5432
readinessProbe:
timeoutSeconds: 1
initialDelaySeconds: 5
exec:
command:
- "/usr/libexec/check-container"
livenessProbe:
timeoutSeconds: 10
initialDelaySeconds: 120
exec:
command:
- "/usr/libexec/check-container"
- "--live"
env:
- name: POSTGRESQL_USER
valueFrom:
secretKeyRef:
name: thriftserver-db
key: database-user
- name: POSTGRESQL_PASSWORD
valueFrom:
secretKeyRef:
name: thriftserver-db
key: database-password
- name: POSTGRESQL_DATABASE
valueFrom:
secretKeyRef:
name: thriftserver-db
key: database-name
resources:
requests:
cpu: 300m
memory: 500Mi
limits:
cpu: 500m
memory: 1Gi
volumeMounts:
- name: "postgresql-data"
mountPath: "/var/lib/pgsql/data"
terminationMessagePath: "/dev/termination-log"
imagePullPolicy: IfNotPresent
capabilities: {}
securityContext:
capabilities: {}
privileged: false
volumes:
- name: "postgresql-data"
persistentVolumeClaim:
claimName: thriftserver-db
restartPolicy: Always
dnsPolicy: ClusterFirst
Loading

0 comments on commit ca177f4

Please sign in to comment.