Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 33 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ between the other charts.

## Instructions for Initial Deployment

The Kubernetes ecosystem contains many standardized and custom solutions across a [wide range of cloud and on-premises environments](https://kubernetes.io/docs/setup/production-environment/turnkey-solutions/). To avoid the complexity of a full-fledged [production environment](https://kubernetes.io/docs/setup/production-environment/) and to achieve parity with the [existing docker-compose](https://github.com/broadinstitute/seqr/blob/master/docker-compose.yml), we recommend setting up a simple local Kubernetes cluster on an on-premises server or a cloud Virtual Machine with at least `32GB` of memory and `750GB` of disk space. While there is no requirement for the minimum number of CPUs, having more available will significantly speed up data loading and some searches.
The Kubernetes ecosystem contains many standardized and custom solutions across a [wide range of cloud and on-premises environments](https://kubernetes.io/docs/setup/production-environment/turnkey-solutions/). To avoid the complexity of a full-fledged [production environment](https://kubernetes.io/docs/setup/production-environment/) and to achieve parity with the [existing docker-compose](https://github.com/broadinstitute/seqr/blob/master/docker-compose.yml), we recommend setting up a simple local Kubernetes cluster on an on-premises server or a cloud Virtual Machine with at least `32GB` of memory and `750GB` of disk space. While there is no requirement for the minimum number of CPUs, having more available will significantly speed up data loading and some searches. For ClickHouse specifically, we recommend reading over the [OSS usage recommendations](https://clickhouse.com/docs/operations/tips), in particular the [filesystem section](https://clickhouse.com/docs/operations/tips#file-system).

Install the four required kubernetes infrastructure components:
1. The [`docker`](https://docs.docker.com/engine/install/) container engine.
Expand Down Expand Up @@ -205,7 +205,7 @@ $ kubectl exec pipeline-runner-api-POD-ID -c pipeline-runner-api-sidecar -it --
$ python3 -m 'v03_pipeline.bin.migrate_all_projects_to_clickhouse'
```

The migration is fully supported whether or not you have configured your environment to run the loading pipeline [on GCP dataproc](https://github.com/broadinstitute/seqr/blob/master/deploy/LOCAL_INSTALL_HELM.md#option-2) and will run in the same environment as data loading. It is also idempotent, so can safely be run multile times in case of failures.
The migration is fully supported whether or not you have configured your environment to run the loading pipeline [on GCP dataproc](https://github.com/broadinstitute/seqr/blob/master/deploy/LOCAL_INSTALL_HELM.md#option-2) and will run in the same environment as data loading. It is also idempotent, so can safely be run multiple times in case of failures.

The migration should take a few minutes per project, substantially less than loading directly from VCF. To check the status of the migration and to debug if required:
- Each project hail table is exported into the format produced by the loading pipeline as if it were a new run. For each of your loaded projects, you should expect a directory to be created:
Expand Down Expand Up @@ -253,3 +253,34 @@ kubectl logs pipeline-runner-api-5557bbc7-vrtcj -c pipeline-runner-api-sidecar
base_hail_table - INFO - UpdatedCachedReferenceDatasetQuery(reference_genome=GRCh37, dataset_type=SNV_INDEL, crdq=CLINVAR_PATH_VARIANTS) start
[Stage 42:========>
```

- How do I shell-into/connect to Clickhouse?

There are two ways:

1) You can port forward and connect with a local clickhouse client:

```
kubectl port-forward services/seqr-clickhouse 9000:9000
```

& in another terminal:
```
./clickhouse client --user seqr_clickhouse_reader --password XXXXX
```

2) You can directly connect by ssh-ing into the clickhouse pod and using a client in the shell:

```
Desktop % kubectl exec seqr-clickhouse-shard0-0 -it -c clickhouse -- bash
I have no name!@seqr-clickhouse-shard0-0:/$ cd /opt/bitnami/clickhouse/bin/
I have no name!@seqr-clickhouse-shard0-0:/opt/bitnami/clickhouse/bin$ ./clickhouse client
```

Then you can navigate the database shell like:

```
USE seqr;
SHOW tables;
SELECT COUNT(*) FROM `GRCh38/SNV_INDEL/entries`;
```