Skip to content

3742 create data backend #64

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: otter
Choose a base branch
from
Open

Conversation

jdhayhurst
Copy link
Contributor

@jdhayhurst jdhayhurst commented Apr 16, 2025

Added the steps to create the platform backend. Resolves opentargets/issues#3742 and opentargets/issues#3743

The following steps have been added:

  • data prep
  • opensearch load
  • clickhouse load
  • Google disk snapshots
  • tarballing
  • bigquery load

add snapshot method

clean up

add create index

add opensearch load

add restore, fix health check

fix parquet globbing, add load all step, make opensearch client based on host/port

fix specs, move snapshot repo conf out of start task

add scratchpad to config

add dataset config

added more indices, resolve _id from config

workaround for coloc

refactor _id field assignment

add explode_datasets

fix for create index race condition

add variant mappings

rearrange pipeline

tidy up

add evidence, slight improvement to load generator

fix paths

formatting

allow path to be empty
add gcloud

fix create disk image

update dataset paths
fix storage location

fix snapshot

fix snapshot

fix disk image

fix disk image

just do snapshot

fix snapshot step in config

add labels

fix labels

increase snapshot timeout

Update README.md
add clickhouse stop step

move clickhouse config

add skeleton for loading to clickhouse
Update config.yaml

add bigquery

add bigquery dev step

Delete bigquery.py

update datatsets and add docstrings to bigquery.py

fix bq access

update bigquery datasets, enable hive partitioning

update bigquery tables and fix hive partitioning
added clickhouse load, updated w2v sql to drop log table
add timeout reset to health check

reorganise services, added run_container helper method
update doc strings

rename step
… image if not exists

increase default timeout for opensearch

build the image only if the image is requested and not found
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load BigQuery: split into tasks Create Platform data backend: split into tasks
1 participant