Skip to content

Dfp pclr yoda #596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Dfp pclr yoda #596

wants to merge 17 commits into from

Conversation

kilonzi
Copy link
Collaborator

@kilonzi kilonzi commented Apr 30, 2025

This pull request introduces a deployment pipeline for the PCLR model, including schema definitions, Docker containerization, and preprocessing/postprocessing scripts. The changes focus on defining input/output formats, creating a Dockerized environment for processing, and implementing scripts for preparing and finalizing data.

Deployment Pipeline Setup:

  • Model Schema Definition:
    Added a JSON schema in pclr_model_schema.json to specify the model's input (ecg_tensor) and output (embed) formats, including their shapes and data types.

  • Dockerfile for Processing:
    Created a Dockerfile to set up a lightweight Python 3.9-based environment for running the preprocessing (prepare.py) and postprocessing (finalize.py) scripts. It installs dependencies from requirements.txt and sets the entry point to Python.

Data Processing Scripts:

  • Preprocessing Script (prepare.py):
    Added a script to process raw ECG files into HDF5 tensor format. It reads input CSVs, extracts ECG data from files, interpolates and normalizes the data, and saves it in a structured HDF5 format.

  • Postprocessing Script (finalize.py):
    Added a script to merge model predictions with input metadata. It reads a CSV of metadata and a JSON of predictions, validates dimensions, and outputs a combined CSV with embeddings appended.

  • Dependencies:
    Added a requirements.txt file listing the necessary Python libraries (pandas, numpy, h5py, smart-open[gcs]) for preprocessing and postprocessing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants