Dataset Selection and Pathing by MattsonCam · Pull Request #17 · WayScience/nuclear_speckles_analysis

MattsonCam · 2026-05-08T20:03:31Z

This pr updates dataset selection and pathing so that I can easily train on different datasets. The main branch will likely not be updated each time I train a new model or perform fine-tuning. Instead, I plan to log the runs in main and develop on different branches. However, I still want to make all of the models available in main. In a future prs I will modify the splitting based on the dataset selected and include additional models.

Point training, analysis scripts, notebooks, and documentation to the new dataset location at /mnt/big_drive/nuclear_speckle_data/initial_dataset/initial_dataset_raw. Also make train.py data-root configurable via --data-dir and NUCLEAR_SPECKLES_DATA_DIR, and remove the stale repo-local dataset ignore entry.

Switch crop cache generation to use U2OS flat TIFF + parquet inputs with underscore-based filename parsing (plate/well/site/channel) and hardcoded U2OS default paths in training. Remove mask-dependent crop processing and use bbox-only CH0->CH2 crop extraction with excluded-folder filtering. Update documentation to match the current state by removing the detailed U2OS-specific data pipeline block from README and keeping the top-level project description concise.

Refactor training and cache generation to support selectable dataset configs with per-dataset paths, channel mappings, and schema normalization. U2OS now maps DAPI->CH01 and Gold->CH03, while initial keeps DAPI->CH0 and Gold->CH2, with Image_Metadata_* columns remapped for profile compatibility and top-level parquet directory loading enabled. Move cache outputs to dataset-specific model_cache roots, isolate crop/tensor cache directories per dataset root, and update README usage/docs for dataset selection, channel mappings, and cache locations.

Uppercase parsed/input/target channel IDs in crop cache building to avoid case-sensitivity mismatches during cache lookup and manifest validation. Add Metadata_Position -> Metadata_Site mapping for U2OS so dataset metadata aligns with the training pipeline expectations.

review-notebook-app · 2026-05-08T20:03:36Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

MattsonCam · 2026-05-08T20:05:38Z

The vanilla Unet models were in the repo before, but I accidentally removed them so I added them back in so that training can proceed with this model pipeline

Cameron Mattson added 5 commits May 7, 2026 13:31

added a unet model

b173187

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Selection and Pathing#17

Dataset Selection and Pathing#17
MattsonCam wants to merge 5 commits intomainfrom
train_switch_datasets

MattsonCam commented May 8, 2026

Uh oh!

review-notebook-app Bot commented May 8, 2026

Uh oh!

MattsonCam commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MattsonCam commented May 8, 2026

Uh oh!

review-notebook-app Bot commented May 8, 2026

Uh oh!

MattsonCam commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant