Skip to content

Commit

Permalink
Merge pull request #401 from OpenCOMPES/data-fetch
Browse files Browse the repository at this point in the history
Module for datasets in SED
  • Loading branch information
zain-sohail authored Jun 14, 2024
2 parents 7e5b76a + c1939fd commit f555a74
Show file tree
Hide file tree
Showing 21 changed files with 1,143 additions and 144 deletions.
17 changes: 3 additions & 14 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,24 +76,13 @@ jobs:
- name: download RAW data
# if: steps.cache-primes.outputs.cache-hit != 'true'
run: |
cd $GITHUB_WORKSPACE/docs/tutorial
curl -L --output ./WSe2.zip https://zenodo.org/record/6369728/files/WSe2.zip
unzip -o ./WSe2.zip -d .
rm WSe2.zip
touch WSe2.zip
curl -L --output ./TaS2.zip https://zenodo.org/records/10160182/files/TaS2.zip
unzip -o ./TaS2.zip -d .
rm TaS2.zip
touch TaS2.zip
curl -L --output ./Gd_W110_flash.zip https://zenodo.org/records/10658470/files/single_event_data.zip
unzip -o ./Gd_W110_flash.zip -d .
rm Gd_W110_flash.zip
touch Gd_W110_flash.zip
cd $GITHUB_WORKSPACE/docs
poetry run python scripts/download_data.py
- name: build Flash parquet files
run: |
cd $GITHUB_WORKSPACE/docs
poetry run python build_flash_parquets.py
poetry run python scripts/build_flash_parquets.py
- name: build Sphinx docs
run: poetry run sphinx-build -b html $GITHUB_WORKSPACE/docs $GITHUB_WORKSPACE/_build
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,10 @@
*.nx
*.nxs
*.zip
**/datasets/*
**/processed/*
**/sed_config.yaml
**/datasets.json

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
39 changes: 0 additions & 39 deletions docs/build_flash_parquets.py

This file was deleted.

1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Single-Event DataFrame (SED) documentation
sed/loader
sed/binning
sed/calibrator
sed/dataset
sed/diagnostic
sed/io
sed/metadata
Expand Down
29 changes: 29 additions & 0 deletions docs/scripts/build_flash_parquets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from pathlib import Path

import sed
from sed import SedProcessor
from sed.dataset import dataset

config_file = Path(sed.__file__).parent / "config/flash_example_config.yaml"

dataset.get("Gd_W110", root_dir="./tutorial")
data_path = dataset.dir


config_override = {
"core": {
"paths": {
"data_raw_dir": data_path,
"data_parquet_dir": data_path + "/processed/",
},
},
}

runs = ["44762", "44797", "44798", "44799", "44824", "44825", "44826", "44827"]
for run in runs:
sp = SedProcessor(
runs=run,
config=config_override,
system_config=config_file,
collect_metadata=False,
)
7 changes: 7 additions & 0 deletions docs/scripts/download_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""run script from docs directory"""
from sed.dataset import dataset

root_dir = "./tutorial"
dataset.get("WSe2", remove_zip=True, root_dir=root_dir)
dataset.get("Gd_W110", remove_zip=True, root_dir=root_dir)
dataset.get("TaS2", remove_zip=True, root_dir=root_dir)
Loading

0 comments on commit f555a74

Please sign in to comment.