Skip to content

Commit

Permalink
Some small fixes after testing
Browse files Browse the repository at this point in the history
  • Loading branch information
tbooth committed May 3, 2024
1 parent 94ed872 commit ade7c58
Show file tree
Hide file tree
Showing 4 changed files with 34 additions and 7 deletions.
12 changes: 6 additions & 6 deletions Snakefile.main
Original file line number Diff line number Diff line change
Expand Up @@ -270,12 +270,12 @@ def count_up_passing(counts, cutoff=None, include_unclassified=True):

# Global wildcard patterns
wildcard_constraints:
pod5file = "\w+",
fullid = "[^/]+",
barcode = "[^/_]+",
pf = "pass|fail",
pfs = "pass|fail|skip",
_pfs = "_pass|_fail|_skip|",
pod5file = r"\w+",
fullid = r"[^/]+",
barcode = r"[^/_]+",
pf = r"pass|fail",
pfs = r"pass|fail|skip",
_pfs = r"_pass|_fail|_skip|",

# Main target is one yaml file (of metadata) per cell. A little bit like statfrombam.yml in the
# project QC pipelines.
Expand Down
2 changes: 1 addition & 1 deletion Snakefile.rundata
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ def i_copy_md5sum_pod5(wc):
rule copy_md5sum_pod5:
output:
pod5 = "{cell}/pod5_{barcode}{_pfs}/{pod5file}.pod5",
md5 = "md5sums/{cell}/pod5_{barcode}{_pfs}/{pod5file}.pod5.md5"
md5 = temp("md5sums/{cell}/pod5_{barcode}{_pfs}/{pod5file}.pod5.md5")
input:
i_copy_md5sum_pod5
params:
Expand Down
26 changes: 26 additions & 0 deletions doc/pod5_remake.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,29 @@ May 2024. So ONT have finally fixed the batching of the POD5 files themselves. N
to copy the files and md5sum them. So I can rip out all the batching logic and just replicate
what I previously had for fast5. Well I guess that was a waste of my time after all. I'll get
that sorted when I can.

(Friday 2nd)
It's working and has made a test report. Let us check:

1) Does the report look OK (Did I ever report the total number of POD5 files)?

Yes it does.
Snakefile.main get_cell_info() sets ci['Files in pod5'] and this gets saved into
cell_info.yaml. Here "Files in pod5: 73" which disregards the skipped files and is not
the same as "pod5_files_in_final_dest: 75" in the final summary. I think this is OK.
I'll add this to the report.

2) Are all the POD5 files there?
With the md5sums? - yes, need to make individual files temp
What about the _pass and _fail? - not there
What about the _skip? - not there

3) Do the POD5 files segragate by channel? Would be cool if they did.

$ pod5 view pod5_file.pod5 -i "read_id,channel"

No such luck.

4) Data delivery needs testing, and inevitably fixing.

I'll release and run the code first.
1 change: 1 addition & 0 deletions make_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ def get_cell_metadata(ci):
(None, 'Run Time'),
('Files in pass',),
('Files in fail',),
('Files in pod5', 'POD5 files (excluding skipped)'),
('SequencingKit', 'Sequencing Kit'),
('Software', 'Software'),
('BasecallConfig', 'Basecaller Config'),
Expand Down

0 comments on commit ade7c58

Please sign in to comment.