-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi-C workflow #139
base: main
Are you sure you want to change the base?
Hi-C workflow #139
Changes from 124 commits
c8b17b2
55f09d8
0a025a0
d374a51
a5cd476
6bd01bf
b531cc5
1357d55
0279f9e
c27bdf3
e4dba06
f8a8cf2
32be249
9567e04
e029ca0
182a31a
7dd86e1
22e7523
8c76606
eee143e
06c2240
ee0cff8
118e794
39fe94b
961ce2e
0e450dd
0b39a96
b150efe
8c95537
eaa5540
2560177
134cf76
3b8f797
3564657
fb61aa3
f143eea
d65883b
54489e8
f6ae399
5c7ab92
495863f
98f052a
f2a556b
332cbc0
d2de023
fb31903
5e59bf9
34287e3
ae579ee
3477fc8
b5e48e2
b9fe3d0
b1cc7d9
3875ba1
0be8e40
23c71c5
3276579
76c1b27
7d8a751
149520f
5e8349d
8ca023b
b68e666
551a155
984127c
f1cc66f
c131c11
bb99cf2
f206b88
95f31eb
84fdf72
c98c4bd
e78e7e3
66a5645
47623fe
2402098
9824304
4b1a97c
3e06810
0aa4f97
ce77e28
275c5d8
e837f8a
f35f38b
cf471e6
380fe6c
160a3aa
86cbd2f
40d1763
01a1ece
334a34e
0287b2b
9f8331a
09399c9
7bf0db8
3adc65c
8e918db
143e9bf
a06c495
b96b348
5168c62
17ae8a9
ad3364f
42fe071
f553ae4
64186e5
3139e69
a262420
de835c3
058fbac
49e1c29
02ae409
27f9e2a
de9b0be
df85b2e
8ac2922
6923f27
cafe359
827dd75
1038702
c5eb1fa
20cff40
723e77c
9414122
55c7b90
0534ab8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
FROM quay.io/biocontainers/bedtools:2.31.1--hf5e1c6e_2 AS bedtools | ||
FROM python:3.13.0 | ||
|
||
COPY --from=bedtools /usr/local/bin/ /usr/local/bin/ | ||
COPY --from=scripts --chmod=777 hic/filter_hic.py /usr/local/bin/filter_hic.py | ||
|
||
RUN apt update && \ | ||
apt install bc | ||
|
||
ENTRYPOINT [ "bash" ] |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
FROM quay.io/biocontainers/samtools:1.17--h00cdaf9_0 AS samtools | ||
FROM quay.io/biocontainers/bowtie2:2.5.4--he20e202_2 | ||
|
||
COPY --from=samtools /usr/local/bin/ /usr/local/bin/ | ||
COPY --from=samtools /usr/local/lib/ /usr/local/lib/ | ||
COPY --from=samtools /usr/local/libexec/ /usr/local/libexec/ | ||
|
||
ENTRYPOINT [ "bowtie2" ] |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we maybe consolidate some of these new Docker images? It looks like we can maybe merge There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought we were trying to avoid these types of monolithic images? We can do that, but is that the direction we want to go? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm. There's a balance to be found, but I'm not sure what it is yet... Responding to #139 (comment) in this thread to consolidate conversation. Maybe we can merge Thoughts on that? |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
FROM python:3.13.0 | ||
|
||
COPY --from=scripts --chmod=777 hic/qc_hic.py /usr/local/bin/qc_hic.py |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
FROM nservant/hicpro:3.0.0 AS hicpro | ||
FROM aidenlab/juicer:1.0.13 | ||
|
||
COPY --from=hicpro /HiC-Pro_3.0.0/bin/utils/hicpro2juicebox.sh /HiC-Pro_3.0.0/bin/utils/hicpro2juicebox.sh | ||
RUN chmod a+rwx /opt/juicer-1.6.2/CPU/common/juicer_tools.1.7.6_jcuda.0.8.jar | ||
|
||
ENTRYPOINT [ "bash" ] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
|
||
from collections import defaultdict | ||
import argparse | ||
|
||
def get_args(): | ||
parser = argparse.ArgumentParser( | ||
description="Filter Hi-C data.") | ||
parser.add_argument( | ||
"--prefix", type=str, help="Prefix for output file.") | ||
parser.add_argument( | ||
"--all_valid_pairs", type=str, help="All valid pairs file.") | ||
parser.add_argument( | ||
"--filter_pairs", type=str, help="Filter pairs file.") | ||
|
||
args = parser.parse_args() | ||
return args | ||
|
||
if __name__ == "__main__": | ||
args = get_args() | ||
|
||
f=open(args.filter_pairs) | ||
blackID=defaultdict(int) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like this was just a copy+paste, but can we use proper casing in this file? And There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably wise. These are just copy+paste. |
||
while True: | ||
line=f.readline() | ||
if not line: | ||
break | ||
cols=line.strip().split("\t") | ||
ID=cols[0] | ||
blackID[ID]=1 | ||
|
||
f2=open(args.all_valid_pairs) | ||
outfile1=args.prefix + ".allValidPairs.filtered" | ||
outfile2=args.prefix + ".allValidPairs.removed" | ||
of1=open(outfile1,'w') | ||
of2=open(outfile2,'w') | ||
while True: | ||
line=f2.readline() | ||
if not line: | ||
break | ||
cols=line.strip().split("\t") | ||
id=cols[0] | ||
if id in blackID: | ||
of2.write(line) | ||
else: | ||
of1.write(line) |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we incorporate some Python tooling? A linter of some sort, formatter, etc? Don't really want to ask you to rewrite all these py scripts which appear to be mostly copy+paste, but also they aren't really conformant to any Python standards... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I ran both of these scripts through There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Running through My major concern is that with the maintenance of a There's a million and one Python lint frameworks out there, and I don't have strong feelings about which to use, but I do think we should be using something. My local VScode set-up uses the Maybe we can add a |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
import os | ||
import argparse | ||
|
||
def get_args(): | ||
parser = argparse.ArgumentParser( | ||
description="QC a Hi-C experiment.") | ||
parser.add_argument( | ||
"--all_valid_pairs_stats", type=str, help="All valid pairs stats file.") | ||
parser.add_argument( | ||
"--pairing_stats", type=str, help="Pairing stats file.") | ||
parser.add_argument( | ||
"--mapping_stats_read1", type=str, help="Mapping stats for read 1.") | ||
parser.add_argument( | ||
"--mapping_stats_read2", type=str, help="Mapping stats for read 2.") | ||
parser.add_argument( | ||
"--fithichip_bed", type=str, help="FithiChIP bed file.") | ||
parser.add_argument( | ||
"--fithichip_q01_bed", type=str, help="FithiChIP q01 bed file.") | ||
parser.add_argument( | ||
"--peaks_bed", type=str, help="Peaks bed file.") | ||
parser.add_argument( | ||
"--prefix", type=str, help="Prefix for output file.") | ||
|
||
|
||
args = parser.parse_args() | ||
|
||
return args | ||
|
||
if __name__ == "__main__": | ||
args = get_args() | ||
FILES = [args.all_valid_pairs_stats, args.pairing_stats, args.mapping_stats_read1, args.mapping_stats_read2] | ||
VARIABLE = ["valid_interaction", "cis_shortRange", "cis_longRange", "Total_pairs_processed", "Reported_pairs", "total_R2", "mapped_R2", "total_R1", "mapped_R1"] | ||
RESULTS = {} | ||
for eachfile in FILES: | ||
with open(eachfile) as f: | ||
for line in f: | ||
array = line.split() | ||
if array[0] in VARIABLE: | ||
RESULTS[array[0]] = int(array[1]) | ||
|
||
PERCENTAGES = {} | ||
COUNTS = {} | ||
VERDICT = {} | ||
REP_VAR = ["R1_aligned", "R2_aligned", "valid_interactionPairs", "cis_shortRange", "cis_longRange"] | ||
|
||
if args.fithichip_bed is not None and os.path.isfile(args.fithichip_bed): | ||
with open(args.fithichip_q01_bed) as fithichip: | ||
LOOPS_SIGNIFICANT = len(fithichip.readlines())-1 | ||
with open(args.fithichip_bed) as fithichip: | ||
LOOPS = len(fithichip.readlines())-1 | ||
if args.peaks_bed is not None and os.path.isfile(args.peaks_bed): | ||
with open(args.peaks_bed) as peaks: | ||
PEAKS = len(peaks.readlines())-1 | ||
|
||
PERCENTAGES["R1_aligned"] = round((RESULTS["mapped_R1"]*100)/RESULTS["total_R1"]) | ||
PERCENTAGES["R2_aligned"] = round((RESULTS["mapped_R2"]*100)/RESULTS["total_R2"]) | ||
PERCENTAGES["valid_interactionPairs"] = round((RESULTS["valid_interaction"]*100)/RESULTS["Reported_pairs"]) | ||
PERCENTAGES["cis_shortRange"] = round((RESULTS["cis_shortRange"]*100)/RESULTS["valid_interaction"]) | ||
PERCENTAGES["cis_longRange"] = round((RESULTS["cis_longRange"]*100)/RESULTS["valid_interaction"]) | ||
COUNTS["R1_aligned"] = RESULTS["mapped_R1"] | ||
COUNTS["R2_aligned"] = RESULTS["mapped_R2"] | ||
COUNTS["valid_interactionPairs"] = RESULTS["valid_interaction"] | ||
COUNTS["cis_shortRange"] = RESULTS["cis_shortRange"] | ||
COUNTS["cis_longRange"] = RESULTS["cis_longRange"] | ||
|
||
for aligned in ["R1_aligned", "R2_aligned"]: | ||
if PERCENTAGES[aligned] > 80: | ||
VERDICT[aligned] = "GOOD" | ||
else: | ||
VERDICT[aligned] = "BAD" | ||
|
||
if PERCENTAGES["valid_interactionPairs"] > 50: | ||
VERDICT["valid_interactionPairs"] = "GOOD" | ||
else: | ||
VERDICT["valid_interactionPairs"] = "BAD" | ||
|
||
if PERCENTAGES["cis_shortRange"] > 50: | ||
VERDICT["cis_shortRange"] = "BAD" | ||
elif PERCENTAGES["cis_shortRange"] > 30: | ||
VERDICT["cis_shortRange"] = "MARGINAL" | ||
else: | ||
VERDICT["cis_shortRange"] = "GOOD" | ||
|
||
if PERCENTAGES["cis_longRange"] > 40: | ||
VERDICT["cis_longRange"] = "GOOD" | ||
elif PERCENTAGES["cis_longRange"] > 20: | ||
VERDICT["cis_longRange"] = "MARGINAL" | ||
else: | ||
VERDICT["cis_longRange"] = "BAD" | ||
|
||
REPORT = open(args.prefix + "_QCreport.txt", 'w') | ||
REPORT.write("STAT\tCOUNTS\tPERCENTAGE\tVERDICT\n") | ||
REPORT.write("Total_pairs_processed\t" + str(RESULTS["Total_pairs_processed"]) + "\n") | ||
for var in REP_VAR: | ||
REPORT.write(var + "\t" + str(COUNTS[var]) + "\t" + str(PERCENTAGES[var]) + "\t" + VERDICT[var] + '\n') | ||
|
||
if args.peaks_bed is not None and os.path.isfile(args.peaks_bed): | ||
REPORT.write("peaks\t" + str(PEAKS) + "\n") | ||
if args.fithichip_bed is not None and os.path.isfile(args.fithichip_bed): | ||
REPORT.write("loops\t" + str(LOOPS) + "\n") | ||
REPORT.write("loops_significant\t" + str(LOOPS_SIGNIFICANT) + "\n") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
chr1 248956422 | ||
chr2 242193529 | ||
chr3 198295559 | ||
chr4 190214555 | ||
chr5 181538259 | ||
chr6 170805979 | ||
chr7 159345973 | ||
chr8 145138636 | ||
chr9 138394717 | ||
chr10 133797422 | ||
chr11 135086622 | ||
chr12 133275309 | ||
chr13 114364328 | ||
chr14 107043718 | ||
chr15 101991189 | ||
chr16 90338345 | ||
chr17 83257441 | ||
chr18 80373285 | ||
chr19 58617616 | ||
chr20 64444167 | ||
chr21 46709983 | ||
chr22 50818468 | ||
chrX 156040895 | ||
chrY 57227415 | ||
chrM 16569 | ||
chr1_KI270706v1_random 175055 | ||
chr1_KI270707v1_random 32032 | ||
chr1_KI270708v1_random 127682 | ||
chr1_KI270709v1_random 66860 | ||
chr1_KI270710v1_random 40176 | ||
chr1_KI270711v1_random 42210 | ||
chr1_KI270712v1_random 176043 | ||
chr1_KI270713v1_random 40745 | ||
chr1_KI270714v1_random 41717 | ||
chr2_KI270715v1_random 161471 | ||
chr2_KI270716v1_random 153799 | ||
chr3_GL000221v1_random 155397 | ||
chr4_GL000008v2_random 209709 | ||
chr5_GL000208v1_random 92689 | ||
chr9_KI270717v1_random 40062 | ||
chr9_KI270718v1_random 38054 | ||
chr9_KI270719v1_random 176845 | ||
chr9_KI270720v1_random 39050 | ||
chr11_KI270721v1_random 100316 | ||
chr14_GL000009v2_random 201709 | ||
chr14_GL000225v1_random 211173 | ||
chr14_KI270722v1_random 194050 | ||
chr14_GL000194v1_random 191469 | ||
chr14_KI270723v1_random 38115 | ||
chr14_KI270724v1_random 39555 | ||
chr14_KI270725v1_random 172810 | ||
chr14_KI270726v1_random 43739 | ||
chr15_KI270727v1_random 448248 | ||
chr16_KI270728v1_random 1872759 | ||
chr17_GL000205v2_random 185591 | ||
chr17_KI270729v1_random 280839 | ||
chr17_KI270730v1_random 112551 | ||
chr22_KI270731v1_random 150754 | ||
chr22_KI270732v1_random 41543 | ||
chr22_KI270733v1_random 179772 | ||
chr22_KI270734v1_random 165050 | ||
chr22_KI270735v1_random 42811 | ||
chr22_KI270736v1_random 181920 | ||
chr22_KI270737v1_random 103838 | ||
chr22_KI270738v1_random 99375 | ||
chr22_KI270739v1_random 73985 | ||
chrY_KI270740v1_random 37240 | ||
chrUn_KI270302v1 2274 | ||
chrUn_KI270304v1 2165 | ||
chrUn_KI270303v1 1942 | ||
chrUn_KI270305v1 1472 | ||
chrUn_KI270322v1 21476 | ||
chrUn_KI270320v1 4416 | ||
chrUn_KI270310v1 1201 | ||
chrUn_KI270316v1 1444 | ||
chrUn_KI270315v1 2276 | ||
chrUn_KI270312v1 998 | ||
chrUn_KI270311v1 12399 | ||
chrUn_KI270317v1 37690 | ||
chrUn_KI270412v1 1179 | ||
chrUn_KI270411v1 2646 | ||
chrUn_KI270414v1 2489 | ||
chrUn_KI270419v1 1029 | ||
chrUn_KI270418v1 2145 | ||
chrUn_KI270420v1 2321 | ||
chrUn_KI270424v1 2140 | ||
chrUn_KI270417v1 2043 | ||
chrUn_KI270422v1 1445 | ||
chrUn_KI270423v1 981 | ||
chrUn_KI270425v1 1884 | ||
chrUn_KI270429v1 1361 | ||
chrUn_KI270442v1 392061 | ||
chrUn_KI270466v1 1233 | ||
chrUn_KI270465v1 1774 | ||
chrUn_KI270467v1 3920 | ||
chrUn_KI270435v1 92983 | ||
chrUn_KI270438v1 112505 | ||
chrUn_KI270468v1 4055 | ||
chrUn_KI270510v1 2415 | ||
chrUn_KI270509v1 2318 | ||
chrUn_KI270518v1 2186 | ||
chrUn_KI270508v1 1951 | ||
chrUn_KI270516v1 1300 | ||
chrUn_KI270512v1 22689 | ||
chrUn_KI270519v1 138126 | ||
chrUn_KI270522v1 5674 | ||
chrUn_KI270511v1 8127 | ||
chrUn_KI270515v1 6361 | ||
chrUn_KI270507v1 5353 | ||
chrUn_KI270517v1 3253 | ||
chrUn_KI270529v1 1899 | ||
chrUn_KI270528v1 2983 | ||
chrUn_KI270530v1 2168 | ||
chrUn_KI270539v1 993 | ||
chrUn_KI270538v1 91309 | ||
chrUn_KI270544v1 1202 | ||
chrUn_KI270548v1 1599 | ||
chrUn_KI270583v1 1400 | ||
chrUn_KI270587v1 2969 | ||
chrUn_KI270580v1 1553 | ||
chrUn_KI270581v1 7046 | ||
chrUn_KI270579v1 31033 | ||
chrUn_KI270589v1 44474 | ||
chrUn_KI270590v1 4685 | ||
chrUn_KI270584v1 4513 | ||
chrUn_KI270582v1 6504 | ||
chrUn_KI270588v1 6158 | ||
chrUn_KI270593v1 3041 | ||
chrUn_KI270591v1 5796 | ||
chrUn_KI270330v1 1652 | ||
chrUn_KI270329v1 1040 | ||
chrUn_KI270334v1 1368 | ||
chrUn_KI270333v1 2699 | ||
chrUn_KI270335v1 1048 | ||
chrUn_KI270338v1 1428 | ||
chrUn_KI270340v1 1428 | ||
chrUn_KI270336v1 1026 | ||
chrUn_KI270337v1 1121 | ||
chrUn_KI270363v1 1803 | ||
chrUn_KI270364v1 2855 | ||
chrUn_KI270362v1 3530 | ||
chrUn_KI270366v1 8320 | ||
chrUn_KI270378v1 1048 | ||
chrUn_KI270379v1 1045 | ||
chrUn_KI270389v1 1298 | ||
chrUn_KI270390v1 2387 | ||
chrUn_KI270387v1 1537 | ||
chrUn_KI270395v1 1143 | ||
chrUn_KI270396v1 1880 | ||
chrUn_KI270388v1 1216 | ||
chrUn_KI270394v1 970 | ||
chrUn_KI270386v1 1788 | ||
chrUn_KI270391v1 1484 | ||
chrUn_KI270383v1 1750 | ||
chrUn_KI270393v1 1308 | ||
chrUn_KI270384v1 1658 | ||
chrUn_KI270392v1 971 | ||
chrUn_KI270381v1 1930 | ||
chrUn_KI270385v1 990 | ||
chrUn_KI270382v1 4215 | ||
chrUn_KI270376v1 1136 | ||
chrUn_KI270374v1 2656 | ||
chrUn_KI270372v1 1650 | ||
chrUn_KI270373v1 1451 | ||
chrUn_KI270375v1 2378 | ||
chrUn_KI270371v1 2805 | ||
chrUn_KI270448v1 7992 | ||
chrUn_KI270521v1 7642 | ||
chrUn_GL000195v1 182896 | ||
chrUn_GL000219v1 179198 | ||
chrUn_GL000220v1 161802 | ||
chrUn_GL000224v1 179693 | ||
chrUn_KI270741v1 157432 | ||
chrUn_GL000226v1 15008 | ||
chrUn_GL000213v1 164239 | ||
chrUn_KI270743v1 210658 | ||
chrUn_KI270744v1 168472 | ||
chrUn_KI270745v1 41891 | ||
chrUn_KI270746v1 66486 | ||
chrUn_KI270747v1 198735 | ||
chrUn_KI270748v1 93321 | ||
chrUn_KI270749v1 158759 | ||
chrUn_KI270750v1 148850 | ||
chrUn_KI270751v1 150742 | ||
chrUn_KI270752v1 27745 | ||
chrUn_KI270753v1 62944 | ||
chrUn_KI270754v1 40191 | ||
chrUn_KI270755v1 36723 | ||
chrUn_KI270756v1 79590 | ||
chrUn_KI270757v1 71251 | ||
chrUn_GL000214v1 137718 | ||
chrUn_KI270742v1 186739 | ||
chrUn_GL000216v2 176608 | ||
chrUn_GL000218v1 161147 | ||
chrEBV 171823 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this version dir should be
2.31.1-0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is named after
bedtools
, why does it havehic
scripts in it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, it was just plain
bedtools
, but with the decision to remove the embedded scripts, those needed to get built in to some image and this one depends onbedtools
. I can go somewhere else, but then we'll need to installbedtools
in to another container.