Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Program stalled/zombified #15

Closed
oushujun opened this issue Jun 2, 2022 · 7 comments
Closed

Program stalled/zombified #15

oushujun opened this issue Jun 2, 2022 · 7 comments

Comments

@oushujun
Copy link

oushujun commented Jun 2, 2022

Hello,

I ran aligned_bam_to_cpg_scores.py on aligned reads and it got stalled after a few seconds. A few sequences were finished indicated by the log file and no more output. Taskmaster checking shows 0% CPU usage and low memory usage. I killed the program, removed the log file, changed to different nodes, and rerun the program, still the same. What could be the cause and how to solve it?

python ~/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20

Thanks,
Shujun

@dportik
Copy link
Member

dportik commented Jun 2, 2022

Hi @oushujun,
It would be helpful to post the log file. This would allow us to see what steps the threads are on when you see this behavior. I have not observed this happening with our test datasets.

@oushujun
Copy link
Author

oushujun commented Jun 2, 2022

Hello @dportik,

The log file looks very normal except the program stopped processing reads and sequences.
test-aligned_bam_to_cpg_scores.log

Below are the current tasks and all of them show 0-0.1% of CPU usage. The node is pretty idle, so there should be plenty of resources.

sou6     44933  0.1  0.2 7137972 502136 pts/0  Sl   10:00   0:12 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44982  0.0  0.0 6486996 140228 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44983  0.0  0.0 6486996 140232 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44984  0.0  0.0 6486996 140232 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44985  0.0  0.0 6486996 140236 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44986  0.0  0.0 6486996 140240 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44987  0.0  0.0 6486996 140240 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44988  0.0  0.0 6486996 140244 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44989  0.0  0.0 6486996 140248 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44990  0.0  0.0 6486996 140248 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44991  0.0  0.0 6486996 140252 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44992  0.0  0.0 6486996 140252 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44993  0.0  0.0 6486996 140248 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44994  0.0  0.0 6486996 140248 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44995  0.0  0.0 6486996 140248 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44996  0.0  0.0 6486996 140248 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44997  0.0  0.0 6486996 140252 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44998  0.0  0.0 6486996 140256 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     44999  0.0  0.0 6487252 140260 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45000  0.0  0.0 6487252 140264 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45001  0.0  0.0 6487252 140264 pts/0  S    10:00   0:00 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45005  0.0  0.1 6756532 207616 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45008  0.0  0.1 6888292 207720 pts/0  S    10:00   0:07 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45009  0.0  0.1 6888024 208144 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45010  0.0  0.1 6887784 207276 pts/0  S    10:00   0:05 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45011  0.0  0.1 6888528 207880 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45012  0.0  0.1 6889020 210248 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45013  0.0  0.1 6888104 207848 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45014  0.0  0.1 6888844 208920 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45015  0.0  0.1 6888836 208900 pts/0  S    10:00   0:05 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45016  0.0  0.1 6888264 208080 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45017  0.0  0.1 6888324 207876 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45018  0.0  0.1 6889640 209636 pts/0  S    10:00   0:05 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45019  0.0  0.1 6887532 207612 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45020  0.0  0.1 6888568 208440 pts/0  S    10:00   0:05 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45021  0.0  0.1 6888584 208568 pts/0  S    10:00   0:03 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45022  0.0  0.1 6888608 208476 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45023  0.0  0.1 6888604 208652 pts/0  S    10:00   0:06 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45024  0.0  0.1 6887536 207620 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45025  0.0  0.1 6888252 208068 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20
sou6     45026  0.0  0.1 6887576 207440 pts/0  S    10:00   0:04 python /home/sou6/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b HiFi.CpG.unmapped.CpG.aln.bam -f test.fa -o test --pileup_mode count --modsites reference --min_coverage 4 --min_mapq 1 --threads 20

Shujun

@dportik
Copy link
Member

dportik commented Jun 2, 2022

Hi @oushujun ,
The log file indicates the input BAM file is strange. Expected output from an aligned BAM would be:

07-Apr-22 08:44:13: DEBUG: coordinates chr1: 1-500,000: (1) run_process_region: start
07-Apr-22 08:44:13: DEBUG: coordinates chr1: 500,001-1,000,000: (1) run_process_region: start
07-Apr-22 08:44:13: DEBUG: coordinates chr1: 1,000,001-1,500,000: (1) run_process_region: start
07-Apr-22 08:44:13: DEBUG: coordinates chr1: 1,500,001-2,000,000: (1) run_process_region: start
07-Apr-22 08:44:13: DEBUG: coordinates chr1: 2,000,001-2,500,000: (1) run_process_region: start
07-Apr-22 08:44:13: DEBUG: coordinates chr1: 2,500,001-3,000,000: (1) run_process_region: start

Your log file is reporting:

02-Jun-22 10:00:40: DEBUG: coordinates m54333U_210725_040653/179/ccs: 1-25,378: (1) run_process_region: start
02-Jun-22 10:00:40: DEBUG: coordinates m54333U_210725_040653/268/ccs: 1-19,662: (1) run_process_region: start
02-Jun-22 10:00:40: DEBUG: coordinates m54333U_210725_040653/298/ccs: 1-13,902: (1) run_process_region: start
02-Jun-22 10:00:40: DEBUG: coordinates m54333U_210725_040653/364/ccs: 1-15,381: (1) run_process_region: start
02-Jun-22 10:00:40: DEBUG: coordinates m54333U_210725_040653/538/ccs: 1-12,357: (1) run_process_region: start

Those are not chromosomal coordinates, those are HiFi reads. Is this an unaligned BAM file?

@oushujun
Copy link
Author

oushujun commented Jun 2, 2022

This is an aligned BAM file, except I use HiFi reads to align to themselves, so that I can call CpGs for each read, as a workaround for #11.

Previously I used --pileup_mode model to process these alignments and it end up outputing nothing due to insufficient data, so I am trying the count mode but it got stuck.

I had similar stalled experiences before for normal genome alignments, and restarting the job usually would solve the issue. But this time no matter how many times I restarted the job, or changed to different nodes, it just didn't work.

Shujun

@dportik
Copy link
Member

dportik commented Jun 3, 2022

Hi @oushujun,
Unfortunately we did not design the script for this specific use-case, which will make troubleshooting difficult. Maybe we can revisit issue #11 instead.

@oushujun
Copy link
Author

oushujun commented Jun 4, 2022

I have this issue in some of the other genomes too. Some genomes were also stalled in the middle of the run without any more progress, while some were running in much lower CPU% and were very slow to finish (eg, a run that could be finished in two hours but took 3 days instead).

Maybe it's due to the presence of contigs/sequences shorter than the window size? Or due to the presence of low-coverage sequences? For analyses of genomes, the stalling issue could be resolved by restarting the analyses. Could you help to check if there are any codes causing the low CPU stalling issue?

@ctsa
Copy link
Member

ctsa commented Sep 15, 2022

Thanks @oushujun, unfortunately I don't think we can scope in this use-case.

@ctsa ctsa closed this as completed Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants