-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating point exception in break10x #15
Comments
Hi Chris, Thanks for reporting errors on break10x. Please go to the working directory and send me the information of "ls -lrt"? Best regards, Zemin |
Hi Zemin, The tmp directory has: -rw-r--r-- 1 claumer marioni 475390678 Feb 16 12:55 tarseq.fastq One thought: I started with reads that had been already run through 10X's longranger basic program (which trims reads, error corrects barcodes, etc). Could it be that this broke break10x? Is it expecting raw reads? |
Thanks for the information. Looks that there are files which are missing and also there are some files which shouldn't be there. Yes, you can use a bam file from longranger. Try this ~zn1/src/Scaff10X/src/break10x -nodes 2 -gap 100 -reads 5 -score 20 -cover 50 -ratio 15 -bam /lustre/scratch117/sciops/team117/hpag/zn1/project/bird/hummingbird/QC/refdata-cns_p_ctg/fasta/possorted_bam.bam genome.fa genome-break10x.fasta > try.out Note: the bam file needs a full path. Let me know how it goes. Cheers. Zemin |
Hi Zemin, To clarify, I meant the unaligned fastqs that you get after running the "longranger basic" command. But I did actually try on the raw reads as well after I had this thought, and got the same issue as I originally wrote about above. I'm trying to use Scaff10x on the EBI cluster, not the Sanger farm, so I had to scp the files you mentioned over. But I did this, and ran the command exactly as you suggested, and it looks like it finished without any major errors (although the genome-break10x.fasta is the same as the input). try.out contains: Input target assembly file: /nfs/research1/marioni/claumer/Scaff10X_debug/genome.fa And stderr contains: sh: -c: line 0: syntax error near unexpected token Is this expected behavior? Would be keen to hear how you think we should progress from here in solving this issue. -Chris L |
Hi Chris, Sorry I have been away for two days and didn't reply your email in time. With the run using possorted_bam.bam, could you check the sizes of the files in the temporary: If the sizes for the above files are not zero, the size of break.dat is zero, this means there is no breakpoint found in the assembly. My pipeline break10x has an issue with zero breakpoints and does not report results correctly. I will try to fix this when I have time. At the moment, if you do not see the file scaffolds-break.name and the input fasta and output fasta files are the same, this means no breakpoints. Sorry again for the late reply. Zemin |
If you only run "longranger basic", you won't get very much from the pipeline. I would suggest you do "longranger align" or use the fastq files. |
Hi Zemin, Sorry also on my part, I've been doing some intensive labwork the last couple days and let this fall to the wayside momentarily. So the tmp directory contains: (base) [claumer@noah-login-01 tmp_rununik_64971]$ ls -lt Which suggests from your comment that break10x worked properly on this dataset but did not find any linked-read supported breakpoints. That implies that this install works, but for some reason my dataset is breaking break10x, right? I'm not sure quite what to do next - maybe I can share my reads and assemblies with you and you can examine the problem for yourself? I do know that it is a decent linked read dataset, with deep coverage and good numbers of links per molecule, although the molecule length is a bit low. I have tried it on both raw fastqs and the trimmed, barcode-in-read-name fastq format output by "longranger basic", and it has failed with both. Thanks, |
Hi Chris, Happy to have a look at your data if you could copy your files read fastqs and assembly to a Sanger location. Best regards, Zemin |
OK, you can grab the files in question at: /lustre/scratch117/cellgen/team220/cl16/DalyG_GyrF_10X The fastqs starting with DalyG* are the raw linked reads for the species in this assembly (ignore the GyrF* files). The two assemblies I have been trying to break and then scaffold with Scaff10x are called primary_polished.fasta (a FALCON assembly) and DalyG_HiFi_q10_redbean_all.ctg.fa (a wtdbg2 assembly). Regards, |
Hi Chris, I have run scaff10x/break10x on your dataset and I have put all the results at /lustre/scratch117/sciops/team117/hpag/zn1/project/DalyG A few points:
https://github.com/dfguan/purge_dups Let me know if you need further information. Zemin |
Hi Zemin, That's really interesting, and I thank you very much for your work on these data. Yes, I'm aware of the length issue with the linked reads - it appears that, because the HMW DNA was stored at 4 C in low concentration for several weeks while a batch was being assembled (the library prep was not done by me, unfortunately), the DNA had a chance to degrade a bit. Still, definitely some utility in these linked reads. I'll follow up separately more directly with you by email about these particular steps - and in a sense my issue is solved, Scaff10x works on my data at least on the Sanger cluster - but it would be good to know why the installation I made at EBI is not working on these data. What would you recommend trying next - is it possible that I need to use exactly the same GCC version you are, rather than gcc 7.1.0? I'd love to be able to reproduce what you've done on a different cluster. Regards, |
Hi there,
I'm running break10x on my cluster as follows:
/nfs/research1/marioni/claumer/Scaff10X/src/break10x -nodes 40 -gap 100 -reads 5 -score 20 -cover 50 -ratio 15 /nfs/research1/marioni/claumer/DalyG_HiFi/Scaff10x/DalyG_HiFi_q10_redbean_all.ctg.fa /nfs/research1/marioni/claumer/DalyG_HiFi/fwdrev_all/Scaff10x/DalyG_10x_R1.fastq /nfs/research1/marioni/claumer/DalyG_HiFi/fwdrev_all/Scaff10x/DalyG_10x_R2.fastq DalyG_wtdbg2_break10x DalyG_wtdbg2_break10x_breakpoints
I've installed using a conda environment built to use gcc 7.1.0.
The program finishes, gives a "successful complete" to LSF, but when I check stderr I see these last couple of lines:
[main] Version: 0.7.17-r1198-dirty
[main] CMD: /nfs/research1/marioni/claumer/Scaff10X/src/scaff-bin/bwa mem -t 40 tarseq.fastq /nfs/research1/marioni/claumer/DalyG_HiFi/fwdrev_all/Scaff10x/DalyG_10x_R1.fastq /nfs/research1/marioni/claumer/DalyG_HiFi/fwdrev_all/Scaff10x/DalyG_10x_R2.fastq
[main] Real time: 15635.842 sec; CPU: 211509.263 sec
sh: line 1: 23754 Floating point exception/nfs/research1/marioni/claumer/Scaff10X/src/scaff-bin/scaff_barcode-cover -score 20 -cover 50 -ratio 15 align.length-sort break.dat cover.dat > break.out
And it looks like the breakpoints file is completely empty and the assembly that's output is the same as the input. So I assume this means break10x has failed. Can you advise on the cause and remedy?
Regards,
Chris L
The text was updated successfully, but these errors were encountered: