Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to generate gcsa file using vg index #442

Open
pliang64 opened this issue Jan 15, 2025 · 1 comment
Open

failed to generate gcsa file using vg index #442

pliang64 opened this issue Jan 15, 2025 · 1 comment

Comments

@pliang64
Copy link

Hi team,

since I don't have a vcf file to start with, I generated a gfa graph using pggb, converted to vg format using:
"vg convert -g species20.gfa -p -t 6 --vg-algorithm >species20.vg"; then modified it using
"vg mod -X 256 species20.vg >species20mod.vg", followed by indexing using
"vg index -x species20mod.xg -g species20mod.gcsa -t 2 -p -k16 species20mod.vg"

The last command generated the expected xg file, and the log indicated that it was generating the k-mers, but the job exited after ~1hr with "State: FAILED (exit code 1)" in the slurm report. FYI, the vg version I use is version v1.61.0 "Plodio", and copied below is the progress report from the program run, providing no specific reason for the run to abort except telling us where it stopped. So, it failed to generate the gcsa file required for running vg map or giraffe. I'd appreciate your input regarding the issue and what would be the alternative approach for generating proper indexes to run giraffe or map.

Thank you in advance for your help.
Ping Liang

Building XG index
Saving XG index to species20gfamod.xg
Generating kmer files...
Building the GCSA2 index...
InputGraph::InputGraph(): 701300942 kmers in 1 file(s)
InputGraph::read(): Read 701300942 16-mers from /tmp/vg-T1xOOl/vg-kmers-tmp-0xzeqG
InputGraph::readKeys(): 502070824 unique keys
InputGraph::read(): Read 701300942 16-mers from /tmp/vg-T1xOOl/vg-kmers-tmp-0xzeqG
InputGraph::readFrom(): 447217595 unique start nodes
InputGraph::read(): Read 701300942 16-mers from /tmp/vg-T1xOOl/vg-kmers-tmp-0xzeqG
PathGraph::PathGraph(): 701300942 paths with 1402601884 ranks
PathGraph::PathGraph(): 20.9004 GB in 1 file(s)
GCSA::GCSA(): Preprocessing: 278.296 seconds, 24.9885 GB
GCSA::GCSA(): Prefix-doubling from path length 16
GCSA::GCSA(): Step 1 (path length 16 -> 32)
PathGraph::prune(): 701300942 -> 662474342 paths (487022124 ranges)
PathGraph::prune(): 395742535 unique, 0 redundant, 266698646 unsorted, 33161 nondeterministic paths
PathGraph::prune(): 19.7433 GB in 1 file(s)
PathGraph::read(): File 0: Read 662474342 order-16 paths
PathGraph::extend(): File 0: Created 1760072743 order-32 paths
PathGraph::read(): File 0: Read 1760072743 order-32 paths
PathGraphBuilder::sort(): File 0: Sorted 1760072743 paths
PathGraph::extend(): 662474342 -> 1760072743 paths (4884442533 ranks)
PathGraph::extend(): 57.5367 GB in 1 file(s)
GCSA::GCSA(): Step 2 (path length 32 -> 64)
PathGraph::prune(): 1760072743 -> 863874315 paths (724446274 ranges)
PathGraph::prune(): 639160705 unique, 0 redundant, 145851269 unsorted, 78862341 nondeterministic paths
PathGraph::prune(): 27.4893 GB in 1 file(s)
PathGraph::read(): File 0: Read 863874315 order-32 paths

@pliang64
Copy link
Author

PS: I did try first with autoindex: it generated all expected files except the gcsa files. I tried both the giraffe and map workflow.
"vg autoindex --workflow map -r S.cerevisiae_S288C_GCF_000146045.2_R64_genomic.fna -g species20.gfa -p species20 -H GCF_000146045.2_R64_genomic.gff -f -a -t 32"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant