Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VEP - structural variants too long to annotate #1224

Open
davmlaw opened this issue Jan 13, 2025 · 6 comments
Open

VEP - structural variants too long to annotate #1224

davmlaw opened this issue Jan 13, 2025 · 6 comments

Comments

@davmlaw
Copy link
Contributor

davmlaw commented Jan 13, 2025

We get these:

9       19287711        .       N       <DEL>   .       .       END=85056911;SVLEN=65769200;SVTYPE=DEL;variant_id=282414451
WARNING: variant . on line 2 is too long to annotate: (65769199)
WARNING: variant . on line 17 is too long to annotate: (79389887)
WARNING: variant . on line 18 is too long to annotate: (89063151)
WARNING: variant . on line 24 is too long to annotate: (77858659)

For these:

  • Are we handling them now saying they were not annotated?
  • Could we do the basics, ie check if overlap with a gene, set type to deletion etc?
@davmlaw
Copy link
Contributor Author

davmlaw commented Jan 28, 2025

I think this is driven by:

max_sv_size - Structural Variant size VEP can process. Default = 10000000

We should just set these as skipped and not pull them out of the DB

Should be able to make an 11M length del or something as a test case

@davmlaw
Copy link
Contributor Author

davmlaw commented Jan 29, 2025

They were killed from OOM

[ 2665.425407] Out of memory: Killed process 2364 (perl) total-vm:6523208kB, anon-rss:6276700kB, file-rss:2816kB, shmem-rss:0kB, UID:1001 pgtables:12392kB oom_score_adj:0
[ 5926.724841] Out of memory: Killed process 5411 (perl) total-vm:6571872kB, anon-rss:6406196kB, file-rss:2560kB, shmem-rss:0kB, UID:1001 pgtables:12652kB oom_score_adj:0

It'll be a bit of a pain to not write them into the VCF as then we have multiple queries for "unannotated variants" ie to check in annotation range lock etc

I don't think it harms anything by writing them out into the VCF - they'll be quickly skipped, but now skip reason will be correctly set.

I will pull down a SV file to my local machine and run it once with and once without removing long records and check memory usage

   9666 dump_18490_structural_variant.vcf
   9002 dump_18490_structural_variant_no_longies.vcf

original file - 664 long SV variants

	User time (seconds): 493.57
	System time (seconds): 19.49
	Percent of CPU this job got: 101%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 8:24.65
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 6402328
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 9281587
	Voluntary context switches: 588651
	Involuntary context switches: 4424
	Swaps: 0
	File system inputs: 573264
	File system outputs: 5047320
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

New (stripped out longies)

	User time (seconds): 475.92
	System time (seconds): 18.74
	Percent of CPU this job got: 101%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 8:05.08
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 6354500
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 8737845
	Voluntary context switches: 585553
	Involuntary context switches: 5826
	Swaps: 0
	File system inputs: 0
	File system outputs: 5047096
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Will try the one that crashed out VEP:

692K Jan 29 12:14 dump_18489_structural_variant_no_longies.vcf
802K Jan 29 12:12 dump_18489_structural_variant.vcf

dump_18489_structural_variant.vcf

	User time (seconds): 536.74
	System time (seconds): 22.70
	Percent of CPU this job got: 101%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 9:10.62
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 7317408
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 12423805
	Voluntary context switches: 675864
	Involuntary context switches: 6448
	Swaps: 0
	File system inputs: 591736
	File system outputs: 5802280
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

dump_18489_structural_variant_no_longies.vcf

	User time (seconds): 520.17
	System time (seconds): 25.03
	Percent of CPU this job got: 102%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 8:53.85
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 7233744
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 13762495
	Voluntary context switches: 672652
	Involuntary context switches: 5893
	Swaps: 0
	File system inputs: 0
	File system outputs: 5801816
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@davmlaw
Copy link
Contributor Author

davmlaw commented Jan 29, 2025

So the amount of RAM used doesn't seem to vary much

So won't strip them out

@davmlaw
Copy link
Contributor Author

davmlaw commented Jan 29, 2025

Plan B is to limit the number of RepeatMasker records returned

It was getting way out of hand, limited to 10k

@davmlaw
Copy link
Contributor Author

davmlaw commented Jan 30, 2025

I have removed these from the VCF just before writing, and deployed to vg.com - will see if it makes a difference, re-running the annotation pipelines that crashed.

There really are over 1k of variants in there that are >10M long wtf

@davmlaw
Copy link
Contributor Author

davmlaw commented Jan 30, 2025

Hmm, seems to work. Need to verify that nothing went askew with the non-dumping of variants (ie they still got the annotation skipped message etc)

I think they would only be inserted if vep warnings should just do asap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant