1
- # slivar: filter/annotate variants in VCF/BCF format with simple expressions
1
+ # slivar: filter/annotate variants in VCF/BCF format with simple expressions [ ![ Build Status] ( https://travis-ci.org/brentp/slivar.svg?branch=master )] ( https://travis-ci.org/brentp/slivar )
2
+
3
+ slivar is a set of command-line tools that enables rapid querying and filtering of VCF files.
4
+ It facilitates operations on trios and [ groups] ( #groups ) and allows arbitrary expressions using simple javascript.
5
+
6
+ #### use-cases for ` slivar `
7
+
8
+ + annotate variants with combined exomes + whole genomes at > 30K variants/second using only a 1.5GB compressed annotation file
9
+ + call * denovo* variants with a simple expression that uses * mom* , * dad* , * kid* labels that is applied to each trio in a cohort (as inferred from a pedigree file).
10
+ ` kid.alts == 1 && mom.alts == 0 && dad.alts == 0 && kid.DP > 10 && mom.DP > 10 && dad.DP > 10 `
11
+ + define and filter on arbitrary groups with labels. For example, 7 sets of samples each with 1 normal and 3 tumor time-points:
12
+ ` normal.AD[0] = 0 && tumor1.AB < tumor2.AB && tumor2.AB < tumor3.AB `
13
+ + filter variants with simple expressions:
14
+ ` variant.call_rate > 0.9 && variant.FILTER == "PASS" && INFO.AC < 22 && variant.num_hom_alt == 0 `
2
15
3
16
slivar has sub-commands:
4
17
+ [ expr] ( #expr ) : trio and group expressions and filtering
5
- + [ gnotate] ( #gnotate ) : rapidly annotate a VCF/BCF with gnomad
6
- + filter: filter a VCF with an expression
18
+ + [ gnotate] ( #gnotate ) : filter and/or annotate a VCF/BCF files
19
+ + [ make-gnotate ] ( #gnotate ) : make a compressed zip file of annotations for use by slivar
7
20
8
21
# Table of Contents
9
22
@@ -47,28 +60,35 @@ will be tested on each of those 200 trios.
47
60
The expressions are javascript so the user can make these as complex as needed.
48
61
49
62
50
- ```
63
+ ``` bash
51
64
slivar expr \
52
65
--pass-only \ # output only variants that pass one of the filters (default is to output all variants)
53
66
--vcf $vcf \
54
67
--ped $ped \
55
- --gnomad $gnomad_zip \ # a compressed gnomad zip that allows fast annotation so that `gnomad_af` is available in the expressions below.
56
- --load functions.js \ # any valid javascript is allowed here.
68
+ # compressed zip that allows fast annotation so that `gnomad_af` is available in the expressions below.
69
+ --gnotate $gnomad_af .zip \
70
+ # any valid javascript is allowed in a file here. provide functions to be used below.
71
+ --js js/functions.js \
57
72
--out-vcf annotated.bcf \
58
- --info "variant.call_rate > 0.9" \ # this filter is applied before the trio filters and can speed evaluation if it is stringent.
73
+ # this filter is applied before the trio filters and can speed evaluation if it is stringent.
74
+ --info " variant.call_rate > 0.9" \
59
75
--trio " denovo:kid.alts == 1 && mom.alts == 0 && dad.alts == 0 \
60
76
&& kid.AB > 0.25 && kid.AB < 0.75 \
61
77
&& (mom.AD[1] + dad.AD[1]) == 0 \
62
78
&& kid.GQ >= 20 && mom.GQ >= 20 && dad.GQ >= 20 \
63
79
&& kid.DP >= 12 && mom.DP >= 12 && dad.DP >= 12" \
64
- --trio "informative:kid.GQ > 20 && dad.GQ > 20 && mom.GQ > 20 && kid.alts == 1 && ((mom.alts == 1 && dad.alts == 0) || (mom.alts == 0 && dad.alts == 1))" \
65
- --trio "recessive:recessive_func(kid, mom, dad)"
80
+ --trio " informative:kid.GQ > 20 && dad.GQ > 20 && mom.GQ > 20 && kid.alts == 1 && \
81
+ ((mom.alts == 1 && dad.alts == 0) || (mom.alts == 0 && dad.alts == 1))" \
82
+ --trio " recessive:trio_autosomal_recessive(kid, mom, dad)"
66
83
67
84
```
68
85
69
86
Note that ` slivar ` does not give direct access to the genotypes, instead exposing ` alts ` where 0 is homozygous reference, 1 is heterozygous, 2 is
70
87
homozygous alternate and -1 when the genotype is unknown. It is recommended to ** decompose** a VCF before sending to ` slivar `
71
88
89
+ Here it is assumed that ` trio_autosomal_recessive ` is defined in ` functions.js ` ; an example implementation of that
90
+ and other useful functions is provided [ here] (https://github.com/brentp/slivar/blob/master/js/functions.js
91
+
72
92
#### Groups
73
93
74
94
A ` trio ` is a special-case of a ` group ` that can be inferred from a pedigree. For more specialized use-cases, a ` group ` can be
@@ -91,7 +111,7 @@ sample7 sample8 sample9 sample12
91
111
```
92
112
93
113
where ` sample10 ` will be available as "sibling" in the first family and an expression like:
94
- ```
114
+ ``` bash
95
115
kid.alts == 1 && mom.alts == 0 && dad.alts == 0 and sibling.alts == 0
96
116
```
97
117
could be specified and it would automatically be applied to each of the 3 families.
@@ -106,7 +126,7 @@ ss3 ss16 ss17 ss18 ss19
106
126
107
127
where, again each row is a sample and the ID's (starting with "ss") will be injected for each sample to allow a single
108
128
expression like:
109
- ```
129
+ ``` bash
110
130
normal.alts == 0 && normal.DP > 10 \
111
131
&& tumor1.AB > 0 \
112
132
&& tumor1.AB < tumor2.AB \
@@ -117,18 +137,49 @@ normal.alts == 0 && normal.DP > 10 \
117
137
to find a somatic variant that has increasing frequency (AB is allele balance) along the tumor time-points.
118
138
119
139
140
+ More detail on groups is provided [ here] ( https://github.com/brentp/slivar/wiki/groups-in-slivar )
141
+
120
142
### Gnotate
121
143
122
- This uses a compressed, reduced representation of gnomad allele frequencies ** and FILTERs** to reduce from the 600+ GB of data for the
123
- ** whole genome and exome** to a 1.5GB file distributed [ here] ( https://s3.amazonaws.com/gemini-annotations/gnomad-2.1.zip ) .
124
- The zip file encodes the popmax_AF (whichever is higher between whole genome and exome) and the FILTER for every variant in gnomad.
125
- It can annotate at faster than 10K variants per second.
144
+ The ` gnotate ` sub-command allows filtering and/or annotating.
145
+ More extensive documentation and justification for annotating with ` gnotate ` are [ here] ( https://github.com/brentp/slivar/wiki/gnotate )
146
+
147
+ ` gnotate ` uses a compressed, reduced representation of a single value pulled from a (population VCF) along with a boolean that indicates a
148
+ non-pass filter. This can, for example, reduce the 600+ GB of data for the ** whole genome and exome** from gnomad to a 1.5GB file
149
+ distributed [ here] ( https://s3.amazonaws.com/gemini-annotations/gnomad-2.1.zip ) .
150
+ The zip file encodes the popmax_AF (whichever is higher between whole genome and exome) and the presence of FILTER for every variant
151
+ in gnomad.
152
+
153
+ It can annotate at faster than 30K variants per second (limited by speed of parsing the query VCF).
154
+
155
+ ```
156
+ slivar gnotate --vcf $input_vcf -o $output_bcf --threads 3 --gnotate encoded.zip
157
+ ```
158
+ It's also possible to use ` gnotate ` as a filtering command without specifying any ` --gnotate ` arguments.
159
+
160
+
161
+ #### make-gnotate
162
+
163
+ Users can make their own ` gnotate ` files like:
164
+
165
+ ``` bash
166
+ slivar make-gnotate --prefix gnomad \
167
+ --field AF_popmax:gnomad_popmax_af \
168
+ --field nhomalt:gnomad_num_homalt \
169
+ gnomad.exomes.r2.1.sites.vcf.gz gnomad.genomes.r2.1.sites.vcf.gz
170
+ ```
171
+
172
+ this will pull ` AF_popmax ` and ` nhomalt ` from the INFO field and put them into ` gnomad.zip ` as ` gnomad_popmax_af ` and ` gnomad_num_homalt ` respectively.
173
+ The resulting zip file will contain the union of values seen in the exome and genomes files with the maximum value for any intersection.
174
+ Note that the names (` gnomad_popmax_af ` and ` gnomad_num_homalt ` in this case) should be chosen carefully as those will be the names added to the INFO of any file to be annotated with the resulting ` gnomad.zip `
175
+
176
+ More information on ` make-gnotate ` is [ in the wiki] ( https://github.com/brentp/slivar/wiki/make-gnotate )
126
177
127
- slivar gnotate --vcf $input_vcf -o $output_bcf --threads 3 -g gnomad-2.1.zip
128
178
129
179
## Installation
130
180
131
181
get the latest binary from: https://github.com/brentp/slivar/releases/latest
182
+ This will require libhts.so (from [ htslib] ( https://htslib.org ) ) to be in the usual places or in a directory indicated in ` LD_LIBRARY_PATH ` .
132
183
133
184
or use via docker from: [ brentp/slivar: latest ] ( https://hub.docker.com/r/brentp/slivar )
134
185
0 commit comments