Skip to content

Commit f04580a

Browse files
committed
feat(frontend): add ac of 1 content
1 parent f762de2 commit f04580a

13 files changed

+383
-26
lines changed
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
## Executive Summary
2+
3+
The vast majority of variants being discovered within large population datasets – and in particular those with the greatest functional impact – are extremely rare. Some genomic data generating programs limit the ability to share allele counts (AC) below a certain threshold (e.g. below 20 in the All of Us Research Program unless an exception is granted). However, sharing summary statistics for all variants, including those found only in a single individual, is critical for scientific progress and discovery. Obscuring information about this rare variation would be reasonable if the risks to participants were large, but the experiences of many other large genomics initiatives illustrate that the actual risks to privacy are quite small.
4+
5+
<br />
6+
7+
## Most variants being discovered in genomic population datasets are extremely rare
8+
9+
Precision medicine research implicitly requires exploring individual genetic variants within the human genome. Because every individual’s genome harbors extensive unique variation, the vast majority of variants in any large scale study – and especially those novel variants – tend to be extremely rare. In fact, our expectation is that the majority of unique variants in any large genomic dataset will be present in at most 1 or 2 participants. It is therefore paramount that information about this critical class of variation is distributed to the research community.
10+
11+
[This paper](https://www.nature.com/articles/nature19057) from the Exome Aggregation Consortium (ExAC) highlights the fact that most variants from a large diverse dataset will be rare<sup>1</sup>. Figure 1c, copied below, shows that more than 50% of the variants in the exome are singletons, present in only 1 individual. Similarly, unpublished data from the gnomAD version 4.1 exomes (restricting analysis to high quality variants in canonical transcripts) also shows that most variants discovered across over 730k individuals are rare.
12+
13+
<br />
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<br />
2+
3+
## Rare variants also tend to have the largest effect sizes
4+
5+
Biologically, the process of negative selection tends to decrease the frequency of functional damaging variants, which means that variants with the largest effect sizes are more likely to be rare. In other words, it is often the very rare variants that are of most scientific interest to researchers.
6+
7+
<br />
8+
9+
## Obfuscation of data harms downstream science
10+
11+
In theory, if one obfuscates or randomizes the counts/frequencies of variants, it could help protect against certain risks to participants in a study. However, changing the allele frequency values can drastically impact associations with health outcomes, thus presenting misleading data. Purposely limiting visibility of rare variation disrupts the facilitation of scientific discoveries. Requiring researchers to request an exemption adds an extra barrier and additional time to publishing or other forms of data sharing.
12+
13+
<br />
14+
15+
## Precedents through NIH and global initiatives
16+
17+
It is standard in the genomics field to display exact allele counts on public browsers as seen in gnomAD ([gnomad.broadinstitute.org](https://gnomad.broadinstitute.org)), _All of Us_ ([https://databrowser.researchallofus.org/snvsindels](https://databrowser.researchallofus.org/snvsindels)) and UK Biobank ([genebass.org](https://genebass.org)), as it
18+
presents a very low risk to re-identification. Ultimately, the analysis and reporting of rare variants in research manuscripts presents minimal added risk once those variants and their frequencies are already present in a public browser (see below). And there is precedent for allowing such analysis in numerous scientific programs (many of which are NIH-funded). For example, the NHGRI-funded Clinical Genome Resource (ClinGen), working closely with policy leaders at NIH and the GA4GH, published [guidance](https://pubmed.ncbi.nlm.nih.gov/29437798/) to laboratories stating that submission of classified variants, associated with phenotype, were allowable without consent and even if limited to a single observation given low risk to individuals and large benefit to science and medicine<sup>5</sup>. This approach was endorsed by leaders in the UK in a [publication](https://pubmed.ncbi.nlm.nih.gov/31886409/)<sup>6</sup> documenting agreement with these principles and noting allowance under General Data Protection Regulation (GDPR). Furthermore, the ability to publish results of rare variant associations has also been adopted by the UK Biobank leading to widespread benefit to science and medicine without any demonstrated risk. Hundreds of studies of rare variant associations have been published in the past decade based on data released by the UK Biobank without barriers to analysis and publication.
19+
20+
<br />
21+
22+
## Re-identification risks
23+
24+
A handful of well cited publications have shown that, in theory, information about genomic variants is vulnerable to several types of attacks. For instance, it has been shown that the presence/absence of a set of alleles over the genome could allow for a user to probabilistically claim that an individual’s record is in a database (or what is often referred to as a membership inference attack)<sup>7</sup> or that their relative is in the database.<sup>8</sup> Another risk to participants in the case where a rare variant is published along with its associated phenotype, is that it could allow for direct linkage to a known genomic record<sup>9</sup> thus providing the data user with novel information about a participant. However, these types of attacks all assume worst case adversarial situations, which **is not likely to be the case in a well-governed setting**. It is worth adding that in most of the attack scenarios, the user would learn only that a certain individual is a participant in a large biobank, a fact unlikely to lead to harm. Furthermore, the multi-stage attack described in [9] required having access to linked databases that are not even available anymore.
25+
26+
In addition to new papers<sup>10</sup> arguing that most genomic data can effectively be shared with minimal re-identification risk, we can see in practice that this is indeed the case: countless rare variants have been published in high profile scientific journals and public databases like [ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/), and the only known “attacks” have come from the handful of theoretical publications referenced here. For example, over 3.5 million unique variants, classified for pathogenicity towards a specific disease, have been submitted to ClinVar, for which over 75% of these variants have only been identified by a single laboratory. In NIH-funded studies of Mendelian disorders, autism, schizophrenia, cardiovascular disease, and many other human disease phenotypes, there have been millions of rare variants identified from genome sequencing and published as novel disease and trait associations that have set these fields in new directions toward understanding disease etiology and the pursuit of targeted therapeutics. To the best of our knowledge, no participants or patients were harmed whereas trailblazing science and genetic diagnoses have been achieved.
27+
28+
As such, we firmly believe that a policy against sharing low allele counts is protecting against situations that aren’t practical and really only serves to hinder scientific advances.
29+
30+
<br />
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
<br />
2+
3+
This is particularly true for structural variants (SVs), as demonstrated in Figure 1 from the gnomAD SV resource.<sup>2</sup> Here, we see in panels 1g and 1h that >70% of all SVs observed had allele counts less than 10, and the proportion of singletons is strongly correlated with SV size.
4+
5+
<br />
62.3 KB
Loading
90.7 KB
Loading
752 KB
Loading

browser/about/policies/policies.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,7 @@ While we hope gnomAD exists for decades to come, we recognize the importance of
99
## Data Generation
1010

1111
A full description of the methods used to aggregate and call variants across the exomes and genomes in this project will be provided shortly. In brief: we pulled raw data together from as many exomes and genomes as we could get our hands on, aligned and processed each of these data types through unified processing pipelines based on Picard, and performed variant calling with the GATK HaplotypeCaller following GATK best practices. Processing and variant calling at this enormous scale was only possible thanks to the hard work of the Broad Institute's Data Sciences Platform, and the Intel GenomicsDB team. Downstream analysis relied heavily on the [Hail](https://hail.is/) toolkit.
12+
13+
## Public Release of Low Frequency Allele Count
14+
15+
The vast majority of variants being discovered within large population datasets – and in particular those with the greatest functional impact – are extremely rare. Some genomic data generating programs limit the ability to share allele counts (AC) below a certain threshold. However, sharing summary statistics for all variants, including those found only in a single individual, is critical for scientific progress and discovery. Obscuring information about this rare variation would be reasonable if the risks to participants were large, but the experiences of many other large genomics initiatives illustrate that the actual risks to privacy are quite small. Follow [this link](/AC1) to read our full statement supporting public release of low frequency allele count summary statistics.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
question: 'What is the minimum number of allele counts you need to release a variant?'
3+
---
4+
5+
Our policy is to release allele counts (AC) as they are observed in the database, including if it is only seen in a single individual (AC=1). Follow [this link](/AC1) to read our full statement supporting public release of low frequency allele count summary statistics.

browser/src/AcOfOnePage.tsx

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
import React from 'react'
2+
import styled from 'styled-components'
3+
import { ExternalLink, ListItem, OrderedList, PageHeading } from '@gnomad/ui'
4+
import { PaperCitation } from './PublicationsPage'
5+
6+
// @ts-expect-error
7+
import exac_frequencies_bar_graph from '../about/acofone/exac_frequencies.png'
8+
// @ts-expect-error
9+
import gnomad_v4_frequencies_bar_graph from '../about/acofone/gnomad_v4_frequencies.png'
10+
// @ts-expect-error
11+
import gnomad_v4_sv_figures from '../about/acofone/gnomad_v4_sv_figures.png'
12+
13+
// @ts-expect-error
14+
import acOfOnePartOne from '../about/acofone/ac-of-one-part-one.md'
15+
// @ts-expect-error
16+
import acOfOnePartTwo from '../about/acofone/ac-of-one-part-two.md'
17+
// @ts-expect-error
18+
import acOfOnePartThree from '../about/acofone/ac-of-one-part-three.md'
19+
20+
import DocumentTitle from './DocumentTitle'
21+
import InfoPage from './InfoPage'
22+
import MarkdownContent from './MarkdownContent'
23+
24+
const Centered = styled.div`
25+
display: flex;
26+
justify-content: space-around;
27+
28+
@media (max-width: 992px) {
29+
display: block;
30+
}
31+
`
32+
33+
const ResponsiveHalfWidthColumn = styled.div`
34+
width: 50%;
35+
36+
@media (max-width: 992px) {
37+
width: 100%;
38+
}
39+
`
40+
41+
const MarginTop = styled.div`
42+
margin-top: 6rem;
43+
`
44+
45+
const AcOfOnePage = () => {
46+
return (
47+
<InfoPage>
48+
<DocumentTitle title="AC=1" />
49+
<PageHeading
50+
// @ts-expect-error
51+
id="ac-one"
52+
>
53+
Arguments Supporting Public Release of Low Frequency Allele Count Summary Statistics
54+
</PageHeading>
55+
56+
<MarkdownContent dangerouslySetInnerHTML={{ __html: acOfOnePartOne.html }} />
57+
58+
<Centered>
59+
<ResponsiveHalfWidthColumn>
60+
<MarginTop>
61+
<img
62+
src={exac_frequencies_bar_graph}
63+
alt="Bar graph of ExAC frequencies"
64+
width="400px"
65+
/>
66+
</MarginTop>
67+
</ResponsiveHalfWidthColumn>
68+
69+
<ResponsiveHalfWidthColumn>
70+
<img
71+
src={gnomad_v4_frequencies_bar_graph}
72+
alt="Bar graph of gnomAD v4 frequencies"
73+
width="400px"
74+
/>
75+
</ResponsiveHalfWidthColumn>
76+
</Centered>
77+
78+
<MarkdownContent dangerouslySetInnerHTML={{ __html: acOfOnePartTwo.html }} />
79+
80+
<Centered>
81+
<img src={gnomad_v4_sv_figures} alt="image4" width="650px" />
82+
</Centered>
83+
84+
<MarkdownContent dangerouslySetInnerHTML={{ __html: acOfOnePartThree.html }} />
85+
86+
<h2>References</h2>
87+
{/* @ts-expect-error */}
88+
<OrderedList>
89+
<PaperCitation
90+
authorList="Lek, M., Karczewski, K., Minikel, E."
91+
etAl
92+
title="Analysis of protein-coding genetic variation in 60,706 humans."
93+
journal="Nature"
94+
issue="536"
95+
pages="285-291"
96+
year="2016"
97+
doiLink="https://doi.org/10.1038/nature19057"
98+
/>
99+
100+
<PaperCitation
101+
authorList="Collins, R. L., Brand, H., Karczewski, K. J."
102+
etAl
103+
title="A structural variation reference for medical and population genetics."
104+
journal="Nature"
105+
issue="581"
106+
pages="444-451"
107+
year="2020"
108+
doiLink="https://doi.org/10.1038/s41586-020-2287-8"
109+
/>
110+
111+
{/* @ts-expect-error */}
112+
<ListItem>
113+
<ExternalLink href="https://databrowser.researchallofus.org/snvsindels">
114+
https://databrowser.researchallofus.org/snvsindels
115+
</ExternalLink>
116+
</ListItem>
117+
118+
{/* @ts-expect-error */}
119+
<ListItem>
120+
<ExternalLink href="https://genebass.org">https://genebass.org</ExternalLink>
121+
</ListItem>
122+
123+
{/* TODO: FIX THIS ONE */}
124+
<PaperCitation
125+
authorList="Azzariti, D. R., Riggs, E. R., Niehaus, A., Rodriguez, L. L., Ramos, E. M., Kattman, B., Landrum, M. J., Martin, C. L., & Rehm, H. L."
126+
title="Points to consider for sharing variant-level information from clinical genetic testing with ClinVar."
127+
journal="Cold Spring Harbor molecular case studies"
128+
issue="4(1)"
129+
year="2018"
130+
doiLink="https://doi.org/10.1101/mcs.a002345"
131+
/>
132+
133+
<PaperCitation
134+
authorList="Wright, C. F., Ware, J. S., Lucassen, A. M., Hall, A., Middleton, A., Rahman, N., Ellard, S., & Firth, H. V."
135+
title="Genomic variant sharing: a position statement."
136+
journal="Wellcome open research"
137+
issue="4"
138+
pages="22"
139+
year="2019"
140+
doiLink="https://doi.org/10.12688/wellcomeopenres.15090.2"
141+
/>
142+
143+
<PaperCitation
144+
authorList="Shringarpure, S. S., Bustamante, C. D."
145+
title="Privacy risks from genomic data-sharing beacons."
146+
journal="American Journal of Human Genetics."
147+
issue="97"
148+
pages="631-646"
149+
year="2015"
150+
doiLink="10.1016/j.ajhg.2015.09.010"
151+
/>
152+
153+
<PaperCitation
154+
authorList="Ayoz K., Aysen M., Ayday E., Cicek A. E."
155+
title="The effect of kinship in re-identification attacks against genomic data sharing beacons."
156+
journal="Bioinformatics"
157+
issue="36"
158+
year="2020"
159+
doiLink="https://doi.org/10.1093/bioinformatics/btaa821"
160+
/>
161+
162+
<PaperCitation
163+
authorList="Erlich Y., Narayanan A."
164+
title="Routes for breaching and protecting genetic privacy."
165+
journal="Nature Reviews Genetics"
166+
issue="15(6)"
167+
pages="409-421"
168+
year="2014"
169+
doiLink="https://doi.org/10.1038/nrg3723"
170+
/>
171+
172+
<PaperCitation
173+
authorList="Wan Z."
174+
etAl
175+
title="Using game theory to thwart multistage privacy intrusions when sharing data."
176+
journal="Science Advances"
177+
issue="7(50)"
178+
year="2021"
179+
doiLink="https://doi.org/10.1126/sciadv.abe9986"
180+
/>
181+
182+
{/* @ts-expect-error */}
183+
<ListItem>
184+
<ExternalLink href="https://www.ncbi.nlm.nih.gov/clinvar/">
185+
https://www.ncbi.nlm.nih.gov/clinvar/
186+
</ExternalLink>
187+
</ListItem>
188+
</OrderedList>
189+
</InfoPage>
190+
)
191+
}
192+
193+
export default AcOfOnePage

browser/src/PublicationsPage.tsx

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ const Citation = styled.cite`
1313
type PaperCitationProps = {
1414
prefix?: string
1515
authorList: string
16+
etAl?: boolean
1617
title: string
1718
journal: string
1819
issue?: string
@@ -24,9 +25,10 @@ type PaperCitationProps = {
2425
pmcid?: string
2526
}
2627

27-
const PaperCitation = ({
28+
export const PaperCitation = ({
2829
prefix,
2930
authorList,
31+
etAl = false,
3032
title,
3133
journal,
3234
issue,
@@ -46,7 +48,9 @@ const PaperCitation = ({
4648
<b>{prefix}</b>:{' '}
4749
</>
4850
)}
49-
<>{`${authorList} ${title} `}</>
51+
<>
52+
{authorList} {etAl && <i>et al.</i>} {title}{' '}
53+
</>
5054
<em>{journal}</em>
5155
<>{`. ${issue ? `${issue}, ` : ''}${pages || ''} (${year}).`}</>
5256
<>

0 commit comments

Comments
 (0)