Skip to content

Commit 676b588

Browse files
committed
fixup: get content right
1 parent 2fabe47 commit 676b588

File tree

7 files changed

+60
-10
lines changed

7 files changed

+60
-10
lines changed

browser/about/acofone/ac-of-one-part-one.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,6 @@ The vast majority of variants being discovered within large population datasets
88

99
Precision medicine research implicitly requires exploring individual genetic variants within the human genome. Because every individual’s genome harbors extensive unique variation, the vast majority of variants in any large scale study – and especially those novel variants – tend to be extremely rare. In fact, our expectation is that the majority of unique variants in any large genomic dataset will be present in at most 1 or 2 participants. It is therefore paramount that information about this critical class of variation is distributed to the research community.
1010

11-
This paper from the Exome Aggregation Consortium (ExAC) highlights the fact that most variants from a large diverse dataset will be rare1. Figure 1c, copied below, shows that more than 50% of the variants in the exome are singletons, present in only 1 individual. Similarly, unpublished data from the gnomAD version 4.1 exomes (restricting analysis to high quality variants in canonical transcripts) also shows that most variants discovered across over 730k individuals are rare.
11+
[This paper](https://www.nature.com/articles/nature19057) from the Exome Aggregation Consortium (ExAC) highlights the fact that most variants from a large diverse dataset will be rare<sup>1</sup>. Figure 1c, copied below, shows that more than 50% of the variants in the exome are singletons, present in only 1 individual. Similarly, unpublished data from the gnomAD version 4.1 exomes (restricting analysis to high quality variants in canonical transcripts) also shows that most variants discovered across over 730k individuals are rare.
1212

13-
# GRAPHS SHOULD GO HERE - 2
13+
<br />

browser/about/acofone/ac-of-one-part-three.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
<br />
2+
13
## Rare variants also tend to have the largest effect sizes
24

35
Biologically, the process of negative selection tends to decrease the frequency of functional damaging variants, which means that variants with the largest effect sizes are more likely to be rare. In other words, it is often the very rare variants that are of most scientific interest to researchers.
@@ -12,16 +14,16 @@ In theory, if one obfuscates or randomizes the counts/frequencies of variants, i
1214

1315
## Precedents through NIH and global initiatives
1416

15-
It is standard in the genomics field to display exact allele counts on public browsers as seen in gnomAD (gnomad.broadinstitute.org), All of Us (https://databrowser.researchallofus.org/snvsindels) and UK Biobank (genebass.org), as it
16-
presents a very low risk to re-identification. Ultimately, the analysis and reporting of rare variants in research manuscripts presents minimal added risk once those variants and their frequencies are already present in a public browser (see below). And there is precedent for allowing such analysis in numerous scientific programs (many of which are NIH-funded). For example, the NHGRI-funded Clinical Genome Resource (ClinGen), working closely with policy leaders at NIH and the GA4GH, published guidance to laboratories stating that submission of classified variants, associated with phenotype, were allowable without consent and even if limited to a single observation given low risk to individuals and large benefit to science and medicine5 . This approach was endorsed by leaders in the UK in a publication6 documenting agreement with these principles and noting allowance under General Data Protection Regulation (GDPR). Furthermore, the ability to publish results of rare variant associations has also been adopted by the UK Biobank leading to widespread benefit to science and medicine without any demonstrated risk. Hundreds of studies of rare variant associations have been published in the past decade based on data released by the UK Biobank without barriers to analysis and publication.
17+
It is standard in the genomics field to display exact allele counts on public browsers as seen in gnomAD ([gnomad.broadinstitute.org](https://gnomad.broadinstitute.org)), _All of Us_ ([https://databrowser.researchallofus.org/snvsindels](https://databrowser.researchallofus.org/snvsindels)) and UK Biobank ([genebass.org](https://genebass.org)), as it
18+
presents a very low risk to re-identification. Ultimately, the analysis and reporting of rare variants in research manuscripts presents minimal added risk once those variants and their frequencies are already present in a public browser (see below). And there is precedent for allowing such analysis in numerous scientific programs (many of which are NIH-funded). For example, the NHGRI-funded Clinical Genome Resource (ClinGen), working closely with policy leaders at NIH and the GA4GH, published [guidance](https://pubmed.ncbi.nlm.nih.gov/29437798/) to laboratories stating that submission of classified variants, associated with phenotype, were allowable without consent and even if limited to a single observation given low risk to individuals and large benefit to science and medicine<sup>5</sup>. This approach was endorsed by leaders in the UK in a [publication](https://pubmed.ncbi.nlm.nih.gov/31886409/)<sup>6</sup> documenting agreement with these principles and noting allowance under General Data Protection Regulation (GDPR). Furthermore, the ability to publish results of rare variant associations has also been adopted by the UK Biobank leading to widespread benefit to science and medicine without any demonstrated risk. Hundreds of studies of rare variant associations have been published in the past decade based on data released by the UK Biobank without barriers to analysis and publication.
1719

1820
<br />
1921

2022
## Re-identification risks
2123

22-
A handful of well cited publications have shown that, in theory, information about genomic variants is vulnerable to several types of attacks. For instance, it has been shown that the presence/absence of a set of alleles over the genome could allow for a user to probabilistically claim that an individual’s record is in a database (or what is often referred to as a membership inference attack)7 or that their relative is in the database.8 Another risk to participants in the case where a rare variant is published along with its associated phenotype, is that it could allow for direct linkage to a known genomic record9 thus providing the data user with novel information about a participant. However, these types of attacks all assume worst case adversarial situations, which is not likely to be the case in a well-governed setting. It is worth adding that in most of the attack scenarios, the user would learn only that a certain individual is a participant in a large biobank, a fact unlikely to lead to harm. Furthermore, the multi-stage attack described in [9] required having access to linked databases that are not even available anymore.
24+
A handful of well cited publications have shown that, in theory, information about genomic variants is vulnerable to several types of attacks. For instance, it has been shown that the presence/absence of a set of alleles over the genome could allow for a user to probabilistically claim that an individual’s record is in a database (or what is often referred to as a membership inference attack)<sup>7</sup> or that their relative is in the database.<sup>8</sup> Another risk to participants in the case where a rare variant is published along with its associated phenotype, is that it could allow for direct linkage to a known genomic record<sup>9</sup> thus providing the data user with novel information about a participant. However, these types of attacks all assume worst case adversarial situations, which **is not likely to be the case in a well-governed setting**. It is worth adding that in most of the attack scenarios, the user would learn only that a certain individual is a participant in a large biobank, a fact unlikely to lead to harm. Furthermore, the multi-stage attack described in [9] required having access to linked databases that are not even available anymore.
2325

24-
In addition to new papers10 arguing that most genomic data can effectively be shared with minimal re-identification risk, we can see in practice that this is indeed the case: countless rare variants have been published in high profile scientific journals and public databases like ClinVar, and the only known “attacks” have come from the handful of theoretical publications referenced here. For example, over 3.5 million unique variants, classified for pathogenicity towards a specific disease, have been submitted to ClinVar, for which over 75% of these variants have only been identified by a single laboratory. In NIH-funded studies of Mendelian disorders, autism, schizophrenia, cardiovascular disease, and many other human disease phenotypes, there have been millions of rare variants identified from genome sequencing and published as novel disease and trait associations that have set these fields in new directions toward understanding disease etiology and the pursuit of targeted therapeutics. To the best of our knowledge, no participants or patients were harmed whereas trailblazing science and genetic diagnoses have been achieved.
26+
In addition to new papers<sup>10</sup> arguing that most genomic data can effectively be shared with minimal re-identification risk, we can see in practice that this is indeed the case: countless rare variants have been published in high profile scientific journals and public databases like [ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/), and the only known “attacks” have come from the handful of theoretical publications referenced here. For example, over 3.5 million unique variants, classified for pathogenicity towards a specific disease, have been submitted to ClinVar, for which over 75% of these variants have only been identified by a single laboratory. In NIH-funded studies of Mendelian disorders, autism, schizophrenia, cardiovascular disease, and many other human disease phenotypes, there have been millions of rare variants identified from genome sequencing and published as novel disease and trait associations that have set these fields in new directions toward understanding disease etiology and the pursuit of targeted therapeutics. To the best of our knowledge, no participants or patients were harmed whereas trailblazing science and genetic diagnoses have been achieved.
2527

2628
As such, we firmly believe that a policy against sharing low allele counts is protecting against situations that aren’t practical and really only serves to hinder scientific advances.
2729

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
This is particularly true for structural variants (SVs), as demonstrated in Figure 1 from the gnomAD SV resource.2 Here, we see in panels 1g and 1h that >70% of all SVs observed had allele counts less than 10, and the proportion of singletons is strongly correlated with SV size.
1+
<br />
22

3-
# GRAPHS SHOULD GO HERE - 1
3+
This is particularly true for structural variants (SVs), as demonstrated in Figure 1 from the gnomAD SV resource.<sup>2</sup> Here, we see in panels 1g and 1h that >70% of all SVs observed had allele counts less than 10, and the proportion of singletons is strongly correlated with SV size.
44

55
<br />
62.3 KB
Loading
90.7 KB
Loading
752 KB
Loading

browser/src/AcOfOnePage.tsx

Lines changed: 50 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,13 @@ import styled from 'styled-components'
33
import { ExternalLink, ListItem, OrderedList, PageHeading } from '@gnomad/ui'
44
import { PaperCitation } from './PublicationsPage'
55

6+
// @ts-expect-error
7+
import exac_frequencies_bar_graph from '../about/acofone/exac_frequencies.png'
8+
// @ts-expect-error
9+
import gnomad_v4_frequencies_bar_graph from '../about/acofone/gnomad_v4_frequencies.png'
10+
// @ts-expect-error
11+
import gnomad_v4_sv_figures from '../about/acofone/gnomad_v4_sv_figures.png'
12+
613
// @ts-expect-error
714
import acOfOnePartOne from '../about/acofone/ac-of-one-part-one.md'
815
// @ts-expect-error
@@ -14,6 +21,27 @@ import DocumentTitle from './DocumentTitle'
1421
import InfoPage from './InfoPage'
1522
import MarkdownContent from './MarkdownContent'
1623

24+
const Centered = styled.div`
25+
display: flex;
26+
justify-content: space-around;
27+
28+
@media (max-width: 992px) {
29+
display: block;
30+
}
31+
`
32+
33+
const ResponsiveHalfWidthColumn = styled.div`
34+
width: 50%;
35+
36+
@media (max-width: 992px) {
37+
width: 100%;
38+
}
39+
`
40+
41+
const MarginTop = styled.div`
42+
margin-top: 6rem;
43+
`
44+
1745
const AcOfOnePage = () => {
1846
return (
1947
<InfoPage>
@@ -27,11 +55,31 @@ const AcOfOnePage = () => {
2755

2856
<MarkdownContent dangerouslySetInnerHTML={{ __html: acOfOnePartOne.html }} />
2957

30-
{/* TODO: graph section 1 */}
58+
<Centered>
59+
<ResponsiveHalfWidthColumn>
60+
<MarginTop>
61+
<img
62+
src={exac_frequencies_bar_graph}
63+
alt="Bar graph of ExAC frequencies"
64+
width="400px"
65+
/>
66+
</MarginTop>
67+
</ResponsiveHalfWidthColumn>
68+
69+
<ResponsiveHalfWidthColumn>
70+
<img
71+
src={gnomad_v4_frequencies_bar_graph}
72+
alt="Bar graph of gnomAD v4 frequencies"
73+
width="400px"
74+
/>
75+
</ResponsiveHalfWidthColumn>
76+
</Centered>
3177

3278
<MarkdownContent dangerouslySetInnerHTML={{ __html: acOfOnePartTwo.html }} />
3379

34-
{/* TODO: graph section 2 */}
80+
<Centered>
81+
<img src={gnomad_v4_sv_figures} alt="image4" width="650px" />
82+
</Centered>
3583

3684
<MarkdownContent dangerouslySetInnerHTML={{ __html: acOfOnePartThree.html }} />
3785

0 commit comments

Comments
 (0)