feat(frontend): add ac of 1 content

rileyhgrant · rileyhgrant · commit f04580a1dbe4 · 2025-07-15T10:41:08.000-04:00
diff --git a/browser/about/acofone/ac-of-one-part-one.md b/browser/about/acofone/ac-of-one-part-one.md
@@ -0,0 +1,13 @@
+## Executive Summary
+
+The vast majority of variants being discovered within large population datasets – and in particular those with the greatest functional impact – are extremely rare. Some genomic data generating programs limit the ability to share allele counts (AC) below a certain threshold (e.g. below 20 in the All of Us Research Program unless an exception is granted). However, sharing summary statistics for all variants, including those found only in a single individual, is critical for scientific progress and discovery. Obscuring information about this rare variation would be reasonable if the risks to participants were large, but the experiences of many other large genomics initiatives illustrate that the actual risks to privacy are quite small.
+
+<br />
+
+## Most variants being discovered in genomic population datasets are extremely rare
+
+Precision medicine research implicitly requires exploring individual genetic variants within the human genome. Because every individual’s genome harbors extensive unique variation, the vast majority of variants in any large scale study – and especially those novel variants – tend to be extremely rare. In fact, our expectation is that the majority of unique variants in any large genomic dataset will be present in at most 1 or 2 participants. It is therefore paramount that information about this critical class of variation is distributed to the research community.
+
+[This paper](https://www.nature.com/articles/nature19057) from the Exome Aggregation Consortium (ExAC) highlights the fact that most variants from a large diverse dataset will be rare<sup>1</sup>. Figure 1c, copied below, shows that more than 50% of the variants in the exome are singletons, present in only 1 individual. Similarly, unpublished data from the gnomAD version 4.1 exomes (restricting analysis to high quality variants in canonical transcripts) also shows that most variants discovered across over 730k individuals are rare.
+
+<br />
diff --git a/browser/about/acofone/ac-of-one-part-three.md b/browser/about/acofone/ac-of-one-part-three.md
@@ -0,0 +1,30 @@
+<br />
+
+## Rare variants also tend to have the largest effect sizes
+
+Biologically, the process of negative selection tends to decrease the frequency of functional damaging variants, which means that variants with the largest effect sizes are more likely to be rare. In other words, it is often the very rare variants that are of most scientific interest to researchers.
+
+<br />
+
+## Obfuscation of data harms downstream science
+
+In theory, if one obfuscates or randomizes the counts/frequencies of variants, it could help protect against certain risks to participants in a study. However, changing the allele frequency values can drastically impact associations with health outcomes, thus presenting misleading data. Purposely limiting visibility of rare variation disrupts the facilitation of scientific discoveries. Requiring researchers to request an exemption adds an extra barrier and additional time to publishing or other forms of data sharing.
+
+<br />
+
+## Precedents through NIH and global initiatives
+
+It is standard in the genomics field to display exact allele counts on public browsers as seen in gnomAD ([gnomad.broadinstitute.org](https://gnomad.broadinstitute.org)), _All of Us_ ([https://databrowser.researchallofus.org/snvsindels](https://databrowser.researchallofus.org/snvsindels)) and UK Biobank ([genebass.org](https://genebass.org)), as it
+presents a very low risk to re-identification. Ultimately, the analysis and reporting of rare variants in research manuscripts presents minimal added risk once those variants and their frequencies are already present in a public browser (see below). And there is precedent for allowing such analysis in numerous scientific programs (many of which are NIH-funded). For example, the NHGRI-funded Clinical Genome Resource (ClinGen), working closely with policy leaders at NIH and the GA4GH, published [guidance](https://pubmed.ncbi.nlm.nih.gov/29437798/) to laboratories stating that submission of classified variants, associated with phenotype, were allowable without consent and even if limited to a single observation given low risk to individuals and large benefit to science and medicine<sup>5</sup>. This approach was endorsed by leaders in the UK in a [publication](https://pubmed.ncbi.nlm.nih.gov/31886409/)<sup>6</sup> documenting agreement with these principles and noting allowance under General Data Protection Regulation (GDPR). Furthermore, the ability to publish results of rare variant associations has also been adopted by the UK Biobank leading to widespread benefit to science and medicine without any demonstrated risk. Hundreds of studies of rare variant associations have been published in the past decade based on data released by the UK Biobank without barriers to analysis and publication.
+
+<br />
+
+## Re-identification risks
+
+A handful of well cited publications have shown that, in theory, information about genomic variants is vulnerable to several types of attacks. For instance, it has been shown that the presence/absence of a set of alleles over the genome could allow for a user to probabilistically claim that an individual’s record is in a database (or what is often referred to as a membership inference attack)<sup>7</sup> or that their relative is in the database.<sup>8</sup> Another risk to participants in the case where a rare variant is published along with its associated phenotype, is that it could allow for direct linkage to a known genomic record<sup>9</sup> thus providing the data user with novel information about a participant. However, these types of attacks all assume worst case adversarial situations, which **is not likely to be the case in a well-governed setting**. It is worth adding that in most of the attack scenarios, the user would learn only that a certain individual is a participant in a large biobank, a fact unlikely to lead to harm. Furthermore, the multi-stage attack described in [9] required having access to linked databases that are not even available anymore.
+
+In addition to new papers<sup>10</sup> arguing that most genomic data can effectively be shared with minimal re-identification risk, we can see in practice that this is indeed the case: countless rare variants have been published in high profile scientific journals and public databases like [ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/), and the only known “attacks” have come from the handful of theoretical publications referenced here. For example, over 3.5 million unique variants, classified for pathogenicity towards a specific disease, have been submitted to ClinVar, for which over 75% of these variants have only been identified by a single laboratory. In NIH-funded studies of Mendelian disorders, autism, schizophrenia, cardiovascular disease, and many other human disease phenotypes, there have been millions of rare variants identified from genome sequencing and published as novel disease and trait associations that have set these fields in new directions toward understanding disease etiology and the pursuit of targeted therapeutics. To the best of our knowledge, no participants or patients were harmed whereas trailblazing science and genetic diagnoses have been achieved.
+
+As such, we firmly believe that a policy against sharing low allele counts is protecting against situations that aren’t practical and really only serves to hinder scientific advances.
+
+<br />
diff --git a/browser/about/acofone/ac-of-one-part-two.md b/browser/about/acofone/ac-of-one-part-two.md
@@ -0,0 +1,5 @@
+<br />
+
+This is particularly true for structural variants (SVs), as demonstrated in Figure 1 from the gnomAD SV resource.<sup>2</sup> Here, we see in panels 1g and 1h that >70% of all SVs observed had allele counts less than 10, and the proportion of singletons is strongly correlated with SV size.
+
+<br />
diff --git a/browser/about/acofone/exac_frequencies.png b/browser/about/acofone/exac_frequencies.png
diff --git a/browser/about/acofone/gnomad_v4_frequencies.png b/browser/about/acofone/gnomad_v4_frequencies.png
diff --git a/browser/about/acofone/gnomad_v4_sv_figures.png b/browser/about/acofone/gnomad_v4_sv_figures.png
diff --git a/browser/about/policies/policies.md b/browser/about/policies/policies.md
@@ -9,3 +9,7 @@ While we hope gnomAD exists for decades to come, we recognize the importance of
 ## Data Generation
 
 A full description of the methods used to aggregate and call variants across the exomes and genomes in this project will be provided shortly. In brief: we pulled raw data together from as many exomes and genomes as we could get our hands on, aligned and processed each of these data types through unified processing pipelines based on Picard, and performed variant calling with the GATK HaplotypeCaller following GATK best practices. Processing and variant calling at this enormous scale was only possible thanks to the hard work of the Broad Institute's Data Sciences Platform, and the Intel GenomicsDB team. Downstream analysis relied heavily on the [Hail](https://hail.is/) toolkit.
+
+## Public Release of Low Frequency Allele Count
+
+The vast majority of variants being discovered within large population datasets – and in particular those with the greatest functional impact – are extremely rare. Some genomic data generating programs limit the ability to share allele counts (AC) below a certain threshold. However, sharing summary statistics for all variants, including those found only in a single individual, is critical for scientific progress and discovery. Obscuring information about this rare variation would be reasonable if the risks to participants were large, but the experiences of many other large genomics initiatives illustrate that the actual risks to privacy are quite small. Follow [this link](/AC1) to read our full statement supporting public release of low frequency allele count summary statistics.
diff --git a/browser/help/faq/general/what-is-the-minimum-number-of-allele-counts-you-need-to-release-a-variant.md b/browser/help/faq/general/what-is-the-minimum-number-of-allele-counts-you-need-to-release-a-variant.md
@@ -0,0 +1,5 @@
+---
+question: 'What is the minimum number of allele counts you need to release a variant?'
+---
+
+Our policy is to release allele counts (AC) as they are observed in the database, including if it is only seen in a single individual (AC=1). Follow [this link](/AC1) to read our full statement supporting public release of low frequency allele count summary statistics.
diff --git a/browser/src/AcOfOnePage.tsx b/browser/src/AcOfOnePage.tsx
@@ -0,0 +1,193 @@
+import React from 'react'
+import styled from 'styled-components'
+import { ExternalLink, ListItem, OrderedList, PageHeading } from '@gnomad/ui'
+import { PaperCitation } from './PublicationsPage'
+
+// @ts-expect-error
+import exac_frequencies_bar_graph from '../about/acofone/exac_frequencies.png'
+// @ts-expect-error
+import gnomad_v4_frequencies_bar_graph from '../about/acofone/gnomad_v4_frequencies.png'
+// @ts-expect-error
+import gnomad_v4_sv_figures from '../about/acofone/gnomad_v4_sv_figures.png'
+
+// @ts-expect-error
+import acOfOnePartOne from '../about/acofone/ac-of-one-part-one.md'
+// @ts-expect-error
+import acOfOnePartTwo from '../about/acofone/ac-of-one-part-two.md'
+// @ts-expect-error
+import acOfOnePartThree from '../about/acofone/ac-of-one-part-three.md'
+
+import DocumentTitle from './DocumentTitle'
+import InfoPage from './InfoPage'
+import MarkdownContent from './MarkdownContent'
+
+const Centered = styled.div`
+  display: flex;
+  justify-content: space-around;
+
+  @media (max-width: 992px) {
+    display: block;
+  }
+`
+
+const ResponsiveHalfWidthColumn = styled.div`
+  width: 50%;
+
+  @media (max-width: 992px) {
+    width: 100%;
+  }
+`
+
+const MarginTop = styled.div`
+  margin-top: 6rem;
+`
+
+const AcOfOnePage = () => {
+  return (
+    <InfoPage>
+      <DocumentTitle title="AC=1" />
+      <PageHeading
+        // @ts-expect-error
+        id="ac-one"
+      >
+        Arguments Supporting Public Release of Low Frequency Allele Count Summary Statistics
+      </PageHeading>
+
+      <MarkdownContent dangerouslySetInnerHTML={{ __html: acOfOnePartOne.html }} />
+
+      <Centered>
+        <ResponsiveHalfWidthColumn>
+          <MarginTop>
+            <img
+              src={exac_frequencies_bar_graph}
+              alt="Bar graph of ExAC frequencies"
+              width="400px"
+            />
+          </MarginTop>
+        </ResponsiveHalfWidthColumn>
+
+        <ResponsiveHalfWidthColumn>
+          <img
+            src={gnomad_v4_frequencies_bar_graph}
+            alt="Bar graph of gnomAD v4 frequencies"
+            width="400px"
+          />
+        </ResponsiveHalfWidthColumn>
+      </Centered>
+
+      <MarkdownContent dangerouslySetInnerHTML={{ __html: acOfOnePartTwo.html }} />
+
+      <Centered>
+        <img src={gnomad_v4_sv_figures} alt="image4" width="650px" />
+      </Centered>
+
+      <MarkdownContent dangerouslySetInnerHTML={{ __html: acOfOnePartThree.html }} />
+
+      <h2>References</h2>
+      {/* @ts-expect-error */}
+      <OrderedList>
+        <PaperCitation
+          authorList="Lek, M., Karczewski, K., Minikel, E."
+          etAl
+          title="Analysis of protein-coding genetic variation in 60,706 humans."
+          journal="Nature"
+          issue="536"
+          pages="285-291"
+          year="2016"
+          doiLink="https://doi.org/10.1038/nature19057"
+        />
+
+        <PaperCitation
+          authorList="Collins, R. L., Brand, H., Karczewski, K. J."
+          etAl
+          title="A structural variation reference for medical and population genetics."
+          journal="Nature"
+          issue="581"
+          pages="444-451"
+          year="2020"
+          doiLink="https://doi.org/10.1038/s41586-020-2287-8"
+        />
+
+        {/* @ts-expect-error */}
+        <ListItem>
+          <ExternalLink href="https://databrowser.researchallofus.org/snvsindels">
+            https://databrowser.researchallofus.org/snvsindels
+          </ExternalLink>
+        </ListItem>
+
+        {/* @ts-expect-error */}
+        <ListItem>
+          <ExternalLink href="https://genebass.org">https://genebass.org</ExternalLink>
+        </ListItem>
+
+        {/* TODO: FIX THIS ONE */}
+        <PaperCitation
+          authorList="Azzariti, D. R., Riggs, E. R., Niehaus, A., Rodriguez, L. L., Ramos, E. M., Kattman, B., Landrum, M. J., Martin, C. L., & Rehm, H. L."
+          title="Points to consider for sharing variant-level information from clinical genetic testing with ClinVar."
+          journal="Cold Spring Harbor molecular case studies"
+          issue="4(1)"
+          year="2018"
+          doiLink="https://doi.org/10.1101/mcs.a002345"
+        />
+
+        <PaperCitation
+          authorList="Wright, C. F., Ware, J. S., Lucassen, A. M., Hall, A., Middleton, A., Rahman, N., Ellard, S., & Firth, H. V."
+          title="Genomic variant sharing: a position statement."
+          journal="Wellcome open research"
+          issue="4"
+          pages="22"
+          year="2019"
+          doiLink="https://doi.org/10.12688/wellcomeopenres.15090.2"
+        />
+
+        <PaperCitation
+          authorList="Shringarpure,  S. S., Bustamante,  C. D."
+          title="Privacy risks from genomic data-sharing beacons."
+          journal="American Journal of Human Genetics."
+          issue="97"
+          pages="631-646"
+          year="2015"
+          doiLink="10.1016/j.ajhg.2015.09.010"
+        />
+
+        <PaperCitation
+          authorList="Ayoz K., Aysen M., Ayday E., Cicek A. E."
+          title="The effect of kinship in re-identification attacks against genomic data sharing beacons."
+          journal="Bioinformatics"
+          issue="36"
+          year="2020"
+          doiLink="https://doi.org/10.1093/bioinformatics/btaa821"
+        />
+
+        <PaperCitation
+          authorList="Erlich Y., Narayanan A."
+          title="Routes for breaching and protecting genetic privacy."
+          journal="Nature Reviews Genetics"
+          issue="15(6)"
+          pages="409-421"
+          year="2014"
+          doiLink="https://doi.org/10.1038/nrg3723"
+        />
+
+        <PaperCitation
+          authorList="Wan Z."
+          etAl
+          title="Using game theory to thwart multistage privacy intrusions when sharing data."
+          journal="Science Advances"
+          issue="7(50)"
+          year="2021"
+          doiLink="https://doi.org/10.1126/sciadv.abe9986"
+        />
+
+        {/* @ts-expect-error */}
+        <ListItem>
+          <ExternalLink href="https://www.ncbi.nlm.nih.gov/clinvar/">
+            https://www.ncbi.nlm.nih.gov/clinvar/
+          </ExternalLink>
+        </ListItem>
+      </OrderedList>
+    </InfoPage>
+  )
+}
+
+export default AcOfOnePage
diff --git a/browser/src/PublicationsPage.tsx b/browser/src/PublicationsPage.tsx
@@ -13,6 +13,7 @@ const Citation = styled.cite`
 type PaperCitationProps = {
   prefix?: string
   authorList: string
+  etAl?: boolean
   title: string
   journal: string
   issue?: string
@@ -24,9 +25,10 @@ type PaperCitationProps = {
   pmcid?: string
 }
 
-const PaperCitation = ({
+export const PaperCitation = ({
   prefix,
   authorList,
+  etAl = false,
   title,
   journal,
   issue,
@@ -46,7 +48,9 @@ const PaperCitation = ({
             <b>{prefix}</b>:{' '}
           </>
         )}
-        <>{`${authorList} ${title} `}</>
+        <>
+          {authorList} {etAl && <i>et al.</i>} {title}{' '}
+        </>
         <em>{journal}</em>
         <>{`. ${issue ? `${issue}, ` : ''}${pages || ''} (${year}).`}</>
         <>
diff --git a/browser/src/Routes.tsx b/browser/src/Routes.tsx
@@ -9,6 +9,7 @@ import DocumentTitle from './DocumentTitle'
 import { DatasetId } from '@gnomad/dataset-metadata/metadata'
 
 // Content pages
+const AcOfOnePage = lazy(() => import('./AcOfOnePage'))
 const AboutPage = lazy(() => import('./AboutPage'))
 const TeamPage = lazy(() => import('./TeamPage/TeamPage'))
 const ContactPage = lazy(() => import('./ContactPage'))
@@ -166,6 +167,8 @@ const Routes = () => {
         }}
       />
 
+      <Route exact path="/AC1" component={AcOfOnePage} />
+
       <Route exact path="/about" component={AboutPage} />
 
       <Route exact path="/team" component={TeamPage} />
diff --git a/browser/src/__snapshots__/PoliciesPage.spec.tsx.snap b/browser/src/__snapshots__/PoliciesPage.spec.tsx.snap
@@ -242,6 +242,10 @@ While we hope gnomAD exists for decades to come, we recognize the importance of
 ## Data Generation
 
 A full description of the methods used to aggregate and call variants across the exomes and genomes in this project will be provided shortly. In brief: we pulled raw data together from as many exomes and genomes as we could get our hands on, aligned and processed each of these data types through unified processing pipelines based on Picard, and performed variant calling with the GATK HaplotypeCaller following GATK best practices. Processing and variant calling at this enormous scale was only possible thanks to the hard work of the Broad Institute's Data Sciences Platform, and the Intel GenomicsDB team. Downstream analysis relied heavily on the [Hail](https://hail.is/) toolkit.
+
+## Public Release of Low Frequency Allele Count
+
+The vast majority of variants being discovered within large population datasets – and in particular those with the greatest functional impact – are extremely rare. Some genomic data generating programs limit the ability to share allele counts (AC) below a certain threshold. However, sharing summary statistics for all variants, including those found only in a single individual, is critical for scientific progress and discovery. Obscuring information about this rare variation would be reasonable if the risks to participants were large, but the experiences of many other large genomics initiatives illustrate that the actual risks to privacy are quite small. Follow [this link](/AC1) to read our full statement supporting public release of low frequency allele count summary statistics.
 ",
       }
     }
diff --git a/browser/src/__snapshots__/PublicationsPage.spec.tsx.snap b/browser/src/__snapshots__/PublicationsPage.spec.tsx.snap

-Original file line number
+Diff line change
@@ @@ -0,0 +1,5 @@ @@
 +<br />
++
 +This is particularly true for structural variants (SVs), as demonstrated in Figure 1 from the gnomAD SV resource.<sup>2</sup> Here, we see in panels 1g and 1h that >70% of all SVs observed had allele counts less than 10, and the proportion of singletons is strongly correlated with SV size.
++
 +<br />