You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _research/ribo-ecosystem.md
+18-23Lines changed: 18 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,28 +16,23 @@ summary: We develop machine learning models to integrate quantitative measuremen
16
16
</center>
17
17
</p>
18
18
19
-
mRNA translation can be measured transcriptome-wide by sequencing of mRNA
20
-
fragments protected by ribosomes from RNase digestion.
21
-
This approach, called ribosome profiling,
22
-
poses unique computational challenges.
23
-
Unlike RNA sequencing measurements,
24
-
ribosome profiling data typically needs to be analyzed
25
-
as a function of the read/footprint size.
26
-
This results in significant bottlenecks in storage and processing,
27
-
as many values need to be stored for each gene and experiment.
28
-
While there are specialized solutions for other data modalities
29
-
such as <ahref="https://samtools.github.io/hts-specs/SAMv1.pdf">BAM</a> for sequence alignment,
30
-
or <ahref="https://cooler.readthedocs.io/en/latest/datamodel.html">Cooler</a> /
31
-
<ahref="https://github.com/aidenlab/juicer/wiki/Data">hic</a> for chromosome conformation capture data,
32
-
ribosome profiling experiments lacked a comparable dedicated and standardized format.
33
-
We have recently designed a dedicated binary hierarchical data format to efficiently store,
34
-
organize and process ribosome profiling data.
35
-
We are building a computational ecosystem around this file format (<ahref="https://academic.oup.com/bioinformatics/article/36/9/2929/5701654">.ribo</a>).
36
-
We currently have a workflow that can generate these files starting
37
-
from raw sequencing reads.
38
-
The resulting file can be used seamlessly to analyze and visualize
39
-
in our downstream analysis software. We continue to improve this ecosystem
40
-
by adding ribosome profiling specific analyses such as improved algorithms
41
-
for pause site detection.
19
+
<p>
20
+
Accurately predicting gene expression across biological contexts requires reliable and reusable data. Ribosome profiling has become a key method for measuring translation, but much of the existing data is scattered across general-purpose databases with poor metadata, limiting reuse and integrative analyses.
21
+
22
+
To address this, we manually curated and uniformly reprocessed over 3,500 ribosome profiling experiments. These have been enabled by a computational ecosystem around a dedicated binary hierarchical data format to efficiently store,
23
+
organize and process ribosome profiling data that we have developed<ahref="https://academic.oup.com/bioinformatics/article/36/9/2929/5701654">we have developed</a>. This effort provided us with a large-scale, high-quality compendium of translation efficiency (TE) data across diverse biological conditions.
24
+
</p>
25
+
26
+
<p>
27
+
Inspired by how co-expression of RNA reveals gene function and regulatory programs, we introduced the concept of translation efficiency covariation (TEC). TEC turns out to be a conserved and biologically meaningful signal, reflecting coordinated translational control. It also uncovers new regulatory mechanisms and can predict protein–protein interactions and gene functions. For instance, TEC revealed a novel regulator of glycolysis that was invisible to RNA expression and protein abundance analyses.
28
+
</p>
42
29
30
+
<p>
31
+
We have also developed RiboNN, a deep neural network that predicts cell-type-specific translation efficiency from full-length mRNA sequences. Trained on our large compendium, RiboNN is the most accurate model of translation to date. Beyond prediction, the model reveals sequence features linked to translation, mRNA stability, and localization.
32
+
33
+
These tools and insights open the door to new applications in synthetic biology and therapeutics. RiboNN can help interpret the effects of genetic variants on translation and guide the design of optimized mRNA-based therapies, with implications for both diagnostics and treatment of genetic diseases.
34
+
</p>
35
+
36
+
<p>
43
37
We emphasize development of reusable, portable and open source software that will be widely distributed.
0 commit comments