From 2729b77bf4d94fe6ac74bfa307d1eefb5bba0231 Mon Sep 17 00:00:00 2001
From: Anka Using eisaR for Exon-Intron Split Analysis
Michael
Stadler
- 2024-10-29
+ 2025-01-06
Source: vignettes/eisaR.Rmd
eisaR.Rmd
IntroductionQuasR
+Bioconductor annotation and the QuasR
package. It is also possible to obtain count tables for exons and
introns using some other pipeline or approach, and directly start with
step 3.
EnsDb
objects, e.g. via packages such as TxDb.Mmusculus.UCSC.mm10.knownGene
+or EnsDb.Hsapiens.v86.
You can see available annotations using the following code:
If you would like to use an alternative source of gene annotations,
you might still be able to use eisaR
by first converting
your annotations into a TxDb
or an EnsDb
(for
-creating a TxDb
see makeTxDb
in the txdbmaker
+creating a TxDb
see makeTxDb
in the txdbmaker
package, for creating an EnsDb
see
-makeEnsembldbPackage
in the ensembldb
+makeEnsembldbPackage
in the ensembldb
package).
For this example, eisaR
contains a small
TxDb
to illustrate how regions are extracted. We will load
@@ -208,7 +208,7 @@
regS
into .gtf
files:
@@ -219,23 +219,25 @@Preparing the annotation
Quantify RNA-seq alignments in exons and introns
-For this example we will use the QuasR +
For this example we will use the QuasR package for indexing and alignment of short reads, and a small RNA-seq dataset that is contained in that package. As mentioned, it is also possible to align or also quantify your reads using an alternative aligner/counter, and skip over these steps. For more details about the -syntax, we refer to the documentation and vignette of the QuasR +syntax, we refer to the documentation and vignette of the QuasR package.
Align reads
-Let’s first copy the sample data from the QuasR +
Let’s first copy the sample data from the QuasR package to the current working directory, all contained in a folder named
extdata
:library(QuasR) #> Loading required package: parallel #> Loading required package: Rbowtie +#> Error in get(paste0(generic, ".", class), envir = get_method_env()) : +#> object 'type_sum.accel' not found file.copy(system.file(package = "QuasR", "extdata"), ".", recursive = TRUE) #> [1] TRUE
We next align the reads to a mini-genome (fasta file @@ -265,9 +267,9 @@
Align reads #> Testing the compute nodes...OK #> Loading QuasR on the compute nodes...preparing to run on 1 nodes...done #> Available cores: -#> Mac-1730219867655.local: 1 +#> Mac-1736166207550.local: 1 #> Performing genomic alignments for 4 samples. See progress in the log file: -#> /Users/runner/work/eisaR/eisaR/vignettes/QuasR_log_654e6f91f14f.txt +#> /Users/runner/work/eisaR/eisaR/vignettes/QuasR_log_ff87da4fee5.txt #> Genomic alignments have been created successfully alignmentStats(proj) #> seqlength mapped unmapped @@ -370,7 +372,7 @@
Alternative implementations of EISA
@@ -496,7 +498,7 @@ TRUE
(default): use a model adjusting for the baseline differences among samples, and with condition-specific region effects -(similar to the model described in section 3.5 of the edgeR user +(similar to the model described in section 3.5 of the edgeR user guide)Alternative implementations of EISA #> logical 46 93 # comparison of deltas -ids <- intersect(rownames(res1$DGEList), rownames(res2$DGEList)) +ids <- intersect(rownames(res1$DGEList), rownames(res2$DGEList)) cor(res1$contrasts[ids,"Dex"], res2$contrasts[ids,"Dex"]) #> [1] 0.989731 cor(res1$contrasts[ids,"Din"], res2$contrasts[ids,"Din"]) @@ -531,7 +533,7 @@
On That sample factor is nested in the condition factor (no sample can belong to more than one condition). Thus, we are in the situation described in section 3.5 (‘Comparisons both between and within -subjects’) of the edgeR user +subjects’) of the edgeR user guide, and we use the approach described there to define a design matrix with sample-specific baseline effects as well as condition-specific region effects. @@ -550,7 +552,7 @@
On #> filtering quantifyable genes...keeping 11759 from 20288 (58%) #> fitting statistical model...done #> calculating log-fold changes...done -ids <- intersect(rownames(res3$contrasts), rownames(res4$contrasts)) +ids <- intersect(rownames(res3$contrasts), rownames(res4$contrasts)) # number of genes with significant post-transcriptional regulation sum(res3$tab.ExIn$FDR < 0.05) @@ -678,7 +680,7 @@
Calculate
Statistical analysis
-Finally, we use the replicate measurement in the edgeR +
Finally, we use the replicate measurement in the edgeR framework to calculate the significance of the changes:
+#> [21] dbplyr_2.5.0 XVector_0.47.1 +#> [23] Rsamtools_2.23.1 rmarkdown_2.29 +#> [25] UCSC.utils_1.3.0 ragg_1.3.3 +#> [27] bit_4.5.0.1 xfun_0.49 +#> [29] zlibbioc_1.53.0 cachem_1.1.0 +#> [31] jsonlite_1.8.9 progress_1.2.3 +#> [33] blob_1.2.4 DelayedArray_0.33.3 +#> [35] BiocParallel_1.41.0 jpeg_0.1-10 +#> [37] prettyunits_1.2.0 R6_2.5.1 +#> [39] VariantAnnotation_1.53.0 bslib_0.8.0 +#> [41] stringi_1.8.4 RColorBrewer_1.1-3 +#> [43] jquerylib_0.1.4 Rcpp_1.0.13-1 +#> [45] bookdown_0.41 SummarizedExperiment_1.37.0 +#> [47] knitr_1.49 SGSeq_1.41.0 +#> [49] igraph_2.1.2 tidyselect_1.2.1 +#> [51] Matrix_1.7-1 abind_1.4-8 +#> [53] yaml_2.3.10 codetools_0.2-20 +#> [55] RUnit_0.4.33 hwriter_1.3.2.1 +#> [57] curl_6.0.1 tibble_3.2.1 +#> [59] lattice_0.22-6 ShortRead_1.65.0 +#> [61] KEGGREST_1.47.0 evaluate_1.0.1 +#> [63] desc_1.4.3 BiocFileCache_2.15.0 +#> [65] xml2_1.3.6 Biostrings_2.75.3 +#> [67] filelock_1.0.3 pillar_1.10.0 +#> [69] BiocManager_1.30.25 MatrixGenerics_1.19.0 +#> [71] RCurl_1.98-1.16 hms_1.1.3 +#> [73] GenomicFiles_1.43.0 glue_1.8.0 +#> [75] tools_4.5.0 interp_1.1-6 +#> [77] BiocIO_1.17.1 BSgenome_1.75.0 +#> [79] locfit_1.5-9.10 GenomicAlignments_1.43.0 +#> [81] fs_1.6.5 XML_3.99-0.18 +#> [83] grid_4.5.0 latticeExtra_0.6-30 +#> [85] GenomeInfoDbData_1.2.13 restfulr_0.0.15 +#> [87] cli_3.6.3 rappdirs_0.3.3 +#> [89] textshaping_0.4.1 S4Arrays_1.7.1 +#> [91] dplyr_1.1.4 sass_0.4.9 +#> [93] digest_0.6.37 SparseArray_1.7.2 +#> [95] rjson_0.2.23 memoise_2.0.1 +#> [97] htmltools_0.5.8.1 pkgdown_2.1.1.9000 +#> [99] lifecycle_1.0.4 httr_1.4.7 +#> [101] statmod_1.5.0 bit64_4.5.2# create DGEList object with exonic and intronic counts @@ -779,13 +781,13 @@
Session information
sessionInfo() -#> R version 4.4.1 (2024-06-14) -#> Platform: x86_64-apple-darwin20 -#> Running under: macOS Monterey 12.7.6 +#> R Under development (unstable) (2025-01-03 r87521) +#> Platform: aarch64-apple-darwin20 +#> Running under: macOS Sonoma 14.7.2 #> #> Matrix products: default -#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib -#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 +#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib +#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 #> #> locale: #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 @@ -798,66 +800,65 @@
Session information#> [8] methods base #> #> other attached packages: -#> [1] edgeR_4.3.21 limma_3.61.12 QuasR_1.45.2 -#> [4] Rbowtie_1.45.0 rtracklayer_1.65.0 GenomicFeatures_1.57.1 -#> [7] AnnotationDbi_1.67.0 Biobase_2.65.1 GenomicRanges_1.57.2 -#> [10] GenomeInfoDb_1.41.2 IRanges_2.39.2 S4Vectors_0.43.2 -#> [13] BiocGenerics_0.51.3 eisaR_1.19.0 BiocStyle_2.33.1 +#> [1] edgeR_4.5.1 limma_3.63.2 QuasR_1.47.2 +#> [4] Rbowtie_1.47.0 rtracklayer_1.67.0 GenomicFeatures_1.59.1 +#> [7] AnnotationDbi_1.69.0 Biobase_2.67.0 GenomicRanges_1.59.1 +#> [10] GenomeInfoDb_1.43.2 IRanges_2.41.2 S4Vectors_0.45.2 +#> [13] BiocGenerics_0.53.3 generics_0.1.3 eisaR_1.19.0 +#> [16] BiocStyle_2.35.0 #> #> loaded via a namespace (and not attached): #> [1] DBI_1.2.3 bitops_1.0-9 -#> [3] deldir_2.0-4 httr2_1.0.5 -#> [5] biomaRt_2.61.3 rlang_1.1.4 -#> [7] magrittr_2.0.3 Rhisat2_1.21.0 -#> [9] matrixStats_1.4.1 compiler_4.4.1 -#> [11] RSQLite_2.3.7 png_0.1-8 +#> [3] deldir_2.0-4 httr2_1.0.7 +#> [5] biomaRt_2.63.0 rlang_1.1.4 +#> [7] magrittr_2.0.3 Rhisat2_1.23.0 +#> [9] matrixStats_1.4.1 compiler_4.5.0 +#> [11] RSQLite_2.3.9 png_0.1-8 #> [13] systemfonts_1.1.0 vctrs_0.6.5 -#> [15] txdbmaker_1.1.2 stringr_1.5.1 -#> [17] pwalign_1.1.0 pkgconfig_2.0.3 +#> [15] txdbmaker_1.3.1 stringr_1.5.1 +#> [17] pwalign_1.3.1 pkgconfig_2.0.3 #> [19] crayon_1.5.3 fastmap_1.2.0 -#> [21] dbplyr_2.5.0 XVector_0.45.0 -#> [23] utf8_1.2.4 Rsamtools_2.21.2 -#> [25] rmarkdown_2.28 UCSC.utils_1.1.0 -#> [27] ragg_1.3.3 bit_4.5.0 -#> [29] xfun_0.48 zlibbioc_1.51.2 -#> [31] cachem_1.1.0 jsonlite_1.8.9 -#> [33] progress_1.2.3 blob_1.2.4 -#> [35] highr_0.11 DelayedArray_0.31.14 -#> [37] BiocParallel_1.39.0 jpeg_0.1-10 -#> [39] prettyunits_1.2.0 R6_2.5.1 -#> [41] VariantAnnotation_1.51.2 bslib_0.8.0 -#> [43] stringi_1.8.4 RColorBrewer_1.1-3 -#> [45] jquerylib_0.1.4 Rcpp_1.0.13 -#> [47] bookdown_0.41 SummarizedExperiment_1.35.5 -#> [49] knitr_1.48 SGSeq_1.39.0 -#> [51] igraph_2.1.1 tidyselect_1.2.1 -#> [53] Matrix_1.7-1 abind_1.4-8 -#> [55] yaml_2.3.10 codetools_0.2-20 -#> [57] RUnit_0.4.33 hwriter_1.3.2.1 -#> [59] curl_5.2.3 tibble_3.2.1 -#> [61] lattice_0.22-6 ShortRead_1.63.2 -#> [63] KEGGREST_1.45.1 evaluate_1.0.1 -#> [65] desc_1.4.3 BiocFileCache_2.13.2 -#> [67] xml2_1.3.6 Biostrings_2.73.2 -#> [69] filelock_1.0.3 pillar_1.9.0 -#> [71] BiocManager_1.30.25 MatrixGenerics_1.17.1 -#> [73] generics_0.1.3 RCurl_1.98-1.16 -#> [75] hms_1.1.3 GenomicFiles_1.41.0 -#> [77] glue_1.8.0 tools_4.4.1 -#> [79] interp_1.1-6 BiocIO_1.15.2 -#> [81] BSgenome_1.73.1 locfit_1.5-9.10 -#> [83] GenomicAlignments_1.41.0 fs_1.6.4 -#> [85] XML_3.99-0.17 grid_4.4.1 -#> [87] latticeExtra_0.6-30 GenomeInfoDbData_1.2.13 -#> [89] restfulr_0.0.15 cli_3.6.3 -#> [91] rappdirs_0.3.3 textshaping_0.4.0 -#> [93] fansi_1.0.6 S4Arrays_1.5.11 -#> [95] dplyr_1.1.4 sass_0.4.9 -#> [97] digest_0.6.37 SparseArray_1.5.45 -#> [99] rjson_0.2.23 memoise_2.0.1 -#> [101] htmltools_0.5.8.1 pkgdown_2.1.1.9000 -#> [103] lifecycle_1.0.4 httr_1.4.7 -#> [105] statmod_1.5.0 bit64_4.5.2
References diff --git a/articles/rna-velocity.html b/articles/rna-velocity.html index 1e0c01e..a1b068a 100644 --- a/articles/rna-velocity.html +++ b/articles/rna-velocity.html @@ -88,7 +88,7 @@
Generating reference files for spliced and
Charlotte Soneson
-2024-10-29
+2025-01-06
Source:vignettes/rna-velocity.Rmd
@@ -121,7 +121,7 @@rna-velocity.Rmd
Generate feature rangesGencode release 28. We extract genomic ranges for spliced transcript and introns, where the latter are defined for each transcript separately -(using the same terminology as in the BUSpaRse +(using the same terminology as in the BUSpaRse package). For each intron, a flanking sequence of 50 nt is added on each side to accommodate reads mapping across an exon/intron boundary.
@@ -135,11 +135,15 @@Generate feature ranges joinOverlappingIntrons = FALSE, verbose = TRUE ) +#> Error in get(paste0(generic, ".", class), envir = get_method_env()) : +#> object 'type_sum.accel' not found #> Import genomic features from the file as a GRanges object ... OK #> Prepare the 'metadata' data frame ... OK #> Make the TxDb object ... -#> Warning in .get_cds_IDX(mcols0$type, mcols0$phase): The "phase" metadata column contains non-NA values for -#> features of type stop_codon. This information was ignored. +#> Warning in .get_cds_IDX(mcols0$type, mcols0$phase): The "phase" metadata column contains non-NA values for features of +#> type stop_codon. This information was ignored. +#> Warning in .makeTxDb_normarg_chrominfo(chrominfo): genome version +#> information is not available for this TxDb object #> OK #> 'select()' returned 1:1 mapping between keys and columns #> Extracting spliced transcript features @@ -243,7 +247,7 @@
Extract feature sequences
Once the genomic ranges of the features of interest are extracted, we can obtain the nucleotide sequences using the -
@@ -280,7 +284,7 @@extractTranscriptSeqs()
function from the GenomicFeatures +extractTranscriptSeqs()
function from the GenomicFeatures package. In addition to the ranges, this requires the genome sequence. This can be obtained, for example, from the corresponding BSgenome package, or from an external FASTA file.Generate an expanded GTF filetximeta. +abundances with tximeta.
+#> [21] tools_4.5.0 yaml_2.3.10 +#> [23] knitr_1.49 prettyunits_1.2.0 +#> [25] S4Arrays_1.7.1 bit_4.5.0.1 +#> [27] curl_6.0.1 DelayedArray_0.33.3 +#> [29] xml2_1.3.6 abind_1.4-8 +#> [31] BiocParallel_1.41.0 txdbmaker_1.3.1 +#> [33] desc_1.4.3 grid_4.5.0 +#> [35] edgeR_4.5.1 biomaRt_2.63.0 +#> [37] SummarizedExperiment_1.37.0 cli_3.6.3 +#> [39] rmarkdown_2.29 crayon_1.5.3 +#> [41] ragg_1.3.3 httr_1.4.7 +#> [43] rjson_0.2.23 DBI_1.2.3 +#> [45] cachem_1.1.0 stringr_1.5.1 +#> [47] zlibbioc_1.53.0 parallel_4.5.0 +#> [49] AnnotationDbi_1.69.0 BiocManager_1.30.25 +#> [51] restfulr_0.0.15 matrixStats_1.4.1 +#> [53] vctrs_0.6.5 Matrix_1.7-1 +#> [55] jsonlite_1.8.9 bookdown_0.41 +#> [57] hms_1.1.3 bit64_4.5.2 +#> [59] systemfonts_1.1.0 GenomicFeatures_1.59.1 +#> [61] locfit_1.5-9.10 limma_3.63.2 +#> [63] jquerylib_0.1.4 glue_1.8.0 +#> [65] pkgdown_2.1.1.9000 codetools_0.2-20 +#> [67] stringi_1.8.4 UCSC.utils_1.3.0 +#> [69] tibble_3.2.1 pillar_1.10.0 +#> [71] rappdirs_0.3.3 htmltools_0.5.8.1 +#> [73] GenomeInfoDbData_1.2.13 dbplyr_2.5.0 +#> [75] httr2_1.0.7 R6_2.5.1 +#> [77] textshaping_0.4.1 evaluate_1.0.1 +#> [79] lattice_0.22-6 Biobase_2.67.0 +#> [81] png_0.1-8 Rsamtools_2.23.1 +#> [83] memoise_2.0.1 bslib_0.8.0 +#> [85] SparseArray_1.7.2 xfun_0.49 +#> [87] fs_1.6.5 MatrixGenerics_1.19.0 +#> [89] pkgconfig_2.0.3exportToGtf( grl, @@ -323,13 +327,13 @@
Session info
sessionInfo() -#> R version 4.4.1 (2024-06-14) -#> Platform: x86_64-apple-darwin20 -#> Running under: macOS Monterey 12.7.6 +#> R Under development (unstable) (2025-01-03 r87521) +#> Platform: aarch64-apple-darwin20 +#> Running under: macOS Sonoma 14.7.2 #> #> Matrix products: default -#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib -#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 +#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib +#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 #> #> locale: #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 @@ -343,66 +347,66 @@
Session info#> #> other attached packages: #> [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 -#> [2] BSgenome_1.73.1 -#> [3] rtracklayer_1.65.0 -#> [4] BiocIO_1.15.2 -#> [5] Biostrings_2.73.2 -#> [6] XVector_0.45.0 -#> [7] GenomicRanges_1.57.2 -#> [8] GenomeInfoDb_1.41.2 -#> [9] IRanges_2.39.2 -#> [10] S4Vectors_0.43.2 -#> [11] BiocGenerics_0.51.3 -#> [12] eisaR_1.19.0 -#> [13] BiocStyle_2.33.1 +#> [2] BSgenome_1.75.0 +#> [3] rtracklayer_1.67.0 +#> [4] BiocIO_1.17.1 +#> [5] Biostrings_2.75.3 +#> [6] XVector_0.47.1 +#> [7] GenomicRanges_1.59.1 +#> [8] GenomeInfoDb_1.43.2 +#> [9] IRanges_2.41.2 +#> [10] S4Vectors_0.45.2 +#> [11] BiocGenerics_0.53.3 +#> [12] generics_0.1.3 +#> [13] eisaR_1.19.0 +#> [14] BiocStyle_2.35.0 #> #> loaded via a namespace (and not attached): #> [1] tidyselect_1.2.1 dplyr_1.1.4 #> [3] blob_1.2.4 filelock_1.0.3 #> [5] bitops_1.0-9 fastmap_1.2.0 -#> [7] RCurl_1.98-1.16 BiocFileCache_2.13.2 -#> [9] GenomicAlignments_1.41.0 XML_3.99-0.17 +#> [7] RCurl_1.98-1.16 BiocFileCache_2.15.0 +#> [9] GenomicAlignments_1.43.0 XML_3.99-0.18 #> [11] digest_0.6.37 lifecycle_1.0.4 -#> [13] statmod_1.5.0 KEGGREST_1.45.1 -#> [15] RSQLite_2.3.7 magrittr_2.0.3 -#> [17] compiler_4.4.1 rlang_1.1.4 +#> [13] statmod_1.5.0 KEGGREST_1.47.0 +#> [15] RSQLite_2.3.9 magrittr_2.0.3 +#> [17] compiler_4.5.0 rlang_1.1.4 #> [19] sass_0.4.9 progress_1.2.3 -#> [21] tools_4.4.1 utf8_1.2.4 -#> [23] yaml_2.3.10 knitr_1.48 -#> [25] prettyunits_1.2.0 S4Arrays_1.5.11 -#> [27] bit_4.5.0 curl_5.2.3 -#> [29] DelayedArray_0.31.14 xml2_1.3.6 -#> [31] abind_1.4-8 BiocParallel_1.39.0 -#> [33] txdbmaker_1.1.2 desc_1.4.3 -#> [35] grid_4.4.1 fansi_1.0.6 -#> [37] edgeR_4.3.21 biomaRt_2.61.3 -#> [39] SummarizedExperiment_1.35.5 cli_3.6.3 -#> [41] rmarkdown_2.28 crayon_1.5.3 -#> [43] generics_0.1.3 ragg_1.3.3 -#> [45] httr_1.4.7 rjson_0.2.23 -#> [47] DBI_1.2.3 cachem_1.1.0 -#> [49] stringr_1.5.1 zlibbioc_1.51.2 -#> [51] parallel_4.4.1 AnnotationDbi_1.67.0 -#> [53] BiocManager_1.30.25 restfulr_0.0.15 -#> [55] matrixStats_1.4.1 vctrs_0.6.5 -#> [57] Matrix_1.7-1 jsonlite_1.8.9 -#> [59] bookdown_0.41 hms_1.1.3 -#> [61] bit64_4.5.2 systemfonts_1.1.0 -#> [63] GenomicFeatures_1.57.1 locfit_1.5-9.10 -#> [65] limma_3.61.12 jquerylib_0.1.4 -#> [67] glue_1.8.0 pkgdown_2.1.1.9000 -#> [69] codetools_0.2-20 stringi_1.8.4 -#> [71] UCSC.utils_1.1.0 tibble_3.2.1 -#> [73] pillar_1.9.0 rappdirs_0.3.3 -#> [75] htmltools_0.5.8.1 GenomeInfoDbData_1.2.13 -#> [77] dbplyr_2.5.0 httr2_1.0.5 -#> [79] R6_2.5.1 textshaping_0.4.0 -#> [81] evaluate_1.0.1 lattice_0.22-6 -#> [83] Biobase_2.65.1 png_0.1-8 -#> [85] Rsamtools_2.21.2 memoise_2.0.1 -#> [87] bslib_0.8.0 SparseArray_1.5.45 -#> [89] xfun_0.48 fs_1.6.4 -#> [91] MatrixGenerics_1.17.1 pkgconfig_2.0.3