-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
344 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,17 @@ | ||
Package: ggcoverage | ||
Type: Package | ||
Title: Visualize Genome Coverage with Various Annotations | ||
Version: 1.1.0 | ||
Title: Visualize Genome/Protein Coverage with Various Annotations | ||
Version: 1.2.0 | ||
Authors@R: | ||
person(given = "Yabing", | ||
family = "Song", | ||
role = c("aut", "cre"), | ||
email = "[email protected]") | ||
Maintainer: Yabing Song <[email protected]> | ||
Description: The goal of 'ggcoverage' is to simplify the process of visualizing genome coverage. It contains functions to | ||
load data from BAM, BigWig or BedGraph files, create genome coverage plot, add various annotations to | ||
the coverage plot, including base and amino acid annotation, GC annotation, gene annotation, transcript annotation, ideogram annotation and peak annotation. | ||
Description: The goal of 'ggcoverage' is to simplify the process of visualizing genome/protein coverage. It contains functions to | ||
load data from BAM, BigWig, BedGraph or txt/xlsx files, create genome/protein coverage plot, add various annotations to | ||
the coverage plot, including base and amino acid annotation, GC annotation, gene annotation, transcript annotation, ideogram annotation, | ||
peak annotation, contact map annotation, link annotation and peotein feature annotation. | ||
License: MIT + file LICENSE | ||
Encoding: UTF-8 | ||
RoxygenNote: 7.1.1 | ||
|
@@ -45,7 +46,10 @@ Imports: | |
ggforce, | ||
HiCBricks, | ||
ggpattern, | ||
BiocParallel | ||
BiocParallel, | ||
openxlsx, | ||
stringr, | ||
ggpp | ||
Suggests: | ||
rmarkdown, | ||
knitr, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
#' Layer for Protein Coverage Plot. | ||
#' | ||
#' @param coverage.file Exported protein coverage file, should be in excel. | ||
#' @param fasta.file Input reference protein fasta file. | ||
#' @param protein.id The protein ID of exported coverage file. This should be unique and in \code{fasta.file}. | ||
#' @param XCorr.threshold The cross-correlation threshold. Default: 2. | ||
#' @param confidence The confidence level. Default: High. | ||
#' @param contaminant Whether to remove contaminant peptides. Default: NULL (not remove). | ||
#' @param remove.na Logical value, whether to remove NA value in Abundance column. Default: TRUE. | ||
#' @param color The fill color of coverage plot. Default: grey. | ||
#' @param mark.bare Logical value, whether to mark region where Abundance is zero or NA. Default: TRUE. | ||
#' @param mark.color The color used for the marked region. Default: red. | ||
#' @param mark.alpha The transparency used for the marked region. Default: 0.5. | ||
#' @param show.table Logical value, whether to show coverage summary table. Default: TRUE. | ||
#' @param table.position The position of the coverage summary table, choose from right_top, left_top, left_bottom, right_bottom. | ||
#' Default: right_top. | ||
#' @param table.size The font size of coverage summary table. Default: 4. | ||
#' @param table.color The font color of coverage summary table. Default: black. | ||
#' @param range.size The label size of range text, used when \code{range.position} is in. Default: 3. | ||
#' @param range.position The position of y axis range, chosen from in (move y axis in the plot) and | ||
#' out (normal y axis). Default: in. | ||
#' | ||
#' @return A ggplot2 object. | ||
#' @importFrom openxlsx read.xlsx | ||
#' @importFrom magrittr %>% | ||
#' @importFrom dplyr filter group_by summarise arrange | ||
#' @importFrom rlang .data | ||
#' @importFrom Biostrings readAAStringSet | ||
#' @importFrom stringr str_locate | ||
#' @importFrom GenomicRanges reduce GRanges setdiff | ||
#' @importFrom IRanges IRanges | ||
#' @importFrom ggplot2 ggplot geom_rect geom_text aes aes_string scale_x_continuous | ||
#' @importFrom ggpp annotate | ||
#' @importFrom scales scientific | ||
#' @export | ||
#' | ||
#' @examples | ||
#' # library(ggplot2) | ||
#' # library(ggcoverage) | ||
#' # coverage.file <- system.file("extdata", "Proteomics", "MS_BSA_coverage.xlsx", package = "ggcoverage") | ||
#' # fasta.file <- system.file("extdata", "Proteomics", "MS_BSA_coverage.fasta", package = "ggcoverage") | ||
#' # protein.id = "sp| | ||
#' # ggplot() + | ||
#' # geom_peptide(coverage.file = coverage.file, fasta.file = fasta.file, protein.id = protein.id) | ||
geom_protein = function(coverage.file, fasta.file, protein.id, XCorr.threshold = 2, | ||
confidence = "High", contaminant = NULL, remove.na = TRUE, | ||
color = "grey", mark.bare = TRUE, mark.color = "red", mark.alpha = 0.5, | ||
show.table = TRUE, table.position = c("right_top", "left_top", "left_bottom", "right_bottom"), | ||
table.size = 4, table.color = "black", range.size = 3, range.position = c("in", "out")){ | ||
# check parameters | ||
table.position <- match.arg(arg = table.position) | ||
range.position <- match.arg(arg = range.position) | ||
|
||
# load coverage dataframe | ||
coverage.df = openxlsx::read.xlsx(coverage.file) | ||
# remove suffix and prefix string | ||
coverage.df$Annotated.Sequence = gsub(pattern = ".*\\.(.*)\\..*", replacement = "\\1", x = coverage.df$Annotated.Sequence) | ||
# filter converge according to confidence | ||
if(!is.null(confidence)){ | ||
coverage.df = coverage.df[coverage.df[, "Confidence"] == confidence, ] | ||
} | ||
# filter converge according to contaminant | ||
if(!is.null(contaminant)){ | ||
coverage.df = coverage.df[coverage.df[, "Contaminant"] == contaminant, ] | ||
} | ||
# filter converge according to cross-correlation | ||
if(!is.null(XCorr.threshold)){ | ||
xcorr.index = grep(pattern = "XCorr", x = colnames(coverage.df)) | ||
coverage.df = coverage.df[coverage.df[, xcorr.index] >= XCorr.threshold, ] | ||
} | ||
# get abundance cols | ||
abundance.col = grep(pattern = "Abundance", x = colnames(coverage.df), value = TRUE) | ||
# remove na abundance | ||
if(remove.na){ | ||
coverage.df = coverage.df %>% dplyr::filter(!is.na(.data[[abundance.col]])) | ||
} | ||
# sum abundance of duplicated Annotated.Sequence | ||
coverage.df = coverage.df %>% | ||
dplyr::group_by(.data[["Annotated.Sequence"]]) %>% | ||
dplyr::summarise(Abundance = sum(.data[[abundance.col]])) %>% | ||
as.data.frame() | ||
colnames(coverage.df) = c("peptide", "abundance") | ||
# check the coverage dataframe | ||
if(nrow(coverage.df) == 0){ | ||
stop("There is no valid peptide, please check!") | ||
} | ||
|
||
# load genome fasta | ||
aa.set = Biostrings::readAAStringSet(fasta.file) | ||
protein.index = which(names(aa.set) == protein.id) | ||
if(length(protein.index) == 1){ | ||
aa.set.used = aa.set[protein.index] | ||
aa.seq.used = paste(aa.set.used) | ||
}else if(length(protein.index) > 1){ | ||
stop("Please check the protein.id you provided, there is more than one in provided fasta file!") | ||
}else{ | ||
stop("Please check the protein.id you provided, it can't be found in provided fasta file!") | ||
} | ||
|
||
# get the region | ||
aa.anno.region = sapply(coverage.df$peptide, function(x){ | ||
stringr::str_locate(pattern =x, aa.seq.used) | ||
}) %>% t() %>% as.data.frame() | ||
colnames(aa.anno.region) = c("start", "end") | ||
|
||
# merge | ||
coverage.final = merge(coverage.df, aa.anno.region, by.x = "peptide", by.y = 0, all.x = TRUE) | ||
coverage.final = coverage.final %>% dplyr::arrange(.data[["start"]], .data[["end"]]) | ||
|
||
# get coverage positions | ||
coverage.pos = | ||
GenomicRanges::reduce(GenomicRanges::GRanges(protein.id, IRanges::IRanges(coverage.final$start, coverage.final$end))) %>% | ||
as.data.frame() | ||
coverage.pos$strand = NULL | ||
colnames(coverage.pos) = c("ProteinID", "start", "end", "width") | ||
coverage.pos$Type = "covered" | ||
# get coverage rate | ||
coverage.rate = round(sum(coverage.pos$width)*100/nchar(aa.seq.used), 2) | ||
# non-cover position | ||
non.coverage.pos = | ||
GenomicRanges::setdiff(GenomicRanges::GRanges(protein.id, IRanges::IRanges(1, nchar(aa.seq.used))), | ||
GenomicRanges::GRanges(protein.id, IRanges::IRanges(coverage.final$start, coverage.final$end))) %>% | ||
as.data.frame() | ||
non.coverage.pos$strand = NULL | ||
colnames(non.coverage.pos) = c("ProteinID", "start", "end", "width") | ||
non.coverage.pos$Type = "bare" | ||
# coverage summary | ||
coverage.summary = rbind(coverage.pos, non.coverage.pos) %>% as.data.frame() | ||
|
||
# coverage rect | ||
coverage.rect = geom_rect(data = coverage.final, mapping = aes_string(xmin = "start", xmax = "end", | ||
ymin = "0", ymax = "abundance"), | ||
show.legend = FALSE, fill = color) | ||
plot.ele <- list(coverage.rect) | ||
# mark bare | ||
if(mark.bare){ | ||
bare.rect = geom_rect(data = non.coverage.pos, mapping = aes_string(xmin = "start", xmax = "end", | ||
ymin = "0", ymax = "Inf"), | ||
show.legend = F, fill = mark.color, alpha = mark.alpha) | ||
plot.ele <- append(plot.ele, bare.rect) | ||
} | ||
# summary table | ||
if(show.table){ | ||
# table position | ||
if(table.position == "left_top"){ | ||
table.x = 0 | ||
table.y = max(coverage.final[ , "abundance"]) | ||
}else if(table.position == "right_top"){ | ||
table.x = nchar(aa.seq.used) | ||
table.y = max(coverage.final[ , "abundance"]) | ||
}else if(table.position == "left_bottom"){ | ||
table.x = 0 | ||
table.y = 0 | ||
}else if(table.position == "right_bottom"){ | ||
table.x = nchar(aa.seq.used) | ||
table.y = 0 | ||
} | ||
summary.table = ggpp::annotate(geom = "table", label = list(coverage.summary), x= table.x, y=table.y, | ||
color = table.color, size = table.size) | ||
plot.ele <- append(plot.ele, summary.table) | ||
} | ||
# range position | ||
if (range.position == "in") { | ||
# prepare range | ||
max.abundance = CeilingNumber(max(coverage.final$abundance)) | ||
abundance.range = data.frame(label = paste0("[0, ", scales::scientific(max.abundance, digits = 2), "]")) | ||
range.text = geom_text( | ||
data = abundance.range, | ||
mapping = aes(x = -Inf, y = Inf, label = label), | ||
hjust = 0, | ||
vjust = 1.5, | ||
size = range.size | ||
) | ||
plot.ele <- append(plot.ele, range.text) | ||
} | ||
# change x scale | ||
plot.ele <- append(plot.ele, scale_x_continuous(limits = c(1, nchar(aa.seq.used)), expand = c(0, 0))) | ||
return(plot.ele) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
#' Create Mass Spectrometry Protein Coverage Plot. | ||
#' | ||
#' @param coverage.file Exported protein coverage file, should be in excel. | ||
#' @param fasta.file Input reference protein fasta file. | ||
#' @param protein.id The protein ID of exported coverage file. This should be unique and in \code{fasta.file}. | ||
#' @param XCorr.threshold The cross-correlation threshold. Default: 2. | ||
#' @param confidence The confidence level. Default: High. | ||
#' @param contaminant Whether to remove contaminant peptides. Default: NULL (not remove). | ||
#' @param remove.na Logical value, whether to remove NA value in Abundance column. Default: TRUE. | ||
#' @param color The fill color of coverage plot. Default: grey. | ||
#' @param mark.bare Logical value, whether to mark region where Abundance is zero or NA. Default: TRUE. | ||
#' @param mark.color The color used for the marked region. Default: red. | ||
#' @param mark.alpha The transparency used for the marked region. Default: 0.5. | ||
#' @param show.table Logical value, whether to show coverage summary table. Default: TRUE. | ||
#' @param table.position The position of the coverage summary table, choose from right_top, left_top, left_bottom, right_bottom. | ||
#' Default: right_top. | ||
#' @param table.size The font size of coverage summary table. Default: 4. | ||
#' @param table.color The font color of coverage summary table. Default: black. | ||
#' @param range.size The label size of range text, used when \code{range.position} is in. Default: 3. | ||
#' @param range.position The position of y axis range, chosen from in (move y axis in the plot) and | ||
#' out (normal y axis). Default: in. | ||
#' | ||
#' @return A ggplot2 object. | ||
#' @importFrom openxlsx read.xlsx | ||
#' @importFrom magrittr %>% | ||
#' @importFrom dplyr filter group_by summarise arrange | ||
#' @importFrom rlang .data | ||
#' @importFrom Biostrings readAAStringSet | ||
#' @importFrom stringr str_locate | ||
#' @importFrom GenomicRanges reduce GRanges setdiff | ||
#' @importFrom IRanges IRanges | ||
#' @importFrom ggplot2 ggplot geom_rect geom_text aes aes_string scale_x_continuous theme_classic theme | ||
#' element_blank annotate rel scale_y_continuous expansion | ||
#' @importFrom ggpp annotate | ||
#' @importFrom scales scientific | ||
#' @export | ||
#' | ||
#' @examples | ||
#' # library(ggcoverage) | ||
#' # coverage.file <- system.file("extdata", "Proteomics", "MS_BSA_coverage.xlsx", package = "ggcoverage") | ||
#' # fasta.file <- system.file("extdata", "Proteomics", "MS_BSA_coverage.fasta", package = "ggcoverage") | ||
#' # protein.id = "sp|P02769|ALBU_BOVIN" | ||
#' # ggprotein(coverage.file = coverage.file, fasta.file = fasta.file, protein.id = protein.id) | ||
ggprotein = function(coverage.file, fasta.file, protein.id, XCorr.threshold = 2, | ||
confidence = "High", contaminant = NULL, remove.na = TRUE, | ||
color = "grey", mark.bare = TRUE, mark.color = "red", mark.alpha = 0.5, | ||
show.table = TRUE, table.position = c("right_top", "left_top", "left_bottom", "right_bottom"), | ||
table.size = 4, table.color = "black", range.size = 3, range.position = c("in", "out"), plot.space = 0.2){ | ||
# check parameters | ||
table.position <- match.arg(arg = table.position) | ||
range.position <- match.arg(arg = range.position) | ||
|
||
# ms protein plot | ||
protein.plot = ggplot() + | ||
geom_protein(coverage.file = coverage.file, fasta.file = fasta.file, protein.id = protein.id, | ||
XCorr.threshold = XCorr.threshold, confidence = confidence, contaminant = contaminant, | ||
remove.na = remove.na, color = color, mark.bare = mark.bare, mark.color = mark.color, | ||
mark.alpha = mark.alpha, show.table = show.table, table.position = table.position, | ||
table.size = table.size, table.color = table.color, range.size = range.size, range.position = range.position) | ||
|
||
# add theme | ||
if (range.position == "in") { | ||
protein.plot + | ||
theme_protein() | ||
} else if (range.position == "out") { | ||
protein.plot + | ||
theme_protein2() | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.