The goal of ggcoverage
is simplify the process of visualizing genomic
coverage. It contains three main parts:
- Load the data:
ggcoverage
can load bam, bigwig (.bw), bedgraph file from various NGS data, including RNA-seq, ChIP-seq, ATAC-seq, et al. - Create genomic coverage plot
- Add annotaions:
ggcoverage
supports four different annotaions:- gene annotaion: Visualize genomic coverage across whole gene
- transcription annotion: Visualize genomic coverage across different transcripts
- ideogram annotation: Visualize the region showing on whole chromosome
- peak annotation: Visualize genomic coverage and peak identified.
ggcoverage
utilizes ggplot2
plotting system, so its usage is
ggplot2-style!
ggcoverage
is an R package distributed as part of the
Bioconductor project and
CRAN. To install the package, start R and
enter:
# install via Bioconductor
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("ggcoverage")
# install via CRAN
install.package("ggcoverage")
# install via Github
# install.package("remotes") #In case you have not installed it.
remotes::install_github("showteeth/ggcoverage")
Once ggcoverage
is installed, it can be loaded by the following
command.
library("rtracklayer")
library("ggcoverage")
The RNA-seq data used here are from Transcription profiling by high throughput sequencing of HNRNPC knockdown and control HeLa cells, we select four sample to use as example: ERR127307_chr14, ERR127306_chr14, ERR127303_chr14, ERR127302_chr14, and all bam files are converted to bigwig file with deeptools.
Load metadata:
# load metadata
meta.file <- system.file("extdata", "RNA-seq", "meta_info.csv", package = "ggcoverage")
sample.meta = read.csv(meta.file)
sample.meta
#> SampleName Type Group
#> 1 ERR127302_chr14 KO_rep1 KO
#> 2 ERR127303_chr14 KO_rep2 KO
#> 3 ERR127306_chr14 WT_rep1 WT
#> 4 ERR127307_chr14 WT_rep2 WT
Load track files:
# track folder
track.folder = system.file("extdata", "RNA-seq", package = "ggcoverage")
# load bigwig file
track.df = LoadTrackFile(track.folder = track.folder, format = "bw",
meta.info = sample.meta)
# check data
head(track.df)
#> seqnames start end score Type Group
#> 1 chr14 21572751 21630650 0 KO_rep1 KO
#> 2 chr14 21630651 21630700 1 KO_rep1 KO
#> 3 chr14 21630701 21630800 4 KO_rep1 KO
#> 4 chr14 21630801 21657350 0 KO_rep1 KO
#> 5 chr14 21657351 21657450 1 KO_rep1 KO
#> 6 chr14 21657451 21663550 0 KO_rep1 KO
Prepare mark region:
# create mark region
mark.region=data.frame(start=c(21678900,21732001,21737590),
end=c(21679900,21732400,21737650),
label=c("M1", "M2", "M3"))
# check data
mark.region
#> start end label
#> 1 21678900 21679900 M1
#> 2 21732001 21732400 M2
#> 3 21737590 21737650 M3
Load GTF file:
gtf.file = system.file("extdata", "used_hg19.gtf", package = "ggcoverage")
gtf.gr = rtracklayer::import.gff(con = gtf.file, format = 'gtf')
basic.coverage = ggcoverage(data = track.df, color = "auto",
mark.region = mark.region, range.position = "out")
basic.coverage
You can also change Y axis style:
basic.coverage = ggcoverage(data = track.df, color = "auto",
mark.region = mark.region, range.position = "in")
basic.coverage
basic.coverage +
geom_gene(gtf.gr=gtf.gr)
basic.coverage +
geom_transcript(gtf.gr=gtf.gr,label.vjust = 1.5)
basic.coverage +
geom_gene(gtf.gr=gtf.gr) +
geom_ideogram(genome = "hg19",plot.space = 0)
#> [1] "hg19"
#> Loading ideogram...
#> Loading ranges...
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
basic.coverage +
geom_transcript(gtf.gr=gtf.gr,label.vjust = 1.5) +
geom_ideogram(genome = "hg19",plot.space = 0)
#> [1] "hg19"
#> Loading ideogram...
#> Loading ranges...
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
The ChIP-seq data used here are from DiffBind, I select four sample to use as example: Chr18_MCF7_input, Chr18_MCF7_ER_1, Chr18_MCF7_ER_3, Chr18_MCF7_ER_2, and all bam files are converted to bigwig file with deeptools.
Create metadata:
# load metadata
sample.meta = data.frame(SampleName=c('Chr18_MCF7_ER_1','Chr18_MCF7_ER_2','Chr18_MCF7_ER_3','Chr18_MCF7_input'),
Type = c("MCF7_ER_1","MCF7_ER_2","MCF7_ER_3","MCF7_input"),
Group = c("IP", "IP", "IP", "Input"))
sample.meta
#> SampleName Type Group
#> 1 Chr18_MCF7_ER_1 MCF7_ER_1 IP
#> 2 Chr18_MCF7_ER_2 MCF7_ER_2 IP
#> 3 Chr18_MCF7_ER_3 MCF7_ER_3 IP
#> 4 Chr18_MCF7_input MCF7_input Input
Load track files:
# track folder
track.folder = system.file("extdata", "ChIP-seq", package = "ggcoverage")
# load bigwig file
track.df = LoadTrackFile(track.folder = track.folder, format = "bw",
meta.info = sample.meta)
# check data
head(track.df)
#> seqnames start end score Type Group
#> 1 chr18 76799701 76800000 439.316 MCF7_ER_1 IP
#> 2 chr18 76800001 76800300 658.974 MCF7_ER_1 IP
#> 3 chr18 76800301 76800600 219.658 MCF7_ER_1 IP
#> 4 chr18 76800601 76800900 658.974 MCF7_ER_1 IP
#> 5 chr18 76800901 76801200 0.000 MCF7_ER_1 IP
#> 6 chr18 76801201 76801500 219.658 MCF7_ER_1 IP
Prepare mark region:
# create mark region
mark.region=data.frame(start=c(76822533),
end=c(76823743),
label=c("Promoter"))
# check data
mark.region
#> start end label
#> 1 76822533 76823743 Promoter
basic.coverage = ggcoverage(data = track.df, color = "auto", region = "chr18:76822285-76900000",
mark.region=mark.region, show.mark.label = FALSE)
basic.coverage
Add gene, ideogram and peak annotaions. To create peak
annotaion, we first get consensus peaks with
MSPC, you can also use
DEbChIP’s GetConsensusPeak
(MSPC
’s wrapper) to do this.
# get consensus peak file
peak.file = system.file("extdata", "ChIP-seq", "consensus.peak", package = "ggcoverage")
basic.coverage +
geom_gene(gtf.gr=gtf.gr) +
geom_peak(bed.file = peak.file) +
geom_ideogram(genome = "hg19",plot.space = 0)
#> [1] "hg19"
#> Loading ideogram...
#> Loading ranges...
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
Please note that the ggcoverage
project is released with a
Contributor Code of
Conduct.
By contributing to this project, you agree to abide by its terms.