Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using binning causes annotations to fail #46

Open
ijhoskins opened this issue Jan 22, 2025 · 6 comments
Open

Using binning causes annotations to fail #46

ijhoskins opened this issue Jan 22, 2025 · 6 comments

Comments

@ijhoskins
Copy link

I was using the default binning (bin.size=10) in LoadTrackFile with BAM file inputs, which caused the coordinates to be misannotated and led to the following error when adding a geom_transcript layer:

Error in if (nrow(data)) {: argument is of length zero
Traceback:

1. add_ggplot(e1, e2, e2name)
2. ggplot_add(object, p, objectname)
3. ggplot_add.transcript(object, p, objectname)
4. geom_arrows(gene.tx.df.exon, color.by, exon.size, arrow.length, 
 .     arrow.angle, arrow.type)
5. .handleSimpleError(function (cnd) 
 . {
 .     watcher$capture_plot_and_output()
 .     cnd <- sanitize_call(cnd)
 .     watcher$push(cnd)
 .     switch(on_error, continue = invokeRestart("eval_continue"), 
 .         stop = invokeRestart("eval_stop"), error = invokeRestart("eval_error", 
 .             cnd))
 . }, "argument is of length zero", base::quote(if (nrow(data)) {
 .     if (!"strand" %in% colnames(data)) {
 .         data$strand <- "+"
 .     }
 .     geom_segment(inherit.aes = TRUE, data = data, mapping = aes(x = .data[["start"]], 
 .         y = .data[["group"]], xend = .data[["end"]], yend = .data[["group"]], 
 .         color = .data[[color]]), arrow = arrow(ends = ifelse(data$strand == 
 .         "+", "last", "first"), angle = arrow_angle, length = unit(arrow_size, 
 .         "points"), type = arrow_type), lineend = "butt", linejoin = "mitre", 
 .         show.legend = FALSE, linewidth = line_width)

Setting bin.size=NULL fixed the issue.

@m-jahn
Copy link
Collaborator

m-jahn commented Jan 22, 2025

Hi Ian, can you provide a reproducible example?

@ijhoskins
Copy link
Author

Hi @m-jahn let me get a test dataset together from reads from a CCLE cell line that I can share.

@ijhoskins
Copy link
Author

Hi @m-jahn , see attached. I included a BAM intersected with EML4 and ALK exons as well as a filtered GTF. You can test with EML4 region "2:42396521-42488653"

I am using ggcoverage version 1.4.1.

ggcoverage_test_data.tar.gz

@m-jahn
Copy link
Collaborator

m-jahn commented Jan 23, 2025

thank you! Do you also happen to have the code chunk you were running that produced this error?

@m-jahn
Copy link
Collaborator

m-jahn commented Jan 23, 2025

I just assumed the params you used for loading your bam file and I can already confirm the bug with the coordinates. Also strand information seems to get missing. To be honest the LoadTrack function is quite cluttered and would need a major rework.

For reproducibility, this is what I tested with your file:

bam <- "~/Downloads/ggcoverage_test_data/SNU-324_EML4_ALK.intersect.bam"
gtf <- "~/Downloads/ggcoverage_test_data/EML4_ALK.filt.gtf"

meta_data <- data.frame(
  SampleName = "SNU-324_EML4_ALK.intersect",
  Type = "EML4",
  Group = "EML4"
)

track_df <- LoadTrackFile(
  track.file = bam,
  format = "bam",
  region = "2:42396521-42488653",
  norm.method = "None",
  bin.size = 10,
  meta.info = meta_data
)

head(track_df)

With the result:

  seqnames start  end width strand     score Type Group
1        2     1   10    10      *  0.000000 EML4  EML4
2        2  1961 1970    10      *  4.333333 EML4  EML4
3        2  1971 1980    10      * 10.500000 EML4  EML4
4        2  1981 1990    10      * 16.000000 EML4  EML4
5        2  1991 2000    10      * 30.333333 EML4  EML4
6        2  2001 2010    10      * 55.666667 EML4  EML4

I'll try to fix some of this. In the mean time I recommend you use the much simpler rtracklayer interface to parse bam files:
You can select your chromosome, calculate coverage and turn the result into a data.frame for plotting with ggcoverage.

rtracklayer::import(bam, format = "bam") %>% 
  coverage()

m-jahn added a commit to m-jahn/ggcoverage that referenced this issue Jan 31, 2025
@m-jahn
Copy link
Collaborator

m-jahn commented Jan 31, 2025

dear @ijhoskins
you can use the the latest version from this development branch https://github.com/m-jahn/ggcoverage/tree/bug_fixes
and see if it solves your issue. I have tested various fixes with your data and tried to make some functions more robust by adding educated guesses. Some defaults are hard-coded to examples that have no relevance for real users, e.g. the gene.name in geom_transcript.

With your example data I get now:

library(ggcoverage)

bam <- "/home/michael/Downloads/ggcoverage_test_data/SNU-324_EML4_ALK.intersect.bam"
gtf <- "/home/michael/Downloads/ggcoverage_test_data/EML4_ALK.filt.gtf"

meta_data <- data.frame(
  SampleName = "SNU-324_EML4_ALK.intersect",
  Type = "EML4",
  Group = "EML4"
)

# load bam file
track_df <- LoadTrackFile(
  track.file = bam,
  format = "bam",
  norm.method = "None",
  bin.size = NULL,
  meta.info = meta_data
)

# load genome annotation
gtf_gr <- rtracklayer::import.gff(con = gtf, format = "gtf")

# plot
ggcoverage(
  data = track_df,
  plot.type = "facet",
  range.position = "out",
  facet.key = "strand"
) +
  geom_gene(gtf.gr = gtf_gr, plot.height = 0.2) +
  geom_transcript(gtf.gr = gtf_gr, plot.height = 0.2)

Image

Note that the gene plot is not fully correct because there is no gene annotated in your GTF file. But it still plots with a warning instead of throwing a cryptic error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants