Skip to content

Added solutions to chp07.md #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 59 additions & 7 deletions chp07.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,7 @@ dir(folder) # will show the contents of the folder

**solution:**
```{r,echo=FALSE,eval=FALSE}

folder=(system.file(package="QuasR", "extdata"))
library(Rqc)
folder = system.file(package="ShortRead", "extdata/E-MTAB-1147")

# feeds fastq.qz files in "folder" to quality check function
qcRes=rqc(path = folder, pattern = "^chip", openBrowser=FALSE)
rqcCycleQualityBoxPlot(qcRes)

Expand All @@ -26,15 +21,72 @@ rqcCycleQualityBoxPlot(qcRes)

**solution:**
```{r,echo=FALSE,eval=FALSE}
#coming soon
# Since preprocessReads() doesn't trim by score, you must look at the QC report above (see previous exercise) to determine number of bases to trim. Looking at the previous plot, the last base (#36) is often low-quality (<20) for both samples, while the last 3 bases are marginal in chip_2_1.fq. So, the truncateEndBases argument was set to 3.

# sample files
infiles <- system.file(package="QuasR", "extdata",
c("chip_1_1.fq.bz2","chip_2_1.fq.bz2"))
outfiles <- paste(pattern=c("chip_1_1.","chip_2_1."),"trim3",".fastq",sep="")
unlink(outfiles) # this removes old files, which allows next command to work
preprocessReads(infiles, outfiles, truncateEndBases = 3)

# Alternative Approach: Instead of trimming all reads to the same extent with preprocessReads(), the ShortRead::trimTails() function can trim *individual* reads based on their quality scores.
library(ShortRead)
# Unfortunately, ShortRead:readFastq can't read in .bz2 files!
# readFastq(dirPath = "extdata/", pattern = "chip_._1.fq.bz2")

# but preprocessReads() can read in .fq.bz2 files and output reads into .fastq format
outfiles <- paste(pattern=c("chip_1_1.","chip_2_1."),"fastq",sep="")
unlink(outfiles) # this removes old files, which allows next command to work
preprocessReads(infiles, outfiles) # we want the original reads, so no trimming
# now read each .fastq file into a ShortReadQ object
fq1 = readFastq(outfiles[1]) # here's the first one
fq1 # 2597 reads originally
# use ShortRead::trimTails to trim each read separately; for help, run ?TrimTails
fq1_1 <- trimTails(fq1, 1, "4") # k = 1, a = "4"; trim at first base with quality < 20
# ASCII of "4" is 52, minus 33 equals quality of 19
fq1_1 # 2361 reads after this trimming
# in contrast, ShortRead::trimTailw uses a window to trim; here trim where 2/5 bases <20
fq1_2w <- trimTailw(fq1, 2, "4", 2) # here, window is set to trim where 2/5 bases <20
fq1_2w # 2151 reads after this trimming

```

3. Align the trimmed and untrimmed reads using `QuasR` and plot alignment statistics, did the trimming improve alignments? [Difficulty: **Intermediate/Advanced**]

**solution:**
```{r,echo=FALSE,eval=FALSE}
#coming soon
# Mapping/aligning reads to the genome
# IF "extdata" folder is missing from current working directory, uncomment next line to copy example data
# file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)

# path to genome file in fasta format
genomeFile <- "extdata/hg19sub.fa"

# alignment with UNTRIMMED reads
# text file with list of fastq file paths and sample names (for grouping)
sampleFile <- "extdata/samples_chip_single.txt"

# check names of files in this list:
read.delim(sampleFile)
# create alignments
proj_trim <- qAlign(sampleFile, genomeFile)
proj_trim
# save a quality control report to the "qc_report_untrim.pdf" file using the qQCReport function
qQCReport(proj_trim, "qc_report_untrim.pdf")

# alignment with TRIMMED reads
# duplicate "samples_chip_single.txt" file, rename as "samples_trimmed.txt", move to folder with trimmed files
# and edit FileNames with file paths to trimmed files and sample names (for grouping)
sampleFile <- "samples_trimmed.txt"
# check that FileNames in this list are for trimmed files:
read.delim(sampleFile)
# create alignments
proj_trim <- qAlign(sampleFile, genomeFile)
# save a quality control report to the "qc_report_trim.pdf" file using the qQCReport function
qQCReport(proj_trim, "qc_report_trim.pdf")

```

*Comparing the qc_report for the sequences trimmed by 3 bases and the qc_report for the untrimmed sequences, this trimming didn’t really improve the alignments significantly. For example, with both the untrimmed and trimmed reads, only 88-90% mapped to the genome (p. 4 of qc_report).*