Skip to content

Commit

Permalink
version update for LRSDAY: v1.1.0 -> v1.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
yjx1217 committed Oct 15, 2018
1 parent 819b84e commit 89976b0
Show file tree
Hide file tree
Showing 86 changed files with 64,238 additions and 26,500 deletions.
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]

## [1.2.0] - 2018-10-15
### Added
- Support for adapter trimming for Nanopore reads (via Porechop).
- Support for long-read filtering based on both quality and length (via Filtlong).
- Support for long-read-based polishing for PacBio and Nanopore reads (via Quiver/Arrow for PacBio reads and nanopolish for Nanopore reads).
- Support for the bax2bam format conversion for the PacBio RSII reads to make it compatible with PacBio's current SMRT pipeline.
- Support for dedicated mitochondrial gene annotation (via Mfannot).
### Changed
- Treat nuclear genome and mitochondrial genome separately in the annotation phase.
- Better robustness for various processing scripts.
- Software version or downloading URL updates for a number of dependencies.
### Fixed
- Typos in the manual.

## [1.1.0] - 2018-07-11
### Added
- This change log file: Changelog.md
Expand Down
Binary file added Example_Outputs/SK1.assembly.final.fa.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added Example_Outputs/SK1.assembly.final.filter.pdf
Binary file not shown.
17 changes: 17 additions & 0 deletions Example_Outputs/SK1.assembly.final.stats.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
total sequence count: 33
total sequence length: 12490496
min sequence length: 1248
max sequence length: 1480288
mean sequence length: 378499.88
median sequence length: 84643.00
N50: 923711
L50: 6
N90: 341493
L90: 14
A%: 30.88
T%: 30.79
G%: 19.13
C%: 19.16
AT%: 61.67
GC%: 38.29
N%: 0.04
11,750 changes: 11,750 additions & 0 deletions Example_Outputs/SK1.final.cds.fa

Large diffs are not rendered by default.

Binary file removed Example_Outputs/SK1.final.cds.fa.gz
Binary file not shown.
Binary file removed Example_Outputs/SK1.final.fa.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed Example_Outputs/SK1.final.filter.pdf
Binary file not shown.
50,286 changes: 25,200 additions & 25,086 deletions Example_Outputs/SK1.final.gff3

Large diffs are not rendered by default.

479 changes: 245 additions & 234 deletions Example_Outputs/SK1.final.manual_check.list

Large diffs are not rendered by default.

11,750 changes: 11,750 additions & 0 deletions Example_Outputs/SK1.final.pep.fa

Large diffs are not rendered by default.

Binary file removed Example_Outputs/SK1.final.pep.fa.gz
Binary file not shown.
17 changes: 0 additions & 17 deletions Example_Outputs/SK1.final.stats.txt

This file was deleted.

11,750 changes: 11,750 additions & 0 deletions Example_Outputs/SK1.final.trimmed_cds.fa

Large diffs are not rendered by default.

Binary file removed Example_Outputs/SK1.final.trimmed_cds.fa.gz
Binary file not shown.
17 changes: 0 additions & 17 deletions Example_Outputs/SK1.final.trimmed_cds.log

This file was deleted.

2 changes: 1 addition & 1 deletion LICENSE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2017 Jia-Xing Yue
Copyright (c) 2018 Jia-Xing Yue

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
Binary file added LRSDAY_flowchart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Manual.pdf
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash
set -e -o pipefail
#######################################
# load environment variables for LRSDAY
source ./../../env.sh

#######################################
# set project-specific variables

prefix="YGL3210" # The file name prefix for the output files
reads="./../00.Long_Reads/YGL3210.fq.gz" # The file path of the long reads file (in fastq or fastq.gz format).
reads_type="nanopore-raw" # The long reads data type: "pacbio-raw" or "pacbio-corrected" or "nanopore-raw" or "nanopore-corrected".
run_filtering="yes" # Whether to filter the reads: "yes" or "no". Default = "yes".
genome_size="12500000" # The haploid genome size (in bp) of sequenced organism. Default = "12500000" (i.e. 12.5 Mb for the budding yeast S. cereviaie genome). This is used to calculate targeted sequencing coverage after read filtering (see below).
post_filtering_coverage="30" # Targeted sequencing coverage after read filtering. Default = "30" (i.e. 30x coverage).
threads=1 # The number of threads to use. Default = "1".

#######################################
# process the pipeline

filtlong_target_bases=$(($genome_size * $post_filtering_coverage))
echo ""
echo "genome_size=$genome_size, post_filtering_coverage=$post_filtering_coverage, filtlong_target_bases=$filtlong_target_bases"
echo ""
if [[ "$reads_type" == "nanopore-raw" || "$reads_type" == "nanopore-corrected" ]]
then
$porechop_dir/porechop -i $reads -o $prefix.porechop.fastq.gz --threads $threads > $prefix.porechop.summary.txt
if [[ "$run_filtering" == "yes" ]]
then
$filtlong_dir/filtlong --min_length 1000 --keep_percent 90 --target_bases $filtlong_target_bases $prefix.porechop.fastq.gz | gzip > $prefix.filtlong.fastq.gz
fi
else
if [[ "$run_filtering" == "yes" ]]
then
$filtlong_dir/filtlong --min_length 1000 --keep_percent 90 --target_bases $filtlong_target_bases $reads | gzip > $prefix.filtlong.fastq.gz
fi
fi

############################
# checking bash exit status
if [[ $? -eq 0 ]]
then
echo ""
echo "LRSDAY message: This bash script has been successfully processed! :)"
echo ""
echo ""
exit 0
fi
############################
41 changes: 41 additions & 0 deletions Project_Template/00.Long_Reads/LRSDAY.00.PacBio.RSII_bax2bam.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/bin/bash
set -e -o pipefail
#######################################
# load environment variables for LRSDAY
source ./../../env.sh

#######################################
# set project-specific variables

prefix="SK1.SMRTCell.1" # The file name prefix for the output files. For the testing example, run this script four times with the prefix of "SK1.SMRTCell.1", "SK1.SMRTCell.2", "SK1.SMRTCell.3", and "SK1.SMRTCell.4" respectively.
pacbio_RSII_bax_fofn_file="./pacbio_fofn_files/SK1.SMRTCell.1.RSII_bax.fofn" # The fofn file containing file paths to the PacBio RSII bax reads from the same SMRT cell. If you have data from multiple SMRT cells, please run this script sepearately for each of them. Do not mix reads from different SMRT cells even though they come from the same sample. For the testing example, you can set pacbio_RSII_bax_fofn_file="./pacbio_fofn_files/$prefix.RSII_bax.fofn" to let this parameter to be automatically set up based on the prefix parameter.

#######################################
# process the pipeline

source $miniconda2_dir/activate $conda_pacbio_dir/../../conda_pacbio_env
$conda_pacbio_dir/bax2bam \
--fofn=$pacbio_RSII_bax_fofn_file \
-o ./pacbio_fofn_files/$prefix.bax2bam \
--subread \
--pulsefeatures=DeletionQV,DeletionTag,InsertionQV,IPD,MergeQV,SubstitutionQV,PulseWidth,SubstitutionTag

cd pacbio_fofn_files
rm $prefix.bax2bam.scraps.bam
rm $prefix.bax2bam.scraps.bam.pbi
echo $(pwd)/$prefix.bax2bam.subreads.bam > $prefix.bam.fofn
cd ..



############################
# checking bash exit status
if [[ $? -eq 0 ]]
then
echo ""
echo "LRSDAY message: This bash script has been successfully processed! :)"
echo ""
echo ""
exit 0
fi
############################
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,71 @@ source ./../../env.sh

#######################################
# set project-specific variables
file_name="SK1.filtered_subreads.bam" # the name of the ENA bam file
file_url="ftp://ftp.sra.ebi.ac.uk/vol1/ERZ448/ERZ448251/SK1.filtered_subreads.bam" # the URL of the ENA bam file
prefix="SK1" # file name prefix for output files
file_name="SK1.filtered_subreads.bam" # The name of the ENA bam file for the testing example.
file_url="ftp://ftp.sra.ebi.ac.uk/vol1/ERZ448/ERZ448251/SK1.filtered_subreads.bam" # The URL of the ENA bam file for the testing example.
prefix="SK1" # The file name prefix for output files of the testing example.

#######################################
# process the pipeline

echo "download the bam file from the ENA database"
echo "download the bam file from the ENA database ..."
wget $file_url
echo "bam2fastq ..."
$bedtools_dir/bedtools bamtofastq -i $file_name -fq $prefix.filtered_subreads.fastq
echo "gzip fastq ..."
gzip $prefix.filtered_subreads.fastq
rm $file_name

cd pacbio_fofn_files
echo "download the metadata and raw PacBio reads in .h5 format ..."

wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150811_092723_00127_c100844062550000001823187612311514_s1_p0.metadata.xml
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150814_233250_00127_c100823152550000001823177111031542_s1_p0.metadata.xml
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA535/ERA535258/pacbio_hdf5/m150911_220012_00127_c100861772550000001823190702121671_s1_p0.metadata.xml
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150813_110541_00127_c100823112550000001823177111031581_s1_p0.metadata.xml

if [[ ! -d Analysis_Results ]]
then
mkdir Analysis_Results
fi

cd Analysis_Results
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150811_092723_00127_c100844062550000001823187612311514_s1_p0.1.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150811_092723_00127_c100844062550000001823187612311514_s1_p0.2.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150811_092723_00127_c100844062550000001823187612311514_s1_p0.3.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150811_092723_00127_c100844062550000001823187612311514_s1_p0.bas.h5

wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150814_233250_00127_c100823152550000001823177111031542_s1_p0.1.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150814_233250_00127_c100823152550000001823177111031542_s1_p0.2.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150814_233250_00127_c100823152550000001823177111031542_s1_p0.3.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150814_233250_00127_c100823152550000001823177111031542_s1_p0.bas.h5

wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA535/ERA535258/pacbio_hdf5/m150911_220012_00127_c100861772550000001823190702121671_s1_p0.1.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA535/ERA535258/pacbio_hdf5/m150911_220012_00127_c100861772550000001823190702121671_s1_p0.2.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA535/ERA535258/pacbio_hdf5/m150911_220012_00127_c100861772550000001823190702121671_s1_p0.3.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA535/ERA535258/pacbio_hdf5/m150911_220012_00127_c100861772550000001823190702121671_s1_p0.bas.h5

wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150813_110541_00127_c100823112550000001823177111031581_s1_p0.1.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150813_110541_00127_c100823112550000001823177111031581_s1_p0.2.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150813_110541_00127_c100823112550000001823177111031581_s1_p0.3.bax.h5
wget ftp://ftp.sra.ebi.ac.uk/vol1/ERA525/ERA525888/pacbio_hdf5/m150813_110541_00127_c100823112550000001823177111031581_s1_p0.bas.h5

echo $(pwd)/m150811_092723_00127_c100844062550000001823187612311514_s1_p0.1.bax.h5 >> ./../$prefix.SMRTCell.1.RSII_bax.fofn
echo $(pwd)/m150811_092723_00127_c100844062550000001823187612311514_s1_p0.2.bax.h5 >> ./../$prefix.SMRTCell.1.RSII_bax.fofn
echo $(pwd)/m150811_092723_00127_c100844062550000001823187612311514_s1_p0.3.bax.h5 >> ./../$prefix.SMRTCell.1.RSII_bax.fofn
echo $(pwd)/m150814_233250_00127_c100823152550000001823177111031542_s1_p0.1.bax.h5 >> ./../$prefix.SMRTCell.2.RSII_bax.fofn
echo $(pwd)/m150814_233250_00127_c100823152550000001823177111031542_s1_p0.2.bax.h5 >> ./../$prefix.SMRTCell.2.RSII_bax.fofn
echo $(pwd)/m150814_233250_00127_c100823152550000001823177111031542_s1_p0.3.bax.h5 >> ./../$prefix.SMRTCell.2.RSII_bax.fofn
echo $(pwd)/m150911_220012_00127_c100861772550000001823190702121671_s1_p0.1.bax.h5 >> ./../$prefix.SMRTCell.3.RSII_bax.fofn
echo $(pwd)/m150911_220012_00127_c100861772550000001823190702121671_s1_p0.2.bax.h5 >> ./../$prefix.SMRTCell.3.RSII_bax.fofn
echo $(pwd)/m150911_220012_00127_c100861772550000001823190702121671_s1_p0.3.bax.h5 >> ./../$prefix.SMRTCell.3.RSII_bax.fofn
echo $(pwd)/m150813_110541_00127_c100823112550000001823177111031581_s1_p0.1.bax.h5 >> ./../$prefix.SMRTCell.4.RSII_bax.fofn
echo $(pwd)/m150813_110541_00127_c100823112550000001823177111031581_s1_p0.2.bax.h5 >> ./../$prefix.SMRTCell.4.RSII_bax.fofn
echo $(pwd)/m150813_110541_00127_c100823112550000001823177111031581_s1_p0.3.bax.h5 >> ./../$prefix.SMRTCell.4.RSII_bax.fofn

cd ..
cd ..

############################
# checking bash exit status
if [[ $? -eq 0 ]]
Expand Down
Loading

0 comments on commit 89976b0

Please sign in to comment.