Skip to content

Commit

Permalink
Update README.md and minor fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
kahsieh committed Mar 19, 2018
1 parent 96fbb9f commit d777380
Show file tree
Hide file tree
Showing 4 changed files with 119 additions and 28 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Project-specific
db_*
tools/imrep
tools/metaphlan2
tools/MiniConda
Expand Down
78 changes: 78 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,81 @@ at the University of California, Los Angeles (UCLA).

Released under the terms of the General Public License, version 3.0 (GPLv3).
For more information, please visit: <https://github.com/smangul1/rop/wiki>

## Installing ROP

To install ROP, first clone this repository, then run

```
./install.sh
```

from the repository's directory. This will download dependencies and databases.
The default installation will generally suffice, but the following options are
available:

- `-c|--clean`: Just remove installed tools.
- The installation script then must be re-run in order to use ROP again.
- `-n|--native`: Use native python.
- MiniConda will not be downloaded.
- May lead to dependency errors.
- `-f|--force`: Unlink databases.
- Use with caution.
- `-l|--link LINK`: Link databases instead of downloading.
- Useful if you previously downloaded an ROP database.
- A symlink will be created in the current directory.
- `-d|--db-dest DB_DEST` (default: `.`): Change database download location.
- Useful for managing space.
- A symlink will be created in the current directory.
- `-o|--organism ORGANISM` (default: `human`): Organism to download databases
for.
- `-s|--select-db SELECT_DB` (default: all): Database(s) to download for the
specified organism.
- A comma-separated list of one or more of the following: repeat, immune,
microbiome metaphlan, viral, fungi, protozoa.
- `-h|--help`: Displays usage information.

## Using ROP

To use ROP, run

```
rop.sh unmapped_reads output_dir
```

Unless otherwise specified using an option, `unmapped_reads`
must be a .fastq/.fq file, and `output_dir` must not exist (it will be created).
Results will be written to `output_dir`, with one subdirectory for every stage
of the pipeline. The following options are available:

- `-o|--organism` (default: `human`): Run for the specified organism instead of
human.
- `-s|--steps` (default: all except lowq and bacteria): Select the analysis modes to use.
- A comma-separated list of one or more of the following: lowq, rdna,
reference, repeats, circrna, immune, microbiome (which may be subdivided
into bacteria, metaphlan, viral, fungi, protozoa).
- `-s all` selects everything.
- circrna and bacteria are not available in this release.
- `-m|--max`: Use a liberal threshold when remapping to reference.
- May account for more reads.
- `-f|--force`: Overwrite the analysis destination directory.
- `-d|--dev`: Keep intermediate FASTA files.
- Consumes extra space.
- `-z|--gzip`: gunzip the input file.
- `-b|--bam`: Input unmapped reads in .bam format instead of .fastq format.
- `-a|--fasta`: Input unmapped reads in .fasta format instead of .fastq format.
Forcibly disables low-quality read filtering.
- `-h|--help`: Displays usage information.

A small example file is included in the repository in various formats. To try it
out, run one of the following commands from the repository directory:

```
rop.sh -b example/example.bam ropout
rop.sh example/example.fastq ropout
rop.sh -z example/example.fastq.gz ropout
rop.sh -a example/example.fasta ropout
rop.sh -az example/example.fasta.gz ropout
```

Then, browser to the `ropout` directory to see the analysis results!
6 changes: 2 additions & 4 deletions install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ echo '--------------------------------------------------------------------------
echo 'Read Origin Protocol: Installer'
echo '--------------------------------------------------------------------------------'
DIR=`dirname $(readlink -e "$0")`
cat "$DIR/README.md"
sed '/##/ q' "$DIR/README.md" | head -n -2 | tail -n +3
echo '--------------------------------------------------------------------------------'

# ------------------------------------------------------------------------------
Expand Down Expand Up @@ -49,7 +49,6 @@ if [ $? -ne 4 ]; then
echo "Error: Environment doesn't support getopt." >&2
exit 1
fi
set -e

# Call getopt.
SHORT_OPTIONS='cnfl:d:o:s:h'
Expand All @@ -59,6 +58,7 @@ if [ $? -ne 0 ]; then
exit 1 # getopt will have printed the error message
fi
eval set -- "$PARSED"
set -e

# Set default options.
CLEAN_ONLY=false
Expand Down Expand Up @@ -166,12 +166,10 @@ if [ $NATIVE = false ]; then
sed -i "1c #!$MiniConda" metaphlan2/metaphlan2.py
sed -i "1c #!$MiniConda" metaphlan2/strainphlan.py
sed -i "1c #!$MiniConda" metaphlan2/utils/read_fastx.py
sed -i "1c #!$MiniConda" ../rop.py
else
sed -i '1c #!/usr/bin/env python2.7' metaphlan2/metaphlan2.py
sed -i '1c #!/usr/bin/env python2.7' metaphlan2/strainphlan.py
sed -i '1c #!/usr/bin/env python2.7' metaphlan2/utils/read_fastx.py
sed -i '1c #!/usr/bin/env python2.7' ../rop.py
fi

# ------------------------------------------------------------------------------
Expand Down
62 changes: 38 additions & 24 deletions rop.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ echo '--------------------------------------------------------------------------
echo 'Read Origin Protocol: Main Program'
echo '--------------------------------------------------------------------------------'
DIR=`dirname $(readlink -e "$0")`
cat "$DIR/README.md"
sed '/##/ q' "$DIR/README.md" | head -n -2 | tail -n +3
echo '--------------------------------------------------------------------------------'

# Add MiniConda to PATH if it's available.
Expand All @@ -28,17 +28,17 @@ if [ $? -ne 4 ]; then
echo "Error: Environment doesn't support getopt." >&2
exit 1
fi
set -e

# Call getopt.
SHORT_OPTIONS='o:s:mfdzbah'
LONG_OPTIONS='organism:,steps:,max,force,dev,gzip,bam,fasta,help'
SHORT_OPTIONS='o:s:mfdzbaxh'
LONG_OPTIONS='organism:,steps:,max,force,dev,gzip,bam,fasta,commands,help'
PARSED=`getopt --options="$SHORT_OPTIONS" --longoptions="$LONG_OPTIONS" \
--name "$0" -- "$@"`
if [ $? -ne 0 ]; then
exit 1 # getopt will have printed the error message
fi
eval set -- "$PARSED"
set -e

# Set default options.
UNMAPPED_READS=''
Expand All @@ -55,6 +55,7 @@ DEV=false
GZIP=false
BAM=false
FASTA=false
COMMANDS=false

# Review parsed options.
while true; do
Expand Down Expand Up @@ -113,6 +114,11 @@ while true; do
FASTA=true
shift
;;
-x|--commands)
# Print all commands.
COMMANDS=true
shift
;;
-h|--help)
echo "Usage: $0 [-o ORGANISM] [-s STEPS] [-mfdzba]" \
"unmapped_reads output_dir" >&2
Expand All @@ -137,10 +143,21 @@ while true; do
esac
done

# Add all steps if selected.
if [ "$STEPS" = 'all' ]; then
STEPS='lowq rdna reference repeats circrna immune microbiome'
fi

# Convert to absolute paths.
UNMAPPED_READS=`readlink -e "$UNMAPPED_READS"`
UNMAPPED_READS=`readlink -m "$UNMAPPED_READS"`
OUTPUT_DIR=`readlink -m "$OUTPUT_DIR"`

# Check if UNMAPPED_READS exists.
if [ ! -e "$UNMAPPED_READS" ]; then
echo "Error: $UNMAPPED_READS doesn't exist." >&2
exit 1
fi

# Check if OUTPUT_DIR exists, then make it.
if [ -d "$OUTPUT_DIR" ]; then
if [ $FORCE = true ]; then
Expand All @@ -162,9 +179,12 @@ mkdir -p "$OUTPUT_DIR"
SAMPLE=`basename "$UNMAPPED_READS" | sed 's \([^\.]*\)\..* \1 '`
DB="$DIR/db_$ORGANISM"

# Duplicate stdout and stderr to the log file.
# Duplicate stdout and stderr to the log file. Print commands if selected.
touch "$OUTPUT_DIR/$SAMPLE--general.log"
exec &> >(tee -i "$OUTPUT_DIR/$SAMPLE--general.log")
if [ $COMMANDS = true ]; then
set -x
fi
echo "Input file: $UNMAPPED_READS"

# Declare output directories.
Expand Down Expand Up @@ -279,7 +299,7 @@ reads_present () {
}

clean () {
if [ $DEV = false ] && [ `dirname "$1"` != "$OUTPUT_DIR" ]; then
if [ $DEV = false ]; then
rm "$1"
fi
}
Expand Down Expand Up @@ -312,6 +332,7 @@ if [ $BAM == true ]; then
exit 1
fi
samtools bam2fq "$current" >"$post"
clean "$current"
current="$post"
fi

Expand Down Expand Up @@ -364,15 +385,17 @@ if ! grep -q 'lowq' <<<"$STEPS" || [ $FASTA = true ] || \

# Must convert to fasta to continue.
if [ $FASTA = false ]; then
fastq_to_fasta <"$current" >"$post"
fastq_to_fasta -n <"$current" >"$post"
clean "$current"
current="$post"
fi
else
n_reads['01_lowq']=`python "$DIR/helper.py" lowq $MAX $PE \
--pre "$current" --post "$post"`
echo "--> Marked lowq in the names of ${n_reads['01_lowq']} low quality" \
'reads. Reads not filtered.'
# Don't clean $current. It won't have an effect, anyway.
'reads.'
echo ' These reads are not filtered.'
clean "$current"
current="$post"
fi

Expand Down Expand Up @@ -510,22 +533,19 @@ fi

echo '7a. MetaPhlAn profiling (-s metaphlan)...'
cd "${DIRS['07a_metaphlan']}"
# No $post file (don't reduce unmapped reads using MetaPhlAn results).
# No post file (don't reduce unmapped reads using MetaPhlAn results).

if ! grep -qE 'metaphlan|microbiome' <<<"$STEPS" || ! reads_present "$current"; then
echo '--> Skipped MetaPhlAn profiling.'
else
echo '--> MetaPhlAn profiling will execute in the background.'
{
python "$DIR/tools/metaphlan2/metaphlan2.py" "$current" \
--input_type multifasta --nproc 8 \
--bowtie2out "${INTFNS['07a_metaphlan_bowtie2out']}" \
>"${INTFNS['07a_metaphlan_output']}" 2>"${LOGFNS['07a_metaphlan']}"
n_reads['07a_metaphlan']=`cat "${INTFNS['07a_metaphlan_output']}" | wc -l`
echo "--> (background) Identified ${n_reads['07a_metaphlan']} reads using" \
'MetaPhlAn. Reads not filtered.'
n_reads_07a_metaphlan=`cat "${INTFNS['07a_metaphlan_output']}" | wc -l`
echo "--> Identified $n_reads_07a_metaphlan reads using MetaPhlAn."
echo ' These reads are neither filtered nor included in the total.'
# Don't clean or change $current.
} &
fi

echo '7b. Bacterial profiling (-s bacteria)...'
Expand Down Expand Up @@ -616,12 +636,6 @@ fi
# CLEANUP
# ------------------------------------------------------------------------------

# Wait for MetaPhlAn to finish.
if ps -p $! >/dev/null; then
echo 'Waiting for MetaPhlAn to finish...'
wait
fi

# Revise low quality read count.
if grep -q 'lowq' <<<"$STEPS"; then
n_reads['01_lowq']=`grep -c '^>lowq_' "$current"`
Expand All @@ -634,7 +648,7 @@ sum=0
for key in "${!n_reads[@]}"; do
steps+="$key,"
counts+="${n_reads[$key]},"
((sum += ${n_reads[$key]}))
((sum += ${n_reads[$key]})) || true
done
steps=`sed 's .$ ' <<<"$steps"`
counts=`sed 's .$ ' <<<"$counts"`
Expand Down

0 comments on commit d777380

Please sign in to comment.