Skip to content

Commit

Permalink
Fix up options
Browse files Browse the repository at this point in the history
  • Loading branch information
kahsieh committed Mar 21, 2018
1 parent b83395a commit c3579cb
Show file tree
Hide file tree
Showing 4 changed files with 81 additions and 75 deletions.
23 changes: 12 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ available:

- `-c|--clean`: Just remove installed tools.
- The installation script then must be re-run in order to use ROP again.
- `-f|--force`: Unlink databases.
- Use with caution.
- `-n|--native`: Use native python.
- MiniConda will not be downloaded.
- May lead to dependency errors.
- `-f|--force`: Unlink databases.
- Use with caution.
- `-l|--link LINK`: Link databases instead of downloading.
- Useful if you previously downloaded an ROP database.
- A symlink will be created in the current directory.
Expand All @@ -47,8 +47,6 @@ available:
specified organism.
- A comma-separated list of one or more of the following: repeat, immune,
microbiome metaphlan, viral, fungi, protozoa.
- `-i|--ignore-extensions`: Ignore incorrect .fastq/.fq/.fasta/.fa file
extensions. Does not ignore incorrect .gz/.bam file extensions.
- `-h|--help`: Displays usage information.

## Using ROP
Expand All @@ -72,15 +70,18 @@ of the pipeline. The following options are available:
into bacteria, metaphlan, viral, fungi, protozoa).
- `-s all` selects everything.
- circrna and bacteria are not available in this release.
- `-m|--max`: Use a liberal threshold when remapping to reference.
- May account for more reads.
- `-f|--force`: Overwrite the analysis destination directory.
- `-d|--dev`: Keep intermediate FASTA files.
- Consumes extra space.
- `-z|--gzip`: gunzip the input file.
- `-b|--bam`: Input unmapped reads in .bam format instead of .fastq format.
- `-a|--fasta`: Input unmapped reads in .fasta format instead of .fastq format.
Forcibly disables low-quality read filtering.
- `-b|--bam`: Input unmapped reads in .bam format instead of .fastq format.
- `-z|--gzip`: gunzip the input file.
- `-d|--dev`: Keep intermediate FASTA files.
- Consumes extra space.
- `-f|--force`: Overwrite the analysis destination directory.
- `-i|--ignore-extensions`: Ignore incorrect .fastq/.fq/.fasta/.fa file
extensions. Does not ignore incorrect .gz/.bam file extensions.
- `-m|--max`: Use a liberal threshold when remapping to reference.
- May account for more reads.
- `-x|--commands`: Print all commands (diagnostic mode).
- `-h|--help`: Displays usage information.

A small example file is included in the repository in various formats. To try it
Expand Down
2 changes: 1 addition & 1 deletion helper.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/u/project/zarlab/kahsieh/rop-wip/tools/MiniConda/bin/python
#!/usr/bin/env python2.7

"""
--------------------------------------------------------------------------------
Expand Down
22 changes: 12 additions & 10 deletions install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,21 +49,23 @@ if [ $? -ne 4 ]; then
echo "Error: Environment doesn't support getopt." >&2
exit 1
fi
set -e

# Call getopt.
SHORT_OPTIONS='cnfl:d:o:s:h'
LONG_OPTIONS='clean,native,force,link:,db-dest:,organism:,select-db:,help'
SHORT_OPTIONS='cfnl:d:o:s:h'
LONG_OPTIONS='clean,force,native,link:,db-dest:,organism:,select-db:,help'
set +e
PARSED=`getopt --options="$SHORT_OPTIONS" --longoptions="$LONG_OPTIONS" --name "$0" -- "$@"`
if [ $? -ne 0 ]; then
exit 1 # getopt will have printed the error message
fi
eval set -- "$PARSED"
set -e
eval set -- "$PARSED"

# Set default options.
CLEAN_ONLY=false
NATIVE=false
FORCE=false
NATIVE=false
LINK=''
DB_DEST="$DIR"
ORGANISM='human'
Expand All @@ -78,16 +80,16 @@ while true; do
CLEAN_ONLY=true
shift
;;
-n|--native)
# Use native python.
NATIVE=true
shift
;;
-f|--force)
# Unlink databases.
FORCE=true
shift
;;
-n|--native)
# Use native python.
NATIVE=true
shift
;;
-l|--link)
# Link databases instead of downloading.
LINK="$2"
Expand All @@ -109,7 +111,7 @@ while true; do
shift 2
;;
-h|--help)
echo "Usage: $0 [-cnf] [-l LINK] [-d DB_DEST] [-o ORGANISM]"\
echo "Usage: $0 [-cfnh] [-l LINK] [-d DB_DEST] [-o ORGANISM]"\
'[-s SELECT_DB]' >&2
exit 0
;;
Expand Down
109 changes: 56 additions & 53 deletions rop.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,35 +28,38 @@ if [ $? -ne 4 ]; then
echo "Error: Environment doesn't support getopt." >&2
exit 1
fi
set -e

# Call getopt.
SHORT_OPTIONS='o:s:mfdzbaixh'
LONG_OPTIONS='organism:,steps:,max,force,dev,gzip,bam,fasta,ignore-extensions,commands,help'
SHORT_OPTIONS='o:s:abzdfimpqxh'
LONG_OPTIONS='organism:,steps:,fasta,bam,gzip,dev,force,ignore-extensions,max,\
pe,quiet,commands,help'
set +e
PARSED=`getopt --options="$SHORT_OPTIONS" --longoptions="$LONG_OPTIONS" \
--name "$0" -- "$@"`
if [ $? -ne 0 ]; then
exit 1 # getopt will have printed the error message
fi
eval set -- "$PARSED"
set -e
eval set -- "$PARSED"

# Set default options.
UNMAPPED_READS=''
OUTPUT_DIR=''
ORGANISM='human'
STEPS='rdna reference repeats immune metaphlan viral fungi protozoa'
# Non-default: lowq (too slow).
# Disabled: circrna bacteria (databases missing).
FASTA=false
BAM=false
GZIP=false
DEV=false
FORCE=false
IGNORE_EXTENSIONS=false
MAX=''
PE=''
FORCE=false
QUIET=false
DEV=false
GZIP=false
BAM=false
FASTA=false
IGNORE_EXTENSIONS=false
COMMANDS=false
UNMAPPED_READS=''
OUTPUT_DIR=''

# Review parsed options.
while true; do
Expand All @@ -71,48 +74,30 @@ while true; do
STEPS=`tr ',' ' ' <<<"$2"`
shift 2
;;
-m|--max)
# Use a liberal threshold when remapping to reference.
MAX='--max'
shift
;;
-p|--pe)
# Not implemented (usage unclear).
# Report the number of discordant read pairs, with reads from the
# same pair classified into different classes.
PE='--pe'
shift
;;
-f|--force)
# Overwrite the analysis destination directory.
FORCE=true
shift
;;
-q|--quiet)
# Not implemented (usage unclear).
# Suppress progress report and warnings.
QUIET=true
-a|--fasta)
# Input unmapped reads in .fasta format instead of .fastq format.
# Forcibly disables low-quality read filtering.
FASTA=true
shift
;;
-d|--dev)
# Keep intermediate FASTA files.
DEV=true
-b|--bam)
# Input unmapped reads in .bam format instead of .fastq format.
BAM=true
shift
;;
-z|--gzip)
# gunzip the input file.
GZIP=true
shift
;;
-b|--bam)
# Input unmapped reads in .bam format instead of .fastq format.
BAM=true
-d|--dev)
# Keep intermediate FASTA files.
DEV=true
shift
;;
-a|--fasta)
# Input unmapped reads in .fasta format instead of .fastq format.
# Forcibly disables low-quality read filtering.
FASTA=true
-f|--force)
# Overwrite the analysis destination directory.
FORCE=true
shift
;;
-i|--ignore-extensions)
Expand All @@ -121,13 +106,31 @@ while true; do
IGNORE_EXTENSIONS=true
shift
;;
-m|--max)
# Use a liberal threshold when remapping to reference.
MAX='--max'
shift
;;
-p|--pe)
# Not implemented (usage unclear).
# Report the number of discordant read pairs, with reads from the
# same pair classified into different classes.
PE='--pe'
shift
;;
-q|--quiet)
# Not implemented (usage unclear).
# Suppress progress report and warnings.
QUIET=true
shift
;;
-x|--commands)
# Print all commands.
COMMANDS=true
shift
;;
-h|--help)
echo "Usage: $0 [-o ORGANISM] [-s STEPS] [-mfdzba]" \
echo "Usage: $0 [-o ORGANISM] [-s STEPS] [-abz] [-dfimxh]" \
"unmapped_reads output_dir" >&2
exit 0
;;
Expand All @@ -136,8 +139,7 @@ while true; do
UNMAPPED_READS="$2"
OUTPUT_DIR="$3"
if [ "$UNMAPPED_READS" = '' ] || [ "$OUTPUT_DIR" = '' ]; then
echo "Usage: $0 [-o ORGANISM] [-s STEPS] [-mfdzbaih]" \
"unmapped_reads output_dir" >&2
echo 'Error: Insufficient arguments.'
exit 1
fi
shift 3
Expand Down Expand Up @@ -297,7 +299,7 @@ declare -A LOGFNS=(
# ------------------------------------------------------------------------------

reads_present () {
if [ `cat "$1" | wc -l` -le 1 ]; then
if [ `wc -l <"$1"` -le 1 ]; then
echo 'No more reads!'
return 1 # false
else
Expand Down Expand Up @@ -352,16 +354,17 @@ if [ $FASTA == true ]; then
exit 1
fi
N=`grep -c '^>' "$current"`
READ_LENGTH=$(($(grep -A 1 -m 1 '^>' "$current" | tail -n 1 | wc -m) - 1))
READ_LENGTH=$(($(sed -n '2 p' <"$current" | wc -m) - 1))
else
if [ $IGNORE_EXTENSIONS = false ] && \
[ `basename $current .fastq` == "$current" ] && \
[ `basename $current .fq` == "$current" ]; then
echo 'Error: input file missing .fastq/.fq extension' >&2
exit 1
fi
N=`grep -c '^+$' "$current"`
READ_LENGTH=$(($(grep -B 1 -m 1 '^+$' "$current" | head -n 1 | wc -m) - 1))
line_count=`wc -l <"$current"`
N=`bc <<<"$line_count/4"`
READ_LENGTH=$(($(sed -n '2 p' <"$current" | wc -m) - 1))
fi
echo "Processing $N unmapped reads. The first unmapped read has length $READ_LENGTH."
current=`readlink -e "$current"`
Expand Down Expand Up @@ -476,11 +479,11 @@ else
-use_index true -query "$current" -db "$DB/repeats/repbase.fa" \
-outfmt 6 -evalue 1e-05 >"${INTFNS['04_repeats_output']}" \
2>"${LOGFNS['04_repeats']}"
n_reads['03_reference']=`python "$DIR/helper.py" repeats $MAX $PE \
n_reads['04_repeats']=`python "$DIR/helper.py" repeats $MAX $PE \
-i "${INTFNS['04_repeats_output']}" \
-o "${INTFNS['04_repeats_reads']}" \
--pre "$current" --post "$post"`
echo "--> Filtered ${n_reads['03_reference']} reads from repeat sequences."
echo "--> Filtered ${n_reads['04_repeats']} reads from repeat sequences."
clean "$current"
current="$post"
fi
Expand Down Expand Up @@ -551,7 +554,7 @@ else
--input_type multifasta --nproc 8 \
--bowtie2out "${INTFNS['07a_metaphlan_bowtie2out']}" \
>"${INTFNS['07a_metaphlan_output']}" 2>"${LOGFNS['07a_metaphlan']}"
n_reads_07a_metaphlan=`cat "${INTFNS['07a_metaphlan_output']}" | wc -l`
n_reads_07a_metaphlan=`wc -l <"${INTFNS['07a_metaphlan_output']}"`
echo "--> Identified $n_reads_07a_metaphlan reads using MetaPhlAn."
echo ' These reads are neither filtered nor included in the total.'
# Don't clean or change $current.
Expand Down Expand Up @@ -636,7 +639,7 @@ else
-i "${INTFNS['07e_protozoa_output']}" \
-o "${INTFNS['07e_protozoa_reads']}" \
--pre "$current" --post "$post"`
echo "--> Filtered ${n_reads['07d_fungi']} reads from protozoan genomes."
echo "--> Filtered ${n_reads['07e_protozoa']} reads from protozoan genomes."
clean "$current"
current="$post"
fi
Expand Down

0 comments on commit c3579cb

Please sign in to comment.