Skip to content

How to install ImReP?

Serghei Mangul edited this page Aug 7, 2017 · 10 revisions

ImReP is written in (Python 2.7) programming language.

Download ImReP using

git clone https://github.com/mandricigor/imrep.git

and install it from the base directory

cd imrep
./install.sh

It will install the following dependencies:

In order to check that ImReP is installed properly, run from your command line.

$ python imrep.py
usage: python2 imrep.py [-h] [--fastq] [--bam] [--chrFormat2] [--hg38]
                        [-a ALLREADS] [--digGold] [-s SPECIES] [-o OVERLAPLEN]
                        [--noOverlapStep] [--extendedOutput] [-c CHAINS]
                        [--noCast] [-f FILTERTHRESHOLD]
                        [--minOverlap1 MINOVERLAP1]
                        [--minOverlap2 MINOVERLAP2] [--misMatch1 MISMATCH1]
                        [--misMatch2 MISMATCH2]
                        reads_file output_clones
python2 imrep.py: error: too few arguments

or

 $ python imrep.py -h
usage: python2 imrep.py [-h] [--fastq] [--bam] [--chrFormat2] [--hg38]
                        [-a ALLREADS] [--digGold] [-s SPECIES] [-o OVERLAPLEN]
                        [--noOverlapStep] [--extendedOutput] [-c CHAINS]
                        [--noCast] [-f FILTERTHRESHOLD]
                        [--minOverlap1 MINOVERLAP1]
                        [--minOverlap2 MINOVERLAP2] [--misMatch1 MISMATCH1]
                        [--misMatch2 MISMATCH2]
                        reads_file output_clones

optional arguments:
  -h, --help            show this help message and exit

Necessary Inputs:
  reads_file            unmapped reads in .fasta (default) or .fastq (if flag
                        --fastq is set) or .bam (if --bam or --digGold is set)
  output_clones         output file with CDR3 clonotypes

Optional Inputs:
  --fastq               a binary flag used to indicate that the input file
                        with unmapped reads is in fastq format
  --bam                 a binary flag used to indicate that the input file is
                        a BAM file mapped and unmapped reads
  --chrFormat2          a binary flag used to indicate that the format of
                        chromosome name in the bam file is in this format :
                        chr1, chr2,..,chrX. This options is only compatible
                        with --bam option. By default we asssume chromosmes
                        names are indicated only by numbers :1,2,3,...,X
  --hg38                a binary flag used to indicate that reads were mapped
                        to hg38 rellease. The default is hg19. For mouse we
                        support only mm10 (default).
  -a ALLREADS, --allReads ALLREADS
                        Original raw reads (all reads). Needs to be used with
                        --digGold option
  --digGold             a binary flag used to indicate that the input file is
                        FASTQ file with original raw reads (all reads). And
                        unmapped reads needs to be extracted from the raw
                        reads ( original raw reads are provided using
                        --reads_file option). Use this option only if unmapped
                        reads were not saved. Needs to be used with -m option
  -s SPECIES, --species SPECIES
                        species (human or mouse, default human)
  -o OVERLAPLEN, --overlapLen OVERLAPLEN
                        the minimal length to consider between reads
                        overlapping with a V gene and reads overlapping with a
                        J gene. Default value is 5 amino acids.
  --noOverlapStep       a binary flag used in case if the user does not want
                        to run the second stage of the ImReP assembly.
  --extendedOutput      extended output: write information read by read
  -c CHAINS, --chains CHAINS
                        chains: comma separated values from
                        IGH,IGK,IGL,TRA,TRB,TRD,TRG
  --noCast              specify this option if you want to disable CDR3
                        clustering
  -f FILTERTHRESHOLD, --filterThreshold FILTERTHRESHOLD
                        filter out clonotypes with readcount less or equal
                        than filterThreshold (remove outliers), default is 1

Advanced Inputs:
  --minOverlap1 MINOVERLAP1
                        minimal overlap between the reads and A) the left part
                        of V gene (before C amino acid) and B) the right part
                        of J gene (after W for IGH and F for all other
                        chains), default is 4
  --minOverlap2 MINOVERLAP2
                        minimal overlap between the reads and A) the right
                        part of V gene (after C amino acid) and B) the left
                        part of J gene (before W for IGH and F for all other
                        chains), default is 1
  --misMatch1 MISMATCH1
                        maximal number of mismatches between the reads and A)
                        the left part of V gene (before C amino acid) and B)
                        the right part of J gene (after W for IGH and F for
                        all other chains), default is 2
  --misMatch2 MISMATCH2
                        maximal number of mismatches between the reads and A)
                        the right part of V gene (after C amino acid) and B)
                        the left part of J gene (before W for IGH and F for
                        all other chains), default is 2