Skip to content

Handling large input files

Igor edited this page Aug 10, 2017 · 1 revision

We provide an additional script imrep.huge.sh which allows handling huge input files.

The script accepts the following 3 mandatory command line arguments:

  1. inputfile - the path of the input file;
  2. number_of_chunks - an integer positive value representing a number of chunks the input file to be split in;
  3. outputfile - the path of the output file;

Example

./imreph.huge.sh my_input_unmapped_rna_seq_reads.fastq 10 my_output_clonotypes.txt

This will split the input file into 10 chunks, run ImReP separately on each of the chunks, merge the resulting clonotypes into one file and running Cast clustering algorithm in stand-alone mode.