-
Notifications
You must be signed in to change notification settings - Fork 3
Usage Examples
Below are some examples of how to use GIL. If you need further clarification on how to use GIL for your application, please start a discussion here.
To generate 8 bp indexes in TruSeq-compatible primers with default filtering parameters, simply run:
GIL generate_indexes
All generated files can be found in Output
in the directory the command was run from.
Suppose you will be pooling your libraries with existing sets of indexes, for example
NEBNext® Multiplex Oligos for Illumina® (Dual Index Primers Set 1).
You can ensure that all of your generated indexes are compatible with this existing set by using the --blocklist
option.
In the manual you can find the index sequences for i7:
and i5:
Note that the expected index read is usually the reverse complement of the index sequence present in the primer, except for i5 indexes with certain sequencers. It's important that the indexes in the blocklist are the reverse complement of the index sequence present in the primer; i.e. the blocklist indexes should be the expected index read for MiniSeq®, NextSeq®, HiSeq4000, HiSeq3000.
To generate indexes that are compatible with this NEB index set, create a text file (let's call it NEB_indexes.txt
) containing one index per line:
AGGCTATA
GCCTCTAT
AGGATAGG
TCAGAGCC
CTTCGCCT
TAAGATTA
ACGTCCTG
GTCAGTAC
ATTACTCG
TCCGGAGA
CGCTCATT
GAGATTCC
ATTCAGAA
GAATTCGT
CTGAAGCT
TAATGCGC
CGGCTATG
TCCGCGAA
TCTCGCGC
AGCGATAG
Place this file somewhere, then supply the path to this file after the --blocklist
argument:
GIL generate_indexes --blocklist NEB_indexes.txt
All generated indexes will have a Levenshtein distance of at least 3 from any index present in the NEB index set, and libraries indexed with your generated indexes can be pooled with libraries indexed with this NEB set without index collisions.
You can specify custom primer sequences if you will be using the indexing primers for specialized NGS methods that don't use the standard Nextera or TruSeq adaptor sequences. Let's say you want your libraries to look like this:
5'-TAAAAAAAAAACNNNNNNNNGCCCCCCCCCCA-insert-CTTTTTTTTTTANNNNNNNNACCCCCCCCCCG-3'
3'-ATTTTTTTTTTGNNNNNNNNCGGGGGGGGGGT-insert-GAAAAAAAAAATNNNNNNNNTGGGGGGGGGGC-5'
idx2 (i5) read 1 read 2 idx1 (i7)
This means your forward/read 1/index 2/i5 primer should look like this:
5'-TAAAAAAAAAACNNNNNNNNGCCCCCCCCCCA-3'
and your reverse/read 2/index 1/i7 primer should look like this:
5'-CGGGGGGGGGGTNNNNNNNNTAAAAAAAAAAG-3'
To make indexing primers that will create this library structure, run the following:
GIL generate_indexes --library_type custom --primer_sequences "TAAAAAAAAAAC GCCCCCCCCCCA CGGGGGGGGGGT TAAAAAAAAAAG" --library_name Useless-Seq
The --library_name
argument is optional and replaces "custom" in the output file names with the name of your custom library type.
By default the generated indexes have a length of 8, and the parameters for filtering were designed with this length in mind. This default length
can be changed by using the --length
argument. For indexes longer than 9 nt, a random sample of 5,000 starting indexes is used instead of using all possible indexes, as
the total number of indexes possible is too large to process in a reasonable amount of time. The size of this random sample can be changed with the --sample-n
argument.
For longer indexes, you may want the minimum distance between sequences to be larger. This can be changed with the --dist
argument. Larger distances will require a larger
initial random sample to generate a sufficient number of indexes.
For example, if you want indexes of length 10 with a minimum distance of 4 between any two indexes, you can run the following:
GIL generate_indexes --dist 4 --length 10
This only generates two plates of indexes because the distance filtering is more strict. If you want more plates, you can change --sample-n
from the default 5000:
GIL generate_indexes --dist 4 --length 10 --sample-n 100000
This still runs in a reasonable amount of time, but now four plates of indexes are generated instead of two.
CDI plates can be created from UDI plates by copying a row of i5 primers across the rows of the CDI plate and copying a row of i7 primers down the columns of the CDI plate, as shown in the diagram:
This creates 96 unique pairs of indexes while only using 20 primers. It's important to use rows for both the i5 and i7 primers because indexes are colour balanced in groups of four across the rows of the UDI plate. If a column of i5 primers was used (which makes more sense at first glance), the CDI plate would not be guaranteed to be colour balanced. Using rows from the UDI plate ensures that the CDI plate is colour balanced in blocks of 4x4 indexes.
Let's assume that you've created the CDI plate from the first i5 and i7 plates generated by GIL. To create a sample sheet for this new CDI plate, run the following:
GIL create_sample_sheets --i7s Output/Plates/Indexes/i7/TruSeq_i7_Indexes_Plate_1.tsv --i5s Output/Plates/Indexes/i5/TruSeq_i5_Indexes_Plate_1.tsv --i7-row A --i5-row A --plate-name CDI_plate1_i5A_i7A
Under Output/Sample_Sheets
you should now see a new index sheet: CDI_plate1_i5A_i7A_index_sheet.csv
, and two new sample sheets in the Forward_Strand_Workflow_Sample_Sheets
and Reverse_Complement_Workflow_Sample_Sheets
directories.
Note that --plate-name
is a required argument, but it can be whatever you want.