Skip to content

Usage Examples

nmateyko edited this page Mar 20, 2023 · 1 revision

Usage Examples

Below are some examples of how to use GIL. If you need further clarification on how to use GIL for your application, please start a discussion here.

Default Settings

To generate 8 bp indexes in TruSeq-compatible primers with default filtering parameters, simply run:

GIL generate_indexes

All generated files can be found in Output in the directory the command was run from.

Using a Blocklist

Suppose you will be pooling your libraries with existing sets of indexes, for example NEBNext® Multiplex Oligos for Illumina® (Dual Index Primers Set 1). You can ensure that all of your generated indexes are compatible with this existing set by using the --blocklist option.

In the manual you can find the index sequences for i7:

and i5:

Note that the expected index read is usually the reverse complement of the index sequence present in the primer, except for i5 indexes with certain sequencers. It's important that the indexes in the blocklist are the reverse complement of the index sequence present in the primer; i.e. the blocklist indexes should be the expected index read for MiniSeq®, NextSeq®, HiSeq4000, HiSeq3000.

To generate indexes that are compatible with this NEB index set, create a text file (let's call it NEB_indexes.txt) containing one index per line:

AGGCTATA
GCCTCTAT
AGGATAGG
TCAGAGCC
CTTCGCCT
TAAGATTA
ACGTCCTG
GTCAGTAC
ATTACTCG
TCCGGAGA
CGCTCATT
GAGATTCC
ATTCAGAA
GAATTCGT
CTGAAGCT
TAATGCGC
CGGCTATG
TCCGCGAA
TCTCGCGC
AGCGATAG

Place this file somewhere, then supply the path to this file after the --blocklist argument:

GIL generate_indexes --blocklist NEB_indexes.txt

All generated indexes will have a Levenshtein distance of at least 3 from any index present in the NEB index set, and libraries indexed with your generated indexes can be pooled with libraries indexed with this NEB set without index collisions.

Using Custom Primer Sequences

You can specify custom primer sequences if you will be using the indexing primers for specialized NGS methods that don't use the standard Nextera or TruSeq adaptor sequences. Let's say you want your libraries to look like this:

5'-TAAAAAAAAAACNNNNNNNNGCCCCCCCCCCA-insert-CTTTTTTTTTTANNNNNNNNACCCCCCCCCCG-3'
3'-ATTTTTTTTTTGNNNNNNNNCGGGGGGGGGGT-insert-GAAAAAAAAAATNNNNNNNNTGGGGGGGGGGC-5'
              idx2 (i5)   read 1              read 2   idx1 (i7)

This means your forward/read 1/index 2/i5 primer should look like this:

5'-TAAAAAAAAAACNNNNNNNNGCCCCCCCCCCA-3'

and your reverse/read 2/index 1/i7 primer should look like this:

5'-CGGGGGGGGGGTNNNNNNNNTAAAAAAAAAAG-3'

To make indexing primers that will create this library structure, run the following:

GIL generate_indexes --library_type custom --primer_sequences "TAAAAAAAAAAC GCCCCCCCCCCA CGGGGGGGGGGT TAAAAAAAAAAG" --library_name Useless-Seq

The --library_name argument is optional and replaces "custom" in the output file names with the name of your custom library type.

Changing Index Length and Minimum Distance

By default the generated indexes have a length of 8, and the parameters for filtering were designed with this length in mind. This default length can be changed by using the --length argument. For indexes longer than 9 nt, a random sample of 5,000 starting indexes is used instead of using all possible indexes, as the total number of indexes possible is too large to process in a reasonable amount of time. The size of this random sample can be changed with the --sample-n argument.

For longer indexes, you may want the minimum distance between sequences to be larger. This can be changed with the --dist argument. Larger distances will require a larger initial random sample to generate a sufficient number of indexes.

For example, if you want indexes of length 10 with a minimum distance of 4 between any two indexes, you can run the following:

GIL generate_indexes --dist 4 --length 10

This only generates two plates of indexes because the distance filtering is more strict. If you want more plates, you can change --sample-n from the default 5000:

GIL generate_indexes --dist 4 --length 10 --sample-n 100000

This still runs in a reasonable amount of time, but now four plates of indexes are generated instead of two.

Making CDI Plates

CDI plates can be created from UDI plates by copying a row of i5 primers across the rows of the CDI plate and copying a row of i7 primers down the columns of the CDI plate, as shown in the diagram:

This creates 96 unique pairs of indexes while only using 20 primers. It's important to use rows for both the i5 and i7 primers because indexes are colour balanced in groups of four across the rows of the UDI plate. If a column of i5 primers was used (which makes more sense at first glance), the CDI plate would not be guaranteed to be colour balanced. Using rows from the UDI plate ensures that the CDI plate is colour balanced in blocks of 4x4 indexes.

Let's assume that you've created the CDI plate from the first i5 and i7 plates generated by GIL. To create a sample sheet for this new CDI plate, run the following:

GIL create_sample_sheets --i7s Output/Plates/Indexes/i7/TruSeq_i7_Indexes_Plate_1.tsv --i5s Output/Plates/Indexes/i5/TruSeq_i5_Indexes_Plate_1.tsv --i7-row A --i5-row A --plate-name CDI_plate1_i5A_i7A

Under Output/Sample_Sheets you should now see a new index sheet: CDI_plate1_i5A_i7A_index_sheet.csv, and two new sample sheets in the Forward_Strand_Workflow_Sample_Sheets and Reverse_Complement_Workflow_Sample_Sheets directories. Note that --plate-name is a required argument, but it can be whatever you want.