Skip to content

Frequently asked questions

Tyler Fair edited this page Oct 14, 2019 · 3 revisions

FAQ

Why separate the off-target discovery and scoring modules of FlashFry?

Off-target discovery can have high computational costs for large putative target sets (say 10,000 to 100,000s of candidate guides). To avoid having to do this step every time you'd like to switch scoring metrics, we thought it was best to split the two stages up. You can also discover sites for the largest mismatch threshold you'd like to use, and then filter this down in scoring steps.

Why the does the output file look the way it does?

We originally wanted the output to work with common analysis tools such as BEDTools. This meant a format that encoded specific details into BED-file columns, as well as leaving off a traditional header line in favor of listing column details in the header section. In the end, this had limited utility, especially when we added the capability to annotate the output with BED files directly.

How much memory should I give FlashFry?

The memory requirements of FlashFly are determined by the number guides you're looking at and the number of off-targets you allow per guide candidate. The first factor is controlled by the size of the region you're looking at, and the second is controlled by the --maximumOffTargets parameter in the discovery phase. Generally with < 100K guides and --maximumOffTargets set to 2000 you'll be able to run with 4g of memory or less (such a memory limit is set in the JVM with the -Xmx4g command line parameter, right after java). You will need to increase this number with higher guide counts, a higher mismatch threshold, or if you want to retain more off-targets.

Why are some scores NA?

If the scoring metric is unable to produce a score for the specified guide it will output NA. This commonly happens when there isn't enough sequence context on either side of a guide for the on-target scoring, which can occur if the guide sits near the beginning or end of the input fasta file.

Should I use candidate guides marked with OVERFLOW?

FlashFry tags all guides exceeding the maximumOffTargets threshold from the discovery module as "OVERFLOW". These guides are excluded entirely from scoring and will not be included in the output scored table. However, these guides are included in the output from the discovery module. For guides tagged "OVERFLOW", we stop accumulating statistics on those targets after we've found too many candidate off-target sites. The remaining numbers are not reliable; these targets are kept in the output as a reference, to identify guides that will be lost during scoring due to the "OVERFLOW" tag.

Should I use a masked or unmasked genome to build my database?

Masking a genome obscures repetitive bases by either converting them to Ns (hard-masked) or making them lowercase (soft-masked). We recommend using an unmasked or soft-masked genome. You generally want to consider repetitive content when designing guides (you'd like to know about any off-targets within the genome), and this is not possible with hard-masked genomes.

Can FlashFry score existing guide libraries?

Yes! Append a PAM sequence to the protospacer sequences (such as 'GGG'; FlashFry will not score guides containing Ns), convert the library to FASTA format, use the index module to index your genome-of-interest for PAM sites, and follow the instructions in the discover and score modules.