-
Notifications
You must be signed in to change notification settings - Fork 10
Frequently asked questions
Off-target discovery can have high computational costs for large putative target sets (say 10,000 to 100,000s of candidate guides). To avoid having to do this step every time you'd like to switch scoring metrics, we thought it was best to split the two stages up. You can also discover sites for the largest mismatch threshold you'd like to use, and then filter this down in scoring steps.
We originally wanted the output to work with common analysis tools such as BEDTools. This meant a format that encoded specific details into BED-file columns, as well as leaving off a traditional header line in favor of listing column details in the header section. In the end, this had limited utility, especially when we added the capability to annotate the output with BED files directly.
The memory requirements of FlashFly are determined by the number guides you're looking at and the number of off-targets you allow per guide candidate. The first factor is controlled by the size of the region you're looking at, and the second is controlled by the --maximumOffTargets
parameter in the discovery phase. Generally with < 100K guides and --maximumOffTargets
set to 2000 you'll be able to run with 4g of memory or less (such a memory limit is set in the JVM with the -Xmx4g
command line parameter, right after java
). You will need to increase this number with higher guide counts, a higher mismatch threshold, or if you want to retain more off-targets.
If the scoring metric is unable to produce a score for the specified guide it will output NA. This commonly happens when there isn't enough sequence context on either side of a guide for the on-target scoring, which can occur if the guide sits near the beginning or end of the input fasta file.
FlashFry tags all guides exceeding the maximumOffTargets
threshold from the discovery
module as "OVERFLOW". These guides are excluded entirely from scoring and will not be included in the output scored table. However, these guides are included in the output from the discovery
module. For guides tagged "OVERFLOW", we stop accumulating statistics on those targets after we've found too many candidate off-target sites. The remaining numbers are not reliable; these targets are kept in the output as a reference, to identify guides that will be lost during scoring due to the "OVERFLOW" tag.
Masking a genome obscures repetitive bases by either converting them to Ns (hard-masked) or making them lowercase (soft-masked). We recommend using an unmasked or soft-masked genome. You generally want to consider repetitive content when designing guides (you'd like to know about any off-targets within the genome), and this is not possible with hard-masked genomes.
Yes! Append a PAM sequence to the protospacer sequences (such as 'GGG'; FlashFry will not score guides containing Ns), convert the library to FASTA format, use the index
module to index your genome-of-interest for PAM sites, and follow the instructions in the discover
and score
modules.