-
Notifications
You must be signed in to change notification settings - Fork 10
Site scoring
-
input (required)
- the input file produced by thediscover
module -
output (required)
- the scored output file -
database (required)
- the database of off-target sequences for the genome of interest -
maxMismatch (optional)
- the maximum number of mismatches in off-targets to consider. This is a way to filter down the mismatch list considered in thediscover
module output (say you ran that with 5 mismatches considered indiscover
, but now you only want to consider 3) -
scoringMetrics (required)
- which scoring metrics to apply. See below for the supported scoring options.
The following scoring options can be supplied to the --scoringMetrics
command line parameter. I recommend reading Haeussler et al. Genome Biology, 2016 for details and comparisons of various scoring schemes to determine which are most appropriate given your experimental design. Some of these scores have command line options of their own, documented below.
A reminder that FlashFry tags all guides exceeding the maximumOffTargets
threshold from the discovery
module as "OVERFLOW". These guides are excluded entirely from scoring and will not be included in the output scored table.
-
hsu2013
(ranking score) - Off-target guide specificity score, also known as the crispr.mit.edu score. From Hsu et al. Nature Biotechnology, 2013. Developed for catalytic Cas9, this is likely the most widely used off-target specificity score. Scores range from 0 to 100; higher scores indicate lower predicted off-target activity. -
doench2016cfd
(ranking score) - Off-target guide specificity score, the cutting frequency determination (CFD) score, a measure of the relative activity of a candidate guide at each off-target site. From Doench et al. Nature Biotechnology, 2016. Scores range from 0 to 1; higher scores indicate higher predicted Cas9 activity at the off-target site. Scores are returned in two ways: (1)DoenchCFD_maxOT
, the highest score, or the off-target site most likely to be edited, and (2)DoenchCFD_specificityscore
, an aggregate guide-level score that considers all detected off-targets using the aggregation formula 1/(1 + sum(all_ot_scores)). For theDoenchCFD_specificityscore
, a higher score indicates lower predicted off-target activity. According to Haeussler et al. Genome Biology, 2016 the CFD score distinguishes best between validated and false-positive off-targets for catalytic Cas9. IfmaxMismatch=3
, the aggregate CFD score reported by FlashFry will be almost identical to the GuideScan specificity score from Perez et al. Nature Biotechnology, 2017 with the exception that GuideScan excludes all candidate guides with any 1-mismatch off-target sites. -
JostandSantos
- Off-target guide specificity score, developed specifically for CRISPRi. From Jost and Santos et al. bioRxiv, 2019. For off-targets containing multiple mismatches, the relative activity values are multiplied together for each mismatching base when comparing the guide and off-target. Scores are returned in two ways: (1)JostCRISPRi_maxOT
the highest score, or the off-target site with the highest predicted activity, and (2)JostCRISPRi_specificityscore
an aggregate guide-level score that considers all detected off-targets using the aggregation formula 1/(1 + sum(all_ot_scores)). For theJostCRISPRi_specificityscore
, a higher score indicates lower predicted off-target activity. Note that due to design rules established in Horlbeck et al. Elife, 2016 position one (PAM-distal) is an invariant guanine (G) nucleotide and is not considered during off-target scoring.
-
doench2014ontarget
(ranking score) - On-target guide activity score (Rule Set 1) from Doench et al. Nature Biotechnology, 2014 Scores range from 0 to 1; higher scores indicate higher predicted on-target guide activity. Note that Rule Set 1 is deprecated, and instead consider using Rule Set 2 from Doench et al. Nature Biotechnology, 2016 for predicting on-target guide activity. -
moreno2015
(ranking score) - On-target guide activity score developed by Moreno-Mateos et al. Nature Methods, 2015 using Cas9 editing in zebrafish. Scores range from 0 to 1; higher scores indicate higher predicted on-target guide activity. Read Haeussler et al. Genome Biology, 2016 for caveats about this score.
-
rank
(meta score) - This score takes the rank-ordering of any scoring metric included above, finds the median rank, and then ranks all of the guides by their median rank. Scores that are undefined (such as on-target scores that aren't given enough sequence context to score) are tied at the worst rank before calculating the median rank. The top 1000 guides are then ranked using the Schulze voting method. This is intended to help users pick the best aggregate targets across multiple scoring schemes. This is still a bit experimental. -
bedannotator
- annotate the scored output file with associated annotations from a BED file Additional command line options:-
inputAnnotationBed
: the bed file to pull annotation information from. -
transformPositions
: The bedannotator module will attempt to assign annotations by transforming the candidates within the target regions from the bed into the cordinate space specified. Say if you pulled your region from the 1 Mb super enhancer region in front of the human MYC gene, which you called>MYC_Region
in the fasta file. You would then include a BED file where you had a line likechr8 127000000 128000000 MYC_Region
(seperated by tabs) here, and the bed annotator would transform each candidate guide into this coordinate space using the start and stop of the line as offsets into this space.
-
-
dangerous
- annotate sequences that would be difficult to work with. Currently this includes:-
IN_GENOME=X
: The number of times a perfect match target for this guide sequence is seen within the genome of interest. -
GC_X
: flagging sequences that have a high (>75%) or low (<25%) GC content -
PolyT
: guide sequences (subsetted from the target sequences) sequences that have four or more thymine (T) bases in a row. Could potentially terminate polIII transcription early (not an issue with other transcription approaches)
-
-
minot
- Identifies the off-target with the fewest number of mismatches to the candidate guide and reports the number of times that off-targets with an equal number of mismatches appear in the genome (For example,basesDiffToClosestHit
= 1 andclosestHitCount
= 12) -
0-1-2-3-4_mismatch
- Lists the number of detected off-targets with N number of mismatches -
reciprocalofftargets
- mark guides within the target region that are a good off-target to one-another. This can lead to large deletion drop-out, which can confound results -
folding
- Calculate the free energy of the guide associated with this target. If you'd like to see the free energy for shorter guides, you can specify the parameter-shortestGuideEnergy
which will add columns for each guide of lengthshortestGuideEnergy
to the full length. For instance-shortestGuideEnergy 17
with Cas9 will generate free energy calculations for guides of length 17, 18, 19, and 20 bases.