Skip to content

Latest commit

 

History

History
14 lines (8 loc) · 1.76 KB

Methods_CRISPR.md

File metadata and controls

14 lines (8 loc) · 1.76 KB

CRISPR Spacer Database

Spacer database to infer host-virus interactions

Database Compilation

CRISPR spacers were compiled from four distinct sources (1) the CRISPRCasdb, built using CRISPRCasFinder completely assembled genomes from RefSeq, (2) a set of spacers built using CRISPRDetect on all prokaryotic assemblies in NCBI's RefSeq (December, 2017), (3) a set of spacers found in 24345 high-quality metagenome assembled genomes (MAGs) from the human microbiome using MinCED (based on CRT), and (4) a set of spacers from the 24706 species-representative sequences in GTDB found using MinCED. In total this resulted in ???? unique spacer sequences across ???? species and/or metagenome contigs.

Target Discovery in 3000 Metagenomes

Using a set of 3000 diverse assmbled metagenomes, we used BLAST to discover sequences targeted by known spacers in our database. Targeted contigs were then assessed to determine whether they were viral (how??? - other groups).

Validation of Spacers to Discover Host Identity

We used BLAST to match spacers in our database to known viral sequences with known hosts (other group half ????). We then compared spacers source taxon identity to known host taxon to determine how reliably spacer-based host prediction matched our known links in the database.