Spacer database to infer host-virus interactions
CRISPR spacers were compiled from four distinct sources (1) the CRISPRCasdb, built using CRISPRCasFinder completely assembled genomes from RefSeq, (2) a set of spacers built using CRISPRDetect on all prokaryotic assemblies in NCBI's RefSeq (December, 2017), (3) a set of spacers found in 24345 high-quality metagenome assembled genomes (MAGs) from the human microbiome using MinCED (based on CRT), and (4) a set of spacers from the 24706 species-representative sequences in GTDB found using MinCED. In total this resulted in ???? unique spacer sequences across ???? species and/or metagenome contigs.
Using a set of 3000 diverse assmbled metagenomes, we used BLAST to discover sequences targeted by known spacers in our database. Targeted contigs were then assessed to determine whether they were viral (how??? - other groups).
We used BLAST to match spacers in our database to known viral sequences with known hosts (other group half ????). We then compared spacers source taxon identity to known host taxon to determine how reliably spacer-based host prediction matched our known links in the database.