Code of "Keeping it in the family: Using protein family templates to rescue low confidence AlphaFold2 models" paper
- Table with domains showing plDDT bimodal distributions
- Comparison with AF3
- Other data are available at:
The pipeline is designed to run on an high perfromance computing cluster through SLURM via NEXTFLOW (version 23.04.1).
Download ColabFold databases as explained here and place them in assets
The alphafold-pfam database should be created using these instructions and placed in assets.
The Pfam 35 MySQL database can be downloaded here and and each file of the database imported as follows:
zcat file.sql.gz | mysql -u <user> -p pfam_35
Once the pfam database has been created, pass the hostname in "prams.pfamhost", the username in "params.pfamuser", the password in "params.pfampassword" and the port in "params.pfamport" for the database in the file header.
Allow execution of scripts by nextflow with:
chmod +x bin/*
To reproduce the results obtained in the paper, run the following:
sbatch < scripts/
To run the pipeline on a customised set of proteins, use this script as example. Note that you will need to specify the proteins and the domain to be used for each protein.
The scripts contained in scripts were also adopted:
- was used to estimate the amount of CO2 produced with the computation;
- was used to extract the whole plDDT pfam distributions;
- was used to extract the information about domains considered in the paper;
- was used to run AF2 with multiple seeds;
The images can be reproduced using this notebook after downloading the results folder from here and uncompressing it.
The workflow image was obtained with Dependencies needed for image generation: python 3.8, seaborn (version 12.2), pandas (version 1.5.3), matplotlib (version 3.6.2), numpy (version 1.24.3), Biopython (version 1.81), scikit-learn (version 1.2.2),
The diagram was generated with