Skip to content

FranceCosta/AF2Fix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code of "Keeping it in the family: Using protein family templates to rescue low confidence AlphaFold2 models" paper

Data:

Graphical diagram

Graphical scheme

Dependencies

The pipeline is designed to run on an high perfromance computing cluster through SLURM via NEXTFLOW (version 23.04.1).

ColabFold database

Download ColabFold databases as explained here and place them in assets

PfamxAlphaFold

The alphafold-pfam database should be created using these instructions and placed in assets.

Pfam

The Pfam 35 MySQL database can be downloaded here and and each file of the database imported as follows:

zcat file.sql.gz | mysql -u <user> -p pfam_35

Once the pfam database has been created, pass the hostname in "prams.pfamhost", the username in "params.pfamuser", the password in "params.pfampassword" and the port in "params.pfamport" for the database in the pipeline.nf file header.

Change script permissions

Allow execution of scripts by nextflow with:

chmod +x bin/*

Run the pipeline

To reproduce the results obtained in the paper, run the following:

sbatch < scripts/run.sh

To run the pipeline on a customised set of proteins, use this script as example. Note that you will need to specify the proteins and the domain to be used for each protein.

Other scripts

The scripts contained in scripts were also adopted:

  • estimate_co2.sh was used to estimate the amount of CO2 produced with the computation;
  • get_distribution.py was used to extract the whole plDDT pfam distributions;
  • get_domain_info.sh was used to extract the information about domains considered in the paper;
  • seed_AF2.py was used to run AF2 with multiple seeds;

The images can be reproduced using this notebook after downloading the results folder from here and uncompressing it.

The workflow image was obtained with draw.io. Dependencies needed for image generation: python 3.8, seaborn (version 12.2), pandas (version 1.5.3), matplotlib (version 3.6.2), numpy (version 1.24.3), Biopython (version 1.81), scikit-learn (version 1.2.2),

The diagram was generated with https://app.diagrams.net

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published