Version 3.0.1 provides new features, improves performance, improves documentation, and fixes bugs. We thank everyone who reported issues, proposed new features, and submitted PRs!
Change log
Below is a selection of new features, performance improvements and bug fixes. Full change log: v3.0.0...v3.0.1.
New features
- It is now possible to run template-free, but still search for MSA.
- It is now possible to run template search when MSA provided.
- It is now possible to specify MSA and templates as external files instead of inline in the input JSON.
- Added these new flags:
--max_template_date
,--diffusion_num_samples
,--num_recycles
,--num_seeds
. - Exposed
--conformer_max_iterations
as a flag to allow increasing number of RDKit iterations. - Added an option to return model embeddings and write them to a file.
- Made it possible to set which GPU to use on a multi-GPU system.
- The per-residue pLDDT is now added in the output mmCIF.
Performance improvements
- Replaced the database download script with a significantly faster one and added GCP post-processing scripts.
- Improved the Stockholm converter to not read more sequences than necessary when converting to A3M.
- Improved the Stockholm to A3M conversion to not read the whole file into RAM.
- Sped up template search by not reading and parsing irrelevant mmCIFs. Very significant if PDB stored on a slow filesystem.
- Refactored input JSON parsing as iterator, so that inputs are not all loaded into memory at once and cause an OOM.
- Added sequence deduplication when outputting the JSON, making it smaller for homomers.
Bug fixes
- The data pipeline is now skipped if blank MSA + templates provided.
- Silenced irrelevant Numpy warnings.
- Improved Singularity installation instructions.
- Fixed Pallas GLU kernel on large inputs and updated documentation for compilation buckets.
- Fixed setup to work with Python 3.12.
- If the output directory already exists, AlphaFold won't overwrite files in it, instead it will create a new one.
- Convert DATIVE bond type to SINGLE to match bond types used during training.
- Convert atom element to uppercase in embedding name to match training.
- Fixed user and permissions when untar-ing the PDB mmCIF files.
- Explicitly use
conformer_id
when getting generated conformers. - Fixed A3M -> Stockholm conversion when description contains tabs.
- Sort the input files returned by glob - otherwise in arbitrary order.
- Ensure user CCD contains all necessary fields and improve docs.
- Fixed incorrect handling of two-letter atoms in SMILES ligands.
- Use OpenEye canonical SMILES when available.
- Raise clear errors if cannot create molecule definitions.
- Add XLA flag workaround for CUDA Capability 7.x GPUs.
- Small fix for numerical precision when handling PIDs in Pallas kernel.
- Deal properly with non-standard residues in the single-letter sequence.
- Fixed RASA calculation to work with arbitrary chain IDs.
- Use true chain type for paired MSA creation.
We thank the following people for their pull requests and/or great suggestions:
@alchemistcai, @amorehead, @aozalevsky, @ccoulombe, @dailypartita, @DanFosing, @davidecarlson, @eltociear, @hegelab @jkosinski, @jpcartailler, @jpcartailler, @linuxfold, @lucajovine, @noghte, @orbsmiv, @Saharsha-N, @smg3d, @vatese, @YoshitakaMo.