-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about Post-Translational Modifications (PTMs) in Protein Prediction #54
Comments
I spent some time analyzing the source code and found that: During the alphafold3/src/alphafold3/model/features.py Lines 444 to 445 in 2ffe43f
This relies on
alphafold3/src/alphafold3/common/folding_input.py Lines 912 to 914 in 2ffe43f
alphafold3/src/alphafold3/common/folding_input.py Lines 235 to 236 in 2ffe43f
This means that the In the alphafold3/src/alphafold3/structure/structure.py Lines 1940 to 1945 in 2ffe43f
So, if I understand correctly, if a CCD code from the Then I further checked Therefore, I think there should be a restriction on which CCD codes can be used in PTM (e.g. they must be recorded in Please let me know if I’m wrong. Thanks. |
"MAN" is a glycan and should be defined as a bonded ligand, see https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md#bonds Please note that converting AlphaFold-Server JSONs containing glycans is not currently supported, see https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md#glycans For PDB examples where a cif already exists, one can create the input json from the cif using from_mmcif in the folding_input class: https://github.com/google-deepmind/alphafold3/blob/main/src/alphafold3/common/folding_input.py#L795C7-L795C17 (we will add this info to the input docs soon) |
On, your follow up message, thanks for digging into the code!
Good find - we will look into this. |
@wtni-gidle I encountered a similar question while working on my modified sequence. Following your guidance, I successfully ran a modified example. Here’s the case: If you want to apply modifications at positions 2 and 5, changing them to To resolve this, you can refer to the reversed mapping dictionary from |
I am just looking at this for the first time today, I am confident we will have an elegant solution with the x mutation in the run_inferance in due time. I am just good at typing creative solutions, my Old Man is M.D.. Let me load up pycharm. |
Small update: thanks again for reporting this. We are working on a fix, which should land by the end of the week. |
If one manually adds an entry to the dictionary (e.g. 'TYC': 'Y'), will the error be fixed? |
any luck on this? Thanks for looking at this btw, also facing this issue |
This has been fixed in 44dee65, sorry for the delay! Please clone the latest commit and rebuild your AlphaFold 3 container to get the fix. |
Hello @Augustin-Zidek, I would appreciate your input - I just tried to predict the structure of a protein with a single PTM (my user-provided CCD) and no additional molecules, leading to the same error message:
I am not sure if there is an issue with my syntax, or if this issue persists. Here is the json I provide from my inference-only script, which fails with the above error (delete the MSA and template segments here due to size): {
"dialect": "alphafold3",
"version": 1,
"name": "5P9J",
"sequences": [
{
"protein": {
"id": "A",
"sequence": "GKNAPSTAGLGYGSWEIDPKDLTFLKELGTGQFGVVKYGKWRGQYDVAIKMIKEGSMSEDEFIEEAKVMMNLSHEKLVQLYGVCTKQRPIFIITEYMANGCLLNYLREMRHRFQTQQLLEMCKDVCEAMEYLESKQFLHRDLAARNCLVNDQGVVKVSDFGLSRYVLDDEYTSSVGSKFPVRWSPPEVLMYSKFSSKSDIWAFGVLMWEIYSLGKMPYERFTNSETAEHIAQGLRLYRPHLASEKVYTIMYSCWHEKADERPTFKILLSNILDVMDEES",
"modifications": [ {"ptmType": "M1", "ptmPosition": 101}],
}
}
],
"modelSeeds": [
1
],
"bondedAtomPairs": null,
"userCCD": "data_M1\n#\n_chem_comp.id M1\n_chem_comp.name af3_enrich\n_chem_comp.type non-polymer\n_chem_comp.formula ?\n_chem_comp.mon_nstd_parent_comp_id ?\n_chem_comp.pdbx_synonyms ?\n_chem_comp.formula_weight ?\n_chem_comp.pdbx_smiles '[H]OC(=O)C([H])(N([H])[H])C([H])([H])SC([H])([H])C([H])([H])C(=O)N1C([H])([H])C([H])([H])C([H])([H])[C@@]([H])(n2nc(-c3c([H])c([H])c(Oc4c([H])c([H])c([H])c([H])c4[H])c([H])c3[H])c3c(N([H])[H])nc([H])nc32)C1([H])[H]'\n#\nloop_\n_chem_comp_atom.atom_id\n_chem_comp_atom.charge\n_chem_comp_atom.pdbx_leaving_atom_flag\n_chem_comp_atom.comp_id\n_chem_comp_atom.pdbx_model_Cartn_x_ideal\n_chem_comp_atom.pdbx_model_Cartn_y_ideal\n_chem_comp_atom.pdbx_model_Cartn_z_ideal\n_chem_comp_atom.type_symbol\nC1 0 N M1 -4.963 -0.007 1.670 C\nCov 0 N M1 -6.443 0.329 1.834 C\nSG 0 N M1 -7.505 -0.328 0.492 S\nCB 0 N M1 -7.027 0.750 -0.914 C\nCA 0 N M1 -7.374 2.235 -0.729 C\nC 0 N M1 -7.097 2.949 -2.052 C\nOXT 0 N M1 -5.792 3.234 -2.234 O\nO 0 N M1 -7.915 3.223 -2.918 O\nN 0 N M1 -8.782 2.460 -0.320 N\nC2 0 N M1 -4.621 -1.440 2.057 C\nO1 0 N M1 -5.447 -2.169 2.607 O\nN1 0 N M1 -3.312 -1.871 1.810 N\nC3 0 N M1 -3.074 -3.312 1.938 C\nC4 0 N M1 -2.382 -1.142 0.932 C\nC5 0 N M1 -1.615 -3.647 2.227 C\nC6 0 N M1 -0.908 -1.473 1.216 C\nC7 0 N M1 -0.678 -2.987 1.220 C\nN2 0 N M1 0.048 -0.795 0.350 N\nC8 0 N M1 -0.052 -0.526 -0.987 C\nN3 0 N M1 1.225 -0.399 0.884 N\nC9 0 N M1 1.135 0.094 -1.347 C\nN4 0 N M1 -1.092 -0.795 -1.796 N\nC10 0 N M1 1.918 0.101 -0.154 C\nC11 0 N M1 1.241 0.519 -2.678 C\nC12 0 N M1 -0.866 -0.373 -3.050 C\nC13 0 N M1 3.293 0.561 0.036 C\nN5 0 N M1 2.315 1.220 -3.208 N\nN6 0 N M1 0.217 0.271 -3.521 N\nC14 0 N M1 3.624 1.347 1.149 C\nC15 0 N M1 4.302 0.205 -0.870 C\nC16 0 N M1 4.937 1.788 1.338 C\nC17 0 N M1 5.614 0.648 -0.684 C\nC18 0 N M1 5.935 1.429 0.430 C\nO2 0 N M1 7.191 1.938 0.673 O\nC19 0 N M1 8.270 1.160 0.322 C\nC20 0 N M1 9.208 1.710 -0.550 C\nC21 0 N M1 8.471 -0.107 0.872 C\nC22 0 N M1 10.335 0.969 -0.910 C\nC23 0 N M1 9.599 -0.846 0.512 C\nC24 0 N M1 10.527 -0.309 -0.383 C\nH1 0 N M1 -4.391 0.641 2.345 H\nH2 0 N M1 -4.639 0.198 0.647 H\nH3 0 N M1 -6.579 1.413 1.884 H\nH4 0 N M1 -6.828 -0.072 2.777 H\nH5 0 N M1 -5.960 0.622 -1.116 H\nH6 0 N M1 -7.551 0.356 -1.793 H\nH7 0 N M1 -6.737 2.695 0.034 H\nH8 0 N M1 -5.777 3.693 -3.102 H\nH9 0 N M1 -9.059 1.693 0.297 H\nH10 0 N M1 -9.379 2.361 -1.147 H\nH11 0 N M1 -3.697 -3.735 2.732 H\nH12 0 N M1 -3.389 -3.777 0.995 H\nH13 0 N M1 -2.679 -1.407 -0.087 H\nH14 0 N M1 -2.523 -0.063 1.044 H\nH15 0 N M1 -1.360 -3.301 3.236 H\nH16 0 N M1 -1.473 -4.733 2.214 H\nH17 0 N M1 -0.692 -1.100 2.229 H\nH18 0 N M1 -0.848 -3.404 0.221 H\nH19 0 N M1 0.361 -3.215 1.486 H\nH20 0 N M1 -1.654 -0.567 -3.770 H\nH21 0 N M1 2.048 1.670 -4.076 H\nH22 0 N M1 2.809 1.808 -2.549 H\nH23 0 N M1 2.861 1.617 1.877 H\nH24 0 N M1 4.080 -0.424 -1.731 H\nH25 0 N M1 5.182 2.403 2.201 H\nH26 0 N M1 6.372 0.376 -1.414 H\nH27 0 N M1 9.057 2.707 -0.953 H\nH28 0 N M1 7.758 -0.521 1.581 H\nH29 0 N M1 11.063 1.389 -1.598 H\nH30 0 N M1 9.757 -1.836 0.933 H\nH31 0 N M1 11.406 -0.884 -0.662 H\n#\nloop_\n_chem_comp_bond.atom_id_1\n_chem_comp_bond.atom_id_2\n_chem_comp_bond.comp_id\n_chem_comp_bond.pdbx_aromatic_flag\n_chem_comp_bond.pdbx_stereo_config\n_chem_comp_bond.value_order\nC1 Cov M1 N N SING\nCov SG M1 N N SING\nSG CB M1 N N SING\nCB CA M1 N N SING\nCA C M1 N N SING\nC OXT M1 N N SING\nC O M1 N N DOUB\nCA N M1 N N SING\nC1 C2 M1 N N SING\nC2 O1 M1 N N DOUB\nC2 N1 M1 N N SING\nN1 C3 M1 N N SING\nC4 N1 M1 N N SING\nC3 C5 M1 N N SING\nC6 C4 M1 N N SING\nC5 C7 M1 N N SING\nC7 C6 M1 N N SING\nC6 N2 M1 N N SING\nN2 C8 M1 Y N SING\nN3 N2 M1 Y N SING\nC8 C9 M1 Y N DOUB\nN4 C8 M1 Y N SING\nC10 N3 M1 Y N DOUB\nC9 C11 M1 Y N SING\nC9 C10 M1 Y N SING\nC12 N4 M1 Y N DOUB\nC10 C13 M1 N N SING\nC11 N5 M1 N N SING\nC11 N6 M1 Y N DOUB\nN6 C12 M1 Y N SING\nC13 C14 M1 Y N DOUB\nC15 C13 M1 Y N SING\nC14 C16 M1 Y N SING\nC17 C15 M1 Y N DOUB\nC16 C18 M1 Y N DOUB\nC18 C17 M1 Y N SING\nC18 O2 M1 N N SING\nO2 C19 M1 N N SING\nC19 C20 M1 Y N DOUB\nC21 C19 M1 Y N SING\nC20 C22 M1 Y N SING\nC23 C21 M1 Y N DOUB\nC22 C24 M1 Y N DOUB\nC24 C23 M1 Y N SING\nC1 H1 M1 N N SING\nC1 H2 M1 N N SING\nCov H3 M1 N N SING\nCov H4 M1 N N SING\nCB H5 M1 N N SING\nCB H6 M1 N N SING\nCA H7 M1 N N SING\nOXT H8 M1 N N SING\nN H9 M1 N N SING\nN H10 M1 N N SING\nC3 H11 M1 N N SING\nC3 H12 M1 N N SING\nC4 H13 M1 N N SING\nC4 H14 M1 N N SING\nC5 H15 M1 N N SING\nC5 H16 M1 N N SING\nC6 H17 M1 N N SING\nC7 H18 M1 N N SING\nC7 H19 M1 N N SING\nC12 H20 M1 N N SING\nN5 H21 M1 N N SING\nN5 H22 M1 N N SING\nC14 H23 M1 N N SING\nC15 H24 M1 N N SING\nC16 H25 M1 N N SING\nC17 H26 M1 N N SING\nC20 H27 M1 N N SING\nC21 H28 M1 N N SING\nC22 H29 M1 N N SING\nC23 H30 M1 N N SING\nC24 H31 M1 N N SING\n#\n"
} |
@YoavShamir5 thanks for reporting, this is a bug I will fix. The workaround for now is to change the query sequence in position 101 from |
@Augustin-Zidek This does not sound like a good workaround as the substitution to 'X' has to be made before the search phase and will thus influence the scoring. The better alternatives are either to change the residue to 'X' it in the alignments (the search phase has to be done as a separate call) or to name the PTM with one of the ccd names mapping to the amino acid you are modifying from CCD_NAME_TO_ONE_LETTER. To be entirely honest though, it feels like a very straightforward bug to fix. |
Thanks for providing the AF3 source!
To test AlphaFold3 using the example 7BBV provided by AlphaFold3 server, I used the following JSON file as input:
However, during the
run_inference
stage, the following error occurred:Upon inspecting the error, I observed that:
Is it necessary for me to modify the input file, or is there a bug in the source code?
Thanks.
The text was updated successfully, but these errors were encountered: