Skip to content

Foldseek clustering stuck for days with errors #390

@YFeriel

Description

@YFeriel

Hello,

I am using a protein structure database with Foldseek for clustering, but I encountered an issue with the following command:
foldseek cluster /data/foldseek/concat_db /data/db_clusters /data/cluster_tmp_dir

I received this error:
structurerescorediagonal /data/foldseek/concat_db /data/foldseek/concat_db /data/cluster_tmp_dir/4804289747088079168/pref /data/cluster_tmp_dir/4804289747088079168/pref_rescore1 --exact-tmscore 0 --tmsc>[=
Can not write to data file p
Can not write to data file /data/cluster_tmp_dir/4804289747088079168/pref_rescore1.29
...
Error: Rescore with hamming distance step died

Initially, I suspected it could be due to disk space or memory issues, so I added the --remove-tmp-files 1 option. While this resolved potential disk space concerns, the runtime increased significantly. The clustering process has now been running for over seven days and remains stuck at the same step:

structurerescorediagonal /data/foldseek/concat_db /data/foldseek/concat_db /data/cluster_tmp_dir/4804289747088079168/pref /data/cluster_tmp_dir/4804289747088079168/pref_rescore1 --exact-tmscore 0 --tmsc>[=

Could this error be related to disk space or memory limitations, or might it indicate a different issue? Also, are there any optimizations or alternative approaches you would recommend to reduce the runtime and avoid prolonged processing times like this?

Thank you for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions