Skip to content

[BUG] dpgen is stucked in the stage of "data stating... (this step may take long time)" #5117

@LingjunWu92

Description

@LingjunWu92

Bug summary

Dear Developers,

I found that strangely my dpgen run is stucked at the stage of "data stating... (this step may take long time)", and sometimes lasted for even 20 hours (I don't know what will happen after 20 hours as I just killed the job then), and while it stucked, no outputs were produced, also no errors were reported either. However, the init_bulk stage is totally fine, and I have also tried the examples downloaded from the deepmd website, which turned out absolutely successful either in the init_bulk stage or in the run stage, so it seems very strange why my task just stucked at the run stage. I uploaded my input and ouput data as attached, could you please give me some advice on what might go wrong? Thank you so much.

Best wishes,
Lingjun

DeePMD-kit Version

3.1.1

Backend and its version

TensorFlow 2.19.1

How did you download the software?

Offline packages

Input Files, Running Commands, Error Log, etc.

issue.zip

The attached zip file includes all related files for the dpgen task.

Steps to Reproduce

I just used "dpgen init_bulk param.json machine.json" for the init_bulk stage, and "dpgen run param.json machine.json" for the run stage. The param.json and machine.json files for each stage are uploaded before in the zip file.

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugreproducedThis bug has been reproduced by developers

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions