-
Notifications
You must be signed in to change notification settings - Fork 121
Description
Hi ,
I have been studying your paper AgentEvolver and I find it to be a very impressive and valuable piece of work.
However, I encountered a issue while reproducing the code with the AppWorld environment. The memory usage on my node spiked to 1.4 TB, triggering Ray's 0.95 threshold and causing the worker to be killed.
It appears to be a memory leak. The logs show that two single RemoteEnv processes consumed over 500 GB of memory each, which is abnormal.
Logs are shown below:
Memory on the node ... was 1406.36GB / 1480.00GB (0.950242)
Top 10 memory users:
PID MEM(GB) COMMAND
70473 536.45 ray::RemoteEnv.step <-- !!!
70355 533.25 ray::RemoteEnv.step <-- !!!
Have you encountered this excessive memory consumption during your experiments? Could you please advise on which part of the code I should inspect to fix this leak?