You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run a RayCluster(Ray 2.39.0) using KubeRay(1.2.2), and submit many job to it. I discover that there many zombie process left after the job is finished.
The zombie processes cause some psutil methods runs vary slow
It will leave 2 zombie processes when I submit one job. For more detail, when I submit a job, the JobSupervisor will start up at head node to hold the job, JobSupervisor(pid=152424) will run 2 subprocesses:
/bin/bash -c python numpy-cpu-job-actor.py, pid is 152834
/bin/bash -c while kill -s 0 152424; do sleep 1; done; kill -9 -152824, pid is 152836
When the job is finished, 152424 & 152834 is exited, but leave 152836 and its subprocess zombie: 1)[sh] <defunct>; 2) [sleep] <defunct>
I run a RayCluster(Ray 2.39.0) using KubeRay(1.2.2), and submit many job to it. I discover that there many zombie process left after the job is finished.
The zombie processes cause some psutil methods runs vary slow
It will leave 2 zombie processes when I submit one job. For more detail, when I submit a job, the
JobSupervisor
will start up at head node to hold the job,JobSupervisor
(pid=152424) will run 2 subprocesses:/bin/bash -c python numpy-cpu-job-actor.py
, pid is 152834/bin/bash -c while kill -s 0 152424; do sleep 1; done; kill -9 -152824
, pid is 152836When the job is finished, 152424 & 152834 is exited, but leave 152836 and its subprocess zombie: 1)
[sh] <defunct>
; 2)[sleep] <defunct>
Code of numpy-cpu-job-actor.py is
And the submit command is
ray job submit --working-dir . -- python numpy-cpu-job.py
I’m wondering if I did something wrong that caused this, of if this is a community bug?
The text was updated successfully, but these errors were encountered: