-
-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.6.x: "API error (500): Could not kill running container" "container 8d67c83751e6 PID 19180 is zombie and can not be killed" #856
Comments
I ran into the exact same issue yesterday with Colima 0.6.0 on a M1pro running on Sonoma (14.1.1) and Lima 0.18.0. I have a default profile which is using vz/virtiofs and i created an additional test profile using qemu/sshfs ( i wanted to see if some hangs on project starts are exclusive to vz or if they happen on qemu as well). on one of my project stops in DDEV on the qemu sshfs profile i ran into the exact same error @rfay ran into. the odd detail with |
Thanks for reporting this. Does it freeze or it only displays the message before stopping the container? |
It does not freeze, and doing another |
also no freezing on my end. and yes the message got displayed. even though it was called a zombie it doesnt behaved like one (except the container would be still present invisible to
and after the error took place i tested:
i then started the same project again. and after the successful start i did another:
the exited container was removed and there was a new ddev-traefik-router container running instead. |
one additional note. the error happened the first time on vz and virtiofs for me now when i wanted to shut down my computer for today. so far i ran into it only on qemu and sshfs. |
We're still seeing this in DDEV automated tests, https://github.com/ddev/ddev/actions/runs/6866895347/job/18674123089#step:13:358 That's colima v0.6.2 on amd64, qemu, ssh-fs, macos 12. Changing tests now to use macos 13. (macos-latest is |
Getting the same failure now using macos-13 in github actions, https://github.com/ddev/ddev/actions/runs/6870719728/job/18686158079?pr=5540#step:13:842
I haven't had it happen to me on my own machine lately so haven't been able to check to see if there are leftover dead containers when this happens. |
@rfay is it the reason for the github action failure? If yes, is there anywhere I can run or reproduce that locally? Or is there any other |
This happens sporadically but quite often. Just |
I'm working on a script to demonstrate this.
Here's the ddev-router (nginx-proxy router) docker inspect`API error (500): Could not kill running container 20e8242d350cefd4cac14709e9561966ae9c372c97e23274b459f97ca16553b6, cannot remove - container 20e8242d350c PID 107081 is zombie and can not be killed. Use the --init option when creating containers to run an init inside the container that forwards signals and reaps processes`
|
Here's a script that demonstrates the problem, at least with qemu (mac m1). https://gist.github.com/rfay/60a1e8d9112d178f6a0e86df027926aa |
I note that the failure here hasn't been observed on any other docker provider. Not previous Colima, Orbstack, Docker Desktop (mac or Windows), docker-ce, etc. I only have speculation about what it could be. I looked in the moby/moby queue to see if there as anything there that might have been fixed after docker 24.0.5, Not finding it. And we used 24.0.5 plenty when it was current. |
Also experiencing this on intel based Mac OS Sonoma.
Containers not removed upon error, running a second time and does remove them |
I updated the script above to capture the pids before the failure point. Partial output:
We see that the pid it's complaining about is 14012, which is traefik, inside the ddev-router container (which is the one that is failing). It does have a parent pid, 13992, which is containerd-shim-runc. AFAICT it's not a zombie then. And I see that several other processes at this time (in other containers, which end up being stopped successfully) are set up exactly the same:
|
I don't think we would have expected otherwise, but v0.6.3 doesn't change the behavior and the script is easily able to demonstrate the problem. |
I note that the Colima is using the Ubuntu 23.10 docker packages instead of the ones from the Docker repository, https://docs.docker.com/engine/install/ubuntu/ - I wonder if that could make any difference? |
I had the script running for about an hour and could not see the error :( But I've encountered it few times in other places without being able to pinpoint or reproduce it.
I would give it a try and see. |
I did try installing docker from their repo, but on |
Can you try with the current development version?
|
I ran colima HEAD through 110 iterations of the breakit.sh script without failure. It was a new colima profile, which makes the test perhaps a little questionable, but it sure looked good. Running full DDEV test suite now in ddev/ddev#5549, that wasn't able to succeed in v0.6.*, so fingers crossed now! |
Looks like the test suite passed 🎉 |
Yay, full test pass, the full DDEV test suite, using colima HEAD, and it hadn't passed since v0.6.0. Thanks! I'm a bit baffled why the normal Ubuntu build would have trouble, but docker-ce keeps up better and is pretty well maintained. https://github.com/ddev/ddev/actions/runs/6914246048?pr=5549 |
Description
With DDEV I've never seen this before in previous versions of Colima, but I've now seen it and it's been reported by others. On
ddev stop
:API error (500): Could not kill running container 8d67c83751e67ee98571cb7c2b64794c30419ef9c9496791d34876dbd040dda3, cannot remove - container 8d67c83751e6 PID 19180 is zombie and can not be killed. Use the --init option when creating containers to run an init inside the container that forwards signals and reaps processes
Version
Colima Version: 0.6.1
Lima Version: 0.18.0
Qemu Version:
Operating System
Output of
colima status
INFO[0000] colima is running using QEMU
INFO[0000] arch: aarch64
INFO[0000] runtime: docker
INFO[0000] mountType: sshfs
INFO[0000] socket: unix:///Users/rfay/.colima/default/docker.sock
Reproduction Steps
It doesn't happen every time, but does happen periodically on
ddev stop
, which is mostly the equivalent ofdocker stop
followed bydocker rm
Expected behaviour
I didn't see this ever before.
Additional context
No response
The text was updated successfully, but these errors were encountered: