-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pytest-xdist interop: signal only works in main thread #8
Comments
Original comment by Floris Bruynooghe (Bitbucket: flub, GitHub: flub). Oops, apologies to miss the original report! It seems like execnet must have changed where xdist's DSession is run and it is no longer run in the main thread. Not immediately sure what to do about this but I think you should be able to work around this by using |
Original comment by Karthik Borkar (Bitbucket: [Karthik Borkar](https://bitbucket.org/Karthik Borkar), ). Any update on this issue? hitting this regularly |
Original comment by Karthik Borkar (Bitbucket: [Karthik Borkar](https://bitbucket.org/Karthik Borkar), ). @flub From https://pypi.python.org/pypi/pytest-timeout the |
Original comment by Floris Bruynooghe (Bitbucket: flub, GitHub: flub). But it is the most reliable way of using pytest-timeout. pytest-timeout aims to be a last-resort attempt at finding problems that you otherwise would just be stuck with and not get any useful infor from. It's kind of misleading to think you can still continue your test run after a timeout occurred. And I do wonder if the default method is wrong because of this. |
Original comment by Miguel Sánchez de León Peque (Bitbucket: Peque, GitHub: Peque). I am getting the same kind of errors when using https://travis-ci.org/Peque/osbrain/jobs/197634358 Using |
Original comment by Bulat Gaifullin (Bitbucket: bgaifullin, GitHub: bgaifullin). the issue affects me too. |
Original comment by Floris Bruynooghe (Bitbucket: flub, GitHub: flub). So currently it seems pytest-timeout and pytest-xdist seem to work together fine, I ran this:
And the timeouts work. So I'll close this unless someone reopens with a reproducible example. Thanks |
Original comment by Darius Lapūnas (Bitbucket: DariusL, GitHub: DariusL). Can still reproduce this. With pytest < 3.3 it would happen nearly every run, now it happens every fifth run or so.
Full output:
Keep in mind, I'm not actually triggering the timeout, it's much larger than the test can reasonably take. Runs fine without pip list:
Running on Ubuntu 16.04. I have not noticed this issue on Windows or Mac. |
Original comment by Floris Bruynooghe (Bitbucket: flub, GitHub: flub). Sorry, with your sample file in a loop I still can't reproduce this. I did a Have you tried this in a complete clean environment? Or is there anything else hidden, like a config file or so (as otherwise pytest-forked wouldn't do anything) |
Original comment by Darius Lapūnas (Bitbucket: DariusL, GitHub: DariusL). Ok, so I tried redoing everything from scratch, these are my steps:
It failed on the second run. I also checked I'm not sure what other system information could be relevant. I'm seeing this both on my Ubuntu 16.04 VM (running in Parallels on a Mac) and on our Jenkins servers, also Ubuntu 16.04. |
Original comment by Miguel Sánchez de León Peque (Bitbucket: Peque, GitHub: Peque). I too think this should be reopened. See: https://travis-ci.org/opensistemas-hub/osbrain/jobs/379614919 With:
That is the result of trying to reintegrate |
Original comment by pombredanne NA (Bitbucket: pombredanne, GitHub: pombredanne). I am not sure that pytest-timeout has anything to do with this directly. FWIW I am getting the same kind of errors in scancode-toolkit test runs... but ONLY on Travis/Linux and not always.... kinda like every other run like others have reported here... and re-running the failed run makes the issue go away. See https://travis-ci.org/nexB/scancode-toolkit/jobs/458977684#L1189 The tests I run are using pytest and xdist AND the underlying code supports timeout-based interruption in multiprocessing . It could well be that the code I used for timeouts is using the same calls that pytest-timeout... which could explain why I am facing this too. In any case, I thought posting here could help |
Original comment by pombredanne NA (Bitbucket: pombredanne, GitHub: pombredanne). ok, my timeout handling code is functionally the same set of calls that pytest-timeout uses. So it makes sense to experience the same issues. |
Original comment by pombredanne NA (Bitbucket: pombredanne, GitHub: pombredanne). Now the interesting part is that pytest-xdist does not seem to be part of the problem either as Darius said above https://bitbucket.org/pytest-dev/pytest-timeout/issues/8/pytest-xdist-interop-signal-only-works-in#comment-45439827 So this seems to boil down t an issue between pytest and timout signals |
Any news about this? |
Not sure if this will help or not, but I found some python bugs related to this: I don't have knowledge enough to figure out how to fix this, but if someone could give me some direction I could try to submit a PR for this. |
I am facing this issue too |
I've re-read this whole thread again. Most of these complaints seem to me like some other environments which are making pytest run in not the main thread and/or other signal handling from the application interfering. I'm going to close this again as I don't think this is overly useful currently. |
I encountered this problem also with the combination of a CI, xdist and timeout. Signals crashed regularly and threads made the test run stuck at 100%. So in the script my CI worker was executing, I set xdist to use all, but one core. This left one core/thread idle, which I hoped could be used by the OS for whatever, thus ensuring, that all the test-runners are undisturbed and can all claim their own main thread. It has worked so far. For linux: EDIT: It dramatically reduced the frequency, but the error does still occur. |
I know this is an old thread, and appreciate the unhelpfulness of "me too" reports, but we just ran into this several-CI-runs-in-a-row, with basically the same stacktrace as the original (Jenkins on If it's just happening for tests that have timed out then it's not much of a problem, it would have failed anyway. But the stacktrace looks like it's a problem registering the handler, rather than executing it? I seem to be able to quite reliably reproduce with a clean environment via Docker: from ubuntu:latest
RUN apt update && apt install -y python3 python3-pip && pip3 install pytest pytest-xdist pytest-timeout
RUN printf '\
from time import sleep\n\
import pytest\n\
@pytest.mark.timeout(10000)\n\
@pytest.mark.parametrize("value", range(0, 20))\n\
def test_timeout(value):\n\
sleep(5)\n\
pass' >> t.py
RUN echo "set -e; while true; do pytest t.py -v -n20; done" > dotest.sh This seems to reproduce it pretty reliably, every time:
Where |
Some more notes:
pathlib.Path(f"{os.getpid()}-{threading.current_thread().ident}.log").write_text(f"""
{"GOOD" if threading.current_thread() == threading.main_thread() else "BAD"}
{threading.current_thread().ident=}
{threading.main_thread().ident=}
{os.getpid()=}
{"".join(traceback.format_stack())}
""") and the tracebacks are identical for good & bad except the bad instances don't have the stack before
which, reading https://github.com/pytest-dev/execnet/blob/master/execnet/gateway_base.py#L267 - seems to indicate that sometimes this is called explicitly as a primary thread, and in some combination of cases I'm not yet certain on, execnet spawns a worker as a non-primary-thread. I'm not sure yet if this is a problem with |
So, ndevenish@bf39d0a fixes this issue, in the most trivial way - it falls back to If this was the route to take - and I suspect it's significantly simpler than finding the cause in xdist/execnet - then maybe a third "auto" mode (or debug "signalwithfallback") could be added and explicitly used by the people who seem to be running into this more often. I'm not sure if this thread is monitored so if not, I'll make a PR to start the conversation at some point. |
Hi @ndevenish, thanks for the explorations and additional info! The commit you point to is very interesting indeed. I'm a bit surprised that with xdist threaded mode can hang, I that'd be interesting to understand better as well. Certainly do take this to a PR, at first glance I'm a bit torn on automatically switching to threaded vs just erroring out and forcing someone to decide this explicitly. But it would be nice to improve usability somehow and you found a pretty promising way I think. (also, apologies I'm not always the most active, responses can take a while as you just noticed) |
I have no expectations for reply times for open source projects! I'll try to pull a PR together, for discussion if nothing else. |
@flub this should be fixed (mostly) on execnet==1.8.0 see also pytest-dev/pytest-xdist#620 |
Ah, that's good to hear. I was settling on "Error, but with a more helpful suggestion" as the least-worst way to "fix", and didn't want to dig into execnet. Evidently it annoyed someone further than it annoyed me :) |
@graingert oh, that's very nice work you did! Might finally fix this issue properly and would mean we don't need a workaround here. |
Yes, we've gotten ever since using xdist + timeout together. It's fairly random:
|
@flub Can you close this? |
I guess so, pytest-timeout now recognises this situation and will refuse to use the signal timeout method, falling back to the thread method. |
Original report by Buck Evan (Bitbucket: bukzor, GitHub: bukzor).
I'm getting this kind of error with increasing frequency as I add tests to my project.
Full detail:
The text was updated successfully, but these errors were encountered: