-
-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bazel coverage
fails when using certain Python packages
#2575
Comments
Thanks for the report. I've spent a bit of time reproducing this and I am a bit stumped... I'm able to reproduce the failure, but Im not finding any clues, even if I disable sandboxing and enable debugging. Do you have any other ideas or have you noticed anything specific about the failures? I notice that these are more "complex" packages that may require a GPU. Is it all native code packages where this occurs? Or is it only these specific torch packages? I imagine we don't have the same issue with pure python packages? |
Hi @groodt , thanks for looking into this.
Yeah that's the tricky part. Not enough debugging messages in this part of the python_bootstrap_template.txt. I'll need to look into this part further, though I'm still familiarizing myself with Bazel's internals.
From my latest investigation, it would seem that the issue boils down to these modules being imported (e.g. imported when something like from torchvision import utils from torch.distributed.tensor.parallel.style import (
ColwiseParallel,
RowwiseParallel,
) But when looking at such files, no peculiar thing about them (apart from maybe some CUDA stuff). Also tried to create the module locally and importing them but to no avail:
Yes it would seem so. It appears pytorch uses Bazel and even supports code coverage: https://pytorch.org/xla/master/contribute/bazel.html#code-coverage. However, it looks like it has some special handling for C++ code in its BUILD definitions. |
Probably related: pytorch/pytorch#112903 |
Can we try reproduce using some other native code. Perhaps pydantic? rules_python does not work well with packages that assume a site-packages layout, and at $dayjob, we are carrying some patches for torch to work in rules_python. I'm wondering, but can't confirm, if this is related. It would be good to know the following:
|
No, I don't think it's related at all. Your reproduction in the repo is using torch as a PyPI dep, not as a bazel dep. |
Can confirm that these packages are all okay:
It would seem it has something to do with PyTorch's Dynamo. All other packages like importing For instance, if we have the following in test.py, then import torch
def add(x, y):
return x + y
add(torch.randn(10), torch.randn(10)) However, if we add @torch.compile
def add(x, y):
return x + y Moreover, we can also force the error by simply having: import torch._dynamo Seems to be closer and closer to the issue. Will keep digging. |
Very interesting! Starts to smell a bit funky. Perhaps some weirdness between how coverage.py works using trace functions and the way that torch.compile works. See torch compiler deep dive and PEP 523 Frame Evaluation Api |
If we can't figure it out. It does seem like it's possible to provide exclusions in coverage.py: https://coverage.readthedocs.io/en/7.6.10/excluding.html# or using In fact, why is it attempting coverage on third-party code... I'm very surprised I can't find anything relevant in a google search. So Im guessing this problem doesn't exist when code is run outside bazel? |
I've tried this but it seems if we try to turn off coverage.py for the affected imports, the error is still there: import torchvision # pragma: no cover Though not sure if coverage.py is still being used here for some reason, will need to double check.
I think it has something to do with how coverage.py's behavior (ref):
Yeah, it's quite Bazel specific 😆 except for PyTorch's own way of building itself in Bazel: https://github.com/pytorch/pytorch/blob/main/BUILD.bazel |
Well, this does hint at it being a possible issue: |
I don't see anything relevant to coverage in the pytorch repo? Can you point out what you mean? There's nothing to indicate to me that they are running |
Sorry, it seems they have their own way of grabbing the coverage: https://github.com/pytorch/pytorch/tree/58cc6693cb4a3f63af7d05ccdae08588752f7cf0/tools/code_coverage. |
Yes, they seem to have a plugin. And then in the torch XLA repo, they seem to have a bazel coverage setup, but that will be for their first-party code I think. Not code pulled in as a third party dep? You could still try similar "instrumentation_filter" etc. Im not sure thats the issue. https://github.com/pytorch/xla/blob/93a2ba6be67c9d22e81a2026b6cb35c993ead705/.bazelrc#L131 https://github.com/pytorch/xla/blob/93a2ba6be67c9d22e81a2026b6cb35c993ead705/.bazelrc#L131
The torch_xla repo seems to exclude //tests from coverage, and thats the only place I can find any usage of |
We should also try to see if |
So after a bit more digging and tracing the code, it would seem that it's an issue with coverage.py (I can reproduce it outside of Bazel; There are 2 steps for Bazel to produce the lcov data coverage file for Python. They can be distilled into these commands:
I've confirmed that Step 1 returns a status code of However, Step 2 returns a status code of
The cause of this issue is not clear, even coverage.py's author is not aware of it (reference). However, it was recommended to append the I'll file another issue to coverage.py for this specific issue (see nedbat/coveragepy#1921). In the meantime, I'm proposing an escape-hatch solution in: #2597 |
It makes sense to me that coverage can't find the source for the JIT'ed code. That reference to |
This sounds familiar to a problem I ran into when doing some prototyping with venvs. If you look at how we invoke coverage, there's a couple of paths that get ignored (/dev, among others). I was tempted to enable ignoring errors, but since it was well know which paths needed to be ignored, I did that instead. Maybe we should just have ignore errors enabled by default? This is at least the second, probably more like 3rd, time coverage has broken because of something unrelated, where ignoring the errors would have been fine Adding |
That's a better approach I think since it would prevent other possible issues that I tried this alternative but doesn't seem to be working inside Bazel: #2599 (comment) |
…ests (#2607) This ensures that un-executed files _(i.e. files that aren't tested)_ are included in the coverage report. The current behavior is that coverage.py excludes them by default. This PR configures source files via the auto-generated `.coveragerc` file. See https://coverage.readthedocs.io/en/7.6.10/source.html#execution: > If the source option is specified, only code in those locations will be measured. Specifying the source option also enables coverage.py to report on un-executed files, since it can search the source tree for files that haven’t been measured at all. Closes #2599 Closes #2597 Fixes #2575 --------- Co-authored-by: Ignas Anikevicius <[email protected]>
🐞 bug report
Affected Rule
py_test
Is this a regression?
Not that I'm aware of.
Description
bazel coverage
fails without any useful messages when some Python packages are being used/imported. Some examples:torchvision
transformers.models.distilbert.DistilBertModel
🔬 Minimal Reproduction
See https://github.com/BurnzZ/bazel-python-coverage-issue for the full code example.
test.py
and uncomment either of these lines:Notice that test coverage fails without useful context or error messages.
Run the following and notice that the code should run successfully when tested:
Try commenting out the imports above and run
bazel coverage
and it should work. Curious as to why it doesn't work in some package liketorchvision
or some parts oftransformers
.🔥 Exception or Error
🌍 Your Environment
Operating System:
Output of
bazel version
:Also tried
7.4.1
Rules_python version:
The text was updated successfully, but these errors were encountered: