Skip to content

docs: Add outline of unit testing recommendations #619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lundybernard
Copy link

@lundybernard lundybernard commented Jul 12, 2025

This PR contains an outline of a proposed Development Guide / Principles / Unit Testing Recommendations section.

I am opening this PR now to allow discussion of the outline, and key-points, before fleshing out the complete page.


📚 Documentation preview 📚: https://scientific-python-cookie--619.org.readthedocs.build/

@henryiii
Copy link
Collaborator

henryiii commented Jul 13, 2025

Over, looks good, just one overall issue that can be solved mostly by moving stuff around. I do not think we should push anyone toward unittest over pytest for unit tests. The reasons:

  • pytest is the most popular framework. unittest is mostly there so CPython can test itself, and has a few specific uses it's good for. It's particularly bad for scientific work.
  • Keeping the tests as easy to write as possible means more tests are written. unittest has a ton of boiler plate.
  • Fixtures, parametrization, etc. are all much better in pytest than in any xtest style framework.
  • Installing a dependency to run unit tests is fine. Developers run unit tests. Developers can install dependencies.

That last point is important: unit tests are for developers to develop the code. End users do not need to run unit tests. Once the units work correctly, then you don't need to verify that again.

However, I think we should instead emphasize a new category of tests, the one we mentioned (I don't remember your term, "smoke test" is what I thought of). That's a separate category from unit tests, and for that one you don't want dependencies; it's a great place to use unittest. So:

  • Integration tests: runs the entire app. Smaller projects often put them in /tests, larger projects might place them in a separate repository. Uses pytest or a command line runner. Can be slow.
  • Unit tests: verifies the various parts work individually. Usually in /tests. Uses pytest. Should be reasonable to run often by the developer.
  • Smoke test. Should be in the package, and runnable from the package. Uses unittest. Most common in compiled packages. Should verify the build works, and any architecture specific behavior works. Should be very fast.

So I'd basically recommend moving the unittest parts to a new section about smoke tests; the actual content is fine, just I think it needs reordering.

(And replace "smoke" with whatever you mentioned today, I've just forgotten what it was. Validity? Verification? ...)

@@ -0,0 +1,94 @@
## Unit Tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I'd make this three sections. One short one about Integration testing. I think we could discuss ideas of cross-package integration testing there, like how to test that downstream packages still work, etc. Half of the below would go under "Unit tests", and the other half under the new "smoke" tests.

Comment on lines 12 to 13
- If the test requires excessive setup, the unit may be dependent on too many
external variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the test requires excessive setup, the unit may be dependent on too many
external variables.
- If the test requires excessive setup, the unit may be dependent on too many
external variables. You might benefit from refactoring the code.

I think it would help here to make sure the reader knows the problme is in the code ("unit" here), not the test. That's clear in the last point, but I'd make sure it's obvious here too. Maybe this could be repeated every bullet point to drive it in.

- It looks like `unittest discover` cannot find TestCase classes within source
files.

- Test files should be named `test_{{file under test}}.py`, so that stdlib
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop the mention of unittest. The naming is to help the user know what is being tested. Maybe "Test frameworks often require "test" in the file name" - you could keep this framework agnostic, and instead discuss frameworks in a separate section.

unittest can find them easily.

- test\_.py files should match your source files (file-under-test) one-to-one,
and contain only tests for code in the file-file-under test.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file-file-under? Also, test_*.py looks better, I think. I've seen people try to name a test file test.py. Doesn't end well. :)

Comment on lines 32 to 70
- Consider using the stdlib `unittest.TestCase` and other stdlib tools instead
of pytest.
- Allows running unit tests for diagnostics in production environments,
without installing additional packages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly true under the smoke test section. Not under unit tests!

Comment on lines 40 to 41
- Test single units of code! A single Function, or a single attribute or method
on a class. Not the interactions between them. (see mocking and patching)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should tell people not to test something. Instead I'd recommend something like "If you have two units, such as classes, it is better to test them individually, using mocking and patching, instead of testing both in a single test. This makes refactoring easier, helps you understand what is required between classes (or functions, etc), and will correctly tell you which part is failing if one breaks."

(Specifically, if it's hard to test one class without the other, it's better to have the test with both classes than no test at all, which is what the current version sounds like!)

Copy link
Collaborator

@henryiii henryiii Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think this is easy to misinterpret. For example, if you have this simplified example:

@dataclasses.dataclass
class Vector:
    x: float
    y: float

def mag2(vec: Vector) -> float:
    return vec.x**2 + vec.y**2

If you took this at face value, that would mean you'd need to mock a Vector class to test mag2; they are two units. I think it would be good to clarify "interactions between them", or give an example, or mention that things like the above are fine. For the above case, you'd want to test Vector on its own, then it's fine to use it to test other units that use it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example where developer discretion is important.

The pedantic answer is:
for the mag2 unit test, the input Vectors should be mocked.
Assert that the vec input is used as expected, and that the returned value is the sum of the results

vec.x.__pow__.assert_called_with(2)
vec.y.__pow__.assert_called_with(2)
vec.x.__add__.assert_called_with(vec.y)
assert ret is vec.x.__add__.return_value

In this case, this is obviously silly.

  • Vector is available to us in the file-under-test, so it is reasonable to use it in the unit test.
  • Excessive mocking makes our test hard to read, which is a code smell... maybe a different implementation would be better.
  • mag2 is reliant on the behavior of Vector, which could lead to trouble, but Vector is a simple dataclass using all builtin types. This pushes our test case more into the realm of an integration test, checking the interaction between Vector and mag2.

We do not need to isolate mag2 from the Vector implementation unless we support multiple Vector implementations, which would result in different results for the same x,y values.

A simple unit test like

vec = Vector(1,2)
assert mag2(vec) == 5

Does make our intention clear.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I meant, it's not a hard and fast rule, so it's better to explain why it's important (isolating interactions, avoiding side effects, file access, etc) so the developer can use discretion. Most data classes (the style, not necessarily from dataclasses.dataclass) don't need to be mocked. While classes with non-trivial inits, file access, etc. are better mocked.

Comment on lines 51 to 52
- This ensures that the test file always imports from its source file, and
does not accidentally import from an installed module.
Copy link
Collaborator

@henryiii henryiii Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEVER DO THIS! Only, only test the installed modules. Users install for pip/conda, etc. They do not download your repository and run files from there unless they are you. That's why we require the /src structure, to ensure no one every accidentally skips the packaging install when developing, so everything behaves just like what users see. Packages with binaries literally won't work if you do this. The importlib tools won't work. Etc.

(However, this is fine for the "smoke test" idea, since those actually live inside the package)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick explanation of where this suggestion came from, and why I agree that it should not be recommended in this context:

The motivation is to test the python source in complete isolation, allowing us to pull the repo, and run tests without any dependencies...
Its a micro-optimization for CI, which may save you from installing dependencies when an error is found in the code.

In retrospect, this is a bit silly:

  • It would force excessive mocking of dependencies.
  • The time & bandwidth cost of creating python environments, is less of a concern today.
  • I think the added complexity out-weighs any advantages.

I suggest adding a section to the Intro to development section to describe the development workflow.
Something like:

  1. create or clone the repository
  2. create a virtual environment
  3. build and install the package
  4. run unit tests (and/or diagnostic tests) to verify the install
  5. run the full test suite
  6. make code changes
  7. rebuild binaries if needed
  8. run unit tests (to fail fast)
  9. run the full test suite
  10. commit and push updates
  11. make coffee while CI runs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this needs to be expanded and worked on, but I've not been very sure where. There's https://learn.scientific-python.org/development/tutorials/dev-environment/, which does a bit of this, but needs work (and doesn't include the whole workflow). It also should show the modern workflow, which is just:

uv run pytest

Which does steps 2, 3, and 4 all at once for you (and steps 7 and 8 for you if rerun).

And uv-based CI for simple projects doesn't give you time for coffee. A full job can run in 10-15 seconds. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and the reason I'm not sure where (since that's pretty much a perfect spot) is because there are now other nearly identical pages:

Comment on lines 55 to 58
- If your source runs `from pandas import DataFrame` and you need a DataFrame
for a test, import it `from ..file_under_test import DataFrame` NOT
`from pandas import DataFrame` in the test file.
- This has some important implications for mocking/patching to elaborate on.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, no. These imports are private; if you replace from pandas import DateFrame with import pandas, other files should not start breaking, including your tests. I'd instead move this to the mocking section and mention it as a mocking trick.

Comment on lines 62 to 69
- Pytest is great for running tests in your development environments!
`pytest {{path/to/source}}`
- stdlib's unittest can be used in environments where pytest is not available:
- To use unittest to run tests in your source folder, from your package root,
use
`python -m unittest discover --start-folder {{source folder}} --top-level-directory .`
- To use unittest to run tests from an installed package (outside of your
source repository), use `python -m unittest discover -s {{module.name}}`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd split this - if you use pytest to write the tests, you need to use it to run the tests, so not needed for the unit test section. But we do need a section to explain how to nicely set up the smoke tests to run. I can try to set this up somewhere and report back later on what seems to work well. numpy.test() might be a good example of what I'm thinking for smoke tests (I know that's really the whole unit test suite, but I think there's no need to put the entire unit test suite inside every wheel, just the validity tests).

- Verify that any value returned from the external unit is utilized as expected.

```
@patch(f'{SRC}.otherfunction', autospec=true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd show how to use pytest mocking here instead of unittest, or maybe better yet stay framework agnostic and just convey the ideas; the guide page on pytest already covers a bit of mocking and we can put more there.

@henryiii
Copy link
Collaborator

The only other thing I'd add is maybe a mention somewhere that this is guidelines for good design, but even poorly structured tests are better than no tests. I don't want to discourage someone from writing tests, but instead give them ways to improve their test design if they want to, and help as users scale up to larger projects.

@henryiii
Copy link
Collaborator

Here's an example I added to boost-histogram: scikit-hep/boost-histogram#1022

@lundybernard
Copy link
Author

The term I use is "Diagnostic Tests",
The audience for diagnostic tests is people checking or troubleshooting live / production deployments who need to verify that an installation is correct.

FFR:
"Smoke tests" are a sub-set of test cases, that are run to quickly verify that key features of a system work.
We some times resort to creating a smoke test suite out of desperation, when the runtime of the full test suite becomes a burden.


#### Integration Tests

The word "Integration" is a bit overloaded, and can reffer to many levels of
Copy link
Collaborator

@henryiii henryiii Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the pre-commit check:

Suggested change
The word "Integration" is a bit overloaded, and can reffer to many levels of
The word "Integration" is a bit overloaded, and can refer to many levels of

@lundybernard lundybernard force-pushed the docs/testing branch 3 times, most recently from 80d6751 to 98bb2e9 Compare July 26, 2025 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants