-
Notifications
You must be signed in to change notification settings - Fork 61
docs: Add outline of unit testing recommendations #619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
lundybernard
wants to merge
4
commits into
scientific-python:main
Choose a base branch
from
lundybernard:docs/testing
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+372
−0
Draft
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,372 @@ | ||
--- | ||
layout: page | ||
title: Testing recommendations | ||
permalink: /principles/testing/ | ||
nav_order: 2 | ||
parent: Principles | ||
--- | ||
|
||
{% include toc.html %} | ||
|
||
# Testing recommendations | ||
|
||
## External or outside-in testing | ||
|
||
A good place to start writing tests is from the perspective of a user of your | ||
module or library, as described in the [Test | ||
Tutorial]({% link pages/tutorials/test.md %}), and [Testing with pytest | ||
guide]({% link pages/guides/pytest.md %}). These test cases live outside your | ||
code, and include many styles or types of test that you may have heard of | ||
(behavioral, fuzz, end-to-end, feature, etc., etc.). There are many, many kinds | ||
of tests that can be used to verify that your code is correct, and works as | ||
expected, and a lot to learn. | ||
|
||
### Any test case, is better than none | ||
|
||
When in doubt, write the test that makes sense to you at the time. While you are | ||
learning, and writing your first test suites, try not to get bogged down in the | ||
taxonomy of test cases. As you write and use your test suite, the reason for | ||
classifying and sorting some types of tests into different test-suites will | ||
become apparent. | ||
|
||
### As long as that test is correct... | ||
|
||
It can be surprisingly easy to write a test that passes when it should fail, | ||
especially when using complicated Mocks and fixtures. The best way to avoid this | ||
is to deliberately break the code you are testing, hard-code a failure, and run | ||
the test-case to make sure it fails when the code is broken. | ||
|
||
- Check that your test fails when it should! | ||
- Keep It Simple: Excessive use of mocks and fixtures can make it difficult to | ||
know if our test is running the code we expect it to. | ||
- Test one thing at a time: A single test, should test a single behavior, and it | ||
is better to write many test cases for a single function or class, than one | ||
giant case. | ||
|
||
{: .highlight-title } | ||
|
||
> A note to new test developers: | ||
> | ||
> This is a good place to pause and go write some tests. The rest of these | ||
> principles apply to more advanced test development. As you gain experience and | ||
> your test suite(s) grow, taxonomy of test cases, the and the use/need for | ||
> different kinds of tests will become more clear. | ||
|
||
### Taxonomy of outside-in tests | ||
|
||
A non-exhaustive discussion of some common types of tests. | ||
|
||
^_^ Dont Panic ^_^ | ||
|
||
Depending on your project, you may not need many, or most of these kinds of | ||
tests. | ||
|
||
- A library project probably does not need to test integration with | ||
microservices. | ||
- A library with no 3rd party dependencies, does not need test them. | ||
- Fuzz testing is for critical code, that many users rely on. | ||
|
||
#### Behavioral, Feature, or Functional Tests: | ||
|
||
High-level tests, which ensure a specific feature works. Used for testing things | ||
like: | ||
|
||
- Loading a file | ||
- Setting a debug flag results in debug messages being printed | ||
- A configuration option, affects the behavior of the code as expected. | ||
|
||
#### Fuzz Tests | ||
|
||
Fuzz tests, attempt to test the full range of possible inputs to a function. | ||
They are good for finding edge-cases, where what should be valid input causes a | ||
failure. [Hypothesis](https://hypothesis.readthedocs.io/en/latest/) is an | ||
excellent tool for this, and a lot of fun to use. | ||
|
||
- SLOW TESTS: fuzz tests can take a very long time to run, and should usually be | ||
placed in a test suite, which we run separately from our faster tests. | ||
- Reserve fuzz testing for the few critical functions, where it really matters. | ||
|
||
#### Integration Tests | ||
|
||
The word "Integration" is a bit overloaded, and can refer to many levels of | ||
integration between our code, its dependencies, and external systems. | ||
|
||
##### Code level | ||
|
||
Test the integration between your software and external / 3rd party | ||
dependencies. Low-level testing of your code-base, where we run the code | ||
imported from dependencies, without mocking it. | ||
|
||
##### Environment level | ||
|
||
Testing that your software works in the environments we plan to run it in. | ||
|
||
- Running inside of a docker container | ||
- Using GPU's or other specialized hardware | ||
- Deploying it to cloud servers | ||
|
||
##### System level | ||
|
||
Testing that it interacts with other software in a larger system. | ||
|
||
- Interactions with other services, on local or cloud-based platforms. | ||
- micro-service, Database, or API connections and interactions. | ||
|
||
#### End to End Tests | ||
|
||
The slowest, and most brittle, of all tests. Here, we setup an entire | ||
production-like system, and run tests against it. | ||
|
||
- Create a Dev / Testing / staging environment, and run tests against it to make | ||
sure everything works together. | ||
- Fake user input, using tools like | ||
[Selenium](https://www.selenium.dev/documentation/) | ||
- Processing data from a pre-loaded test database. | ||
- Manual QA testing | ||
|
||
## Unit Tests | ||
|
||
### Advantages of unit testing: | ||
|
||
Unit tests ensure that the code, as written, is correct, and executes properly. | ||
they communicate the intention of the creator of the code, how the code is | ||
expected to behave, in its expected use-case. | ||
|
||
Unit tests should be simple, isolated, and run very quickly. Which allows us to | ||
run them quickly, while we make changes to the code (even automatically, each | ||
time we save a file for example) to ensure our changes did not break anything... | ||
or only break what we expected to. | ||
|
||
Writing unit tests can reveal weakensses in our implementations, and lead us to | ||
better design decisions: | ||
|
||
- If the test requires excessive setup, the unit may be dependent on too many | ||
external variables. | ||
- If the test requires many assertions, the unit may be doing too many things / | ||
have too many side-effects. | ||
- If the unit is very difficult to test, it will likely be difficult to | ||
understand and maintain. Refactoring code to make it easier to test often | ||
leads us to write better code overall. | ||
|
||
### When to write unit tests: | ||
|
||
Unit tests are considered "low level", and used for [Isolation Testing](). Not | ||
all projects need full unit test coverage, some may not need unit tests at all. | ||
|
||
- When your project matures enough to justify the work! higher-level testing is | ||
often sufficient for small projects, which are not part of critical | ||
infrastructure. | ||
|
||
- When you identify a critical part of the code-base, parts that are especially | ||
prone to breaking, Use unit tests to ensure that code continues to behave as | ||
designed. | ||
|
||
- When other projects start to depend heavily on your library, thorough unit | ||
testing helps ensure the reliability of your code for your users. | ||
|
||
- When doing test-driven development, unit tests should be created after | ||
higher-level 'integration' or 'outside-in' test cases, before writing the code | ||
to make the tests pass. | ||
|
||
### Guidelines for unit testing: | ||
|
||
- Unit tests live alongside the code they test, in a /tests folder. They should | ||
be in a different directory than higher-level tests (integration, e2e, | ||
behavioral, etc.) So that they can be run quickly before the full test suite, | ||
and to avoid confusing them. | ||
|
||
- Test files should be named `test_{{file under test}}.py`, so that test runners | ||
can find them easily. | ||
|
||
- test\_.py files should match your source files (file-under-test) one-to-one, | ||
and contain only tests for code in the file-file-under test. The code in | ||
`mymodule/source.py` is tested by `mymodule/tests/test_source.py`. | ||
|
||
- Keep it simple! If a test-case requires extra setup and external tools, It may | ||
be more appropriate as an external test, instead of in the unit tests | ||
|
||
- Avoid the temptation to test edge-cases! Focus your unit tests on the | ||
"happy-path". The UT should describe the expected and officially supported | ||
usage of the code under test. | ||
|
||
- Isolation: Test single units of code! A single Function, or a single attribute | ||
or method on a class. If you have two units (classes, functions, class | ||
attributes) with deeply coupled behavior, it is better to test them | ||
individually, using mocking and patching, instead of testing both in a single | ||
test. This makes refactoring easier, helps you understand the interactions | ||
between units, and will correctly tell you which part is failing if one | ||
breaks. | ||
|
||
#### Importing in test files: | ||
|
||
Keep things local! prefer to import only from the file-under-test when possible. | ||
This helps keep the context of the unit tests focused on the file-under-test. | ||
|
||
It makes refactoring much smoother; think about factoring a class out of a | ||
source file where many functions operate on it, and tests require it. | ||
|
||
```python | ||
# src/project/lib.py | ||
class MyClass: ... | ||
|
||
|
||
def func(my_class: MyClass): ... | ||
|
||
|
||
# src/project/tests/test_lib.py | ||
from project.lib import MyClass, func | ||
|
||
|
||
def test_func(): | ||
ret = func(MyClass()) | ||
... | ||
|
||
|
||
class TestMyClass: ... | ||
``` | ||
|
||
When we move MyClass into another source file, we only need to move its | ||
TestMyClass unit tests along with it. Even moving MyClass to another module, or | ||
swapping it for a drop-in replacement, is minimally disruptive to the tests that | ||
rely on it. | ||
|
||
```python | ||
# src/project/lib.py | ||
from .util import MyClass | ||
|
||
|
||
def func(my_class: MyClass): ... | ||
|
||
|
||
# src/project/tests/test_lib.py | ||
from project.lib import MyClass, func | ||
|
||
|
||
def test_func(): | ||
ret = func(MyClass()) | ||
... | ||
``` | ||
|
||
- Importing from other source files is a code smell (for unit tests), It | ||
indicates that the test is not well isolated. | ||
|
||
It is worth cultivating a deep understanding of how python's imports work. The | ||
interactions between imports and patches can some times be surprising, and cause | ||
us to write invalid tests... or worse, tests that pass when they should fail. | ||
These are a few of the cases that I have seen cause the most confusion. | ||
|
||
- If you import `SomeThing` from your file-under-test, Then patch | ||
`file.under.test.SomeThing`, it does not patch `SomeThing` in your test file. | ||
Only in the file-under-test. So, code in your file-under-test which calls | ||
`SomeThing()`, will use the Mock. But in your test case. `SomeThing()` will | ||
create a new instance, not call the Mock. | ||
|
||
- Prefer to import only the object that you actually use, not the entire | ||
library. | ||
- This simplifies mocking/patching in unit tests. | ||
- Makes using drop-in replacements simpler. Changing | ||
`from pandas import DataFrame` to `from polars import DataFrame` in your | ||
file-under-test, should result in all tests passing, with no other changes. | ||
|
||
It is common practice to import all of pandas or numpy `import numpy as np`, And | ||
this style is helpful for ensuring that we are using the version of `sum()` we | ||
expect... was it python's builtin `sum` or `np.sum`? However, as we develop our | ||
unit tests, this can cause difficulty with mocking, and complicate refactoring. | ||
consider the benefits of refactoring your imports like so: | ||
|
||
```python | ||
from numpy import sum as numeric_sum, Array as NumericArray | ||
``` | ||
|
||
#### Running unit tests: | ||
|
||
- Pytest is great for running tests in your development environments! | ||
- to run unit tests in your source folder, from your package root, use | ||
`pytest {{path/to/source}}` | ||
- To run tests from an installed package (outside of your source repository), | ||
use `pytest --pyargs {package name}}` | ||
|
||
#### Mocking and Patching to Isolate the code under test: | ||
|
||
When the unit you are testing touches any external unit (usually something you | ||
imported, or another unit that has its own tests), the external unit should be | ||
Patched, replacing it with a Mock for the durration of the test. | ||
|
||
- Verify that the external unit is called with the expected input | ||
- Verify that any value returned from the external unit is utilized as expected. | ||
|
||
```python | ||
import pytest | ||
|
||
SRC = "path.to.module.under.test" | ||
|
||
|
||
def test_myfunction(mocker): | ||
patchme: Mock = mocker.patch(f"{SRC}.patchme", autospec=True) | ||
ret = myfunction() | ||
patchme.assert_called_with("input from myfunction") | ||
assert ret is patchme.return_value | ||
``` | ||
|
||
- Consider what needs to be mocked, and the level of isolation your unit test | ||
really needs. | ||
- Anything imported into the module you are testing should probably be mocked. | ||
- external units with side-effects, or which do a lot of computation should | ||
probably be mocked. | ||
- Some people prefer to never test `_private` attributes. | ||
|
||
- Excessive mocking is a code smell! Consider ways to refactor the code, so that | ||
it needs fewer mocks, less setup, and fewer assertions in a single test case. | ||
This frequently leads us to write more readable and maintainable code. | ||
|
||
## Diagnostic Tests | ||
|
||
Diagnostic tests are used to verify the installation of a package. They should | ||
be runable on production systems, like when we need to ssh into a live server to | ||
troubleshoot problems. | ||
|
||
### Advantages of Diagnostic Tests | ||
|
||
- Diagnostic tests allow us to verify an installation of a package. | ||
- They can be used to verify system-level dependencies like: | ||
- Compiled binary dependencies | ||
- Access to specific hardware, like GPUs | ||
|
||
### Guidelines for Diagnostic Tests | ||
|
||
- Consider using the stdlib `unittest.TestCase` and other stdlib tools instead | ||
of pytest. To allow running unit tests for diagnostics in production | ||
environments, without installing additional packages. | ||
|
||
- Test files should be named `test_{{file under test}}.py`, so that stdlib | ||
unittest can find them easily. | ||
|
||
### Mocking and Patching to Isolate the code under test: | ||
|
||
Test Isolation is less necessary in diagnostic tests than unit tests. We often | ||
want diagnostic tests to execute compiled code, or run a test on GPU hardware. | ||
In cases where we do need to mock some part of our code, `unittest.mock.patch` | ||
is similar to the pytest mocker module. | ||
|
||
```python | ||
from unittest.mock import patch, Mock | ||
|
||
SRC = "mymodule.path.to.source" | ||
|
||
|
||
@patch(f"{SRC}.patchme", autospec=true) | ||
def test_myfunction(t, patchme: Mock): | ||
ret = myfunction() | ||
patchme.assert_called_with("input from myfunction") | ||
t.assertIs(ret, patchme.return_value) | ||
``` | ||
|
||
### Running Diagnostic Tests: | ||
|
||
stdlib's unittest can be used in environments where pytest is not available: | ||
|
||
- To use unittest to run tests from an installed package (outside of your source | ||
repository), use `python -m unittest discover -s {{module.name}}` | ||
- To use unittest to run tests in your source folder, from your package root, | ||
use | ||
`python -m unittest discover --start-folder {{source folder}} --top-level-directory .` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, I'd make this three sections. One short one about Integration testing. I think we could discuss ideas of cross-package integration testing there, like how to test that downstream packages still work, etc. Half of the below would go under "Unit tests", and the other half under the new "smoke" tests.