Skip to content

docs: Add outline of unit testing recommendations #619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
372 changes: 372 additions & 0 deletions docs/pages/principles/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,372 @@
---
layout: page
title: Testing recommendations
permalink: /principles/testing/
nav_order: 2
parent: Principles
---

{% include toc.html %}

# Testing recommendations

## External or outside-in testing

A good place to start writing tests is from the perspective of a user of your
module or library, as described in the [Test
Tutorial]({% link pages/tutorials/test.md %}), and [Testing with pytest
guide]({% link pages/guides/pytest.md %}). These test cases live outside your
code, and include many styles or types of test that you may have heard of
(behavioral, fuzz, end-to-end, feature, etc., etc.). There are many, many kinds
of tests that can be used to verify that your code is correct, and works as
expected, and a lot to learn.

### Any test case, is better than none

When in doubt, write the test that makes sense to you at the time. While you are
learning, and writing your first test suites, try not to get bogged down in the
taxonomy of test cases. As you write and use your test suite, the reason for
classifying and sorting some types of tests into different test-suites will
become apparent.

### As long as that test is correct...

It can be surprisingly easy to write a test that passes when it should fail,
especially when using complicated Mocks and fixtures. The best way to avoid this
is to deliberately break the code you are testing, hard-code a failure, and run
the test-case to make sure it fails when the code is broken.

- Check that your test fails when it should!
- Keep It Simple: Excessive use of mocks and fixtures can make it difficult to
know if our test is running the code we expect it to.
- Test one thing at a time: A single test, should test a single behavior, and it
is better to write many test cases for a single function or class, than one
giant case.

{: .highlight-title }

> A note to new test developers:
>
> This is a good place to pause and go write some tests. The rest of these
> principles apply to more advanced test development. As you gain experience and
> your test suite(s) grow, taxonomy of test cases, the and the use/need for
> different kinds of tests will become more clear.

### Taxonomy of outside-in tests

A non-exhaustive discussion of some common types of tests.

^_^ Dont Panic ^_^

Depending on your project, you may not need many, or most of these kinds of
tests.

- A library project probably does not need to test integration with
microservices.
- A library with no 3rd party dependencies, does not need test them.
- Fuzz testing is for critical code, that many users rely on.

#### Behavioral, Feature, or Functional Tests:

High-level tests, which ensure a specific feature works. Used for testing things
like:

- Loading a file
- Setting a debug flag results in debug messages being printed
- A configuration option, affects the behavior of the code as expected.

#### Fuzz Tests

Fuzz tests, attempt to test the full range of possible inputs to a function.
They are good for finding edge-cases, where what should be valid input causes a
failure. [Hypothesis](https://hypothesis.readthedocs.io/en/latest/) is an
excellent tool for this, and a lot of fun to use.

- SLOW TESTS: fuzz tests can take a very long time to run, and should usually be
placed in a test suite, which we run separately from our faster tests.
- Reserve fuzz testing for the few critical functions, where it really matters.

#### Integration Tests

The word "Integration" is a bit overloaded, and can refer to many levels of
integration between our code, its dependencies, and external systems.

##### Code level

Test the integration between your software and external / 3rd party
dependencies. Low-level testing of your code-base, where we run the code
imported from dependencies, without mocking it.

##### Environment level

Testing that your software works in the environments we plan to run it in.

- Running inside of a docker container
- Using GPU's or other specialized hardware
- Deploying it to cloud servers

##### System level

Testing that it interacts with other software in a larger system.

- Interactions with other services, on local or cloud-based platforms.
- micro-service, Database, or API connections and interactions.

#### End to End Tests

The slowest, and most brittle, of all tests. Here, we setup an entire
production-like system, and run tests against it.

- Create a Dev / Testing / staging environment, and run tests against it to make
sure everything works together.
- Fake user input, using tools like
[Selenium](https://www.selenium.dev/documentation/)
- Processing data from a pre-loaded test database.
- Manual QA testing

## Unit Tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I'd make this three sections. One short one about Integration testing. I think we could discuss ideas of cross-package integration testing there, like how to test that downstream packages still work, etc. Half of the below would go under "Unit tests", and the other half under the new "smoke" tests.


### Advantages of unit testing:

Unit tests ensure that the code, as written, is correct, and executes properly.
they communicate the intention of the creator of the code, how the code is
expected to behave, in its expected use-case.

Unit tests should be simple, isolated, and run very quickly. Which allows us to
run them quickly, while we make changes to the code (even automatically, each
time we save a file for example) to ensure our changes did not break anything...
or only break what we expected to.

Writing unit tests can reveal weakensses in our implementations, and lead us to
better design decisions:

- If the test requires excessive setup, the unit may be dependent on too many
external variables.
- If the test requires many assertions, the unit may be doing too many things /
have too many side-effects.
- If the unit is very difficult to test, it will likely be difficult to
understand and maintain. Refactoring code to make it easier to test often
leads us to write better code overall.

### When to write unit tests:

Unit tests are considered "low level", and used for [Isolation Testing](). Not
all projects need full unit test coverage, some may not need unit tests at all.

- When your project matures enough to justify the work! higher-level testing is
often sufficient for small projects, which are not part of critical
infrastructure.

- When you identify a critical part of the code-base, parts that are especially
prone to breaking, Use unit tests to ensure that code continues to behave as
designed.

- When other projects start to depend heavily on your library, thorough unit
testing helps ensure the reliability of your code for your users.

- When doing test-driven development, unit tests should be created after
higher-level 'integration' or 'outside-in' test cases, before writing the code
to make the tests pass.

### Guidelines for unit testing:

- Unit tests live alongside the code they test, in a /tests folder. They should
be in a different directory than higher-level tests (integration, e2e,
behavioral, etc.) So that they can be run quickly before the full test suite,
and to avoid confusing them.

- Test files should be named `test_{{file under test}}.py`, so that test runners
can find them easily.

- test\_.py files should match your source files (file-under-test) one-to-one,
and contain only tests for code in the file-file-under test. The code in
`mymodule/source.py` is tested by `mymodule/tests/test_source.py`.

- Keep it simple! If a test-case requires extra setup and external tools, It may
be more appropriate as an external test, instead of in the unit tests

- Avoid the temptation to test edge-cases! Focus your unit tests on the
"happy-path". The UT should describe the expected and officially supported
usage of the code under test.

- Isolation: Test single units of code! A single Function, or a single attribute
or method on a class. If you have two units (classes, functions, class
attributes) with deeply coupled behavior, it is better to test them
individually, using mocking and patching, instead of testing both in a single
test. This makes refactoring easier, helps you understand the interactions
between units, and will correctly tell you which part is failing if one
breaks.

#### Importing in test files:

Keep things local! prefer to import only from the file-under-test when possible.
This helps keep the context of the unit tests focused on the file-under-test.

It makes refactoring much smoother; think about factoring a class out of a
source file where many functions operate on it, and tests require it.

```python
# src/project/lib.py
class MyClass: ...


def func(my_class: MyClass): ...


# src/project/tests/test_lib.py
from project.lib import MyClass, func


def test_func():
ret = func(MyClass())
...


class TestMyClass: ...
```

When we move MyClass into another source file, we only need to move its
TestMyClass unit tests along with it. Even moving MyClass to another module, or
swapping it for a drop-in replacement, is minimally disruptive to the tests that
rely on it.

```python
# src/project/lib.py
from .util import MyClass


def func(my_class: MyClass): ...


# src/project/tests/test_lib.py
from project.lib import MyClass, func


def test_func():
ret = func(MyClass())
...
```

- Importing from other source files is a code smell (for unit tests), It
indicates that the test is not well isolated.

It is worth cultivating a deep understanding of how python's imports work. The
interactions between imports and patches can some times be surprising, and cause
us to write invalid tests... or worse, tests that pass when they should fail.
These are a few of the cases that I have seen cause the most confusion.

- If you import `SomeThing` from your file-under-test, Then patch
`file.under.test.SomeThing`, it does not patch `SomeThing` in your test file.
Only in the file-under-test. So, code in your file-under-test which calls
`SomeThing()`, will use the Mock. But in your test case. `SomeThing()` will
create a new instance, not call the Mock.

- Prefer to import only the object that you actually use, not the entire
library.
- This simplifies mocking/patching in unit tests.
- Makes using drop-in replacements simpler. Changing
`from pandas import DataFrame` to `from polars import DataFrame` in your
file-under-test, should result in all tests passing, with no other changes.

It is common practice to import all of pandas or numpy `import numpy as np`, And
this style is helpful for ensuring that we are using the version of `sum()` we
expect... was it python's builtin `sum` or `np.sum`? However, as we develop our
unit tests, this can cause difficulty with mocking, and complicate refactoring.
consider the benefits of refactoring your imports like so:

```python
from numpy import sum as numeric_sum, Array as NumericArray
```

#### Running unit tests:

- Pytest is great for running tests in your development environments!
- to run unit tests in your source folder, from your package root, use
`pytest {{path/to/source}}`
- To run tests from an installed package (outside of your source repository),
use `pytest --pyargs {package name}}`

#### Mocking and Patching to Isolate the code under test:

When the unit you are testing touches any external unit (usually something you
imported, or another unit that has its own tests), the external unit should be
Patched, replacing it with a Mock for the durration of the test.

- Verify that the external unit is called with the expected input
- Verify that any value returned from the external unit is utilized as expected.

```python
import pytest

SRC = "path.to.module.under.test"


def test_myfunction(mocker):
patchme: Mock = mocker.patch(f"{SRC}.patchme", autospec=True)
ret = myfunction()
patchme.assert_called_with("input from myfunction")
assert ret is patchme.return_value
```

- Consider what needs to be mocked, and the level of isolation your unit test
really needs.
- Anything imported into the module you are testing should probably be mocked.
- external units with side-effects, or which do a lot of computation should
probably be mocked.
- Some people prefer to never test `_private` attributes.

- Excessive mocking is a code smell! Consider ways to refactor the code, so that
it needs fewer mocks, less setup, and fewer assertions in a single test case.
This frequently leads us to write more readable and maintainable code.

## Diagnostic Tests

Diagnostic tests are used to verify the installation of a package. They should
be runable on production systems, like when we need to ssh into a live server to
troubleshoot problems.

### Advantages of Diagnostic Tests

- Diagnostic tests allow us to verify an installation of a package.
- They can be used to verify system-level dependencies like:
- Compiled binary dependencies
- Access to specific hardware, like GPUs

### Guidelines for Diagnostic Tests

- Consider using the stdlib `unittest.TestCase` and other stdlib tools instead
of pytest. To allow running unit tests for diagnostics in production
environments, without installing additional packages.

- Test files should be named `test_{{file under test}}.py`, so that stdlib
unittest can find them easily.

### Mocking and Patching to Isolate the code under test:

Test Isolation is less necessary in diagnostic tests than unit tests. We often
want diagnostic tests to execute compiled code, or run a test on GPU hardware.
In cases where we do need to mock some part of our code, `unittest.mock.patch`
is similar to the pytest mocker module.

```python
from unittest.mock import patch, Mock

SRC = "mymodule.path.to.source"


@patch(f"{SRC}.patchme", autospec=true)
def test_myfunction(t, patchme: Mock):
ret = myfunction()
patchme.assert_called_with("input from myfunction")
t.assertIs(ret, patchme.return_value)
```

### Running Diagnostic Tests:

stdlib's unittest can be used in environments where pytest is not available:

- To use unittest to run tests from an installed package (outside of your source
repository), use `python -m unittest discover -s {{module.name}}`
- To use unittest to run tests in your source folder, from your package root,
use
`python -m unittest discover --start-folder {{source folder}} --top-level-directory .`