Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/how-tos/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ directory. If there's an example you want but don't see, reach out or open an is
ml-training
llm-workflows
run-data-quality-checks
test-hamilton-code
use-hamilton-for-lineage
scale-up
microservice
Expand Down
163 changes: 163 additions & 0 deletions docs/how-tos/test-hamilton-code.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
..
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

==============================
Testing Apache Hamilton code
==============================

A common question on `Slack <https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g>`_
is "how do I test my Hamilton functions?" -- often with a worry that decorators
will get in the way. The good news: a Hamilton function is just a Python
function, so the standard ``pytest`` patterns you already know apply directly.

This guide walks through four cases, in order of increasing scope:

1. Unit-testing a plain function.
2. Unit-testing a decorated function.
3. Integration-testing the full DAG with the ``Driver``, including
``inputs=`` and ``overrides=``.
4. Driving an in-memory module for self-contained tests (e.g. of custom
materializers).

The complete runnable code lives in
`examples/testing <https://github.com/apache/hamilton/tree/main/examples/testing>`_.
Every code block on this page is a ``literalinclude`` from that folder, so the
docs and the example can never drift out of sync.

Prerequisites
-------------

Install the example's dependencies and run it:

.. code-block:: bash

cd examples/testing
pip install -r requirements.txt
pytest

You should see all 13 tests pass.

1. Unit-testing plain functions
-------------------------------

Hamilton encourages you to put your transformation logic in ordinary modules
that don't import the Driver. That makes them trivial to unit-test:

.. literalinclude:: ../../examples/testing/my_functions.py
:language: python
:lines: 18-
:caption: ``examples/testing/my_functions.py``

Tests are just calls to the function:

.. literalinclude:: ../../examples/testing/test_my_functions.py
:language: python
:lines: 18-
:caption: ``examples/testing/test_my_functions.py``

Notes
^^^^^

* No Driver is required. You import the module under test and call its
functions like any other Python code.
* ``pytest.mark.parametrize`` is a clean way to cover edge cases without
copy-pasting test bodies.
* Use ``pd.testing.assert_series_equal`` (or ``assert_frame_equal``) for
pandas outputs -- it gives readable diffs on failure.

2. Unit-testing decorated functions
-----------------------------------

Hamilton's function modifiers (``@tag``, ``@parameterize``, ``@extract_columns``,
...) tell Hamilton how to wire the function into the DAG. They do **not**
change what the function does when you call it directly. You can therefore
mix two complementary techniques:

A. Call the underlying function in a unit test (cheap, fast).
B. Build a Driver and assert on the expanded DAG, to verify the wiring (slower,
but the only way to catch decorator misuse).

The decorated module:

.. literalinclude:: ../../examples/testing/decorated_functions.py
:language: python
:lines: 18-
:caption: ``examples/testing/decorated_functions.py``

The tests:

.. literalinclude:: ../../examples/testing/test_decorated_functions.py
:language: python
:lines: 18-
:caption: ``examples/testing/test_decorated_functions.py``

3. Integration-testing the DAG
------------------------------

For end-to-end tests, build a Driver from the module(s) under test and call
``execute(...)`` with controlled inputs.

Two arguments are especially useful:

* ``inputs=`` injects test data at the **inputs** of the DAG -- the parameter
names that aren't produced by any function.
* ``overrides=`` short-circuits an **intermediate** node by pinning its value.
This is the integration-test sweet spot: instead of fabricating realistic
raw inputs and re-deriving every intermediate, hand the DAG a known value
for ``spend`` (or any other node) and assert on the *downstream* logic.

.. literalinclude:: ../../examples/testing/test_driver.py
:language: python
:lines: 18-
:caption: ``examples/testing/test_driver.py``

Tip: ``Driver`` exposes a number of inspection methods --
``what_is_upstream_of``, ``what_is_downstream_of``, ``list_available_variables``
-- that are handy for asserting on graph shape, not just values.

4. In-memory modules for self-contained tests
---------------------------------------------

Sometimes you want a test that defines its own tiny Hamilton module inline
-- to exercise a custom materializer, regression-test a data-quality bug,
or demonstrate a pattern in a doctest. You don't need to create a new
``.py`` file; ``hamilton.ad_hoc_utils.create_temporary_module`` packages
inline-defined functions into a real module that the Driver can consume:

.. literalinclude:: ../../examples/testing/test_ad_hoc_module.py
:language: python
:lines: 18-
:caption: ``examples/testing/test_ad_hoc_module.py``

This is also how Hamilton itself tests several of its built-in materializers,
so it scales up to fairly involved scenarios. See
`tests/test_ad_hoc_utils.py <https://github.com/apache/hamilton/blob/main/tests/test_ad_hoc_utils.py>`_
in the Hamilton source for more usage examples.

Where to go from here
---------------------

* Read the :doc:`/concepts/best-practices/code-organization` page -- the
module structure it recommends is the same one that makes tests easy to
write.
* Browse the
`Hamilton test suite <https://github.com/apache/hamilton/tree/main/tests>`_
for ideas; the same patterns work for user code.
* Have a testing pattern that isn't covered here? Share it on
`Slack <https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g>`_
-- we'd love to add it.
79 changes: 79 additions & 0 deletions examples/testing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Testing Apache Hamilton code

This is the runnable companion to the
[Testing Hamilton code](https://hamilton.apache.org/how-tos/test-hamilton-code/)
how-to. It shows that Hamilton functions are normal Python -- so the standard
`pytest` patterns you already know apply, including when decorators are
involved.

The example covers the four cases from issue
[#1044](https://github.com/apache/hamilton/issues/1044):

1. **Unit-testing plain functions** -- `test_my_functions.py`
2. **Unit-testing decorated functions** -- `test_decorated_functions.py`
3. **Integration-testing the DAG with `inputs=` and `overrides=`** -- `test_driver.py`
4. **In-memory modules with `ad_hoc_utils.create_temporary_module`** -- `test_ad_hoc_module.py`

## File organization

| File | Purpose |
| ---- | ------- |
| `my_functions.py` | A small marketing dataflow (no decorators). |
| `decorated_functions.py` | The same style of dataflow, using `@tag`, `@parameterize` and `@extract_columns`. |
| `test_my_functions.py` | Unit tests that import and call functions directly. |
| `test_decorated_functions.py` | Unit + driver-level tests for the decorated module. |
| `test_driver.py` | End-to-end tests using `Builder().with_modules(...).build()` plus `inputs=` and `overrides=`. |
| `test_ad_hoc_module.py` | Builds a module from inline-defined functions for self-contained tests. |
| `conftest.py` | Adds this folder to `sys.path` so `import my_functions` works under pytest. |

## Running the tests

```bash
pip install -r requirements.txt
pytest
```

You should see all tests pass. Each test file is independently runnable:

```bash
pytest test_my_functions.py -v
pytest test_driver.py -v
```

## What to take away

* A Hamilton function is just a Python function. Testing it does **not**
require building a Driver.
* Decorators (`@tag`, `@parameterize`, `@extract_columns`, ...) leave the
underlying callable intact. Direct function calls still work; the decorator
changes how Hamilton wires the function into the DAG, not what the function
computes.
* For integration tests, `Builder().with_modules(...).build()` is the canonical
entry point. Use `inputs=` to inject test data at the DAG inputs and
`overrides=` to short-circuit intermediate nodes when you want to assert on
downstream logic in isolation.
* Need to test inline -- e.g. for a regression test or a custom materializer
-- without a `.py` file on disk? Use
`hamilton.ad_hoc_utils.create_temporary_module`.

If you have questions, or need help with this example,
join us on [Slack](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g).
28 changes: 28 additions & 0 deletions examples/testing/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

"""Make the example modules importable when running ``pytest`` from this dir.

Hamilton needs to import your dataflow module by name. Adding this folder to
``sys.path`` lets the example tests do ``import my_functions`` directly,
mirroring how a real project would lay out its code.
"""

import os
import sys

sys.path.insert(0, os.path.dirname(__file__))
55 changes: 55 additions & 0 deletions examples/testing/decorated_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

"""Functions that use Hamilton decorators.

Decorators are a common source of confusion when testing. The point of this
module is to show that decorators do not get in the way of unit testing -- the
function below the decorator is still a plain Python callable, so you can call
it directly from a test. To test what the decorator *expands to*, drive the
function through a Driver instead (see ``test_decorated_functions.py``).
"""

import pandas as pd

from hamilton.function_modifiers import extract_columns, parameterize, source, tag, value


@tag(owner="growth-team", pii="false")
def total_signups(signups: pd.Series) -> int:
"""Sum of signups across the time window."""
return int(signups.sum())


@parameterize(
spend_in_thousands={"raw_value": source("spend"), "divisor": value(1000.0)},
signups_in_hundreds={"raw_value": source("signups"), "divisor": value(100.0)},
)
def scaled(raw_value: pd.Series, divisor: float) -> pd.Series:
"""Scale a series by a constant divisor.

`@parameterize` produces one node per entry above. The function itself is
still a normal callable, so a unit test can call ``scaled(some_series, 1000)``
directly without a Driver.
"""
return raw_value / divisor


@extract_columns("scaled_spend", "scaled_signups")
def scaled_features(spend_in_thousands: pd.Series, signups_in_hundreds: pd.Series) -> pd.DataFrame:
"""Bundle the two scaled series into a frame, then expose each column as a node."""
return pd.DataFrame({"scaled_spend": spend_in_thousands, "scaled_signups": signups_in_hundreds})
Loading