Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic tests with IPython notebooks #7

Open
rossant opened this issue Oct 13, 2013 · 12 comments
Open

Add automatic tests with IPython notebooks #7

rossant opened this issue Oct 13, 2013 · 12 comments
Labels

Comments

@rossant
Copy link
Owner

rossant commented Oct 13, 2013

No description provided.

@ihrke
Copy link
Collaborator

ihrke commented Jan 26, 2015

One way to do this, is to use a script like this: https://github.com/paulgb/runipy
and then let travis-ci upload the processed notebooks to a specific branch of the ipycache-repo.
Here is a link describing how this is done: http://sleepycoders.blogspot.se/2013/03/sharing-travis-ci-generated-files.html

A disadvantage here is that you have to manually look at the notebooks to see if everything went fine.

Another way would be to define unit tests in an ipy-notebook and run them from the command line (e.g., using py.test https://pypi.python.org/pypi/pytest-ipynb). However, I'm not sure if that works with the magics etc.

@rossant
Copy link
Owner Author

rossant commented Jan 27, 2015

Another way would be to define unit tests in an ipy-notebook and run them from the command line (e.g., using py.test https://pypi.python.org/pypi/pytest-ipynb). However, I'm not sure if that works with the magics etc.

I think that would be the best solution. I'm sure we can find a way to make the magic commands work (see e.g. this class). The idea would be to loop over all input cells, execute the code using the InteractiveShell, capture the output, and compare the output with the expected output stored in the .ipynb file. This way we would have a way to automatically test a notebook containing input + output by running it and checking that the output is correct.

@ihrke
Copy link
Collaborator

ihrke commented Jan 27, 2015

Interesting idea!

But comparing cell output to previously generated output may break when minor changes are made (could also happen when minor changes to external libraries are made?). Say we generate a plot and some default in matplotlib changes such that the plot is not exactly the same then we all our tests would break if the naively compare the stored to the new plots. Maybe we should run all the cells as you suggest but instead of comparing to previously stored output insert assertions/throw errors?

@rossant
Copy link
Owner Author

rossant commented Jan 27, 2015

It's true that comparing the base64-encoded plots would not be very robust. Maybe at first we could just compare text output?

WDYM exactly by inserting assert/throw errors?

@ihrke
Copy link
Collaborator

ihrke commented Jan 27, 2015

Example:

cell 1

%%cache test.pkl a
a=[i for i in range(10000)]

cell 2

import ipycache
try:
   a=ipycache.load_vars('./test.pkl', ['a'])
except:
   raise CustomErrorThatTellsUsSomething

Then run all the cells (from the command line in the IPython Kernel) and catch any raised error and report it as usual.

@rossant
Copy link
Owner Author

rossant commented Jan 27, 2015

OK I see. That's a possibility indeed. Maybe it would be worth encapsulating the code in cell 2 in some private testing function like _test_cached_var('./test.pkl', 'a') or something.

Also, I think what you describe is rather close to the unit tests that already exist. We could definitely do that, but I think another sort of notebook-based test would be useful as well.

You would have an actual example notebook that would only contain user-exposed commands (so basically just %%cache) and no testing logic. To test it, we would just compare the text outputs. For example, the cached cell could contain a print() statement, and we could check that it would only show up the first time, etc.

Something roughly like this:

example.ipynb:

# cell 1
%%cache test.pkl a
print("Computing...")
a=[i for i in range(10000)]

# cell 2
print(len(a))

test_notebooks.py:

nb = Notebook('example.ipynb')
nb.run_all()
assert check_nb_outputs(nb, ['Computing...', '10000'])

nb.run_all()
assert check_nb_outputs(nb, ['', '10000'])

@ihrke
Copy link
Collaborator

ihrke commented Jan 27, 2015

Actually, your second test should produce exactly the same output, i.e., Computing...\n10000, since ipycache saves and loads the outputs. The only thing that should differ would be the verbosity-output, i.e., [Saved Variables...] vs. [Skipped the cell's code and loaded, ...]. So we should test if the first run produced [Saved Variables...] and the second [Skipped the cell's ...].

The difference between what I suggested and the tests we currently have is that the magic is directly run through ipython's magic interface instead of the mock functions used in test_ipycache.py.

Anyway, we could of course mix the approaches: since the output of the cell is always stored in _captured_io, we can just look at

io=ipycache.load_vars('./test.pkl', ['_captured_io'])
assert check_cell_output(io['stdout'].getvalue(), 'Computing...')

after having run the cell. That would also decouple testing logic from ipython-notebook code (but actually, I would prefer to run the tests in the NB because it's easier to develop and run).

@rossant
Copy link
Owner Author

rossant commented Jan 27, 2015

Actually, your second test should produce exactly the same output, i.e., Computing...\n10000, since ipycache saves and loads the outputs. The only thing that should differ would be the verbosity-output, i.e., [Saved Variables...] vs. [Skipped the cell's code and loaded, ...]. So we should test if the first run produced [Saved Variables...] and the second [Skipped the cell's ...].

Ha I had forgotten that! I'm wondering whether that's a good behavior...? Seeing Computing... in this example would be confusing because I'd think that my code is actually executed!

I do agree that testing logic should be decoupled from the notebook. That being said, having minimal assertions in the notebooks would be fine as long as these are just a couple of lines of code demonstrating what would be expected from normal behavior. Then we could have an "examples" folder with some notebooks demonstrating how ipycache works, and these examples would also be tested by the testing suite (like "doctests" in a way).

@ihrke
Copy link
Collaborator

ihrke commented Jan 27, 2015

Ok, but how do we define if the notebook passes the test? In a doctest, you have to add some code to a function that defines the test. We could of course add a doctest snippet to each cell that is executed by the test-runner after the cell is run?
Say

%%cache test.pkl a
print("Computing...")
a=[i for i in range(10000)]

"""
#doctest 
assert len(a)==10000
assert os.path.exists('test.pkl')
"""

@rossant
Copy link
Owner Author

rossant commented Jan 27, 2015

Why not just putting this doctest code in the next cell? We say that the test fails if an assertion is raised during the notebook execution, otherwise it passes.

@ihrke ihrke mentioned this issue Jan 28, 2015
@ihrke
Copy link
Collaborator

ihrke commented Jan 28, 2015

I added a ipynb_runner.py script which can run the ipython notebooks from the command line. It reports if any of the cells fail (i.e., cause an exception).
This PR already runs the notebooks in examples on travis-ci (just take a look at the last build).

@rossant
Copy link
Owner Author

rossant commented Feb 14, 2015

FYI I just found this: https://github.com/bollwyvl/nosebook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants