|
1 | 1 | [](https://www.youtube.com/playlist?list=PLnK6m_JBRVNoYEer9hBmTNwkYB3gmbOPO) [](https://towardsdatascience.com/pytest-for-data-scientists-2990319e55e6)
|
2 | 2 |
|
| 3 | +# Pytest for Data Scientists |
| 4 | + |
| 5 | +Comprehensive examples and best practices for testing data science code with pytest. |
| 6 | + |
| 7 | +## Directory Structure |
| 8 | + |
| 9 | +``` |
| 10 | +pytest/ |
| 11 | +├── README.md # This file |
| 12 | +├── get_started/ # Basic pytest concepts |
| 13 | +│ └── sentiment.py |
| 14 | +├── parametrization/ # Parametrized testing |
| 15 | +│ ├── process.py |
| 16 | +│ ├── process_fixture.py |
| 17 | +│ └── sentiment.py |
| 18 | +├── test_structure_example/ # Project organization |
| 19 | +│ ├── src/ |
| 20 | +│ └── tests/ |
| 21 | +├── advanced_fixtures/ # Advanced fixture patterns |
| 22 | +│ ├── session_scoped.py |
| 23 | +│ ├── autouse_fixtures.py |
| 24 | +│ ├── conftest.py |
| 25 | +│ └── README.md |
| 26 | +├── temporary_files/ # Safe file I/O testing |
| 27 | +│ ├── file_operations.py |
| 28 | +│ ├── data_pipeline.py |
| 29 | +│ └── README.md |
| 30 | +├── numerical_testing/ # NumPy/DataFrame testing |
| 31 | +│ ├── numpy_arrays.py |
| 32 | +│ ├── dataframe_testing.py |
| 33 | +│ └── README.md |
| 34 | +├── mocking/ # External dependency mocking |
| 35 | +│ ├── api_mocking.py |
| 36 | +│ ├── database_mocking.py |
| 37 | +│ ├── requirements.txt |
| 38 | +│ └── README.md |
| 39 | +├── custom_markers/ # Test organization with markers |
| 40 | +│ ├── pytest.ini |
| 41 | +│ ├── marked_tests.py |
| 42 | +│ └── README.md |
| 43 | +└── project_config/ # Complete project configuration |
| 44 | + ├── pytest.ini |
| 45 | + ├── conftest.py |
| 46 | + ├── test_with_fixtures.py |
| 47 | + └── README.md |
| 48 | +``` |
| 49 | + |
| 50 | +## Quick Start |
| 51 | + |
| 52 | +### Basic Installation |
| 53 | +```bash |
| 54 | +pip install pytest |
| 55 | + |
| 56 | +# For advanced features |
| 57 | +pip install pytest-cov pytest-xdist pytest-benchmark |
| 58 | +``` |
| 59 | + |
| 60 | +### Run Examples |
| 61 | +```bash |
| 62 | +# Basic examples |
| 63 | +pytest get_started/ |
| 64 | +pytest parametrization/ |
| 65 | + |
| 66 | +# Advanced features |
| 67 | +pytest advanced_fixtures/ |
| 68 | +pytest numerical_testing/ |
| 69 | +pytest mocking/ |
| 70 | + |
| 71 | +# Full project configuration |
| 72 | +cd project_config && pytest |
| 73 | +``` |
| 74 | + |
| 75 | +## Feature Overview |
| 76 | + |
| 77 | +### 🚀 **Basic Concepts** (`get_started/`, `parametrization/`) |
| 78 | +- Simple test functions and assertions |
| 79 | +- Parametrized tests for multiple test cases |
| 80 | +- Basic fixtures for data reuse |
| 81 | + |
| 82 | +### 🔧 **Advanced Fixtures** (`advanced_fixtures/`) |
| 83 | +- **Session-scoped fixtures**: Load expensive datasets once |
| 84 | +- **Autouse fixtures**: Automatic setup for all tests |
| 85 | +- **Shared fixtures**: Common test data via `conftest.py` |
| 86 | + |
| 87 | +### 📁 **Safe File Testing** (`temporary_files/`) |
| 88 | +- **tmp_path fixture**: Isolated temporary directories |
| 89 | +- **File I/O testing**: CSV, JSON, model serialization |
| 90 | +- **Pipeline testing**: End-to-end data processing |
| 91 | + |
| 92 | +### 🔢 **Numerical Testing** (`numerical_testing/`) |
| 93 | +- **NumPy arrays**: Floating-point comparison with tolerance |
| 94 | +- **Pandas DataFrames**: Proper DataFrame equality testing |
| 95 | +- **Statistical validation**: Testing model outputs and data properties |
| 96 | + |
| 97 | +### 🌐 **Mocking External Services** (`mocking/`) |
| 98 | +- **API mocking**: Test without hitting real APIs |
| 99 | +- **Database mocking**: Test queries without databases |
| 100 | +- **Error simulation**: Test failure scenarios safely |
| 101 | + |
| 102 | +### 🏷️ **Custom Markers** (`custom_markers/`) |
| 103 | +- **Test organization**: Group tests by speed, requirements, domain |
| 104 | +- **Selective execution**: Run specific test categories |
| 105 | +- **CI/CD integration**: Different test suites for different stages |
| 106 | + |
| 107 | +### ⚙️ **Project Configuration** (`project_config/`) |
| 108 | +- **Complete setup**: Production-ready pytest configuration |
| 109 | +- **Centralized fixtures**: Project-wide test utilities |
| 110 | +- **Best practices**: Logging, warnings, reproducibility |
| 111 | + |
| 112 | +## Common Workflows |
| 113 | + |
| 114 | +### Development Workflow |
| 115 | +```bash |
| 116 | +# Fast feedback during development |
| 117 | +pytest -m fast |
| 118 | + |
| 119 | +# Before committing changes |
| 120 | +pytest -m "fast or (integration and not slow)" |
| 121 | + |
| 122 | +# Full test suite |
| 123 | +pytest |
| 124 | +``` |
| 125 | + |
| 126 | +### Continuous Integration |
| 127 | +```bash |
| 128 | +# Unit tests (fast feedback) |
| 129 | +pytest -m "unit and fast" |
| 130 | + |
| 131 | +# Integration tests |
| 132 | +pytest -m "integration and not gpu and not expensive" |
| 133 | + |
| 134 | +# Performance tests (separate stage) |
| 135 | +pytest -m "slow or expensive" |
| 136 | +``` |
| 137 | + |
| 138 | +### Data Science Specific |
| 139 | +```bash |
| 140 | +# Test data processing pipelines |
| 141 | +pytest -m data_processing |
| 142 | + |
| 143 | +# Test model training |
| 144 | +pytest -m model_training |
| 145 | + |
| 146 | +# Test without external dependencies |
| 147 | +pytest -m "not api and not database" |
| 148 | +``` |
| 149 | + |
| 150 | +## Key Benefits for Data Scientists |
| 151 | + |
| 152 | +### 🛡️ **Reliability** |
| 153 | +- **Reproducible results**: Consistent random seeds |
| 154 | +- **Isolated tests**: No interference between tests |
| 155 | +- **Proper numerical comparison**: Handle floating-point precision |
| 156 | + |
| 157 | +### ⚡ **Performance** |
| 158 | +- **Fast feedback**: Separate fast/slow test categories |
| 159 | +- **Efficient fixtures**: Load expensive data once |
| 160 | +- **Parallel execution**: Run tests concurrently |
| 161 | + |
| 162 | +### 🔍 **Better Debugging** |
| 163 | +- **Clear error messages**: Detailed assertion information |
| 164 | +- **Test organization**: Easy to find and run specific tests |
| 165 | +- **Comprehensive logging**: Track test execution |
| 166 | + |
| 167 | +### 🤝 **Team Collaboration** |
| 168 | +- **Standardized setup**: Consistent test environment |
| 169 | +- **Shared fixtures**: Common test data and utilities |
| 170 | +- **Documentation**: Clear examples and best practices |
| 171 | + |
| 172 | +## Testing Patterns by Use Case |
| 173 | + |
| 174 | +### Data Processing |
| 175 | +```python |
| 176 | +def test_data_cleaning(tmp_path): |
| 177 | + # Use temporary files for safe testing |
| 178 | + input_file = tmp_path / "dirty_data.csv" |
| 179 | + # Test cleaning pipeline... |
| 180 | +``` |
| 181 | + |
| 182 | +### Machine Learning |
| 183 | +```python |
| 184 | +@pytest.fixture(scope="session") |
| 185 | +def trained_model(): |
| 186 | + # Train once, test many aspects |
| 187 | + return expensive_model_training() |
| 188 | + |
| 189 | +def test_model_accuracy(trained_model): |
| 190 | + # Test with proper numerical comparison |
| 191 | + assert model.accuracy > 0.9 |
| 192 | +``` |
| 193 | + |
| 194 | +### External APIs |
| 195 | +```python |
| 196 | +@patch('requests.get') |
| 197 | +def test_api_integration(mock_get): |
| 198 | + # Mock external calls for reliable testing |
| 199 | + mock_get.return_value.json.return_value = {'data': 'test'} |
| 200 | + # Test your logic... |
| 201 | +``` |
| 202 | + |
| 203 | +## Getting Help |
| 204 | + |
| 205 | +Each directory contains detailed README files with: |
| 206 | +- Specific feature documentation |
| 207 | +- Running instructions |
| 208 | +- Best practices |
| 209 | +- Troubleshooting guides |
| 210 | + |
| 211 | +Start with the examples that match your current testing needs, then explore advanced features as your test suite grows. |
| 212 | + |
| 213 | +## Related Resources |
| 214 | + |
| 215 | +- **Article**: [Pytest for Data Scientists](https://towardsdatascience.com/pytest-for-data-scientists-2990319e55e6) |
| 216 | +- **Video Series**: [YouTube Playlist](https://www.youtube.com/playlist?list=PLnK6m_JBRVNoYEer9hBmTNwkYB3gmbOPO) |
| 217 | +- **Official Docs**: [pytest.org](https://docs.pytest.org/) |
| 218 | + |
| 219 | +## Contributing |
| 220 | + |
| 221 | +These examples are designed to be practical and educational. Feel free to adapt them for your specific data science testing needs. |
0 commit comments