CodeCutTech
diff --git a/‎.gitignore
Lines changed: 4 additions & 1 deletion b/‎.gitignore
Lines changed: 4 additions & 1 deletion
diff --git a/‎data_science_tools/pgvector_rag.ipynb
Lines changed: 1 addition & 1 deletion b/‎data_science_tools/pgvector_rag.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎data_science_tools/pytest/README.md
Lines changed: 219 additions & 0 deletions b/‎data_science_tools/pytest/README.md
Lines changed: 219 additions & 0 deletions
diff --git a/‎data_science_tools/pytest/advanced_fixtures/README.md
Lines changed: 52 additions & 0 deletions b/‎data_science_tools/pytest/advanced_fixtures/README.md
Lines changed: 52 additions & 0 deletions
diff --git a/‎data_science_tools/pytest/advanced_fixtures/autouse_fixtures.py
Lines changed: 23 additions & 0 deletions b/‎data_science_tools/pytest/advanced_fixtures/autouse_fixtures.py
Lines changed: 23 additions & 0 deletions
diff --git a/‎data_science_tools/pytest/advanced_fixtures/conftest.py
Lines changed: 63 additions & 0 deletions b/‎data_science_tools/pytest/advanced_fixtures/conftest.py
Lines changed: 63 additions & 0 deletions
@@ -149,4 +149,7 @@ outputs
 marimo_notebooks
 *.csv
 *.parquet
-__marimo__/
+__marimo__/
+
+# Claude Code documentation
+CLAUDE.md
@@ -602,4 +602,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}
@@ -1,2 +1,221 @@
 [![View on YouTube](https://img.shields.io/badge/YouTube-Watch%20on%20Youtube-red?logo=youtube)](https://www.youtube.com/playlist?list=PLnK6m_JBRVNoYEer9hBmTNwkYB3gmbOPO) [![View on Medium](https://img.shields.io/badge/Medium-View%20on%20Medium-blue?logo=medium)](https://towardsdatascience.com/pytest-for-data-scientists-2990319e55e6)
 
+# Pytest for Data Scientists
+
+Comprehensive examples and best practices for testing data science code with pytest.
+
+## Directory Structure
+
+```
+pytest/
+├── README.md                    # This file
+├── get_started/                 # Basic pytest concepts
+│   └── sentiment.py
+├── parametrization/            # Parametrized testing
+│   ├── process.py
+│   ├── process_fixture.py
+│   └── sentiment.py
+├── test_structure_example/     # Project organization
+│   ├── src/
+│   └── tests/
+├── advanced_fixtures/          # Advanced fixture patterns
+│   ├── session_scoped.py
+│   ├── autouse_fixtures.py
+│   ├── conftest.py
+│   └── README.md
+├── temporary_files/           # Safe file I/O testing
+│   ├── file_operations.py
+│   ├── data_pipeline.py
+│   └── README.md
+├── numerical_testing/         # NumPy/DataFrame testing
+│   ├── numpy_arrays.py
+│   ├── dataframe_testing.py
+│   └── README.md
+├── mocking/                   # External dependency mocking
+│   ├── api_mocking.py
+│   ├── database_mocking.py
+│   ├── requirements.txt
+│   └── README.md
+├── custom_markers/            # Test organization with markers
+│   ├── pytest.ini
+│   ├── marked_tests.py
+│   └── README.md
+└── project_config/           # Complete project configuration
+    ├── pytest.ini
+    ├── conftest.py
+    ├── test_with_fixtures.py
+    └── README.md
+```
+
+## Quick Start
+
+### Basic Installation
+```bash
+pip install pytest
+
+# For advanced features
+pip install pytest-cov pytest-xdist pytest-benchmark
+```
+
+### Run Examples
+```bash
+# Basic examples
+pytest get_started/
+pytest parametrization/
+
+# Advanced features
+pytest advanced_fixtures/
+pytest numerical_testing/
+pytest mocking/
+
+# Full project configuration
+cd project_config && pytest
+```
+
+## Feature Overview
+
+### 🚀 **Basic Concepts** (`get_started/`, `parametrization/`)
+- Simple test functions and assertions
+- Parametrized tests for multiple test cases
+- Basic fixtures for data reuse
+
+### 🔧 **Advanced Fixtures** (`advanced_fixtures/`)
+- **Session-scoped fixtures**: Load expensive datasets once
+- **Autouse fixtures**: Automatic setup for all tests  
+- **Shared fixtures**: Common test data via `conftest.py`
+
+### 📁 **Safe File Testing** (`temporary_files/`)
+- **tmp_path fixture**: Isolated temporary directories
+- **File I/O testing**: CSV, JSON, model serialization
+- **Pipeline testing**: End-to-end data processing
+
+### 🔢 **Numerical Testing** (`numerical_testing/`)
+- **NumPy arrays**: Floating-point comparison with tolerance
+- **Pandas DataFrames**: Proper DataFrame equality testing
+- **Statistical validation**: Testing model outputs and data properties
+
+### 🌐 **Mocking External Services** (`mocking/`)
+- **API mocking**: Test without hitting real APIs
+- **Database mocking**: Test queries without databases
+- **Error simulation**: Test failure scenarios safely
+
+### 🏷️ **Custom Markers** (`custom_markers/`)
+- **Test organization**: Group tests by speed, requirements, domain
+- **Selective execution**: Run specific test categories
+- **CI/CD integration**: Different test suites for different stages
+
+### ⚙️ **Project Configuration** (`project_config/`)
+- **Complete setup**: Production-ready pytest configuration
+- **Centralized fixtures**: Project-wide test utilities
+- **Best practices**: Logging, warnings, reproducibility
+
+## Common Workflows
+
+### Development Workflow
+```bash
+# Fast feedback during development
+pytest -m fast
+
+# Before committing changes
+pytest -m "fast or (integration and not slow)"
+
+# Full test suite
+pytest
+```
+
+### Continuous Integration
+```bash
+# Unit tests (fast feedback)
+pytest -m "unit and fast"
+
+# Integration tests  
+pytest -m "integration and not gpu and not expensive"
+
+# Performance tests (separate stage)
+pytest -m "slow or expensive"
+```
+
+### Data Science Specific
+```bash
+# Test data processing pipelines
+pytest -m data_processing
+
+# Test model training
+pytest -m model_training
+
+# Test without external dependencies
+pytest -m "not api and not database"
+```
+
+## Key Benefits for Data Scientists
+
+### 🛡️ **Reliability**
+- **Reproducible results**: Consistent random seeds
+- **Isolated tests**: No interference between tests
+- **Proper numerical comparison**: Handle floating-point precision
+
+### ⚡ **Performance** 
+- **Fast feedback**: Separate fast/slow test categories
+- **Efficient fixtures**: Load expensive data once
+- **Parallel execution**: Run tests concurrently
+
+### 🔍 **Better Debugging**
+- **Clear error messages**: Detailed assertion information
+- **Test organization**: Easy to find and run specific tests
+- **Comprehensive logging**: Track test execution
+
+### 🤝 **Team Collaboration**
+- **Standardized setup**: Consistent test environment
+- **Shared fixtures**: Common test data and utilities  
+- **Documentation**: Clear examples and best practices
+
+## Testing Patterns by Use Case
+
+### Data Processing
+```python
+def test_data_cleaning(tmp_path):
+    # Use temporary files for safe testing
+    input_file = tmp_path / "dirty_data.csv"
+    # Test cleaning pipeline...
+```
+
+### Machine Learning
+```python
+@pytest.fixture(scope="session")
+def trained_model():
+    # Train once, test many aspects
+    return expensive_model_training()
+
+def test_model_accuracy(trained_model):
+    # Test with proper numerical comparison
+    assert model.accuracy > 0.9
+```
+
+### External APIs
+```python
+@patch('requests.get')
+def test_api_integration(mock_get):
+    # Mock external calls for reliable testing
+    mock_get.return_value.json.return_value = {'data': 'test'}
+    # Test your logic...
+```
+
+## Getting Help
+
+Each directory contains detailed README files with:
+- Specific feature documentation
+- Running instructions  
+- Best practices
+- Troubleshooting guides
+
+Start with the examples that match your current testing needs, then explore advanced features as your test suite grows.
+
+## Related Resources
+
+- **Article**: [Pytest for Data Scientists](https://towardsdatascience.com/pytest-for-data-scientists-2990319e55e6)
+- **Video Series**: [YouTube Playlist](https://www.youtube.com/playlist?list=PLnK6m_JBRVNoYEer9hBmTNwkYB3gmbOPO)
+- **Official Docs**: [pytest.org](https://docs.pytest.org/)
+
+## Contributing
+
+These examples are designed to be practical and educational. Feel free to adapt them for your specific data science testing needs.
@@ -0,0 +1,52 @@
+# Advanced Fixtures Examples
+
+This directory demonstrates advanced pytest fixture patterns particularly useful for data science projects.
+
+## Files
+
+- `session_scoped.py` - Session-scoped fixtures for expensive operations (like loading large datasets)
+- `autouse_fixtures.py` - Auto-use fixtures that run automatically before each test
+- `conftest.py` - Shared fixtures available to all tests in this directory
+
+## Key Concepts
+
+### Session-Scoped Fixtures
+- Run only once per test session
+- Perfect for loading expensive datasets or training models
+- Shared across all tests that request them
+- Significant performance improvements for test suites
+
+### Autouse Fixtures
+- Automatically applied to all tests without explicit request
+- Great for setup that should always happen (like setting random seeds)
+- Ensures consistent test environments
+- Reduces boilerplate code in individual tests
+
+### Conftest.py
+- Provides fixtures to all test files in the directory
+- No imports needed - fixtures are automatically available
+- Can have different conftest.py files at different directory levels
+- Fixtures in parent directories are available to child directories
+
+## Running the Examples
+
+```bash
+# Run all advanced fixture examples
+pytest advanced_fixtures/
+
+# Run with verbose output to see fixture setup
+pytest -v advanced_fixtures/
+
+# Run only session-scoped fixture tests
+pytest advanced_fixtures/session_scoped.py
+
+# Run only autouse fixture tests  
+pytest advanced_fixtures/autouse_fixtures.py
+```
+
+## Key Benefits for Data Science
+
+1. **Performance**: Session-scoped fixtures prevent reloading expensive datasets
+2. **Reproducibility**: Autouse fixtures ensure consistent random seeds
+3. **Organization**: Conftest.py centralizes common test data and setup
+4. **Maintainability**: Reduces code duplication across test files
@@ -0,0 +1,23 @@
+import numpy as np
+import pytest
+
+
+@pytest.fixture(autouse=True)
+def setup_random_seeds():
+	print("Setting up random seeds...")
+	np.random.seed(42)
+	import random
+	random.seed(42)
+
+
+def test_model_prediction():
+	# This test will have reproducible random results
+	X = np.random.randn(100, 5)
+	# Your model training and prediction code here
+	assert len(X) == 100
+
+
+def test_data_sampling():
+	# This test also gets reproducible randomness
+	sample = np.random.choice([1, 2, 3, 4, 5], size=10)
+	assert len(sample) == 10
@@ -0,0 +1,63 @@
+"""
+Shared fixtures for advanced_fixtures examples.
+
+This conftest.py file provides fixtures that can be used across
+all test files in this directory without explicit imports.
+"""
+
+import numpy as np
+import pandas as pd
+import pytest
+
+
+@pytest.fixture(scope="session")
+def ml_dataset():
+	"""Create a machine learning dataset for testing."""
+	np.random.seed(42)  # For reproducibility
+
+	# Generate features
+	n_samples = 1000
+	n_features = 4
+
+	X = np.random.randn(n_samples, n_features)
+	# Create a target variable with some relationship to features
+	y = (X[:, 0] + X[:, 1] * 0.5 + np.random.normal(0, 0.1, n_samples) > 0).astype(int)
+
+	# Create DataFrame
+	feature_names = [f"feature_{i + 1}" for i in range(n_features)]
+	df = pd.DataFrame(X, columns=feature_names)
+	df["target"] = y
+
+	return df
+
+
+@pytest.fixture(scope="module")
+def data_processing_config():
+	"""Configuration for data processing tests."""
+	return {
+		"train_size": 0.8,
+		"random_state": 42,
+		"normalize": True,
+		"remove_outliers": True,
+		"outlier_threshold": 3.0,
+	}
+
+
+@pytest.fixture
+def sample_predictions():
+	"""Generate sample model predictions for testing."""
+	np.random.seed(42)
+	return {
+		"y_true": np.random.randint(0, 2, 100),
+		"y_pred": np.random.rand(100),  # Probability predictions
+		"y_pred_binary": np.random.randint(0, 2, 100),
+	}
+
+
+@pytest.fixture(autouse=True)
+def reset_random_state():
+	"""Ensure each test starts with a known random state."""
+	np.random.seed(42)
+	import random
+
+	random.seed(42)