# Install dependencies
make install
# Alternative installation
pip install -e .[dev]
# Format code
make format # Runs black with line length 79 and fixes import ordering
# Run all tests
make test
# Run specific test file or directory
pytest policyengine_us/tests/path/to/test_file.py
# Run specific test function
pytest policyengine_us/tests/path/to/test_file.py::test_function_name
# Run specific YAML tests
policyengine-core test path/to/tests -c policyengine_us [-v]
# Run microsimulation test
pytest policyengine_us/tests/microsimulation/test_microsim.py
# Run YAML-specific tests
make test-yaml-structural
make test-yaml-no-structural
# Generate documentation
make documentation
- Checkout a PR:
gh pr checkout [PR-NUMBER]
- View PR list:
gh pr list
- View PR details:
gh pr view [PR-NUMBER]
- Contributing to PRs:
- After making code changes, run
make format
to ensure code meets style guidelines - Use
git push
to push changes to the PR branch - Alternatively, use VS Code's "Sync Changes" button in the Source Control panel
- After making code changes, run
- Imports: Use absolute imports from policyengine_us.model_api for Variables
- Formatting: Line length 79 characters; use Black for formatting
- Types: Use type hints; import ArrayLike from numpy.typing
- Variable Naming: Use snake_case for variable names and function names
- Error Handling: Use np.divide with out/where parameters to avoid divide-by-zero errors
- Documentation: Add docstrings to classes and functions; include description, parameters, returns
- Python >= 3.10, < 3.13
- Use
where
notif
andmax_
/min_
notmax
/min
for vectorization - For array comparisons use
(x >= min) & (x <= max)
notmin <= x <= max
- Test-driven development: write YAML tests before implementing variables
- Variables must have descriptive names and follow pattern in existing files
- Parameters defined in YAML with metadata (units, reference, etc.)
- Follow GitHub Flow with PRs targeting master branch
- Every PR needs a changelog entry in changelog_entry.yaml
- Unit tests with scalar values can pass while vectorized microsimulation fails
- When implementing a previously empty variable, be sure to check for dependent formulas
- When using
defined_for
, ensure it's tested in microsimulation context - Be careful with chained comparisons in formulas - they work with scalars but fail with arrays
- Prefer explicit vectorized comparison operators joined with
&
and|
- Avoid using
if
checks with.any()
in variable formulas as they can prevent proper vectorization - Use
p = parameters(period).gov.<program>
pattern for accessing parameter trees - Break complex calculations into separate variables for better modularity and testing
- In YAML tests, use
[val1, val2]
array syntax instead of hyphenated lists for output values - When updating test values, add detailed calculation steps in comments to document the derivation
- For scale parameters that return integers, avoid using
rate_unit: int
in metadata (use/1
instead) due to parameter validation issues - Parameter file naming matters: make sure variables reference the exact parameter file name
- When refactoring from enum to numeric values, update all downstream dependencies consistently
- Prefer parameter-driven calculations over hardcoded logic in formulas when possible
- Use
bool
instead ofint
or/1
inrate_unit
for scale parameters when appropriate to avoid validation issues - For categorization logic, create separate variables for each test (e.g.,
is_school_age
) and use them in downstream variables - When dynamically determining numeric groups/categories, calculate values rather than hardcoding them (e.g.,
max_value + 1
) - Structure parameter files to match program documentation and real-world rules to improve maintainability
- Program takeup is assigned during microdata construction, not simulation time
- Changes to takeup parameters (SNAP, EITC, etc.) have no effect in the web app
- These parameters should include
economy: false
in their metadata
-
Unit Tests:
- Create tests in
variable.yaml
that test only the direct inputs tovariable.py
- Unit tests should focus on validating a single variable's logic in isolation
- Use simple inputs that test specific edge cases and logic branches
- Variables higher in the dependency chain should stub their inputs directly
- Each test should verify one specific behavior or case
- Create tests in
-
Integration Tests:
- Place integration tests in
integration.yaml
within the relevant module directory - Integration tests verify the entire calculation pipeline works together
- Test real-world scenarios with multiple people and complex household structures
- Validate intermediate values along with final outputs
- Include edge cases that test interactions between different parts of the system
- Place integration tests in
- Cross-check parameter values against authoritative external sources (gov websites, calculators)
- Document the source, publication date, and effective dates in parameter metadata
- Include both title and href for references to maintain traceability
- Use multiple sources to validate complex parameters (e.g., tax brackets, benefit amounts)
- For annually updated values, document the adjustment methodology and inflation indices
- Test parameter values with real-world examples through YAML tests
- Be especially careful with multiplicative factors, which can significantly magnify errors
- Consider interactions between parameters when updating values
- When updating parameters, verify both individual values and downstream impacts
- Use descriptive changelog entries that reference authoritative sources for updates
- Always cite specific regulation sections in variable reference and documentation
- When implementing complex benefit calculations, clearly document the step-by-step process based on regulations
- Follow the exact order of operations specified in regulations (e.g., which exclusions apply first)
- Create dedicated unit tests for each step of complex benefit calculations
- Implement integration tests that verify the entire benefit calculation pipeline
- Add detailed comments that reference specific regulatory paragraphs or sections
- Create README documentation for complex modules or benefit programs
- For income tests or phaseouts, pay special attention to threshold values and differential calculations
- Verify behavior at edge cases (income just below/above thresholds, exact boundary conditions)
- Consider real-world examples to validate implementation, including official calculators or published examples
-
⚠️ ABSOLUTELY NEVER HARDCODE LOGIC JUST TO PASS SPECIFIC TEST CASES⚠️ - This is a serious form of dishonesty that undermines code quality and model integrity
- It creates technical debt and maintenance nightmares
- It destroys the ability to trust results and undermines the entire purpose of tests
- NEVER add conditional logic that returns fixed values for specific input combinations
- NEVER use period.start.year or other conditional checks to return hardcoded values for test cases
- If you're tempted to hardcode values to make tests pass, you MUST fix the underlying issue instead
-
Test cases should be treated as verification of the correctness of the calculation logic.
-
If tests fail:
- FIRST understand WHY they are failing:
- Is there a legitimate bug in the implementation?
- Is there a misunderstanding of the regulations/policy?
- Are the test expectations incorrect or outdated?
- Fix the ACTUAL ROOT CAUSE, not the symptom:
- If the implementation is wrong, fix the formula/algorithm
- If the test expectations are wrong, update the test expectations
- If parameters have changed, document this and update accordingly
- FIRST understand WHY they are failing:
-
When dealing with regulatory examples:
- Use period-appropriate parameter values (e.g., for 1986 examples, use 1986 parameter values)
- If you need special handling for an example year, make the algorithm work correctly for that year's parameters
- Document any special time-period specific logic in BOTH code comments and variable documentation
- Always ensure your implementation follows the general rules and principles for all time periods
- Focus on preserving the calculation PROCESS rather than just matching specific OUTCOMES
-
Never sacrifice correctness and maintainability for passing tests. It is better to have a failing test with a clear explanation than a "passing" test with incorrect implementation.
- When using
breakdown
metadata in parameters, avoid using variable references for integer values. Instead use Python expressions likerange(1, 5)
. - The parameter validation system in policyengine-core has issues with certain parameter structures:
- Using boolean keys (
True
/False
) as parameter names can cause validation errors - Using integer output variables in breakdown metadata can cause parameter validation errors
- If you encounter
UnboundLocalError: cannot access local variable 'possible_values'
, it's likely due to parameter validation issues
- Using boolean keys (
- To fix parameter validation issues:
- Split complex parameters into separate, simpler parameter files
- Use string names instead of boolean keys (e.g.,
"eligible"
instead ofTrue
) - Use Python expressions like
range()
instead of variable references for enumerated values - Use breakdown patterns that match existing working examples in the codebase
- See GitHub issue #346 for more details
-
Marital Units:
- Include exactly 1 person (if unmarried) or 2 people (if married)
- Do NOT include children or dependents
- Defined in entities.py as "An unmarried person, or a married and co-habiting couple"
- Used for calculations where spousal relationships matter (like SSI)
marital_unit.nb_persons()
will return 1 or 2, never more
-
SSI Income Attribution:
- For married couples where both are SSI-eligible:
- Combined income is attributed to each spouse via
ssi_marital_earned_income
andssi_marital_unearned_income
- These variables use
ssi_marital_both_eligible
to determine if combined income should be used - When both eligible, each spouse receives the combined household income for SSI calculations
- Combined income is attributed to each spouse via
- For married couples where both are SSI-eligible:
-
SSI Spousal Deeming:
- Only applies when one spouse is eligible and the other is ineligible
- If both are eligible, spousal deeming doesn't apply; instead income is combined through marital income variables
-
Debugging Entity Relationships:
- When checking entity totals or sums, be aware of which entity level you're operating at
- For variables that need to sum across units, use
entity.sum(variable)
- Use
entity.nb_persons()
to count people in an entity