Skip to content

Add selected loss.py targets to unified calibration#946

Merged
MaxGhenis merged 2 commits into
mainfrom
codex/add-loss-py-targets-unified
May 16, 2026
Merged

Add selected loss.py targets to unified calibration#946
MaxGhenis merged 2 commits into
mainfrom
codex/add-loss-py-targets-unified

Conversation

@anth-volk
Copy link
Copy Markdown
Collaborator

@anth-volk anth-volk commented May 11, 2026

Fixes #945

Requesting review from @MaxGhenis to confirm that we want these targets in US.h5 and state/district-level datasets.

Summary

  • Port selected legacy loss.py target families into unified calibration target ETL and selection.
  • Add SOI taxable AGI-grid targets for irs_employment_income, pension_income, and social_security.
  • Add or select state EITC claimant counts, state SNAP household counts, LIHEAP household counts, state ACS rent, real-estate-tax itemizer counts, AOTC and education credit targets, and Medicare Part B age-bucket targets.
  • Add focused unit coverage and a Towncrier changelog fragment.

Targets Added

Exact target-selection rules added to policyengine_us_data/calibration/target_config.yaml:

Variable Geo level Domain variable
household_count state snap
rent state none
tax_unit_count state eitc
tax_unit_count state real_estate_taxes,tax_unit_itemizes
tax_unit_count national adjusted_gross_income,income_tax_before_credits,irs_employment_income
irs_employment_income national adjusted_gross_income,income_tax_before_credits,irs_employment_income
tax_unit_count national adjusted_gross_income,filing_status,income_tax_before_credits,irs_employment_income
irs_employment_income national adjusted_gross_income,filing_status,income_tax_before_credits,irs_employment_income
tax_unit_count national adjusted_gross_income,income_tax_before_credits,pension_income
pension_income national adjusted_gross_income,income_tax_before_credits,pension_income
tax_unit_count national adjusted_gross_income,filing_status,income_tax_before_credits,pension_income
pension_income national adjusted_gross_income,filing_status,income_tax_before_credits,pension_income
tax_unit_count national adjusted_gross_income,income_tax_before_credits,social_security
social_security national adjusted_gross_income,income_tax_before_credits,social_security
tax_unit_count national adjusted_gross_income,filing_status,income_tax_before_credits,social_security
social_security national adjusted_gross_income,filing_status,income_tax_before_credits,social_security
medicare_part_b_premium national age
household_count national spm_unit_energy_subsidy_reported
refundable_american_opportunity_credit national refundable_american_opportunity_credit
education_tax_credits national education_tax_credits
tax_unit_count national refundable_american_opportunity_credit
tax_unit_count national education_tax_credits
tax_unit_count national real_estate_taxes,tax_unit_itemizes

New ETL target rows introduced by this PR:

  • State EITC claimant counts: tax_unit_count by state where eitc > 0, sourced from IRS EITC Central controls.
  • State ACS rent totals: rent by state, sourced from acs_housing_costs_YEAR.csv / ACS B25060.
  • Medicare Part B age buckets: medicare_part_b_premium nationally by 10-year age buckets, sourced from healthcare_spending.csv.
  • SOI taxable AGI-grid income rows for irs_employment_income, pension_income, and social_security, with matching positive-domain tax_unit_count rows, sourced from tracked SOI Table 1.4 targets.

Verification

  • python3 -m py_compile policyengine_us_data/db/etl_irs_soi.py policyengine_us_data/db/etl_national_targets.py policyengine_us_data/db/create_field_valid_values.py tests/unit/test_etl_irs_soi_overlay.py tests/unit/test_etl_national_targets.py tests/unit/calibration/test_target_config.py
  • ruff check policyengine_us_data/db/etl_irs_soi.py policyengine_us_data/db/etl_national_targets.py tests/unit/test_etl_irs_soi_overlay.py tests/unit/test_etl_national_targets.py tests/unit/calibration/test_target_config.py
  • ruff format --check policyengine_us_data/db/etl_irs_soi.py policyengine_us_data/db/etl_national_targets.py tests/unit/calibration/test_target_config.py tests/unit/test_etl_irs_soi_overlay.py tests/unit/test_etl_national_targets.py
  • git diff --check

Notes

  • make lint was run but is blocked locally by pre-existing untracked .tmp/ report-card artifacts that ruff sees outside this PR.
  • uv run pytest for the targeted tests remains blocked locally on macOS x86_64 because the lock resolves torch==2.9.1, which has no wheel for this platform.

@anth-volk anth-volk requested a review from MaxGhenis May 11, 2026 12:46
@anth-volk anth-volk marked this pull request as ready for review May 13, 2026 15:27
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the target-selection additions, ETL insertion points, and unit coverage. The new SOI positive-domain strata, EITC claimant counts, ACS rent targets, Medicare Part B age buckets, and target_config rules look consistent with the unified calibration flow, and CI is green. No blockers from my review.

Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one issue that I think should block merge.

[P2] Preserve SOI variable-specific uprating for the new income-domain targets

load_national_taxable_agi_domain_filing_status_targets writes the selected SOI rows with period=int(row["Year"]) and raw Value (policyengine_us_data/db/etl_irs_soi.py:1116-1117). Those rows then go through unified_matrix_builder._get_uprating_info, which applies cpi to every non-*_count variable. That is not what the legacy loss.py SOI target path did: get_soi(time_period) applies SOI_UPRATING_MAP variable-specific factors before the row enters the loss matrix. For 2024, the newly-added 2023-source amount targets differ from legacy by about -0.84% for employment/pension income and -6.60% for Social Security in my spot check, because CPI is lower than the SOI/Social Security uprater. That will push all added irs_employment_income, pension_income, and social_security AGI-domain amount targets to the wrong level whenever the calibration year exceeds the source SOI year.

Please either materialize these SOI target rows at the calibration target year with the same get_soi(target_year) values used by the legacy path, or carry per-target uprating metadata through the DB/builder instead of using generic CPI for these variables. I would also add a regression test comparing a 2024 loaded target value to get_soi(2024) for at least total_social_security/social_security, since that case has the largest miss.

Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the SOI domain-target uprating issue in 63feccf: these rows now use get_soi(target_year) values and store period=target_year, with regression coverage that the loaded rows are materialized at the calibration period.

@MaxGhenis MaxGhenis merged commit ef0bc21 into main May 16, 2026
12 checks passed
@MaxGhenis MaxGhenis deleted the codex/add-loss-py-targets-unified branch May 16, 2026 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Draft: Port selected loss.py targets into unified calibration

2 participants