Add selected loss.py targets to unified calibration#946
Conversation
MaxGhenis
left a comment
There was a problem hiding this comment.
Reviewed the target-selection additions, ETL insertion points, and unit coverage. The new SOI positive-domain strata, EITC claimant counts, ACS rent targets, Medicare Part B age buckets, and target_config rules look consistent with the unified calibration flow, and CI is green. No blockers from my review.
MaxGhenis
left a comment
There was a problem hiding this comment.
Found one issue that I think should block merge.
[P2] Preserve SOI variable-specific uprating for the new income-domain targets
load_national_taxable_agi_domain_filing_status_targets writes the selected SOI rows with period=int(row["Year"]) and raw Value (policyengine_us_data/db/etl_irs_soi.py:1116-1117). Those rows then go through unified_matrix_builder._get_uprating_info, which applies cpi to every non-*_count variable. That is not what the legacy loss.py SOI target path did: get_soi(time_period) applies SOI_UPRATING_MAP variable-specific factors before the row enters the loss matrix. For 2024, the newly-added 2023-source amount targets differ from legacy by about -0.84% for employment/pension income and -6.60% for Social Security in my spot check, because CPI is lower than the SOI/Social Security uprater. That will push all added irs_employment_income, pension_income, and social_security AGI-domain amount targets to the wrong level whenever the calibration year exceeds the source SOI year.
Please either materialize these SOI target rows at the calibration target year with the same get_soi(target_year) values used by the legacy path, or carry per-target uprating metadata through the DB/builder instead of using generic CPI for these variables. I would also add a regression test comparing a 2024 loaded target value to get_soi(2024) for at least total_social_security/social_security, since that case has the largest miss.
Fixes #945
Requesting review from @MaxGhenis to confirm that we want these targets in US.h5 and state/district-level datasets.
Summary
Targets Added
Exact target-selection rules added to
policyengine_us_data/calibration/target_config.yaml:household_countstatesnaprentstatetax_unit_countstateeitctax_unit_countstatereal_estate_taxes,tax_unit_itemizestax_unit_countnationaladjusted_gross_income,income_tax_before_credits,irs_employment_incomeirs_employment_incomenationaladjusted_gross_income,income_tax_before_credits,irs_employment_incometax_unit_countnationaladjusted_gross_income,filing_status,income_tax_before_credits,irs_employment_incomeirs_employment_incomenationaladjusted_gross_income,filing_status,income_tax_before_credits,irs_employment_incometax_unit_countnationaladjusted_gross_income,income_tax_before_credits,pension_incomepension_incomenationaladjusted_gross_income,income_tax_before_credits,pension_incometax_unit_countnationaladjusted_gross_income,filing_status,income_tax_before_credits,pension_incomepension_incomenationaladjusted_gross_income,filing_status,income_tax_before_credits,pension_incometax_unit_countnationaladjusted_gross_income,income_tax_before_credits,social_securitysocial_securitynationaladjusted_gross_income,income_tax_before_credits,social_securitytax_unit_countnationaladjusted_gross_income,filing_status,income_tax_before_credits,social_securitysocial_securitynationaladjusted_gross_income,filing_status,income_tax_before_credits,social_securitymedicare_part_b_premiumnationalagehousehold_countnationalspm_unit_energy_subsidy_reportedrefundable_american_opportunity_creditnationalrefundable_american_opportunity_crediteducation_tax_creditsnationaleducation_tax_creditstax_unit_countnationalrefundable_american_opportunity_credittax_unit_countnationaleducation_tax_creditstax_unit_countnationalreal_estate_taxes,tax_unit_itemizesNew ETL target rows introduced by this PR:
tax_unit_countby state whereeitc > 0, sourced from IRS EITC Central controls.rentby state, sourced fromacs_housing_costs_YEAR.csv/ ACS B25060.medicare_part_b_premiumnationally by 10-yearagebuckets, sourced fromhealthcare_spending.csv.irs_employment_income,pension_income, andsocial_security, with matching positive-domaintax_unit_countrows, sourced from tracked SOI Table 1.4 targets.Verification
Notes