Assign electric and gas utilities for MD#445
Open
alexhyunminlee wants to merge 11 commits into
Open
Conversation
Resolve conflict in context/README.md by taking main's updated description for ny_utility_assignment_resstock.md and dropping the now-merged ct_utility_gis_data_sources.md entry (removed in the main refactor). Co-authored-by: Cursor <cursoragent@cursor.com>
alxsmith
approved these changes
Jun 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements GIS-based electric and gas utility assignment for Maryland ResStock buildings, following the same PUMA-overlap probabilistic pattern used for NY, and extending the generic utility infrastructure in `utils.py` with a nearest-neighbor fill for PUMAs that have no HIFLD polygon coverage.
Closes #436
Utility shapefile sources and fetching
Electric and gas service territory polygons are fetched from the HIFLD Open dataset maintained by DOE/CESER and originally hosted at `hifld-geoplatform.hub.arcgis.com`. The HIFLD portal was deactivated on August 26, 2025; both datasets are now archived at DataLumos. The fetch logic in `load_utility_boundaries()` (in `data/resstock/utility/utils.py`) tries a list of live ArcGIS REST mirror endpoints in order and falls back to DataLumos if all fail. For gas territories, there is an additional last-resort fallback to a locally cached DataLumos ZIP file.
On first fetch the result is:
Subsequent runs read directly from S3 using the cached filename; no re-fetch occurs.
state_configs.yaml changes
The `MD` entry in `data/resstock/state_configs.yaml` was updated to add a `utility_assignment` block that registers MD in `SUPPORTED_UTILITY_STATES` and passes configuration to the state module:
No `excluded_gas_utilities` are configured for MD; all HIFLD LDCs are eligible for assignment.
PUMA–utility overlap calculation
Assignment is PUMA-based: each building inherits the utility probability distribution of its 2010-definition Census PUMA (taken from the last 5 characters of `in.puma` in the ResStock metadata).
`calculate_puma_utility_overlap()` in `utils.py` performs a spatial intersection between the 44 MD Census PUMAs and each utility's service territory polygon. For each PUMA × utility pair it records the area of intersection in the state-plane CRS. This intersection is then divided by the PUMA's total intersected area to produce a fractional overlap weight — the share of the PUMA's covered area that falls within that utility's territory.
`calculate_utility_probabilities()` normalises these weights row-by-row so each PUMA's utility probabilities sum to 1, producing a wide probability table (one row per PUMA, one column per utility).
Per-building utility sampling
`sample_utility_per_building()` joins each building to its PUMA's probability row and draws one utility via `np.random.choice` with those probabilities (fixed seed 42 for reproducibility).
The output is written to `metadata_utility/state=MD/utility_assignment.parquet` — a slim file containing only `bldg_id`, `sb.electric_utility`, and `sb.gas_utility` — which is uploaded to `s3://data.sb/nrel/resstock/res_2024_amy2018_2_sb/metadata_utility/state=MD/utility_assignment.parquet`.
HIFLD coverage gaps
HIFLD utility boundaries do not cover the full land area of Maryland: 10 of 44 PUMAs have no electric coverage and a partly overlapping set of 10 of 44 PUMAs have no gas coverage. Without a fix, approximately 25.8% of MD buildings (2,575 of 9,996) would be left with no utility assigned. Analysis showed that:
The gaps arise because HIFLD data is self-reported by utilities — there is no federal requirement for every provider (especially municipal utilities and co-ops) to file precise GIS polygons — and because the portal was deactivated in August 2025 with the 2024 snapshot as the final version.
Nearest-neighbor fill for uncovered PUMAs
`fill_missing_puma_probabilities()` (new generic function in `utils.py`) resolves coverage gaps before sampling:
This function is state-generic and opt-in. `assign_utility_md.py` enables it by passing `fill_missing_pumas=True` to `create_hh_utilities()`. After the fill, all 9,996 MD buildings receive an electric utility and all 5,231 natgas-connected buildings receive a gas utility.
Reviewer focus