Skip to content

Assign electric and gas utilities for MD#445

Open
alexhyunminlee wants to merge 11 commits into
mainfrom
436-fetch-md-utility-shapefiles
Open

Assign electric and gas utilities for MD#445
alexhyunminlee wants to merge 11 commits into
mainfrom
436-fetch-md-utility-shapefiles

Conversation

@alexhyunminlee

@alexhyunminlee alexhyunminlee commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

This PR implements GIS-based electric and gas utility assignment for Maryland ResStock buildings, following the same PUMA-overlap probabilistic pattern used for NY, and extending the generic utility infrastructure in `utils.py` with a nearest-neighbor fill for PUMAs that have no HIFLD polygon coverage.

Closes #436

Utility shapefile sources and fetching

Electric and gas service territory polygons are fetched from the HIFLD Open dataset maintained by DOE/CESER and originally hosted at `hifld-geoplatform.hub.arcgis.com`. The HIFLD portal was deactivated on August 26, 2025; both datasets are now archived at DataLumos. The fetch logic in `load_utility_boundaries()` (in `data/resstock/utility/utils.py`) tries a list of live ArcGIS REST mirror endpoints in order and falls back to DataLumos if all fail. For gas territories, there is an additional last-resort fallback to a locally cached DataLumos ZIP file.

On first fetch the result is:

  1. Filtered to MD features
  2. Written as a dated WKT CSV (e.g. `md_electric_utilities_20260605.csv`) to a local cache directory
  3. Uploaded to `s3://data.sb/gis/utility_boundaries/`
  4. Filename recorded in `data/resstock/state_configs.yaml` under `MD.utility_assignment.kwargs.electric_poly_filename` / `gas_poly_filename`

Subsequent runs read directly from S3 using the cached filename; no re-fetch occurs.

state_configs.yaml changes

The `MD` entry in `data/resstock/state_configs.yaml` was updated to add a `utility_assignment` block that registers MD in `SUPPORTED_UTILITY_STATES` and passes configuration to the state module:

  • `module: data.resstock.utility.assign_utility_md` — the new state-specific module
  • `state_crs: 2248` — NAD83 / Maryland State Plane (feet), used for accurate area calculations during PUMA–polygon intersection
  • `puma_year: 2019` — 2010-definition Census PUMAs matching the vintage used in ResStock `res_2024_amy2018_2`
  • `electric_poly_filename` / `gas_poly_filename` — dated WKT CSV filenames written at first fetch

No `excluded_gas_utilities` are configured for MD; all HIFLD LDCs are eligible for assignment.

PUMA–utility overlap calculation

Assignment is PUMA-based: each building inherits the utility probability distribution of its 2010-definition Census PUMA (taken from the last 5 characters of `in.puma` in the ResStock metadata).

`calculate_puma_utility_overlap()` in `utils.py` performs a spatial intersection between the 44 MD Census PUMAs and each utility's service territory polygon. For each PUMA × utility pair it records the area of intersection in the state-plane CRS. This intersection is then divided by the PUMA's total intersected area to produce a fractional overlap weight — the share of the PUMA's covered area that falls within that utility's territory.

`calculate_utility_probabilities()` normalises these weights row-by-row so each PUMA's utility probabilities sum to 1, producing a wide probability table (one row per PUMA, one column per utility).

Per-building utility sampling

`sample_utility_per_building()` joins each building to its PUMA's probability row and draws one utility via `np.random.choice` with those probabilities (fixed seed 42 for reproducibility).

  • Electric: every building is sampled regardless of heating fuel.
  • Gas: only buildings with `has_natgas_connection = True` are sampled; all others receive `null` in `sb.gas_utility`.

The output is written to `metadata_utility/state=MD/utility_assignment.parquet` — a slim file containing only `bldg_id`, `sb.electric_utility`, and `sb.gas_utility` — which is uploaded to `s3://data.sb/nrel/resstock/res_2024_amy2018_2_sb/metadata_utility/state=MD/utility_assignment.parquet`.

HIFLD coverage gaps

HIFLD utility boundaries do not cover the full land area of Maryland: 10 of 44 PUMAs have no electric coverage and a partly overlapping set of 10 of 44 PUMAs have no gas coverage. Without a fix, approximately 25.8% of MD buildings (2,575 of 9,996) would be left with no utility assigned. Analysis showed that:

  • 100% of unassigned buildings have measurable electricity use (as expected — every ResStock building has electrical load).
  • 59% of unassigned buildings have `has_natgas_connection = True`, meaning the gaps are not solely rural/no-gas areas but also urban/suburban Baltimore-area PUMAs where BGE or SMECO should cover them but HIFLD polygons are missing.

The gaps arise because HIFLD data is self-reported by utilities — there is no federal requirement for every provider (especially municipal utilities and co-ops) to file precise GIS polygons — and because the portal was deactivated in August 2025 with the 2024 snapshot as the final version.

Nearest-neighbor fill for uncovered PUMAs

`fill_missing_puma_probabilities()` (new generic function in `utils.py`) resolves coverage gaps before sampling:

  1. For each PUMA absent from the probability table, find all covered PUMAs whose geometry touches (shares a boundary segment with) the uncovered PUMA.
  2. Among touching covered PUMAs, pick the one whose centroid is nearest to the uncovered PUMA's centroid.
  3. If no touching covered PUMA exists, fall back to the globally nearest covered PUMA by centroid distance.
  4. Copy the donor PUMA's full probability distribution to the uncovered PUMA.

This function is state-generic and opt-in. `assign_utility_md.py` enables it by passing `fill_missing_pumas=True` to `create_hh_utilities()`. After the fill, all 9,996 MD buildings receive an electric utility and all 5,231 natgas-connected buildings receive a gas utility.

Reviewer focus

  • The nearest-neighbor fill logic in `fill_missing_puma_probabilities()` — specifically the adjacency-first → centroid-fallback donor selection and whether copying the donor's distribution wholesale (rather than e.g. interpolating) is the right approach for uncovered PUMAs.
  • The decision to use HIFLD names verbatim in MD (no name crosswalk) rather than mapping to Switchbox-standardised names as NY does.

alexhyunminlee and others added 6 commits May 22, 2026 22:05
Resolve conflict in context/README.md by taking main's updated description
for ny_utility_assignment_resstock.md and dropping the now-merged
ct_utility_gis_data_sources.md entry (removed in the main refactor).

Co-authored-by: Cursor <cursoragent@cursor.com>
@alexhyunminlee alexhyunminlee linked an issue Jun 5, 2026 that may be closed by this pull request
@alexhyunminlee alexhyunminlee requested a review from alxsmith June 8, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fetch MD utility shapefiles

2 participants