Predicting Zip Code Level Utility Disconnection Rates

A Two-Part Logistic and Spatial Mixed-Effects Framework for Oregon and Washington · Urban Spatial Analytics, 2025–2026

Research poster: Predicting Zip Code Level Utility Disconnection Rates

Click to enlarge

1. Predicting Occurrence of Utility Disconnection (Logistic Regression)

A logistic regression model is used to predict whether a ZIP code experiences any utility disconnection in a given year. This formulation addresses the rarity and zero-inflation of disconnection events.

Key outputs:

  • Predicted probabilities of disconnection at the ZIP-year level
  • Classification of high-risk ZIP codes using an optimal probability threshold (~0.85)
  • Model performance metrics on held-out test data (2023), including:
    • Area Under Curve ≈ 0.95
    • Balanced accuracy ≈ 0.89
    • Sensitivity ≈ 87% and specificity ≈ 90%

Results show that ZIP codes with larger customer bases, lower median home values, higher shares of elderly female residents, and strong neighborhood spillover effects face substantially higher odds of experiencing a disconnection.

2. Predicted Magnitude of Disconnection Rates (Spatial Mixed-Effects Model)

Conditional on observing at least one disconnection, a spatial mixed-effects linear model is used to predict the log of annual disconnection rates. This approach explicitly accounts for spatial dependence and unobserved ZIP-level heterogeneity.

Key outputs:

  • ZIP-level predicted log disconnection rates
  • Fixed-effect estimates capturing socioeconomic, housing, demographic, and policy factors
  • Random effects and spatial lag terms capturing neighborhood spillovers with test-data conditional R² exceeding 0.65 and no remaining residual spatial autocorrelation after model estimation

Because the dependent variable is log-transformed, coefficients are interpretable as proportional changes in disconnection rates.

3. Spatial Pattern Identification and Clustering

Spatial diagnostics reveal significant geographic clustering of utility disconnections across ZIP codes. Global Moran's I confirms statistically significant spatial autocorrelation, justifying the spatial modeling framework.

Local Indicators of Spatial Association (LISA) identify High–High clusters of disconnection rates concentrated in:

  • Smaller cities and regional hubs
  • Suburban and rural-adjacent ZIP codes
  • Areas with vulnerable housing stock and lower homeownership rates

These outputs highlight that utility disconnections are not randomly distributed, but reflect structural and neighborhood-level processes of energy insecurity.

Year: 2025–2026

Category: Urban Spatial Analytics

Institution: University of Pennsylvania · Indiana University

GitHub Repository