Open Source · Apache-2.0

Policy Data Infrastructure

From raw public data to the stories that change how decisions are made.

DojoGenesis · Open Source · Madison, WI
Scroll
01 — This is what it produces

Five Mornings in Dane County


PDI generates narrative documents from tract-level data. This excerpt was produced by the pipeline for a real census tract in south Madison.

Household One

The Garcias

South Madison · Census Tract 14.01

Maria Garcia wakes at 5:40. Her shift at the food processing plant starts at 7:00. Her two children, ages 6 and 9, attend an elementary school 2.3 miles away. There is no before-school childcare in her census tract — it is designated a childcare desert, with zero licensed providers within walking distance.2 The nearest bus stop is a 12-minute walk, and during the AM peak, buses arrive fewer than 2 times per hour.3

$38K
Household Income
14
Missed School Days
31
Tracts with Zero Bus Stops
0
Licensed Childcare Providers

Maria earns $38,000. Her neighborhood’s median household income is $42,000, placing it in the lowest income bin in the Atlas.4 Schools serving neighborhoods below $45,000 show an average chronic absence rate of 27.8% — compared to 17.6% in neighborhoods above $100,000. The infrastructure is the difference.

Generated from Census ACS 5-Year, WI DPI attendance data, and Metro Transit GTFS. The names are invented. The numbers are not.
2 WI DPI Licensed Childcare Provider Map, 2024. 3 Metro Transit GTFS, AM peak frequency analysis, 125 tracts. 4 ACS 5-Year Estimates (2022); Atlas RQ 1.2 income threshold analysis, 51 schools.
02 — 70 Policies. 72 Counties. Real Data.

Evidence Cards


Each card maps a policy position to tract-level data — identifying which communities would benefit most, grounded in actual indicators, not composite scores.

Evidence Card Explorer

70 cards · 18 categories · 72 counties

Highest need:

03 — The methodology, explained

How the Infrastructure Thinks


Six methods. Each one grounded in peer-reviewed research. Each one solving a specific problem that simpler approaches get wrong.

Not All Data Is Equally Trustworthy

Census Bureau ACS General Handbook, Chapter 7 (2018)

The Census Bureau’s American Community Survey estimates come with a margin of error. At the tract level, a poverty rate of 12.4% might have a true rate anywhere from 4% to 20%. Acting on that number as if it were precise is acting on noise.

Example: Tract with CV = 0.40
12.4%
Reported Poverty
±8.1%
Margin of Error
0.40
CV (Low Reliability)

PDI flags this automatically. You see the confidence before you see the conclusion. CV < 0.15 = high, 0.15–0.30 = moderate, > 0.30 = low.

What Composite Scores Hide

Stiglitz-Sen-Fitoussi Commission (2009) · PMC (2022, 2023, 2025)

Most tools combine indicators into a single “vulnerability score.” The CDC Social Vulnerability Index predicts only 38.9% of COVID case variability (PMC, 2022). The Area Deprivation Index, when unstandardized, is 98.8% explained by just 2 of its 17 variables (Health Affairs Scholar, 2023). Environmental composite rankings shift by an average of 45 places across alternative specifications (PMC, 2023).

Tract 14.01 — Composite
87out of 100 · “Very High Risk”

Is it poverty? Housing? Transit? Health? The score does not say. A program officer cannot tell whether housing is the problem.

Tract 14.01 — Raw Indicators (illustrative profile)
42.1%
Chronic Absence
0
Bus Stops
48.2%
Cost Burdened
$38K
Income
23.1%
Uninsured
4.2%
Diabetes

Transit is the crisis. Housing burden is severe. Diabetes is low. A housing program and transit investment would reach this tract. A health clinic would not. The composite told you none of this.

Measuring Polarization Without Collapsing It

Krieger N, Waterman PD, Gryparis A, Coull BA. Health & Place, 2016

Race and income are too correlated in the US to “control for” one while studying the other. The ICE measures both simultaneously:

ICE = (Privileged − Deprived) ÷ Total
+1.0
Entirely privileged
0.0
Equal distribution
−1.0
Entirely deprived

Unlike a composite index, ICE measures a specific, named phenomenon: spatial polarization between privilege and deprivation. It does not average unrelated dimensions.

Letting the Data Name Its Own Dimensions

Kolak et al., JAMA Network Open, 2020 — 72K tracts, 71% variance explained

With 50 indicators, which ones belong together? Intuition says “poverty and education are both socioeconomic.” But they may not move together in every geography. EFA examines the correlations and finds clusters that actually co-occur.

Four factors emerged at 72,000 US tracts
Socioeconomic Advantage
Income, education, employment
Limited Mobility
Vehicle access, aging, disability
Urban Core Opportunity
Density, transit, amenities
Immigrant Cohesion
Language, foreign-born, multigenerational

None assumed in advance. An analyst who averaged poverty and vehicle access would be mixing independent dimensions. The average would mean nothing.

Where Disadvantage Concentrates

Anselin L. Local Indicators of Spatial Association. 1995

Drawing a line at the 80th percentile is arbitrary. Move it to the 75th and you add dozens of tracts. LISA identifies clusters by actual spatial pattern — which tracts are surrounded by tracts like them.

HH
Concentrated disadvantage
LL
Concentrated advantage
HL
Spatial outlier
NS
Not clustered

The HH cluster is where policy should focus — not because an analyst drew a line, but because the pattern is statistically significant.

Finding Where Outcomes Change

Segmented regression · Madison Equity Atlas, 2026

Policy needs thresholds: “at what poverty rate does chronic absence spike?” Segmented regression finds the breakpoint.

Finding: The $100K Threshold

Below $100K median household income, chronic absence rises sharply — a 10.4pp gap at the threshold. Above $100K, attendance stabilizes.

Chronic Absence Rate
Income →

This is directly actionable. A composite score cannot produce it because it has already averaged income with a dozen other variables.

04 — Data & Landscape

Twelve Sources. One Platform.


SourceCategoryStatusLevel
Census ACS 5-YearDemographicLiveTract
TIGER/LineGeographicLiveAll
CDC PLACESHealthLiveTract
EPA EJScreenEnvironmentLiveTract
USDA Food AccessFoodLiveTract
BLS LAUSEmploymentLiveCounty
WI DPIEducationLiveDistrict
HUD CHASHousingPlannedTract
Eviction LabHousingPlannedTract
HRSA HPSAHealthPlannedCounty
GTFSTransitPlannedRoute
NCESEducationPlannedSchool

Where PDI Fits


PlatformOpen SourceNationalAPINarrativeRaw-First
Census ReporterYesYesYesNoYes
COI 3.0DocsYesDownloadNoComposite
National Equity AtlasNoMetroNoNoDashboard
Opportunity InsightsCodeYesDownloadNoYes
PDIYesYesREST+SSEGo tmplYes
05 — Where this comes from

From 125 Tracts to 85,000


PDI began as the Madison Equity Atlas — a 22-layer GIS platform analyzing 125 census tracts in Dane County, Wisconsin. The Atlas produced Five Mornings in Madison, 70 evidence cards, a founding partnership proposal, and a field guide for decision makers.

The Atlas proved that when tract-level data is structured right, it produces stories that move people and evidence that informs campaigns. PDI generalizes that methodology to run for any county in America.

Go + Python + PostgreSQL/PostGIS · 12 REST endpoints + SSE streaming · Apache-2.0
GitHub → DojoGenesis/policy-data-infrastructure