Shapefiles for Mosaic
Where to find one, what Mosaic needs it to look like, and what to do when it complains.
Where do I get a shapefile?
Mosaic does not include a built-in data fetcher; you bring your own shapefile. These are the cleanest sources for redistricting work:
Redistricting Data Hub →
The most user-friendly start. Pre-cleaned precinct shapefiles with population and election totals, organised by state and election year. Free with registration.
VEST (Harvard Dataverse) →
The Voting and Election Science Team's precinct shapefiles with election results joined in. Strong for partisan-scoring runs. Format and column naming are very consistent across states.
MGGG-states →
Academic-grade shapefiles curated by the Metric Geometry and Gerrymandering Group. Each state repo includes documentation of columns and known caveats. Designed for redistricting research.
Census TIGER/Line →
The official boundary files for counties, congressional districts, tracts, and block groups. No population or vote data attached — you would need to join those yourself.
You can also grab a shapefile from your state's GIS portal. Quality varies wildly; the cleaned sources above are usually less work.
Bonus Mosaic-ready state shapefiles
Forty-three US states pre-joined and ready to load directly into Mosaic:
TIGER/Line VTD geometries, 2020 population + race/age VAP breakdowns, and
2016 / 2020 / 2024 presidential vote totals. Each zip extracts to a single
.shp set; pick it in Mosaic's load dialog and the auto-detect
should pre-fill the population and county columns for you.
Available states: Alabama, Arizona, Arkansas, California, Colorado, Connecticut, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, Tennessee, Texas, Utah, Virginia, Washington, West Virginia, Wisconsin (1.4–55.5 MB each).
Three caveats are documented per-state on the repo: eight states
(AR, CT, ME, MI, NJ, OK, OR, PA) use county-swing-estimated 2024
results (DT_EST_24 / KH_EST_24) because
precinct-level 2024 data was unavailable; three states (CA, NY, RI)
have a small number of offshore precincts removed for graph
connectivity; Florida is accurate in Mosaic but will not round-trip
cleanly into Dave's Redistricting. See the
per-state notes
and
DATA_LICENSE.md
for the schema and modeling choices.
Browse and download on GitHub →
Click any _DRA_Mosaic.zip file on the repo, then the
download icon to save it.
Just want to try Mosaic without picking a state? The bundled North Carolina sample
(shapefiles/North_Carolina_Simplified.shp in the extracted
folder) is ready to load with no setup.
What Mosaic needs the shapefile to look like
Geometry
Polygons (or multipolygons), one per precinct. Mosaic will refuse to load a shapefile of points or lines, or one where some rows have null geometry.
Required columns
-
Population — numeric, no missing values, no negatives,
positive total. Mosaic auto-detects common names:
POP,TOTPOP,population,POP100, and the censusP0010001. If yours uses something else you can pick it in the column-picker dialog after load. -
Precinct ID — any column that uniquely identifies
each row, used when Mosaic writes the assignments CSV. Common names:
GEOID,GEOID20,VTDID.
Optional columns
-
County — required only if you want to score
county splits, county-edge bias, or use the county overlay. Common names:
CTY,COUNTY,COUNTYFP. - Vote totals — one pair (Democratic, Republican) per election. Required for partisan scores (mean-median, efficiency gap, expected seats, etc.). Mosaic supports one election per run.
Coordinate system (CRS)
Mosaic works with any CRS the file declares, but Polsby-Popper compactness
is computed from polygon area — so a geographic CRS (degrees of
latitude / longitude) gives misleadingly low scores. For honest
compactness, project your shapefile to an equal-area CRS first. For the
continental US, EPSG:5070 (USA Contiguous Albers Equal Area)
is a safe default.
The Redistricting Data Hub, VEST, and MGGG sources above already ship in projected CRSs.
Worked example: loading the bundled sample
- Open Mosaic.
- Click Load shapefile and pick
shapefiles/North_Carolina_Simplified.shpfrom the folder you extracted. -
The column-picker dialog opens. Mosaic auto-fills Population
with
POP, Precinct ID withGEOID20, and County withCTY. Confirm. - Set district count to 14 (NC's congressional total) and run.
About the cache folder
The first time you load a shapefile, Mosaic builds an adjacency graph
and saves a pickled copy under cache/<shapefile-stem>.pkl
so subsequent loads of the same file skip the build step (typically
saving 5–30 seconds, longer for big files).
The cache is keyed by the shapefile's filename and a content hash of its
.shp / .dbf bytes, so editing the shapefile
invalidates the cache automatically. You can delete the cache/
folder any time; it rebuilds on next load.
If Mosaic refuses to load your shapefile
Mosaic does a few strict checks at load time. If any fail, it tells you why in the status bar (or inline in the column-picker dialog) and refuses to proceed.
"N of M rows have null or empty geometry"
Some rows have no polygon. Remove them in QGIS or your GIS tool and reload. Mosaic cannot build adjacency for rows with no geometry.
"N of M rows are not Polygon / MultiPolygon"
The shapefile contains points or lines. Mosaic needs polygons. You probably grabbed the wrong file — for example, a precinct *centroids* file instead of a precinct *boundaries* file.
"Population column has N null / NaN value(s)"
Population must be present on every row. Clean the data in your GIS tool (set blanks to 0 if you really want, but understand 0-pop districts get treated as empty) and reload.
"Adjacency graph has N disconnected components"
Your shapefile contains islands or exclaves — physically separate regions that don't touch the mainland. Common in coastal states (HI, AK, Outer Banks, barrier islands). ReCom requires a connected graph, so for now the only fix is to remove those features from the shapefile manually and reload. The error message lists the row indices to remove.
Still stuck? Open an issue on GitHub with a description of the data you tried to load and the error message you got.
Mosaic