Mosaic

Shapefiles for Mosaic

Where to find one, what Mosaic needs it to look like, and what to do when it complains.

Where do I get a shapefile?

Mosaic does not include a built-in data fetcher; you bring your own shapefile. These are the cleanest sources for redistricting work:

Redistricting Data Hub →

The most user-friendly start. Pre-cleaned precinct shapefiles with population and election totals, organised by state and election year. Free with registration.

VEST (Harvard Dataverse) →

The Voting and Election Science Team's precinct shapefiles with election results joined in. Strong for partisan-scoring runs. Format and column naming are very consistent across states.

MGGG-states →

Academic-grade shapefiles curated by the Metric Geometry and Gerrymandering Group. Each state repo includes documentation of columns and known caveats. Designed for redistricting research.

Census TIGER/Line →

The official boundary files for counties, congressional districts, tracts, and block groups. No population or vote data attached — you would need to join those yourself.

You can also grab a shapefile from your state's GIS portal. Quality varies wildly; the cleaned sources above are usually less work.

Bonus Mosaic-ready state shapefiles

Forty-three US states pre-joined and ready to load directly into Mosaic: TIGER/Line VTD geometries, 2020 population + race/age VAP breakdowns, and 2016 / 2020 / 2024 presidential vote totals. Each zip extracts to a single .shp set; pick it in Mosaic's load dialog and the auto-detect should pre-fill the population and county columns for you.

Available states: Alabama, Arizona, Arkansas, California, Colorado, Connecticut, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, Tennessee, Texas, Utah, Virginia, Washington, West Virginia, Wisconsin (1.4–55.5 MB each).

Three caveats are documented per-state on the repo: eight states (AR, CT, ME, MI, NJ, OK, OR, PA) use county-swing-estimated 2024 results (DT_EST_24 / KH_EST_24) because precinct-level 2024 data was unavailable; three states (CA, NY, RI) have a small number of offshore precincts removed for graph connectivity; Florida is accurate in Mosaic but will not round-trip cleanly into Dave's Redistricting. See the per-state notes and DATA_LICENSE.md for the schema and modeling choices.

Browse and download on GitHub →

Click any _DRA_Mosaic.zip file on the repo, then the download icon to save it.

License: these files inherit CC BY-SA 4.0 from Dave's Redistricting, with DRA's explicit no-sale restriction. Distinct from Mosaic itself, which is MIT. Full attribution and schema notes: DATA_LICENSE.md.

Just want to try Mosaic without picking a state? The bundled North Carolina sample (shapefiles/North_Carolina_Simplified.shp in the extracted folder) is ready to load with no setup.

What Mosaic needs the shapefile to look like

Geometry

Polygons (or multipolygons), one per precinct. Mosaic will refuse to load a shapefile of points or lines, or one where some rows have null geometry.

Required columns

Optional columns

Coordinate system (CRS)

Mosaic works with any CRS the file declares, but Polsby-Popper compactness is computed from polygon area — so a geographic CRS (degrees of latitude / longitude) gives misleadingly low scores. For honest compactness, project your shapefile to an equal-area CRS first. For the continental US, EPSG:5070 (USA Contiguous Albers Equal Area) is a safe default.

The Redistricting Data Hub, VEST, and MGGG sources above already ship in projected CRSs.

Worked example: loading the bundled sample

  1. Open Mosaic.
  2. Click Load shapefile and pick shapefiles/North_Carolina_Simplified.shp from the folder you extracted.
  3. The column-picker dialog opens. Mosaic auto-fills Population with POP, Precinct ID with GEOID20, and County with CTY. Confirm.
  4. Set district count to 14 (NC's congressional total) and run.

About the cache folder

The first time you load a shapefile, Mosaic builds an adjacency graph and saves a pickled copy under cache/<shapefile-stem>.pkl so subsequent loads of the same file skip the build step (typically saving 5–30 seconds, longer for big files).

The cache is keyed by the shapefile's filename and a content hash of its .shp / .dbf bytes, so editing the shapefile invalidates the cache automatically. You can delete the cache/ folder any time; it rebuilds on next load.

If Mosaic refuses to load your shapefile

Mosaic does a few strict checks at load time. If any fail, it tells you why in the status bar (or inline in the column-picker dialog) and refuses to proceed.

"N of M rows have null or empty geometry"

Some rows have no polygon. Remove them in QGIS or your GIS tool and reload. Mosaic cannot build adjacency for rows with no geometry.

"N of M rows are not Polygon / MultiPolygon"

The shapefile contains points or lines. Mosaic needs polygons. You probably grabbed the wrong file — for example, a precinct *centroids* file instead of a precinct *boundaries* file.

"Population column has N null / NaN value(s)"

Population must be present on every row. Clean the data in your GIS tool (set blanks to 0 if you really want, but understand 0-pop districts get treated as empty) and reload.

"Adjacency graph has N disconnected components"

Your shapefile contains islands or exclaves — physically separate regions that don't touch the mainland. Common in coastal states (HI, AK, Outer Banks, barrier islands). ReCom requires a connected graph, so for now the only fix is to remove those features from the shapefile manually and reload. The error message lists the row indices to remove.

Still stuck? Open an issue on GitHub with a description of the data you tried to load and the error message you got.