Mosaic — docs

Loading a shapefile

Where to get a shapefile, what Mosaic needs it to look like (required columns, geometry type, CRS), and what to do when Mosaic refuses to load yours: see the dedicated shapefile guide.

Run Config reference

Mosaic groups its knobs into four popups in the Run Config panel.

ReCom

Number of Districts: How many districts to draw. Set to the chamber size (e.g. 14 for NC's US House delegation, larger for state legislatures). Mosaic will refuse to start if the precinct count is too small to support the requested number of districts.
Max Iterations: Hard cap on the optimisation budget. The annealing schedule is sized against this number, so changing it changes the cooling trajectory, not just the stopping point.
n=3 ReCom Mix: Probability per step of attempting a three-way recombination (merge 3 adjacent districts and re-split into 3) instead of the default two-way step. Three-way moves are slower but escape local minima better; 0.0 disables it.

Population

Population Tolerance: Per-district deviation from the ideal population that Mosaic will accept as a valid plan. Any candidate that violates this is rejected at the recombination step (default 2.5%). Tighter is harder to satisfy and slower to converge.
Safe Harbor: The no-penalty band for the Population Deviation score (default 0.25%). Districts inside it contribute 0 to that metric; districts outside contribute (excess²), summed as mean + max so a single outlier cannot be averaged away. Clamped at run-start to never exceed Population Tolerance.

Annealing

GUIDED cooling (recommended): Temperature drops fastest during an initial guide window (a fraction of max iterations) and tapers off afterward. The schedule self-tunes to acceptance rate, so the same settings work across very different shapefiles and weight mixes.
STATIC cooling: Temperature decays on a fixed geometric schedule. Useful when reproducing a published methodology or when you want fine control over the cooling curve. Requires more manual tuning to match GUIDED's quality.
Initial Temperature factor & mode: PROPORTIONAL (default) multiplies the factor by the initial plan's score: starting temp = factor × initial_score (default factor 0.2). This is the right mode when you don't know what magnitudes to expect from your weight mix — the temperature scales with whatever the score actually is.
NOMINAL uses the factor as an absolute temperature. Use this when you have a calibrated target temperature in mind, e.g. when reproducing a specific run.
GUIDED — Guide Fraction & Target Temperature: The schedule is sized so that temperature reaches Target Temperature at Guide Fraction × max_iterations. Defaults: 0.9 and 1.0. Lowering Guide Fraction means faster cooling (more aggressive); lowering Target Temperature means a colder endpoint (less random late in the run).
STATIC — Cooling Rate: Per-iteration temperature multiplier (default 0.9995, meaning temp drops 0.05% per step). The total cooling over a run is cooling_rate^max_iterations; e.g. 0.9995 over 10,000 iterations leaves temperature at about 0.7% of initial.
Launch Watch: When enabled (default on, at iteration 250), Mosaic re-anchors the temperature to the current score after the first 250 iterations. Useful when score collapses early (common with County Splits and County-Edge Bias enabled) and the initial-anchored temperature would otherwise become too hot to be useful by the time the score has stabilized.
n=3 ReCom Mix slider: Frozen at run start (see ReCom). Listed here too because the Annealing popup is where it lives in the GUI.

Seed

Random Seed: Sets the RNG state at the start of the run. 0 (default) leaves the RNG random, so every run is different. Any non-zero integer makes a run reproducible on the same machine, with the same Mosaic version, with the same shapefile. Cross-machine or cross-version runs may diverge slightly due to floating-point ordering inside numpy / igraph; the seed buys you reproducibility, not bit-exactness across environments.

County-Edge Bias

County-Edge Bias (enable + Multiplier slider): Makes ReCom less likely to propose cut edges that cross county lines when generating new district boundaries. Higher multiplier = stronger preference for cuts along county borders. Distinct from the County Splits score: County Splits penalizes splits in the score function (post hoc), County-Edge Bias steers the proposal distribution (per step). The two work well together — County-Edge Bias keeps splits rare in proposals; County Splits cleans up the ones that do happen.

Scoring weights

Each enabled metric is multiplied by its weight and summed into PlanScore.total, which the optimizer minimizes. Weight 0 disables the metric; the computation is skipped, reducing per-iteration cost. Per-metric contribution formulas are in the Metrics glossary.

Partisan calibration

Set in the Partisanship popup. Applied when any of Efficiency Gap (Robust), Expected Seats, Competitiveness, Chance of Majority, or Hinge is enabled.

election_win_prob_at_55 — default 0.9: Calibration anchor: P(D wins | D share = 0.55). Inverted to σ_district = 0.05 / Φ⁻¹(p). Higher values yield sharper per-district win probabilities.
election_swing_sigma — default 0.03 (3pp): Standard deviation of a shared partisan-environment swing added to every district before computing win probability. Higher values widen the partisan metric distributions.

Combined as σ_combined = √(σ_swing² + σ_district²). For point-estimate behavior, set election_swing_sigma to its minimum and disable Robust EG (Partisanship popup → Efficiency Gap mode → Static).

Run controls

Four buttons in the main panel drive a run.

Start: Begins (or resumes) optimisation with the current Run Config and score weights. Most settings are frozen at run start — including the n=3 mix probability and the annealing schedule — so changes to the popups during a run don't take effect until you Reset.
Pause: Stops the algorithm thread at the next clean iteration boundary. Charts and the map stay where they were. Click Start again to resume from the same iteration.
Revert to Best: Restores the current assignment to the best-scoring plan seen so far in this run, without resetting the iteration counter or temperature. Useful when the algorithm has wandered into a worse region and you want to continue annealing from a known-good point. The score history charts keep their full trace.
Reset: Discards the current run entirely. Clears the map, the iteration counter, the score history, and the annealing state. The loaded shapefile and column choices are kept. Use this when you want to try a fresh run with new weights or a new seed.

Metrics glossary

Each metric appears on a score row in the GUI. Entries below give a definition followed by a meta card listing the unit, optimization direction, contribution to PlanScore.total, and any tunable parameters with their GUI location. Metrics are internally rescaled to comparable magnitudes; weights and formulas appear in the meta card.

Five metrics (Efficiency Gap in Robust mode, Expected Seats, Competitiveness, Chance of Majority, Hinge) model each district's Dem share as a Gaussian variable. Two parameters set the spread and are shared across these metrics: election_win_prob_at_55 (anchors per-district noise via σ_district = 0.05 / Φ⁻¹(p)) and election_swing_sigma (standard deviation of a shared partisan-environment draw). They combine as σ_combined = √(σ_swing² + σ_district²). Both are set in the Partisanship popup; see Run Config > Partisan calibration for details.

Cut Edges

Number of precinct-pair adjacencies whose endpoints are in different districts.

Unit	integer count
Direction	lower is better
Contribution	`cut_edges` (no internal rescale)

County Splits

Penalty on splitting counties across districts. Two terms: excess splits beyond the minimum forced by county population, plus the shortfall in clean districts (districts whose precincts all lie within one county).

Unit	weighted score
Direction	lower is better
Contribution	`10 × (excess_splits + (max_clean − clean_districts))`
Tunables	none directly. `max_clean` depends on the run-config Population Tolerance (Population popup → Population Tolerance).

Population Deviation

Two-component imbalance penalty. Districts within the safe-harbor band contribute zero. Beyond it, the metric sums mean squared excess and max squared excess, so a single outlier cannot be masked by balanced averages.

Unit	scaled sum of squares
Direction	lower is better; zero indicates all districts inside the safe-harbor
Contribution	`(mean(excess²) + max(excess²)) × 50,000`, where `excess = max(0, \|dev_frac\| − safe_harbor)`
Tunables	Population popup → Safe Harbor slider (`pop_deviation_safe_harbor`, default 0.25%). Clamped at run start to not exceed Population Tolerance.

Polsby-Popper compactness

Compactness penalty. Per-district PP = 4π·area / perimeter² ranges from 0 (elongated) to 1 (circle). The score chart displays mean(PP) directly; the optimizer minimizes (1 − mean(PP)) × 100.

Unit	contribution [0, 100]; chart shows mean PP in [0, 1]
Direction	lower contribution is better
Contribution	`(1 − mean(PP)) × 100`

Reock compactness

Second compactness penalty, complementary to Polsby-Popper. Per-district Reock = area / area(minimum bounding circle) ranges from 0 (elongated) to 1 (circular). Polsby-Popper measures boundary smoothness; Reock measures how round the district is overall. A district can be smooth-edged but stretched thin (high PP, low Reock), or jagged but compact in extent (low PP, high Reock). Optimizing both together usually catches what either misses alone.

Mosaic uses a 16-direction approximation rather than the textbook minimum bounding circle, which is too slow to evaluate on every iteration. The approximation caches each precinct's 16 directional extreme vertices once at load, then computes the bounding diameter from those candidate points. Agreement with textbook Reock is within ~0.8 score points on a 0–100 penalty scale. The optimizer treats the approximation as a canonical deterministic score, identical to PP or any other component.

Unit	contribution [0, 100]; chart shows mean Reock in [0, 1]
Direction	lower contribution is better
Contribution	`(1 − mean(Reock)) × 100`

Mean-Median

mean(district Dem share) − median(district Dem share). Non-zero values indicate the mean and median districts disagree on partisan balance. Sign: negative = D advantage, positive = R advantage.

Unit	signed share difference (display); penalty in squared percentage points
Direction	distance from target
Contribution	`((raw − target) × 100)²`
Tunables	Score panel → Mean-Median row → Target MM slider (`target_mean_median`, default 0.000; negative aims at a D-favoring plan, positive at R)

Efficiency Gap

(wasted Dem votes − wasted Rep votes) / total votes. Wasted votes are losing-side votes plus winning-side votes above 50%. Robust mode integrates the gap over a Gaussian swing in closed form; Static mode evaluates at the input shares only. Sign: negative = D bias, positive = R bias.

Unit	signed share (display); penalty in squared percentage points
Direction	distance from target
Contribution	`((raw − target) × 100)²`
Tunables	Score panel → Efficiency Gap row → Target EG slider (`target_efficiency_gap`, default 0.000; negative aims D, positive R) Partisanship popup → Efficiency Gap mode radio (`use_robust_eg`, default Robust) Shared partisan calibration when Robust is selected

Competitiveness

Mean non-competitiveness across districts. A district contributes 0 at a 50% win probability and 1 at certainty.

Unit	raw in [0, 1] (display); contribution in [0, 100]
Direction	lower is better
Contribution	`raw × 100`
Tunables	shared partisan calibration

Expected Seats

Sum of per-district P(D wins) under the swing model. Penalized by squared distance from target, rescaled by 100 to align with the other partisan metrics.

Unit	real-valued seat count (display)
Direction	distance from target
Contribution	`(raw − target)² × 100`
Tunables	Score panel → Expected Dem Seats row → Target S slider (`target_dem_seats`, default 7) Shared partisan calibration

Chance of Majority

Probability that the selected party wins at least ⌈n/2⌉ districts, integrated over the partisan swing via Gauss-Hermite quadrature. D-majority and R-majority weights are independent.

Unit	probability [0, 1] (display)
Direction	weighted party's probability pushed toward 1
Contribution	`(1 − p)^1.5 × 100`
Tunables	Score panel → Chance of Majority row → D / R checkboxes (selects which party's probability is scored) Shared partisan calibration Threshold is fixed at `⌈n/2⌉`; use Hinge for custom thresholds.

Hinge

Probability that the selected party wins at least hinge_threshold seats, under the swing model. Penalty curve matches Chance of Majority — in fact Chance of Majority is exactly Hinge with hinge_threshold = ⌈n/2⌉; Hinge is the generalisation that lets you pick a different seat threshold (supermajority, blocking minority, etc.).

Unit	probability [0, 1] (display)
Direction	probability pushed toward 1
Contribution	`(1 − p)^1.5 × 100`
Tunables	Score panel → Supermajority/Hinge row → Threshold slider (`hinge_threshold`, default 1; clipped to `[1, n_districts]`) Score panel → Supermajority/Hinge row → D / R checkboxes (`hinge_dem`) Shared partisan calibration

Map overlays

Checkboxes under the live map switch what the map shows. Only one of the colouring modes can be active at a time; the others (county lines, labels) stack on top.

County: Draws grey lines along county boundaries on top of the current colouring. Useful for eyeballing how often district boundaries follow county lines.
Splits: Dims counties that lie entirely inside a single district, leaving split counties at full saturation. Quick visual diagnostic for the County Splits score: more colour = more splits.
Precinct Results: Colours each precinct by its Dem vs Rep share (Classic Mosaic red ↔ blue palette). Independent of district boundaries; shows the partisan terrain Mosaic is working over. Requires election columns at load.
District Results: Colours each district by its aggregate Dem share. The map view of the same data the Mean-Median, Efficiency Gap, and Competitiveness metrics summarise.
Compactness: Colours each district by its Polsby-Popper score (round = greener, elongated = redder). Spot the ugly district at a glance.
Population Deviation: Colours each district by signed deviation from ideal population. Underpopulated districts in one direction, overpopulated in the other.
Labels: Draws the district number at each district's interior. Number matches the labels in the District Info panel and the exported CSVs.
Precincts: Draws faint white precinct boundaries on top of the district colouring. Hairline at on-screen resolution; cleans up at high-DPI exports. Useful for previewing how district lines fall relative to precinct lines.

Reading the charts

The left side of the GUI shows live charts you can toggle from the Panels menu. The most useful ones during a run:

Score History: Total PlanScore.total over iterations (lower is better). Trends down as annealing cools and the search becomes pickier.
Per-metric panels: One chart per enabled metric, plotted in its native units. Metrics with targets (Mean-Median, Efficiency Gap, Expected Seats) draw a horizontal target line for reference.
Temperature: The annealing temperature over time. In GUIDED mode the curve is non-linear by design; in STATIC mode it's a clean geometric decay.
Entropy / Acceptance: Fraction of recent proposals that were accepted. Falls as the temperature drops; useful diagnostic for whether the schedule is too aggressive (drops to near-zero too early) or too gentle (stays high throughout).

Most charts will downsample the displayed buffer once it exceeds about 10,000 points so the GUI stays responsive across long runs. The underlying data isn't lost — it just isn't all drawn.

Exporting results

Mosaic writes two CSVs from the toolbar buttons in the main panel. Both reflect the best plan seen during the run (lowest PlanScore.total), not the most recent one.

Save Assignments → output/assignments_YYYYMMDD_HHMMSS.csv: One row per precinct: the precinct's ID (from whichever column you picked at load) and the district number. District numbers match the labels shown on the live map — same colours, same labels, no surprises.
Save Metrics → output/metrics_YYYYMMDD_HHMMSS.csv: One row per district: population, deviation from ideal, optional vote totals and Dem/Rep percentages, optional Polsby-Popper compactness. Same district numbering as the assignments CSV.

Both files load directly into Dave's Redistricting App via its Color Map from File feature, or join back to your shapefile in QGIS / ArcGIS using the precinct ID column.

Map image (PNG)

Two icons sit at the right end of the map-overlay row, just under the District Map panel.

Camera icon → output/map_YYYYMMDD_HHMMSS.png: Quick PNG of the current map. Re-rasterized at native resolution and cropped to the state's bounding box (so vertical states like Mississippi save tall, not letterboxed). Preserves whichever overlays are currently checked.
"..." (Photo Menu) icon: Opens an export dialog for high-resolution output: optional title bar, DPI selection (96 / 144 / 192 / 288 / 384 / 576 — 288 is the default), and a "Made with Mosaic" caption strip across the bottom. The map is re-rasterized at the chosen resolution (not upscaled), border and label thicknesses scale at about half rate so high-DPI exports don't over-emphasize the lines, and a black state-edge outline is added so the boundary reads cleanly when the image is shared.

FAQ & troubleshooting

Mosaic crashed during a run. Where's the log?: Top-level crashes are written to crashes/YYYYMMDD-HHMMSS.log next to where you launched from. The file includes the traceback, Mosaic version, platform, and any context the runner had (current iteration, shapefile path). Attach it when filing an issue.
Mosaic won't load my shapefile.: See the shapefile guide — in particular the "If Mosaic refuses to load" section, which maps each error message to the fix in the source data.
The launcher fails on first run.: See the install page troubleshooting section — SmartScreen on Windows, Gatekeeper on Mac, and a few common uv install failure modes.
Why is the first iteration slow?: Numba JIT-compiles the hot ReCom path on the first ReCom step (several seconds). Subsequent steps run at hundreds per second. The JIT cache persists across runs, so this only hurts once.
Can I run multiple chains in parallel?: Not yet from inside Mosaic. As a workaround, run multiple Mosaic instances in separate folders with different seeds.
Mosaic feels slow / the GUI is laggy.: Two knobs in the View menu help. Limit plots trims the score-history buffer to keep chart rendering fast on long runs (the underlying score data isn't lost, just downsampled for display). Map render interval sets how often the map redraws during a run; raise it from the default to give the algorithm more CPU. The algorithm itself runs at the same speed either way; both knobs only affect rendering.

About & methodology

Mosaic is a Python port of Mosaic for R, extended with a live Dear PyGui interface and a simulated-annealing layer on top of the underlying ReCom step.

The ReCom (recombination) algorithm itself is derived from the one developed by the MGGG Redistricting Lab. At each step, Mosaic picks an edge that crosses two districts, merges those districts, draws a random spanning tree on the merged region, and cuts the tree to produce two new districts of balanced population. The simulated-annealing wrapper accepts or rejects each proposed step based on a weighted plan score plus a temperature schedule.

Relationship to canonical ReCom, MergeSplit, and GerryChain. Mosaic borrows the recombination proposal from ReCom but accepts each proposal under simulated annealing keyed to a weighted plan score, rather than under a Metropolis-Hastings rule designed to produce a known stationary distribution over valid partitions. That distinction matters: canonical ensemble samplers — ReCom in GerryChain and the corrected MergeSplit variant in redist — are designed so that the distribution of accepted plans supports statistical inference (for example, "this enacted plan is an outlier compared to the distribution of valid plans"). Mosaic, with the simulated-annealing layer biasing the chain toward score improvements as it cools, instead produces an optimized plan under the user's weighted objectives. Mosaic runs should not be cited for ensemble-style outlier claims; use redist or GerryChain for that.

Some features from the R version are implemented differently here, and some are still in progress.

License & acknowledgments

Mosaic is released under the MIT License.

The ReCom algorithm is derived from work by the MGGG Redistricting Lab. The GUI is built on Dear PyGui; graph operations use igraph and NetworkX; hot loops are accelerated with Numba.

Documentation