HOME
May 13, 2022

New Maps, New Dem Strategy: Our PA Analysis

The 2022 Congressional elections will involve new district maps in all 50 states. Districts will be reshaped, created, and eliminated, which will change Dems’ odds in each race. How can donors figure out which races have the biggest “bang for buck” for flipping tenuous GOP seats or protecting vulnerable Dems?

To help answer that question, we’ve been building and refining a demographic model that predicts Democratic lean in each district based on its makeup in terms of race, sex, education, and population density. We then compare those results to an existing model based on historical data to help Dem donors identify races that we think deserve support for “offense” or “defense”.

In this post, we’re focusing on Pennsylvania. Here’s what we’ll cover:

  1. Dem-lean by district in PA: our demographic model vs. historical data
  2. Districts worthy of Dem donor support
  3. Coming next from Blue Ripple
  4. Coda #1: Demographics of the new PA districts
  5. Coda #2: Brief intro to our methods (for non-experts)

1. Dem-lean by district in PA: our demographic model vs. historical data

Our “demographic” model forecasts the potential Democratic lean of each new district in PA based on attributes like race, education, age, and population density. In the graph and table below, we compare our predictions to a “historical” model (from the excellent Dave’s Redistricting (DR) web-site) built up from precinct-level results in prior elections1. (See methods at the end of this post for more details.) The axes show the projected 2-party Dem vote share with each model. The diagonal line represents where districts would fall on this scatter-plot if the two models agreed precisely. In districts to the left of the line, our demographic model thinks the D vote share is higher than historical results, and to the right of the line, we think it’s lower than the historical model predicts2.

NB: For this and all scatter charts to follow, you can pan & zoom by dragging with the mouse or moving the scroll wheel. To reset the chart, hold shift and click with the mouse.

We generally focus our attention on districts that fall in the 45-55% range of Dem share in our demographic model and 47-53% in the historical model. That’s because we think a 3-point gap is one that either party could potentially close with some focused energy, resources, and strategic thinking. Our demographic model carries some additional uncertainties, so we expand the range a bit there. Our methodology for using our model and the historical data to classify districts is explained in this post.

With that in mind, here are a few observations on the new PA districts:

  • First, let’s dispense with the obvious ones. Several PA districts are clearly far outside the competitive range and don’t merit serious investment by Dems looking to maximize their impact. Both the demographic model and the historical model agree that PA-2, PA-3, PA-5, and PA-12 are safe D. Both models also agree that PA-9, PA-11, PA-13, PA-14, PA-15, and PA-16 are far out of reach for Dems.

  • We think the rest are competitive, though historically three of those (PA-4, PA-6, PA-17) are historically safe D.

  • We’re particularly interested/concerned with PA-8, which is historically competitive (D+1) but our model sees as R+9.

Here’s a different look at the data, in a table sorted by the Dem share in our demographic model.

Calculated Dem Vote Share, PA 2022: Demographic Model vs. Historical Model (DR)
StateDistrictDemographic Model (Blue Ripple)Historical Model (Dave's Redistricting)BR Stance
PA38192Safe D (No near-term D risk)
PA27275Safe D (No near-term D risk)
PA56866Safe D (No near-term D risk)
PA126063Safe D (No near-term D risk)
PA65256Becoming At-Risk (More Balanced than Advertised)
PA45260Becoming At-Risk (More Balanced than Advertised)
PA15153Toss-up (Down to the Wire)
PA175055Becoming At-Risk (More Balanced than Advertised)
PA75051Toss-up (Down to the Wire)
PA104748Toss-up (Down to the Wire)
PA84151Toss-up (Highly vulnerable for D)
PA114040Safe R (No near-term D hope)
PA163542Safe R (No near-term D hope)
PA143139Safe R (No near-term D hope)
PA93034Safe R (No near-term D hope)
PA133030Safe R (No near-term D hope)
PA152734Safe R (No near-term D hope)

2. Districts worthy of Dem donor support

Based on the results above, we think there are six good options for Dem donors in PA: PA-1, PA-6, PA-7, PA-8, PA-10 and PA-17. We think PA-4 is safe although we’re going to keep an eye on any polling there.

Our findings in PA-8 provide a good opportunity to discuss why BR’s demographic model and the historical model may differ. PA-8 looks like a D-leaning tossup given the voting patterns in the precincts within it, but our demographic model suggests it’s safe R (R+9). What might this mean? One way to answer this question is to consider the difference between the two models. Our demographic model asks: if the voting-age citizens of this district turned out and voted like similar people in other parts of the country,  what would we expect the outcome of this election to be? Whereas the historical model asks how we'd expect the election to turn out if the voters in this district turn out and vote as they have in previous elections. This points to a few possible reasons why a historically tossup district like PA-8 might look like a safe-R district in our model–including, but not limited to, the following:

  • Our model may be wrong about how we define "similar" voters. We've incorporated factors like education and race, but maybe we've missed key things that make voters in PA-8 different from superficially "similar" voters in other districts nationwide.

  • Location-specific factors may support Dem voting. E.g., perhaps the Democratic party or local organizations are particularly well-organized in PA-8 and that is reflected in the historical model using those voters.

  • Democrats may, in fact, have outperformed relative to their potential. Or there may have been demographic shifts in the district since the last election which favor Republicans.

We don't know which (if any) of these explanations is correct. But our model suggests that PA-8 is especially vulnerable.

3. Coming next from Blue Ripple

Here’s where we’re planning to take these analyses over the next few months:

  • We’re going to do the same type analysis in many (all?) of the states, in order to identify the best options for Dem donors in 2022 on both offense and defense nationwide. Here are our takes on Arizona, Michigan, North Carolina, and Texas.

  • We’re going to continue to refine and improve our demographic model–we’ll update this post and others as we do so. Feel free to contact us if you want more details on the mechanics, or if you’d like to propose changes or improvements.

  • As maps get solidified, we’ll set up ActBlue donation links for candidates (after the primaries) to make it easy for you to donate.

If you want to stay up-to-date, please sign up for our email updates! We’re also on Twitter, Facebook, and Github.

4. Coda #1: Demographics of new vs. old PA districts

One thing we haven’t seen discussed very much is how redistricting in PA has changed the demographics in each district. As a way of putting the demographic model results in context, let’s look at the underlying population two different ways:

  • The first chart below shows each of PA’s proposed 2022 districts, with the population broken down by race/ethnicity (Black, Hispanic, Asian, White-non-Hispanic and other) and education (college graduate and non-college graduate). Each bar also has a dot representing the (logarithmic) population density3 of the district. The scale for that dot is on the right-side axis of the chart. For reference, a log density of 5 represents about 150 people per square mile and a log density of 8 represents about 3000 people per square mile. We’ve ordered the districts by D-share based on our demographic model, which is helpful for understanding how the model responds to demographics and density.

  • In the second chart, we look at these demographics a different way, placing each PA district according to its proportion of college graduates and non-white citizens of voting age. We also indicate (logarithmic) population density via the size of the circle and modeled D-edge (D-share minus 50%) via color. This makes it easier to see that the model predicts larger D vote-share as the district becomes more educated, more non-white and more dense.

It’s hard to see anything specific from these charts, though we are continuing to examine them as we try to understand what might be happening in each specific district.

5. Coda #2: Brief intro to our methods (for non-experts)

This part of the post contains a general summary of the math behind what we’re doing here intended for non-experts. If you want even more technical details, check out the links at the end of this section, visit our Github page, or contact us directly.

Our model is demographic. We use turnout data from the 2020 CPS voter supplement (a self-reported survey); voting and turnout data from the 2020 CES (a validated survey); and election result data from the 2020 presidential, senate and house elections. The survey data from the CPS and CES is broken down by several demographic categories, including sex, education and race/ethnicity.

The election results are trickier to use in the model since we don’t have demographic information paired with with turnout or vote choice. What we do know is the overall demographics of the state or house district. So we use the election-data to assign a likelihood to the post-stratification of our parameters across the demographics of the relevant region (from the micro-data ACS).

Then we look at the demographics of a particular house or state-legislative district (using tract-level census data from the ACS), breaking it down into the same categories and then apply our model of turnout and voter preference to estimate the 2-party vote share we expect for a Democratic candidate.

This is in contrast to what we call the historical model: a standard way to predict “partisan lean” for any district, old or new: break it into precincts with known voting history (usually a combination of recent presidential, senate and governors races) and then aggregate those results to estimate expected results in the district.

The historical model is likely to be a pretty accurate “predictor” if you think the same people will vote the same way in subsequent elections, regardless of where the district lines lie. So why did we build a demographic model? Three reasons:

  1. We’re interested in places where the history may be misleading, either because of the specific story in a district or because changing politics or demographics may have altered the balance of likely voters.4

  2. Our demographic analysis is potentially more useful when the districts are new, since voting history may be less “sticky” there. For example, if I’m a Dem-leaning voter in a strong-D district, I might not have bothered voting much in the past because I figured my vote didn’t matter. But if I now live in a district that’s more competitive in the new map, I might be much more likely to turn out.

  3. We’re not as interested in predicting what will happen in each district, but what plausibly could happen in each district if Dems applied resources in the right way, or fail to when the Republicans do. The historical model is backward-looking, whereas our demographic model is forward-looking making them complementary when it comes to strategic thinking.

Two final points. First, when it comes to potential Dem share in each district, we’re continuing to improve and refine our demographic model. The Blue Ripple web-site contains more details on how it works and some prior results of applying a similar model to state legislative districts, something we will also do more of in the near future. Second, for the historical model comparator, we use data from the excellent “Dave’s Redistricting”, which is also the source of our maps for the new districts.

Want to read more from Blue Ripple? Visit our website, sign up for email updates, and follow us on Twitter and FaceBook. Folks interested in our data and modeling efforts should also check out our Github page.


  1. One important note about the numbers. Dave’s Redistricting gives estimates of Democratic candidate votes, Republican candidate votes and votes for other candidates. We’ve taken those numbers and computed 2-party vote share for the Democratic candidate, that is, D Votes/(D Votes + R Votes). That makes it comparable with the Demographic model which also produces 2-party vote share.↩︎

  2. We’ve also done this modeling for the old districts and compared that result to the actual 2020 election results. See here.↩︎

  3. We use logarithms here because density varies tremendously over districts, from tens to hundreds of thousands of people per square mile. We use population-weighting because the resulting average more closely expresses the density of where people actually live. For example, consider a district made up of a high-density city where 90% of the population live and then large but low-density exurbs where the other 10% live. Most people in that district live at high density and we want our density to reflect that even though the unweighted average density (people/district size) might be smaller.↩︎

  4. We’re also interested in voter empowerment strategies. In particular, questions about where and among whom, extra turnout might make a difference. The historical model is no help here since it does not attempt to figure out who is voting or who they are voting for in a demographically specific way.↩︎