As we've written about before, voter turnout varies widely among demographic groups. Younger voters and voters of color tend to vote for Democrats, but smaller fractions of those voters cast a vote in most elections. Similarly, white voters without a college degree, who are more likely to vote for Republicans, are less likely to vote than white voters with a college degree.
Unsurprisingly, group-level turnout varies by state: each state has a different mix of eligible voters, with different distributions of age and education, etc. We are interested in the impact of state-level differences (policy, organizing, local candidates or ballot initiatives) on turnout of VOC and WNHV, so we look at the turnout “gap”, the difference between VOC turnout and WNHV turnout—positive gaps indicate higher turnout among VOC, negative indicate higher turnout among WHNV1.
Below we chart the 2020 turnout gap in each state along with the average gap (about -9%, marked by a vertical orange line) to illustrate the range seen among the states (and DC). Since these turnout estimates are produced from a model2 we also include 90% confidence intervals for our estimates.
Across the states, gaps vary by 32 points, from a high of +4% in AL to -28% in WA. Can this gap be explained by demographic differences alone? The VOC in AL are mostly Black while the VOC in WA are mostly Hispanic. Black voters turned out at higher rates than Hispanic voters nationwide. Perhaps this explains the turnout gap between WA and PA? Using our model, we can estimate the gaps that would result if there were no state-specific effects3. We chart this (on the same scale) below.
Demographic differences account for some but not all of the variations in turnout gaps among states. That is, turnout gaps vary significantly by state even once we account for the demographic differences among them. For example, only 8 of the 32 point difference between the AL and WA turnout gaps is demographic in origin. The other 24 is state-specific, i.e., has something to do with AL and WA in particular.
Why might there be state-specific effects on turnout?
There’s been a great deal of focus on voter organizing and suppression and how that affected turnout in 2020, particularly in GA, where Dems prevailed, and FL and TX, where we did not. This prompts our interest in a data-informed view of state-specific turnout.
Modeling this is somewhat complicated! First we need a baseline view of state-level turnout across demographic groups. Our model uses data from the Current Population Survey Voter and Registration Supplement (CPS-VRS) and the American Community Survey (ACS), both produced by the US Census bureau and we'll explain a bit about those below. We're also going to talk briefly about MRP modeling (Click here for a good 3 minute explanatory video about MRP), anchoring this and further analyses based on similar models.
Three quick caveats before we dig in.
Let’s dive into the analysis. We’re going to start with two detailed sections about the underlying data and approach, describe our model, and then pose and answer two initial questions about the VOC/WHNV turnout gap.
Each year, the U.S. Census Bureau (USCB) conducts the “American Community Survey” which, for our purposes, is an update to the decennial census. Surveys are sent to ~3.5 million households (of about 125 million) so it’s not a complete count. There is also less geographic specificity to the reported results than the decennial census. The decennial census reports results down to the “block” level, whereas much of the ACS data is available only at the “Public Use Micro-data Area” (PUMA) level— each PUMA has about 100,000 people. Still, this is enough granularity for most work at the state or congressional-district level. We use 2018 ACS data rather than the 2010 decennial census because it provides a more up-to-date picture of the demographics of each state. (We’ll re-run this analysis once we have full 2020 census results or the 2020 ACS.)
In addition, each election year, the USCB produces the Voting and Registration Supplement to the Current Population Survey (CPS-VRS), asking approximately 100,000 people nationwide if they voted in the general election. Responses are paired with county of residence, demographic information (age, sex, race, ethnicity, education, etc.), allowing estimation of voter turnout among various groups and in various places.
We make one important tweak to the CPS-VRS data: The survey responses are “self-reported” and not independently validated, so there are reporting errors that tend to overestimate turnout, and in a way which differs systematically among states. To account for this, we adjust the turnout probabilities from the CPS-VRS so that when they are post-stratified across the voting eligible population (VEP) of each state, we get the correct total turnout. This was first suggested by Achen and Hur and we follow the procedure outlined by Ghitza and Gelman (p. 769), to compute the adjustment for each state, using the vote totals from United States Election Project and group populations from the ACS.
Crucially, CPS-VRS data seems to under-report the turnout gaps between white and non-white voters. So all of our results looking at race-specific turnout should be viewed skeptically: the gaps are likely larger than we see in the data, though it's unclear if this comes from over-estimating VOC turnout or under-estimating WNH turnout. There are other publicly available surveys which, when possible, validate survey responses via state voter files, for example the CCES. That survey is smaller: approximately 50,000 people surveyed each year, with about 40,000 validated voters. For this post, we stick to the CPS-VRS because it's bigger. When the data is available, we may repeat this analysis with the 2020 CCES survey.
The 100,000 people surveyed by the CPS-VRS are distributed throughout the country, so there will be a limited number of people in each state, particularly less populous ones. Once you start breaking those people down by demographic groups, the number of people per group gets quite small. For example, our model has binary groupings for age, sex and education and a 4-category grouping for race and ethnicity. Considering each of the 50 states plus DC, we have \(2\times 2 \times 2 \times 4 \times 51 = 1632\) groups. If people were distributed evenly among those groups, we might have 60 or so people in each. But people are not distributed equally among those groups! Some states are smaller and some may not have very many people in some of those categories. So how can we hope to understand any state and race effects in turnout?
That's where MRP comes in.
Though there might not be many people in any one Age/Sex/Education/Race/State group, each person surveyed has many things in common with many people in other groups. The MR part of MRP stands for Multi-level Regression, a modeling technique which allows “partial-pooling” of the data. This means that the estimation of the turnout probability in each group is built partially from the data in that group and partially from the data in the shared categories.
Consider a young, female, Black, college-graduate in MI. We could estimate her turnout probability using just young, female, Black, college-educated voters in MI. But there won't be that many potential voters like her in the CPS-VRS, which would make the estimate very uncertain. However, that potential voter presumably has much in common with other young, female, Black, college-educated voters in other states. So we could use all of them to estimate her chance of voting. Unfortunately, then we lose whatever information is specific about such voters in MI! MR models allow us to use both.
Models can be constructed to partially-pool along different groupings. The MR technique and tools we use (namely, Hamiltonian Monte Carlo via Stan) allow the data itself to determine how much partial-pooling leads to the best estimates.
Once we have estimates for every group in every state, we turn them into turnout numbers or probabilities via Post-stratification: multiplying the estimated probabilities by the actual number of people in each group, and adding these up to figure out how many people are likely to vote. Without post-stratification, we'd need to weight the CPS-VRS to match the population in each state and that creates thorny modeling issues all by itself4. Instead, we use CPS-VRS to estimate group-level probabilities and then post-stratify them using the actual populations in each state.
The Monte-Carlo modeling produces confidence intervals for the parameters, and the post-stratifications that use them. The fact that some groups are very small, making probabilistic inference difficult, will show up in our results as wide confidence intervals. Partial-pooling helps, but only so much.
Our basic model includes age (under 45 or 45-and-over), sex (female or male), education (non-college-graduate or college-graduate), race/ethnicity (Black-non-Hispanic, Hispanic, Asian/Other, and white-non-Hispanic) and state. We recognize that these categories are reductive. In the case of sex we are limited to categories provided by the CPS data. For age and education we've chosen to simplify the categories to keep the modeling simple. For race/ethnicity, we‘re using a slightly richer set of categories, since turnout varies widely among these groups.
We add a congressional-district-level population-density factor and interactions between education and VOC/WHNV, a term in the model that estimates the effect of being, e.g., white-non-Hispanic (WNH) and college-educated over and above the effects of being in either category separately. Crucially, we also include an interaction between state and VOC/WNHV, a term which estimates the state-dependent portion of the turnout gap.
We fit a binomial model, estimating the probability that voters in each subgroup will vote. The model uses partial-pooling in the national turnout by race, turnout probability in each state, and for the interaction between state and race, allowing the data to determine the best balance among these for estimating the turnout of a particular subgroup.
A more complex model might:
Because we are interested in local organizing and state-level voter suppression, we focus on the state-specific portion of the turnout gap, in particular how much the gap in each state differs from what we would expect based on the demographics (age, sex, education, race/ethnicity, local population density) of those voters. So we post-stratify—using the ACS data6—on VOC and WHNV separately in each state, with and without state/race interactions.
Now we’re in a position to answer two questions about each state’s VOC turnout:
(1) Observed vs. expected turnout gaps in 2020 by state: Below we look only at the state-specific turnout gap, taking the difference of the full turnout gap in a state (figure 1) and the demographic turnout gap (figure 2). We chart these gaps on the same scale as the previous two charts so comparing magnitudes is straightforward. CO and AL are at the top with 8 point better-than-expected turnout gaps while WA, something of an outlier, is at the bottom, with a 16 point worse-than-expected turnout gap.
The 90% confidence intervals are large—in very few states is the state-specific gap clearly non-zero. Let's zoom in on those, retaining only states where we are 90% (or more) confident that the state-specific component of VOC turnout is not zero. Here we’ll shrink the turnout scale (x-axis) a bit.
(2) In particular, in 2020, CO, AL, PA, GA, and MI have clearly positive state-specific turnout gaps and WA, IL, MA, NH, and FL have clearly negative state-specific turnout gaps.
One hypothesis we have is that the state-specific gaps reflect a push and pull between voter-suppression and organizing in each state. That might be reflected somewhat in this list: there is strong and consistent organizing in the swing-states, like PA and MI, and organizing in GA was obviously a big factor. There's a history of Black-voter-organizing in AL which might explain the positive state-specific effect. CO has had a smaller than expected turnout gap since 2014, since instituting universal vote-by-mail (UVBM). However, UVBM does not predict positive state-specific effects: WA, NV and OR also have UVBM and those state-specific gaps are small or negative7.
The negative state-specific effects are more confusing. FL has a long and sordid history of voter suppression, so maybe that explains why the state-specific effect is gap-widening. WA is a UVBM state and, according to the CPS-VRS, in the 2016 election, the turnout gap was about -8pts (59% to 67%) but in 2020 it’s -16% (54% to 80%), the change somewhat from a drop in VOC voting but more from a sharp rise in WHNV voting. IL is a similar story, at smaller magnitude. We have fewer ideas what might have led to increased WHNV turnout while VOC turnout remained nearly level. Maybe effective targeted advertising and organizing aimed at WHNV voters?
What is or isn't happening in CO, AL, PA, GA and MI to improve turnout gaps? Conversely, what is or isn't happening WA, IL, MA, NH and FL to make the gaps worse? Understanding this might help narrow the gaps in many states.
Why is the demographic component so large? In this piece, we've been focusing on state-specific effects, because we are interested in state-level solutions, especially organizing around VOC turnout and against suppressive state-policies. But looming over all of this is the fact that VOC turnout (particularly Hispanic, Asian and Native-American turnout) lags WNH and Black turnout nationwide—by more than 9 points in 2020, by no means an atypical year. That tips lots of elections to Republicans in places where Dems ought to be competitive, especially state-legislative and house seats.
At the height of the election cycle we focused nearly all of our pieces on actionable ideas. But, in this off-cycle year, we are trying to understand the last election so that we may apply some new wisdom as we go forward. The organizing which helped carry GA, PA, MI, NV, etc. were inspiring and, we hope, repeatable.
For us, that work begins by seeing what we can learn from the data. We know that is only one way into the story, but for us it helps refine our ideas, pointing in new directions and improving the questions we ask. Some pieces, like this one, raise more questions than they answer.
Next we will look back, at the 2012 and 2016 presidential elections, and also at the 2014 and 2018 midterms, and see if this same analysis, and the trends they reveal, together help us understand 2020 and figure out how to apply those lessons moving forward. Stay tuned!
Want to read more from Blue Ripple? Visit our website, sign up for email updates, and follow us on Twitter and FaceBook. Folks interested in our data and modeling efforts should also check out our Github page.
We could look instead at the state-specific effect on VOC turnout alone, but variations among states in overall turnout makes this difficult to interpret.↩︎
As we’ll discuss below, the CPS data on the demographic makeup of voters is sparse. So we don’t know the demographics of all voters. We use an MRP model of that sparse data to infer the demographics of all the voters, and these inferences have uncertainty, reflected in the confidence intervals. There are other sources of uncertainty (survey methods, etc.) that we are not quantifying here, so these confidence intervals are probably too small.↩︎
We post-stratify with and without the state/race interaction term. We could instead model the two situations separately since the presence of the interaction term in the model shifts the other parameters. But using two models, one for the demographic effects and a different one for state-specific effects, complicates the estimation of confidence intervals for various quantities. Click here for more details.↩︎
Using weights in the data input to the model raises the question of the uncertainty of the weights themselves, something which might require its own model!↩︎
We ran our model with some of these interactions and found that, other than the ones we already had, they appeared to add little information.↩︎
We don't yet have ACS data for 2020, so for now we are using 2018 ACS data in our post-stratifications.↩︎
The case of CO and NV are a reminder that we are using a particular data source which doesn’t always agree with other sources. CO and NV are states where most of the VOC are Hispanic. According to exit polls, Hispanic voters in NV had the highest turnout of Hispanic voters in any state, somewhere north of 70%. Exit polls showed CO with high Hispanic turnout as well, but not nearly as high. But the CPS-VRS data shows much higher VOC turnout in CO (79%) than in NV (58%).↩︎