Why are political polls so confusing? Why do the same set of polls often lead to very different predictions? Why did the Trump campaign ask CNN to retract a poll and why did CNN refuse?
There are a number of things that make polls confusing and difficult to interpret. Among them, choosing how to survey people (land-line phones? Cell-phones? Facebook survey?), selecting which questions to ask and how to phrase them, and then interpreting the data to reach some conclusion, e.g., to predict the outcome of an upcoming election. All these complexities are worth talking about, but here we’re going to focus on the last question: how do we go from polling-data to prediction?
Predicting election outcomes via polling or survey data is hard: the people you survey may not be representative of the electorate, and the composition of the electorate in each state changes election-to-election. When you conduct a poll or survey, you cannot control who responds. So your sample will be different in myriad ways from the population as a whole, and also different from the subset of the population who will vote.
Reconstructing the opinion of the population from a poll is done via weighting, assigning a number to each response that allow users of the data to reconstruct a representative sample of the population from the survey. For example, if a state has roughly the same number of adults above and below 45 but the survey has fewer responses from people over 45, those responses will be assigned higher weights. Weighting is complex, not least because there are many demographic variables to be considered.
With weighting, we have an estimate of how a person in a particular place is likely to vote given various demographic information about them. And, from the census and other sources, we know the demographics of each county/district/state. But if we want to predict an election outcome, or decide where we need to focus resources because an election is likely to be close, we need to know who is going to cast a ballot in the election.
There are a lot of reasons people do and don’t vote: see our piece on voter ID, or Fair Vote’s rundown on voter turnout, for example. For the purposes of this post we’re going to ignore the reasons, and focus on the data problem of uncertain turnout.
Figuring out how to weight opinion data to predict voting is among the most difficult parts of determining which elections/states are likely to be close and thus where to prioritize work and the allocation of resources.
Weighting polls is also deeply contentious. As an extreme example, CNN recently published a poll showing Biden up 14 points over Trump. The Trump campaign hired a pollster to “analyze” the CNN poll and then demanded CNN retract the poll, which CNN promptly refused to do. The substantive objection made by Trump’s campaign was that CNN should have weighted their poll differently, such that the fraction of Republicans, Democrats and Independents was the same as among voters in 2016. It’s not hard to see why this is a bad idea. Firstly, as people’s views on candidates change, so may their reported party. So weighting to fix partisan identity will tend to obscure exactly the thing you are trying to measure. If partisan identity were a stable feature of the electorate, it might still make sense to weight this way, just to control for a bad polling sample. But partisan identification is not particularly stable! In the chart below we plot partisan identity as reported in the CCES survey from 2006-2018 to see how much it shifts election to election. From the chart we see that though voters are roughly equally distributed between Democrats, Independents and Republicans, they shift around among those groups a significant amount.
So, does the last election provide any useful guidance for weighting? Using partisan identity is particularly bad since it is so heavily correlated the thing a poll is trying to measure. But perhaps if we just weight demographically, that is assume that the turnout in any given demographic group in any state will be similar to 2016, this might provide reasonable weights? This doesn’t have the correlation issue of partisan identity but turnout by demographic group is also variable election-to-election. And there are two other issues: data sources on turnout vary dramatically and how one breaks down the demographics can change the results as well. To illustrate both these points, we built a few simple electorate models from 2016 data, either from the census CPS voter survey or the CCES survey data. For each data-set, we used multi-level regression to infer turnout probabilities in each state for 2 demographic groupings, differing only in the level of detail we use when considering a person’s race:
ASER: we infer turnout for 16 groups per state, categorizing everyone by age (45 and Over/Under 45), sex (Female/Male), education (Non-College-Grad/College Grad or In College), and race (Non-White/White-Non-Latinx).
ASER5: we infer turnout for 40 groups per state, categorizing everyone by age (45 and Over/Under 45), sex (Female/Male), education (Non-College-Grad/College Grad or In College), and race (Black/Latinx/Asian/Other/White-Non-Latinx).
We further adjust these inferred turnout numbers such that, when multiplied by the number of people in each group in the state, we reproduce the actual number of ballots cast on election day. If you’re interested in the details, we followed a simplified version of the techniques described in this paper.
To illustrate how different these electorate weights are, we apply them to the 2016 election itself, using the census demographics in each state and voter preferences inferred from the CCES data using the same multi-level regression methods. From that we compute the two-party vote share in each state and get 4 different 2016 outcomes in terms of national popular vote and electoral votes. This is not something that would have been an actual forecast. For that, we would want to simulate many elections with the given set of probabilities. But these correspond to the average outcome with each of these electorate compositions and so give a picture of how different they are.
The results are charted below and vary, from a Dem popular vote share below 48.5% and 180 electoral votes to a popular vote share of about 50.5% and almost 280 electoral votes! That’s a swing of almost 3% in the popular vote (~4 million votes) and the difference between losing and winning the election.
The Census based models look more like the actual election outcomes, though that could just be a coincidence. The census-sourced electoral weights indicate higher turnout from younger and minority voters than does the CCES data, and is thus more Democratic-candidate-friendly than the CCES data.
For a given data-set, there are significant differences in the ASER and ASER5 models. Popular vote difference comes from correlation between turnout and preference among the more specific race categories. For instance, in 2016, Black voters were more likely to vote for Democratic candidates and turned out to vote at higher rates than Latinx voters. So separating those groups out, results in a model with higher Dem vote-share. The difference in electoral college outcome is even larger–electoral votes jump by more than you might expect from just the shift in popular vote–suggesting that the specific places that people of various non-white racial backgrounds live is advantageous to a Democratic Presidential candidate. In other words, the extra vote-share we see from looking at race in greater detail is disproportionately located in some close (battleground) states.
None of these models is particularly close to the actual election we had, and this is using data from an election to model the same election. Blindly using that data to model the 2020 election would be foolhardy.
What do good pollsters actually do? They ask respondents a variety of questions to gauge how likely they are to vote. They combine this with demographic weighting (like the examples above) to estimate the makeup of the electorate. This is complicated and hard to get right, and is part of what makes some pollsters more reliable than others. Previous elections may be used as sanity checks or as the baseline for non-poll-based models.
None of these issues is simple or settled. For example, a pollster in Utah recently decided to begin using education in their weighting, something they had not been using before, because they realized they had made errors in 2016 by ignoring it.
Election modelers then use these polls as well as fundamentals—economic trends, incumbency, etc.—to predict the probabilities of various outcomes. For a particularly thorough explanation of one such model, see the Economist’s summary of their work for 2020.
As we continue to work with this data we refine and improve our modeling. In this post we have shifted from using the census summary of the CPS voter supplement to using the CPS micro-data itself, as harmonized via the IPUMS web portal. This has several advantages. Firstly, the micro-data allows us to get turnout numbers at the state level. Also useful is that the micro-data has more demographic information. For example, the summary tables allow us to explore the variables of age, sex and race or age, sex, and education but not age, sex, education, and race. This is important since the combination of education and race is necessary to explore the voting and turnout patterns of the so-called “White Working Class,” a crucial voting block in the past few elections. Another nice benefit is that the micro-data contains more information about voting. It includes data about whether people voted by mail or voted early and some information about why registered non-voters didn’t vote. For all those reasons, going forward we will be using the micro-data sourced CPS turnout data instead of the previous nationally-aggregated summaries of that data.
Want to read more from Blue Ripple? Visit our website, sign up for email updates, and follow us on Twitter and FaceBook. Folks interested in our data and modeling efforts should also check out our Github page.