John Versus The Midterms, pt. 1: The Senate

The other day while I was temporarily incapacitated by the kind of hangover only a visit to the old college chums can provide, I decided to take a stab at what I thought would happen in the 2010 U.S. midterm elections. Here’s part one of what I did, which covers the U.S. Senate general elections. Later, I’ll roll out some predictions about governors and the House. First, the findings, then, the method.

(Click for larger.)

So as it stands right now, I predict the Democrats will hold the Senate.

What I’m doing

In devising the formula I ended up using to make these predictions I incorporated two key elements: All the polls I could find, and Intrade. The poll data came from Pollster, Electoral-Vote, and RealClearPolitics. The polling data consisted of pure Dem/Rep questions, and I simply included the raw percentage responses without worrying about those who were undecided, voting for another candidate, or didn’t have an answer to the question. Intrade is an election prediction market, which meant I had to fiddle with it a bit to make it get along with the polling data.

By itself, Intrade is simply a prediction market. People buy shares of one particular issue like any normal stock, and that stock matures when it comes time to test the prediction. Unfortunately, every share is in “thumbs-up or thumbs-down” categories, not in “Candidate A or Candidate B” categories. That means that “Barbara Boxer will win in California” is an entirely different market than “Carly Fiorina will win in California.” As such, the values of each market aren’t necessarily scaled in a useful fashion. To make it feel more percentage-y, I simply translated rice differences into meaningful, scalar differences by regarding the value of shares as a proportion of total shares. For example, my interpreted Intrade probability of Boxer winning in California is

((the value of Boxer’s shares)/(the value of Boxer’s shares + the value of Fiorina’s shares))*100%.

I simply took an average of all of the polls that had been done in a particular race and counted it as the polling portion of my final equation. I did not use rolling averages (which would allow me to exclude older polling data altogether) or exponential averaging (which would strongly reduce the influence of said old polling data). In addition to saving time, I don’t think the debate is settled that older polls are actually significantly less accurate than more recent polls, and certainly not extravagantly so. Consider the work of Joe Bafumi, Bob Erikson, and Chris Wlezien, summarized in pretty picture format below. They found only weak evidence to suggest that older polls are substantially worse than newer polls. For example:

Mind the distinctly unimpressive gap.

I don’t see any particular reason to rule out older poll data, and because polling is inconsistent across districts – which would make exponential averaging in a consistent, time by unit time measure really, really hard – I just went with the numbers as they were. Apparently professional forecasters do tend to use exponential decay, but I’ve never seen any of them explain if they do so on a poll-by-poll basis for individual races rather than by a time-by-time basis. I suspect they’re doing the former, and I suspect that it’s wrong. I suspect that it’s wrong because doing so means they’re not treating time as a constant unit, they’re treating polls as a constant unit. That would mean that three polls for candidate A conducted 100, 50, and 3 days before the election would be weighted the same as three polls for candidate B conducted 4, 3, and 2 days from the election.

So now I had poll numbers on a nice 0-100% scale, and Intrade share value on a nice 0-100% scale. I decided to weight the value of each in as simple and useful a fashion as I could devise. I simply weighted them both by how accurate Intrade and recent polls have been overall. According to Nate Silver, in the previous election Senate polls were off by about 6 points, House polls by 5.1 points, and gubernatorial polls, 5.3. Intrade had error rates of 4.06 as an average of the three categories. These differences in errors are actually much closer than I thought they would be – all part of an ongoing debate about the value of prediction markets versus polls, when either is useful, and why (there’s some research here, here, and here, for starters). I’m inclined to think prediction markets are powerful things – but I also work for a polling firm, so…

I know this is an inconsistent metric, an individual breakdown versus a total average, but it’s the only data on the subject out there. This, the weighting for each category would be:

Weight of polling average = (Average error of polling data for this type of election)/(Average error of polling data for this type of election + Average error of Intrade proportional value data for this type of election)
Weight of Intrade average = |(1-Weight of polling average)|

(editor’s note: After I finished this post, I found some research that almost justifies my approach – check it out!)

And my predicted winner is simply which of the candidates has the higher score.

The reason the predictions are coming out in multiple posts are as follows:

(1) I have a day job, you ungrateful kids.
(2) Lean/Lock is weird.

Lean/Lock is a game rolled out by Slate and Yahoo that lets you predict who will win the 2010 Senate elections. You sign into the game through your Facebook profile. You are then presented with the option of either “Leaning” or “Locking” towards the candidates for the House, Senate, and the governorships. Players are rewarded points every day that they’re “leaned” or “locked” on the candidate who is currently in the lead in the polls. If you “lean” towards a candidate you can switch at a low penalty, and if “locked,” at a very high penalty (500, which is about 12.5% of the average total score as of right now).

This scoring methodology is strange. Players are rewarded for correctly lining up their choices simply with whoever is in the lead in the polls. This incentivizes meticulously staring at polling websites every morning, not actually developing nifty prediction techniques. I don’t like it, but I’m playing it.

And that’s my excuse for forecasting. All the Senate candidates are out there now, and later on, as I have time and excuses to procrastinate from my real work as this game provides, I’ll put out more. What do you think of my method? How do you predict elections?


2 Responses to “John Versus The Midterms, pt. 1: The Senate”

  1. I Love Christine O’Donnell « Platykurtosity Says:

    […] I ran my first set of election projections, my methodology spat out a loss for Chris Coons to then-presumptive Republican nominee Mike Castle. […]

  2. Midterm Forecast Roundup: Semi-Complete Lean/Lock Results « Casual Factors Says:

    […] gubernatorial races using Slate’s Lean/Lock game. Readers can get a refresher on my methodology here, and Slate’s rules of the game are available here. Basically, the point of the game was to either […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: