Scotch Tape and Duct Whisky: On the repeatability of NCAA football games

Nate Silver has a new column in the New York Times Sunday Magazine. In the inaugural edition, he writes of the difficulty of ranking college football teams.

But the computers are no dumber — and no smarter — than any of the usual ways to rank teams. Since the beginning of the B.C.S. in 1998, the computer ratings have predicted the winners of B.C.S. bowl games correctly 32 times and incorrectly 20 times. That’s marginally better than the 30-22 record put up by the overall B.C.S. formula, using a combination of human polls and computer ratings, and only marginally worse than the 33-19 record the computers would have compiled if they were allowed to consider victory margin and home-field advantage. Teams favored by a composite of human polls are 30-21-1 in the same games. The teams considered the favorites by Las Vegas gamblers are also just 30-21-1.

Silver goes on to show that pre-season rankings do just about as well at predicting championship outcomes as the estimates described above (all in the neighborhood of 60% correct).

He appears to view the 60% success rate as poor, and attributes the poor success to a lack of data; good teams in different conferences rarely play each other:

Without high-quality out-of-conference games, every major conference is in essence an island unto itself. We can identify the best team in the Pac 10, or the best team in the S.E.C. But we don’t have any good way of comparing the Pac 10 against the S.E.C., or against any other conference. It doesn’t matter how smart your computer rankings are, or how wizened the participants in your poll: there simply isn’t enough worthwhile data to work with.

Well, what if we had good data? Let’s say we want to pick the winner of a championship game between the Springfield U Nittany Tide and Springfield A&M Snortin’ Swine. But, hey, these two teams played each other earlier in the season! That’s exactly the kind of direct matchup Silver is looking for. Let’s predict that the rematch will have the same outcome as the first game.

I downloaded James Howell's NCAA football score archives for the 1998-2009 seasons (the start of the BCS to the latest available). I searched for rematches within each season. There were 27 rematches in the data set. Of these, the team that won the first game won the rematch 16 times, or 59.2%. That’s almost exactly the same as the success rate of the computer rankings, human polls, and Vegas bookmakers.

Silver calls for a multi-team playoff to better determine the best team. That would help, but I think what’s missing from the discussion is the fact that a single game is a noisy measurement of which team is better. “On any given Sunday any team can beat any other team.” That’s why other sports have best-of-5 or best-of-7 playoff series. It’s also why we see inconsistent loops in conference games (A beats B, B beats C, C beats A). 60% may simply be about as good as you can do at predicting a championship game.

People like to think that a championship game means something, that it determines some ultimate truth about the teams, but a single game between closely-matched teams doesn’t tell you who the “best team” is. It hardly tells you more than the coin toss at the start. It’s a fun spectacle, not a precision measurement. I’m all for the NCAA giving us more spectacle, but we should disabuse ourselves of the notion that we’ll learn more from it.

Scotch Tape and Duct Whisky

Sunday, November 21, 2010

On the repeatability of NCAA football games

No comments:

Post a Comment