From sports to schools to economic to politics and policy, we live in a cloud of numbers … and it’s hard to tell which numbers matter. (More)

Too Much Information, Part I: A Cloud of Numbers

This week Morning Feature looks at an irony of the Information Age: with so much measured data available, it’s easier to make arguments but harder to draw reliable conclusions. Today we consider the first half of that problem, why a cloud of numbers enables us to make so many more and different arguments. Tomorrow we’ll ponder the second half, why the veil of causation makes it so difficult to draw reliable conclusions.

The importance of a good start …

The history of Super Bowl Champions seems to prove it’s important to get off to a good start. Indeed 36 of the 46 Super Bowl Champions won their opening games. Does this mean that if your favorite team loses their opener next year, they’re doomed? Hardly. Almost 20% of Super Bowl winners lost their first games. Three champions – the 1999 St. Louis Rams, 2001 New England Patriots, and 2007 New York Giants – lost their first two games.

In fact, in three of the Giants’ four championship seasons – 1986, 2007, and 2011 – they lost their opening games. Should Giants’ fans hope they lose the season opener next year? Somehow I doubt Giants’ head coach Tom Coughlin will suggest that to his players.

What do the statistics of opening game wins or losses tell us? Not much, really. There are now 32 teams in the NFL, so in Week One there will be 16 winners and 16 losers. One of those will win the Super Bowl. It’s more likely that the Super Bowl Champion will have won the season opener, but you could crunch the numbers to show that is true for any one of the 16 games of the regular season. That makes sense, as every champion wins more regular season games than they lose. More likely than not, that includes the season opener.

… or a strong finish

In fact, you could argue it’s more important to win the last game of the regular season, on the reasonable theory that teams playing early may be mere flukes, or suffer injuries that derail their hopes. Teams that are hot at the end of the season are more likely to do well in the playoffs … or so sports pundits tell us every December.

Are the sports pundits right?

While 37 of the 46 Super Bowl Champions won their last regular season games, that statistic is misleading. The 1997 and 1998 Denver Broncos lost two of their last three games and the 2006 Indianapolis Colts lost three of their last five. Each won their final game of the season, but none was “hot” at the end of the season. Nine teams lost their final regular season game and went on to win the Super Bowl, including the 2009 New York Giants … who lost their final three games.

In short, roughly 80% of Super Bowl Champions won their season openers, and roughly 80% won their season finales. And – surprise! – Super Bowl Champions win an average 80% of their regular season games. In other words, coaches who tell their players to “take it one week at a time” and “every game matters” are correct … even if they’re stating the obvious.

But what about…?

“Ahh,” you say, “but what about net turnovers, or team defense rankings, or scoring average, or quarterback ranking. There are lots of statistics. Surely some must reliably predict Super Bowl winners!”

And you’re right. Surely some do. Given enough sets of numbers, you’re almost certain to find one set that matches up closely with another set. And that’s exactly the problem.

We live in a cloud of numbers, with computers that can easily be programmed to look for matches between sets. If you do that enough, you’ll find intriguing matches for which you can tell a plausible story of cause-and-effect. Sports pundits, backed by statistical researchers, do that every day.

So do political pundits. Many pore over monthly GDP or unemployment statistics as if they held the key to understanding and predicting the 2012 elections. But consider what New York Times‘ Nate Silver, one of the best math mavens in American political analysis, wrote last June:

There are literally thousands of plausible models that one might build, using different economic indicators measured in different ways and over different time periods, taken alone or in combination with one another, and applied to different subsets of elections that are deemed to be relevant. Some of these models, through chance alone, will produce a better fit on the historical data – but the relationships may be spurious and their predictive power will sometimes not be as strong as claimed. Even the most thoughtful, well-designed models – I like this one, for instance – can see their performance deteriorate quite substantially if small, seemingly benign changes are made to their assumptions.

So statistics don’t matter?

Not quite. As we’ll see tomorrow, statistical analysis has been at the core of science for well over a century. When one set of numbers matches another, and if the probability of their matching by random chance is very small, there may well be a real link between them. But with researchers using computers to compare dozens, hundreds, or thousands of sets of numbers, they’re likely to get some meaningless matches. To figure out which are meaningful, you need huge sets of numbers and – ideally – numbers produced in a way that allows you to filter out (what you hope is) irrelevant information.

Even so, increasingly, the statistics will not be conclusive. As Silver notes, good statistical models yield a range of probabilities rather than a specific This Will Happen or That Will Not. Returning to our sports metaphor, good statistical analysis can look at the scoreboard. Ideally, the statistics can show us both which outcomes are more likely (the score) and how much we don’t yet know (the time left on the clock).

But Ron Klain argued in an article for Bloomberg News, we still have to play the game:

In five-card draw poker, the hand each player is dealt at the beginning has an impact on the outcome. But how they play those cards – the bets they place, whether they keep or discard, draw or fold – decides the final result.

Likewise, in 2012, factors such as the state of the economy and the ideology of the Republican candidate will certainly affect the president’s chances of re-election. But in the end, how the campaign unfolds – the messages the candidates offer, the campaigns they run, their performance on the stump, their get-out-the-vote efforts and their debate appearances – will make the difference. Candidates and their campaigns will dictate the outcome, not calculators.

So when you read political pundits dissecting next month’s unemployment numbers, or reading the tea leaves of some poll, remember that we live in a cloud of number so dense it’s easy to find numbers that fit whatever story someone wants to tell. Then remember this:


Happy Tuesday!