Phelps vs Spitz: z-scores tell all

So, yesterday I suggested that, given improvements in training and equipment, Olympic athletes of today should be compared to those of the past using z-scores, rather than raw performance data. This was specifically with reference to comparing swimmer Michael Phelps and the historical performance of Mark Spitz, but I couldn’t find enough data from Spitz’s events in the 1972 Olympics to calculate the standardized z-scores.

(For those just joining us, z-scores use information about a distribution of data points to calculate a “universal” measure of how much one point stands out from the rest – in this case, how much Spitz or Phelps stands out from those among contemporary swimmers.)

Anyway: after another round of digging on Google, I’ve found detailed results (i.e., the final times for the top eight competitors) for the men’s 200-meter butterfly in 2008 and 1972. To convert Phelps’s and Spitz’s times to z-scores, I estimated the parameters of a distribution from the other seven men in the top eight by by taking the average (arithmetic mean) and standard deviation of those times in good ol’ Microsoft Excel [.xls file]. The z-score is just the difference between a single score and the average, divided by the standard deviation.

And …

Spitz wins! His z-score is -3.67, compared to -2.27 for Phelps. (The numbers are negative because the times are, of course, lower than the average from the other seven.) So, even though Phelps is considerably faster than Spitz, Spitz outperformed his competition by a greater margin than Phelps did.

Michael Phelps is fast, but what’s his z-score?

Even without following the Olympics in any detail, it’s hard not to hear about the success of U.S. swimmer Michael Phelps: a new record for career gold medals won by an athlete in any sport, and new time records for just about every race he swims.


Figure 1: Michael Phelps

Photo by sagicel.

But what do these records mean? Over on Slate, William Saletan lists a whole bunch of advantages Phelps has over past Olympic swimmers, including the high-tech LZR swimsuits, but also things like greater pool depth. All of which makes it hard to directly compare race times achieved by swimmers in the 2008 games and those achieved by past swimmers. Including those who set the records that Phelps keeps breaking.

Saletan suggests an “Olympic inflation index” based on the year-to-year improvements in athletes’ average performance; the New York Times devotes a whole article and an animated infographic to comparing Phelps to the great American swimmer Mark Spitz. But there’s a better option, proposed years ago by none other than Stephen Jay Gould: compare not the raw performance metrics, but z-scores. A z-score is how much an individual measurement differs from the mean of a group of measurements, divided by the standard deviation of the group. Converting raw performance measurements to z-scores gives us a standardized measure of how much an athlete’s performance stands out from that of his competitors. Gould applied this to batting averages, but it’s easy to do with any set of sports scores. For instance, here’s a scholarly article that does it with basketball results [$-a].

Unfortunately, I can’t make that comparison for Phelps and Spitz. In order to calculate a z-score, you need a reasonable sample size – say, at least five (and that’s if you make some assumptions about the way those scores are distributed). While the New York Times website lists the times for the top eight men in (e.g.) the 200m butterfly at Beijing 2008, I haven’t been able to dig up comparable data for Mark Spitz’s victory in the same event at Munich 1972 – or for any other event, either. Kind of a downer, I know – but I’m going to keep digging around for the data. If anyone has a lead, feel free to comment.

Edit: I found the data! Results in a new post.

Reference

Chatterjee, S, Yilmaz, MR (1999). The NBA as an Evolving Multivariate System. The American Statistician, 53, 257-262