Even without following the Olympics in any detail, it’s hard not to hear about the success of U.S. swimmer Michael Phelps: a new record for career gold medals won by an athlete in any sport, and new time records for just about every race he swims.

But what do these records mean? Over on *Slate*, William Saletan lists a whole bunch of advantages Phelps has over past Olympic swimmers, including the high-tech LZR swimsuits, but also things like greater pool depth. All of which makes it hard to directly compare race times achieved by swimmers in the 2008 games and those achieved by past swimmers. Including those who set the records that Phelps keeps breaking.

Saletan suggests an “Olympic inflation index” based on the year-to-year improvements in athletes’ average performance; the *New York Times* devotes a whole article and an animated infographic to comparing Phelps to the great American swimmer Mark Spitz. But there’s a better option, proposed years ago by none other than Stephen Jay Gould: compare not the raw performance metrics, but *z-scores*. A z-score is how much an individual measurement differs from the mean of a group of measurements, divided by the standard deviation of the group. Converting raw performance measurements to z-scores gives us a standardized measure of how much an athlete’s performance stands out from that of his competitors. Gould applied this to batting averages, but it’s easy to do with any set of sports scores. For instance, here’s a scholarly article that does it with basketball results [$-a].

Unfortunately, I can’t make that comparison for Phelps and Spitz. In order to calculate a z-score, you need a reasonable sample size – say, at least five (and that’s if you make some assumptions about the way those scores are distributed). While the *New York Times* website lists the times for the top eight men in (e.g.) the 200m butterfly at Beijing 2008, I haven’t been able to dig up comparable data for Mark Spitz’s victory in the same event at Munich 1972 – or for any other event, either. Kind of a downer, I know – but I’m going to keep digging around for the data. If anyone has a lead, feel free to comment.

*Edit: I found the data! Results in a new post.*

**Reference**

Chatterjee, S, Yilmaz, MR (1999). The NBA as an Evolving Multivariate System. The American Statistician, 53, 257-262