Pythagorean Expectation - Empirical Origin

Empirical Origin

Empirically, this formula correlates fairly well with how baseball teams actually perform. However, statisticians since the invention of this formula found it to have a fairly routine error, generally about 3 games off. For example, in 2002, the New York Yankees scored 897 runs, allowing 697 runs. According to James' original formula, the Yankees should have won 62.35% of their games.

Based on a 162 game season, the Yankees should have won 101.07 games. The 2002 Yankees actually went 103-58.

In efforts to fix this error, statisticians have performed numerous searches to find the ideal exponent.

If using a single number exponent, 1.83 is the most accurate, and the one used by baseball-reference.com, the premier website for baseball statistics across teams and time. The updated formula therefore reads as follows:

The most widely known is the Pythagenport formula developed by Clay Davenport of Baseball Prospectus:

He concluded that the exponent should be calculated from a given team based on the team's runs scored (R), runs allowed (RA), and games (G). By not reducing the exponent to a single number for teams in any season, Davenport was able to report a 3.9911 root-mean-square error as opposed to a 4.126 root-mean-square error for an exponent of 2.

Less well known but equally (if not more) effective is the Pythagenpat formula, developed by David Smyth.

Davenport expressed his support for this formula, saying:

After further review, I (Clay) have come to the conclusion that the so-called Smyth/Patriot method, aka Pythagenpat, is a better fit. In that, X = ((rs + ra)/g)0.285, although there is some wiggle room for disagreement in the exponent. Anyway, that equation is simpler, more elegant, and gets the better answer over a wider range of runs scored than Pythagenport, including the mandatory value of 1 at 1 rpg.

These formulas are only necessary when dealing with extreme situations in which the average number of runs scored per game is either very high or very low. For most situations, simply squaring each variable yields accurate results.

There are some systematic statistical deviations between actual winning percentage and expected winning percentage, which include bullpen quality and luck. In addition, the formula tends to regress toward the mean, as teams that win a lot of games tend to be underrepresented by the formula (meaning they "should" have won fewer games), and teams that lose a lot of games tend to be overrepresented (they "should" have won more).

Read more about this topic:  Pythagorean Expectation

Famous quotes containing the words empirical and/or origin:

    To develop an empiricist account of science is to depict it as involving a search for truth only about the empirical world, about what is actual and observable.... It must involve throughout a resolute rejection of the demand for an explanation of the regularities in the observable course of nature, by means of truths concerning a reality beyond what is actual and observable, as a demand which plays no role in the scientific enterprise.
    Bas Van Fraassen (b. 1941)

    Someone had literally run to earth
    In an old cellar hole in a byroad
    The origin of all the family there.
    Thence they were sprung, so numerous a tribe
    That now not all the houses left in town
    Made shift to shelter them without the help
    Of here and there a tent in grove and orchard.
    Robert Frost (1874–1963)