Extreme Value Theory - Data Sampling

Data Sampling

Two approaches exist to fit the tail of a sample empirical cumulative distribution function (ECDF) to one of the three possible distribution functions (called the Frechet, Gumbel and Weibull). The first method relies on approximating a distribution from a so-called block maxima (minima) series. In operational statistics situations, it is customary and convenient to apply a sampling method that consists in extracting the annual maxima. In doing so, a so-called "Annual Maxima Series" (AMS) is generated. The second method relies on sampling points from the data set that exceeds a certain threshold (falls below a certain floor). This method is generally referred to as the "Point Over Threshold" method (POT).:

  1. Basic theory approach as described in the book by Burry (1975). In general this conforms to the first theorem in extreme value theory (Fisher and Tippett, 1928; Gnedenko, 1943).
  2. Most common at this moment is the tail-fitting approach based on the second theorem in extreme value theory (Pickands, 1975; Balkema and de Haan, 1974).

The difference between the two theorems is due to the nature of the data generation. For Theorem I the data are generated in full range, while in Theorem II data is only generated when it surpasses a certain threshold, called Peak Over Threshold models (POT). The POT approach has been developed largely in the insurance business, where only losses (pay outs) above a certain threshold are accessible to the insurance company. Strangely, this approach is often used for cases where Theorem I applies, which creates problems with the basic model assumptions.

Extreme value distributions are the limiting distributions for the minimum or the maximum of a very large collection of independent random variables from the same arbitrary distribution. Emil Julius Gumbel (1958) showed that for any well-behaved initial distribution (i.e., F(x) is continuous and has an inverse), only a few models are needed, depending on whether you are interested in the maximum or the minimum, and also if the observations are bounded above or below.

Read more about this topic:  Extreme Value Theory

Famous quotes containing the word data:

    This city is neither a jungle nor the moon.... In long shot: a cosmic smudge, a conglomerate of bleeding energies. Close up, it is a fairly legible printed circuit, a transistorized labyrinth of beastly tracks, a data bank for asthmatic voice-prints.
    Susan Sontag (b. 1933)