Index of Coincidence - Generalization

Generalization

The above description is only an introduction to use of the index of coincidence, which is related to the general concept of correlation. Various forms of Index of Coincidence have been devised; the "delta" I.C. (given by the formula above) in effect measures the autocorrelation of a single distribution, whereas a "kappa" I.C. is used when matching two text strings. Although in some applications constant factors such as and can be ignored, in more general situations there is considerable value in truly indexing each I.C. against the value to be expected for the null hypothesis (usually: no match and a uniform random symbol distribution), so that in every situation the expected value for no correlation is 1.0. Thus, any form of I.C. can be expressed as the ratio of the number of coincidences actually observed to the number of coincidences expected (according to the null model), using the particular test setup.

From the foregoing, it is easy to see that the formula for kappa I.C.' is

where is the common aligned length of the two texts A and B, and the bracketed term is defined as 1 if the -th letter of text A matches the -th letter of text B, otherwise 0.

A related concept, the "bulge" of a distribution, measures the discrepancy between the observed I.C. and the null value of 1.0. The number of cipher alphabets used in a polyalphabetic cipher may be estimated by dividing the expected bulge of the delta I.C. by the observed bulge, although in many cases (such as when a repeating key was used) better techniques are available.

Read more about this topic:  Index Of Coincidence