Item Response Theory - A Comparison of Classical and Item Response Theories

A Comparison of Classical and Item Response Theories

Classical test theory (CTT) and IRT are largely concerned with the same problems but are different bodies of theory and entail different methods. Although the two paradigms are generally consistent and complementary, there are a number of points of difference:

  • IRT makes stronger assumptions than CTT and in many cases provides correspondingly stronger findings; primarily, characterizations of error. Of course, these results only hold when the assumptions of the IRT models are actually met.
  • Although CTT results have allowed important practical results, the model-based nature of IRT affords many advantages over analogous CTT findings.
  • CTT test scoring procedures have the advantage of being simple to compute (and to explain) whereas IRT scoring generally requires relatively complex estimation procedures.
  • IRT provides several improvements in scaling items and people. The specifics depend upon the IRT model, but most models scale the difficulty of items and the ability of people on the same metric. Thus the difficulty of an item and the ability of a person can be meaningfully compared.
  • Another improvement provided by IRT is that the parameters of IRT models are generally not sample- or test-dependent whereas true-score is defined in CTT in the context of a specific test. Thus IRT provides significantly greater flexibility in situations where different samples or test forms are used. These IRT findings are foundational for computerized adaptive testing.

It is worth also mentioning some specific similarities between CTT and IRT which help to understand the correspondence between concepts. First, Lord showed that under the assumption that is normally distributed, discrimination in the 2PL model is approximately a monotonic function of the point-biserial correlation. In particular:


a_i \cong \frac{\rho_{it}}{\sqrt{1-\rho_{it}^2}}

where is the point biserial correlation of item i. Thus, if the assumption holds, where there is a higher discrimination there will generally be a higher point-biserial correlation.

Another similarity is that while IRT provides for a standard error of each estimate and an information function, it is also possible to obtain an index for a test as a whole which is directly analogous to Cronbach's alpha, called the separation index. To do so, it is necessary to begin with a decomposition of an IRT estimate into a true location and error, analogous to decomposition of an observed score into a true score and error in CTT. Let

where is the true location, and is the error association with an estimate. Then is an estimate of the standard deviation of for person with a given weighted score and the separation index is obtained as follows


R_\theta = \frac{\text{var}}{\text{var}} = \frac{\text{var} - \text{var}}{\text{var}}

where the mean squared standard error of person estimate gives an estimate of the variance of the errors, across persons. The standard errors are normally produced as a by-product of the estimation process. The separation index is typically very close in value to Cronbach's alpha.

IRT is sometimes called strong true score theory or modern mental test theory because it is a more recent body of theory and makes more explicit the hypotheses that are implicit within CTT.

Read more about this topic:  Item Response Theory

Famous quotes containing the words comparison, classical, item, response and/or theories:

    [Girls] study under the paralyzing idea that their acquirements cannot be brought into practical use. They may subserve the purposes of promoting individual domestic pleasure and social enjoyment in conversation, but what are they in comparison with the grand stimulation of independence and self- reliance, of the capability of contributing to the comfort and happiness of those whom they love as their own souls?
    Sarah M. Grimke (1792–1873)

    Compare the history of the novel to that of rock ‘n’ roll. Both started out a minority taste, became a mass taste, and then splintered into several subgenres. Both have been the typical cultural expressions of classes and epochs. Both started out aggressively fighting for their share of attention, novels attacking the drama, the tract, and the poem, rock attacking jazz and pop and rolling over classical music.
    W. T. Lhamon, U.S. educator, critic. “Material Differences,” Deliberate Speed: The Origins of a Cultural Style in the American 1950s, Smithsonian (1990)

    Those things for which the most money is demanded are never the things which the student most wants. Tuition, for instance, is an important item in the term bill, while for the far more valuable education which he gets by associating with the most cultivated of his contemporaries no charge is made.
    Henry David Thoreau (1817–1862)

    Because humans are not alone in exhibiting such behavior—bees stockpile royal jelly, birds feather their nests, mice shred paper—it’s possible that a pregnant woman who scrubs her house from floor to ceiling [just before her baby is born] is responding to a biological imperative . . . . Of course there are those who believe that . . . the burst of energy that propels a pregnant woman to clean her house is a perfectly natural response to their mother’s impending visit.
    Mary Arrigo (20th century)

    Whatever practical people may say, this world is, after all, absolutely governed by ideas, and very often by the wildest and most hypothetical ideas. It is a matter of the very greatest importance that our theories of things that seem a long way apart from our daily lives, should be as far as possible true, and as far as possible removed from error.
    Thomas Henry Huxley (1825–95)