Dummy Variable (statistics) - Precautions in The Usage of Dummy Variables

Precautions in The Usage of Dummy Variables

1. If one dummy variable (that has been introduced as an explanatory variable) has n categories, it is important that only (n − 1) dummy variables are introduced. For example, if the dummy variable is gender, there are 2 categories (female / male). A dummy should be created only for one category, either female or male, and not both. The regression equation should incorporate that dummy along with the intercept term. If this rule is not complied with, the regression would lead to a dummy variable trap. A dummy variable trap is a situation where there is perfect collinearity or perfect multicollinearity between the variables, i.e., there would be an exact linear relationship among the variables. This can be explained as: The value of the intercept is implicitly given as 1 for every observation. Suppose the columns of all the n dummy categories under the qualititative variable are added up. This sum will produce exactly the intercept column as it is. This is the perfect collinearity situation that leads to the dummy variable trap. The way to avoid this is to introduce (n − 1) dummies + the intercept term OR to introduce n dummies and no intercept term. In both these cases there will be no linear relationship among the explanatory variables. However, the latter case is not recommended because it will make it difficult to test the dummy categories for differences from the base category. Hence, the (n − 1) dummies + intercept term approach is the most advised approach to form the regression model.

2. The category for which the dummy is excluded or not assigned is known as the base group or the benchmark category. This category is the omitted category and all comparisons are made against this category. That is why it is also known as the comparison, reference or control category. This category should be carefully identified and omitted from the assignment of dummy variables.

3. Since the base category does not have a dummy variable, the mean value of this category is equal to the intercept term itself. The value of this intercept term will thus be the value against which the categories having dummies should be compared.

4. For comparison against the benchmark category, the "slope" coefficients of the dummy variables in the regression equation are considered. These "slope" coefficients are called differential intercept coefficients as they indicate by how much the mean value of the dummy categories differ from the mean value of the benchmark category (which is equal to the intercept).

5. The choice of the benchmark category is completely at the discretion of the researcher. The researcher must take precaution to ensure that the intercept term is equal to the mean value of the benchmark category and all other dummy categories are compared against this benchmark category.

Read more about this topic:  Dummy Variable (statistics)

Famous quotes containing the words precautions, usage, dummy and/or variables:

    A multitude of little superfluous precautions engender here a population of deputies and sub-officials, each of whom acquits himself with an air of importance and a rigorous precision, which seemed to say, though everything is done with much silence, “Make way, I am one of the members of the grand machine of state.”
    Marquis De Custine (1790–1857)

    Pythagoras, Locke, Socrates—but pages
    Might be filled up, as vainly as before,
    With the sad usage of all sorts of sages,
    Who in his life-time, each was deemed a bore!
    The loftiest minds outrun their tardy ages.
    George Gordon Noel Byron (1788–1824)

    Fathers and Sons is not only the best of Turgenev’s novels, it is one of the most brilliant novels of the nineteenth century. Turgenev managed to do what he intended to do, to create a male character, a young Russian, who would affirm his—that character’s—absence of introspection and at the same time would not be a journalist’s dummy of the socialistic type.
    Vladimir Nabokov (1899–1977)

    The variables of quantification, ‘something,’ ‘nothing,’ ‘everything,’ range over our whole ontology, whatever it may be; and we are convicted of a particular ontological presupposition if, and only if, the alleged presuppositum has to be reckoned among the entities over which our variables range in order to render one of our affirmations true.
    Willard Van Orman Quine (b. 1908)