Regression Analysis - Underlying Assumptions

Underlying Assumptions

Classical assumptions for regression analysis include:

  • The sample is representative of the population for the inference prediction.
  • The error is a random variable with a mean of zero conditional on the explanatory variables.
  • The independent variables are measured with no error. (Note: If this is not so, modeling may be done instead using errors-in-variables model techniques).
  • The predictors are linearly independent, i.e. it is not possible to express any predictor as a linear combination of the others. See Multicollinearity.
  • The errors are uncorrelated, that is, the variance–covariance matrix of the errors is diagonal and each non-zero element is the variance of the error.
  • The variance of the error is constant across observations (homoscedasticity). (Note: If not, weighted least squares or other methods might instead be used).

These are sufficient conditions for the least-squares estimator to possess desirable properties; in particular, these assumptions imply that the parameter estimates will be unbiased, consistent, and efficient in the class of linear unbiased estimators. It is important to note that actual data rarely satisfies the assumptions. That is, the method is used even though the assumptions are not true. Variation from the assumptions can sometimes be used as a measure of how far the model is from being useful. Many of these assumptions may be relaxed in more advanced treatments. Reports of statistical analyses usually include analyses of tests on the sample data and methodology for the fit and usefulness of the model.

Assumptions include the geometrical support of the variables (Cressie, 1996). Independent and dependent variables often refer to values measured at point locations. There may be spatial trends and spatial autocorrelation in the variables that violates statistical assumptions of regression. Geographic weighted regression is one technique to deal with such data (Fotheringham et al., 2002). Also, variables may include values aggregated by areas. With aggregated data the Modifiable Areal Unit Problem can cause extreme variation in regression parameters (Fotheringham and Wong, 1991). When analyzing data aggregated by political boundaries, postal codes or census areas results may be very different with a different choice of units.

Read more about this topic:  Regression Analysis

Famous quotes containing the words underlying and/or assumptions:

    It is commonplace that a problem stated is well on its way to solution, for statement of the nature of a problem signifies that the underlying quality is being transformed into determinate distinctions of terms and relations or has become an object of articulate thought.
    John Dewey (1859–1952)

    Assumptions of male superiority are as widespread and deep rooted and every bit as crippling to the woman as the assumptions of white supremacy are to the Negro.... this is no more a man’s world than it is a white world.
    Student Non-Violent Coordinating Committee, African American civil rights organization. SNCC Position Paper (Women in the Movement)