Underlying Assumptions
Classical assumptions for regression analysis include:
- The sample is representative of the population for the inference prediction.
- The error is a random variable with a mean of zero conditional on the explanatory variables.
- The independent variables are measured with no error. (Note: If this is not so, modeling may be done instead using errors-in-variables model techniques).
- The predictors are linearly independent, i.e. it is not possible to express any predictor as a linear combination of the others. See Multicollinearity.
- The errors are uncorrelated, that is, the variance–covariance matrix of the errors is diagonal and each non-zero element is the variance of the error.
- The variance of the error is constant across observations (homoscedasticity). (Note: If not, weighted least squares or other methods might instead be used).
These are sufficient conditions for the least-squares estimator to possess desirable properties; in particular, these assumptions imply that the parameter estimates will be unbiased, consistent, and efficient in the class of linear unbiased estimators. It is important to note that actual data rarely satisfies the assumptions. That is, the method is used even though the assumptions are not true. Variation from the assumptions can sometimes be used as a measure of how far the model is from being useful. Many of these assumptions may be relaxed in more advanced treatments. Reports of statistical analyses usually include analyses of tests on the sample data and methodology for the fit and usefulness of the model.
Assumptions include the geometrical support of the variables (Cressie, 1996). Independent and dependent variables often refer to values measured at point locations. There may be spatial trends and spatial autocorrelation in the variables that violates statistical assumptions of regression. Geographic weighted regression is one technique to deal with such data (Fotheringham et al., 2002). Also, variables may include values aggregated by areas. With aggregated data the Modifiable Areal Unit Problem can cause extreme variation in regression parameters (Fotheringham and Wong, 1991). When analyzing data aggregated by political boundaries, postal codes or census areas results may be very different with a different choice of units.
Read more about this topic: Regression Analysis
Famous quotes containing the words underlying and/or assumptions:
“Comedy deflates the sense precisely so that the underlying lubricity and malice may bubble to the surface.”
—Paul Goodman (19111972)
“Unlike Boswell, whose Journals record a long and unrewarded search for a self, Johnson possessed a formidable one. His life in Londonhe arrived twenty-five years earlier than Boswellturned out to be a long defense of the values of Augustan humanism against the pressures of other possibilities. In contrast to Boswell, Johnson possesses an identity not because he has gone in search of one, but because of his allegiance to a set of assumptions that he regards as objectively true.”
—Jeffrey Hart (b. 1930)