Regression Models
Regression models involve the following variables:
- The unknown parameters, denoted as β, which may represent a scalar or a vector.
- The independent variables, X.
- The dependent variable, Y.
In various fields of application, different terminologies are used in place of dependent and independent variables.
A regression model relates Y to a function of X and β.
The approximation is usually formalized as E(Y | X) = f(X, β). To carry out regression analysis, the form of the function f must be specified. Sometimes the form of this function is based on knowledge about the relationship between Y and X that does not rely on the data. If no such knowledge is available, a flexible or convenient form for f is chosen.
Assume now that the vector of unknown parameters β is of length k. In order to perform a regression analysis the user must provide information about the dependent variable Y:
- If N data points of the form (Y,X) are observed, where N < k, most classical approaches to regression analysis cannot be performed: since the system of equations defining the regression model is underdetermined, there is not enough data to recover β.
- If exactly N = k data points are observed, and the function f is linear, the equations Y = f(X, β) can be solved exactly rather than approximately. This reduces to solving a set of N equations with N unknowns (the elements of β), which has a unique solution as long as the X are linearly independent. If f is nonlinear, a solution may not exist, or many solutions may exist.
- The most common situation is where N > k data points are observed. In this case, there is enough information in the data to estimate a unique value for β that best fits the data in some sense, and the regression model when applied to the data can be viewed as an overdetermined system in β.
In the last case, the regression analysis provides the tools for:
- Finding a solution for unknown parameters β that will, for example, minimize the distance between the measured and predicted values of the dependent variable Y (also known as method of least squares).
- Under certain statistical assumptions, the regression analysis uses the surplus of information to provide statistical information about the unknown parameters β and predicted values of the dependent variable Y.
Read more about this topic: Regression Analysis
Famous quotes containing the word models:
“The parents who wish to lead a quiet life I would say: Tell your children that they are very naughtymuch naughtier than most children; point to the young people of some acquaintances as models of perfection, and impress your own children with a deep sense of their own inferiority. You carry so many more guns than they do that they cannot fight you. This is called moral influence and it will enable you to bounce them as much as you please.”
—Samuel Butler (18351902)