Even better: compares how much a variable
adds the to problem
The smallest possible value for VIF is 1,
which indicates the complete absence
of collinearity.
indicates a problematic amount of collinearity.
What to do:
If we would like to create a model, that predicts the flipper length not only from bill length but also bill depth, we need to define a model with multiple predictors.
4.1 Multiple Linear Regression
25 minutes
You will be able to
femur length of person
id | body height | femur length | sex | ethnicity |
---|---|---|---|---|
1 | 185 | 49 | male | caucasian |
2 | 176 | 43 | female | asian |
3 | 179 | 33 | female | african american |
... | ... | ... | ... |
id | body height | femur length | is_female | is_asian | is_caucasian |
---|---|---|---|---|---|
1 | 185 | 49 | 0 | 0 | 1 |
2 | 176 | 43 | 1 | 1 | 0 |
3 | 179 | 33 | 1 | 0 | 0 |
... | ... | ... | ... | ... |
id | body height | femur length | is_female | is_male | is_asian | is_caucasian | is_african_american |
---|---|---|---|---|---|---|---|
1 | 185 | 49 | 0 | 1 | 0 | 1 | 0 |
2 | 176 | 43 | 1 | 0 | 1 | 0 | 0 |
3 | 179 | 33 | 1 | 0 | 0 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... |
In reality, is it better to spent all the marketing budget on TV or to split it?
prediction of bank account balance from income
the linear model is no good predictor for the relationship
Imagine researching penguin in the Antarctic caused Your scale to freeze an break. However, You still want to get an estimate of the penguins weight.
Luckily, Your colleagues left You with a data set (train
) with all the other variables you can measure.
Create a multiple linear regression model to predict the penguins weight. Compare the Mean Squared Error of the model in the test
set with your colleagues.
45 minutes