Polynomial Regression in 3 minutes
Content summary:
- How to understand polynomial regression?
- Why would we want to apply polynomial regression?
- How to use polynomial regression in Python?
- What should we beware of?
- Key takeaways
How to understand polynomial regression?
Polynomial regression is a type of linear regression. It can generate new features from the combinations of original input features, so it allows us to use both the original features and the extra polynomial features to train a linear model. By doing this feature transformation, we can capture the nonlinear relationship between input features and the output/target variable in a linear model. (If you want to know the statistical definition of polynomial regression, Wikipedia has a nice explanation of it. Click here to view if necessary.)
Why would we want to apply polynomial regression?
A more complex relationship and more accurate prediction can be described using polynomial regression than using simple linear regression. As we mentioned earlier, polynomial feature transformation can generate new features from the original features. This means that we can capture interactions between the original features. With additional interactions, we have richer input features to train the linear model, potentially achieving higher accuracy.
Here is a simple scenario:
Suppose we are trying to predict house price (the output variable) using house size and property tax ( two input features). We can use simple linear regression to predict the house price, and the relationship between them will be linear.
However, the house price may vary not as a linear function of house size and property tax (two features), but as a quadratic function of the combination of house size and property tax (five features), so the relationship between them will be non-linear.
In this case, using polynomial regression to predict the house price can be a better choice. By adding three extra polynomial features, which are generated from house size and property tax, we can capture the influence of the interaction of house size and property tax on the house price. With these additional features, we can have a more complex model that allows for more accurate house price prediction.
How to use polynomial regression in Python?
Here is a simple coding workflow:
What should we beware of?
Overfitting. As the expansion of polynomial features, a model becomes more and more complex, leading to overfitting.
Finding the proper degree of the polynomial and using a regularization method like ridge regression can ease the pain.
Key takeaways
- Polynomial regression is still a linear regression.
- Polynomial regression can generate new features.
- Polynomial regression can capture non-linear relationships.
- Beware of overfitting.