Regression
You probably already learned about regression in your high school mathematics class. The specific method you learned was probably what is called ordinary least squares (OLS) regression. This 200-year-old technique is computationally fast and can be used for many real-world problems. This chapter will start by reviewing it and showing you how it is available in scikit-learn.
For some problems, however, this method is insufficient. This is particularly true when we have many features, and it completely fails when we have more features than data points. In those cases, we need more advanced methods. These methods are very modern, with major developments happening in the last 20 years. They go by names such as Lasso, Ridge, or ElasticNets. We will go into these in detail. They are also available in scikit-learn. In this chapter, we will learn the following:
- How to use different forms of linear regression with scikit-learn
- The importance of proper cross-validation, particularly when we have many features
- When and how to use two layers of cross-validation to set hyperparameters