Python Machine Learning Blueprints
上QQ阅读APP看书,第一时间看更新

Modeling

Once the data preparation is complete, the next phase is modeling. Here, you will be selecting an appropriate algorithm and using the data to train your model. There are a number of best practices to adhere to during this stage, and we will discuss them in detail, but the basic steps involve splitting your data into training, testing, and validation sets. This splitting up of the data may seem illogical—especially when more data typically yields better models—but as we'll see, doing this allows us to get better feedback on how the model will perform in the real world, and prevents us from the cardinal sin of modeling: overfitting. We will talk more about this in later chapters.