上QQ阅读APP看书,第一时间看更新
Dealing with messy data
As the dataset grows, so do inconsistencies and errors. Whether as a result of human error, system failure, or data structure evolutions, real-world data is rife with invalid, absurd, or missing values. Even when the dataset is spotless, the nature of some variables need to be adapted to the model. We look at the most common data anomalies and characteristics that need to be corrected in the context of Amazon ML linear models.