Python Machine Learning Blueprints
上QQ阅读APP看书,第一时间看更新

Visualization

So far, we have seen how to select portions of a DataFrame and how to get summary statistics from our data, but let's now move on to learning how to visually inspect the data. But first, why even bother with visual inspection? Let's see an example to understand why.

Here is the summary statistics for four distinct series of x and y values:

Based on the series having identical summary statistics, you might assume that these series would appear visually similar. You would, of course, be wrong. Very wrong. The four series are part of Anscombe's quartet, and they were deliberately created to illustrate the importance of visual data inspection. Each series is plotted as follows:

Clearly, we would not treat these datasets as identical after having visualized them. So, now that we understand the importance of visualization, let's take a look at a pair of useful Python libraries for this.