Python Machine Learning Blueprints
上QQ阅读APP看书,第一时间看更新

The seaborn library

The next visualization library we'll look at is called seaborn, (http://seaborn.pydata.org/index.html). It is a library that was created specifically for statistical visualizations. In fact, it is perfect for use with pandas DataFrames, where the columns are features and the rows are observations. This style of DataFrame is called tidy data, and is the most common form for machine learning applications.

Let's now take a look at the power of seaborn:

import seaborn as sns 
sns.pairplot(df, hue='species') 

With just those two lines of code, we get the following:

Seaborn plot

Having just detailed the intricate nuances of matplotlib, you will immediately appreciate the simplicity with which we generated this plot. All of our features have been plotted against each other and properly labeled with just two lines of code. You might wonder if I just wasted dozens of pages teaching you matplotlib when seaborn makes these types of visualizations so simple. Well, that isn't the case, as seaborn is built on top of matplotlib. In fact, you can use all of what you learned about matplotlib to modify and work with seaborn. Let's take a look at another visualization:

fig, ax = plt.subplots(2, 2, figsize=(7, 7)) 
sns.set(style='white', palette='muted')
sns.violinplot(x=df['species'], y=df['sepal length (cm)'], ax=ax[0,0]) sns.violinplot(x=df['species'], y=df['sepal width (cm)'], ax=ax[0,1]) sns.violinplot(x=df['species'], y=df['petal length (cm)'], ax=ax[1,0]) sns.violinplot(x=df['species'], y=df['petal width (cm)'], ax=ax[1,1]) fig.suptitle('Violin Plots', fontsize=16, y=1.03)
for i in ax.flat:
plt.setp(i.get_xticklabels(), rotation=-90)
fig.tight_layout()

The preceding code generates the following output:

Violin Plots

Here, we have generated a violin plot for each of the four features. A violin plot displays the distribution of the features. For example, you can easily see that the petal length of setosa (0) is highly clustered between 1 cm and 2 cm, while virginica (2) is much more dispersed, from nearly 4 cm to over 7 cm. You will also notice that we have used much of the same code we used when constructing the matplotlib graphs. The main difference is the addition of the sns.plot() calls, in place of the ax.plot() calls previously. We have also added a title above all of the subplots, rather than over each individually, with the fig.suptitle() function. One other notable addition is the iteration over each of the subplots to change the rotation of the xticklabels. We call ax.flat() and then iterate over each subplot axis to set a particular property using .setp(). This prevents us from having to individually type out ax[0][0]...ax[1][1] and set the properties, as we did previously in the earlier matplotlib subplot code.

There are hundreds of styles of graphs you can generate using matplotlib and seaborn, and I highly recommend digging into the documentation for these two libraries—it will be time well spent—but the graphs I have detailed in the preceding section should go a long way toward helping you to understand the dataset you have, which in turn will help you when building your machine learning models.