
Highlighting data points with influence plots
Influence plots take into account residuals after a fit, influence, and leverage for individual data points similar to bubble plots. The size of the residuals is plotted on the vertical axis and can indicate that a data point is an outlier. To understand influence plots, take a look at the following equations:

The residuals according to the statsmodels
documentation are scaled by standard deviation (2.1). In (2.2), n is the number of observations and p is the number of regressors. We have a so-called hat-matrix, which is given by (2.3).
The diagonal elements of the hat matrix give the special metric called leverage. Leverage serves as the horizontal axis and indicates potential influence of influence plots. In influence plots, influence determines the size of plotted points. Influential points tend to have high residuals and leverage. To measure influence, statsmodels
can use either Cook's distance (2.4) or DFFITS (2.5).
How to do it...
- The imports are as follows:
import matplotlib.pyplot as plt import statsmodels.api as sm from statsmodels.formula.api import ols from dautil import data
- Get the available country codes:
dawb = data.Worldbank() countries = dawb.get_countries()[['name', 'iso2c']]
- Load the data from the Worldbank:
population = dawb.download(indicator=[dawb.get_name('pop_grow'), dawb.get_name('gdp_pcap'), dawb.get_name('primary_education')], country=countries['iso2c'], start=2014, end=2014) population = dawb.rename_columns(population)
- Define an ordinary least squares model, as follows:
population_model = ols("pop_grow ~ gdp_pcap + primary_education", data=population).fit()
- Display an influence plot of the model using Cook's distance:
%matplotlib inline fig, ax = plt.subplots(figsize=(19.2, 14.4)) fig = sm.graphics.influence_plot(population_model, ax=ax, criterion="cooks") plt.grid()
Refer to the following plot for the end result:

The code is in the highlighting_influence.ipynb
file in this book's code bundle.
See also
- The Wikipedia page about the Cook's distance at https://en.wikipedia.org/wiki/Cook%27s_distance (retrieved July 2015)
- The Wikipedia page about DFFITS at https://en.wikipedia.org/wiki/DFFITS (retrieved July 2015)