Building Machine Learning Systems with Python
上QQ阅读APP看书,第一时间看更新

Multidimensional regression

So far, we have only used a single variable for prediction: the number of rooms per dwelling. This is, obviously, not the best we can do. We will now use all the data we have to fit a model, using multidimensional regression. We now try to predict a single output (the average house price) based on multiple inputs.

The code looks very much like before. In fact, it's even simpler as we can now pass the value of boston.data directly to the fit method:

x = boston.data 
y = boston.target
lr.fit(x, y)

Using all the input variables, the root mean squared error is only 4.7, which corresponds to a coefficient of determination of 0.74 (the code to compute these is the same as the previous example). This is better than what we had before, which indicates that the extra variables did help. But we can no longer easily display the regression line as we did before, because we have a 14-dimensional regression hyperplane instead of a single line!

One good solution in this situation is to plot the prediction versus the actual value. The code is as follows:

p = lr.predict(x) 
fig,ax = plt.subplots()
ax.scatter(p, y)
ax.xlabel('Predicted price')
ax.ylabel('Actual price')
ax.plot([y.min(), y.max()], [[y.min()], [y.max()]], ':')

The last line plots a diagonal line that corresponds to perfect agreement; a model that made no errors would mean that all points would lie on this diagonal. This aids with visualization: