Starting with a simple straight line
Let's assume for a second that the underlying model is a straight line. Then the challenge is how to best put that line into the chart so that it results in the smallest approximation error. SciPy's polyfit() function does exactly that. Given data x and y and the desired order of the polynomial (a straight line has an order of 1), it finds the model function that minimizes the error function defined earlier:
fp1 = np.polyfit(x, y, 1)
The polyfit() function returns the parameters of the fitted Model function, fp1:
>>> print("Model parameters: %s" % fp1)
Model parameters: [ 2.59619213 989.02487106]
This means the best straight line fit is the following function:
f(x) = 2.59619213 * x + 989.02487106
We then use poly1d() to create a model function from the model parameters:
>>> f1 = np.poly1d(fp1)
>>> print(error(f1, x, y))
317389767.34
We can now use f1() to plot our first trained model. We have already implemented plot_web_traffic in a way that lets us easily add additional models to plot. In addition, we pass a list of models, of which we currently have only one:
plot_web_traffic(x, y, [f1])
This will produce the following plot:
It seems like the first four weeks are not that far off, although we can clearly see that there is something wrong with our initial assumption that the underlying model is a straight line. Also, how good or how bad actually is the error of 319,531,507.008 ?
The absolute value of the error is seldom of use in isolation. However, when comparing two competing models, we can use their errors to judge which one of them is better. Although our first model clearly is not the one we would use, it serves a very important purpose in the workflow. We will use it as our baseline until we find a better one. Whatever model we come up with in the future, we will compare it against the current baseline.