Regression with TensorFlow
We will dive into TensorFlow in a future chapter, but regularized linear regression can be implemented with it, so it's good idea to get a feel for how TensorFlow works.
Let's try to use the Boston dataset for this experiment.
import tensorflow as tf
TensorFlow requires you to create symbols for all elements it works on. These can be variables or placeholders. The former are symbols that TensorFlow will change, whereas placeholders are externally imposed by TensorFlow.
For regression, we need two placeholders, one for the input features and one for the output we want to match. We will also require two variables, one for the slope and one for the intercept. Contrary to linear regression, we have to write far more code for the same functionality:
X = tf.placeholder(shape=[None, 1], dtype=tf.float32, name="X")
Y = tf.placeholder(shape=[None, 1], dtype=tf.float32, name="y")
A = tf.Variable(tf.random_normal(shape=[1, 1]), name="A")
b = tf.Variable(tf.random_normal(shape=[1, 1]), name="b")
The two placeholders have a shape of [None, 1]. This means that they have a dynamic size along one axis and a size of 1 on the fastest dimension (in terms of memory layout). The two variables are fully static and have a dimension of [1, 1], meaning a single element. They will both be initialized by TensorFlow following a random variable (a Gaussian with a mean of 0 and a variance of 1).
The type of symbols can be set by using dtype, or for variable it can be inferred from the type of the initial_value. In this example, it will always be a floating point value.
All the symbols are now created, and we can now create the loss function. We first create the prediction, and then we will compare it to the ground truth value:
model_output = tf.matmul(X, A) + b
loss = tf.reduce_mean(tf.square(Y - model_output))
The multiplication for the prediction seems to be transposed, and this is due to the way X was defined: it is indeed transposed! This allows model_output to have a dynamic first dimension.
We can now minimize this cost function with a gradient descent. First we create the TensorFlow objects:
grad_step = 5e-7
my_opt = tf.train.GradientDescentOptimizer(grad_step)
train_step = my_opt.minimize(loss)
We also need some variables:
batch_size = 50
n_epochs = 20000
steps = 100
The batch size indicates how many elements at a time we are going to compute the loss for. This is also the dimension of the input data for the placeholders as well as the dimension of the output we predict during the optimization.
Epochs are the number of times we go through all the training data to optimize our model. Finally, steps are just how often we display the information of the loss function we optimize.
Now we can go to the last step and let TensorFlow loose on the function and data we have:
loss_vec = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(n_epochs):
permut = np.random.permutation(len(x))
for j in range(0, len(x), batch_size):
batch = permut[j:j+batch_size]
Xs = x[batch]
Ys = y[batch]
sess.run(train_step, feed_dict={X: Xs, Y: Ys})
temp_loss = sess.run(loss, feed_dict={X: x, Y: y})
loss_vec.append(temp_loss)
if epoch % steps == 0:
(A_, b_) = sess.run([A, b])
print('Epoch #%i A = %s b = %s' % (epoch, np.transpose(A_), b_))
print('Loss = %.8f' % temp_loss)
print("")
prediction = sess.run(model_output, feed_dict={X: trX, Y: trY})
mse = mean_squared_error(y, prediction)
print("Mean squared error (on training data): {:.3}".format(mse))
rmse = np.sqrt(mse)
print('RMSE (on training data): %f' % rmse)
r2 = r2_score(y, prediction)
print("R2 (on training data): %.2f" % r2)
We first create a TensorFlow session. This will enable us to use the symbols with calls to sess.run. The first argument is a function to call or a list of functions to call (and their results will be the return of this call), and we have to pass a dictionary, feed_dict. This dictionary maps placeholders to actual data, so dimensions must match.
The first call in the session initializes all the variables according to what we specified when they were declared. Then we have two loops, one on epochs and one on batch sizes.
For each epoch, we define a permutation of the training data. This randomizes the order of the data. This is important, especially for a neural network, so that they don't have bias and so they learn all the data consistently. If the batch size is equal to the size of the training data, then we don't need to randomize data, and this is usually the case when we have only a handful of data samples. For large datasets, we have to use batches. Each batch will be fed inside the train_step function and the variables will be optimized.
After each epoch, we save the loss over all the training data for display purposes. We also display the state of the variables every few epochs to monitor and check the state of the optimization.
Finally, we display the mean square error of the predicted outputs with our model as well as the r2 score.
Of course, the solution for this loss function is analytically known, so let's modify it:
beta = 0.005
regularizer = tf.nn.l2_loss(A)
loss = loss + beta * regularizer
Then let's run the full optimization to get a Lasso result. We can see that TensorFlow doesn't really shine there. It is very slow and requires an awful number of iterations to get the result that is far from what scikit-learn can retrieve.
Let's see a fraction of the run when using just feature 5 for this dataset:
Epoch #9400 A = [[ 8.60801601]] b = [[-31.74242401]]
Loss = 43.75216293
Epoch #9500 A = [[ 8.57831573]] b = [[-31.81438446]]
Loss = 43.92549133
Epoch #9600 A = [[ 8.67326164]] b = [[-31.88376808]]
Loss = 43.69957733
Epoch #9700 A = [[ 8.75835037]] b = [[-31.94364548]]
Loss = 43.97978973
Epoch #9800 A = [[ 8.70185089]] b = [[-32.03764343]]
Loss = 43.69329453
Epoch #9900 A = [[ 8.66107273]] b = [[-32.10965347]]
Loss = 43.74081802
Mean squared error (on training data): 1.17e+02
RMSE (on training data): 10.8221888258
R2 (on training data): -0.39
Here is how the loss function behaves:
Here is the result when using only the fifth feature: