Introduction to Keras
Building ANNs involves creating layers of nodes. Each node can be thought of as a tensor of weights that are learned in the training process. Once the ANN is fitted to the data, a prediction is made by multiplying the input data by the weight matrices layer by layer, applying any other linear transformation when needed, such as activation functions, until the final output layer is reached. The size of each weight tensor is determined by the size of the shape of input nodes and the shape of the output nodes. For example, in a single-layer ANN, the size of our single hidden layer can be thought of as follows:
Figure 2.41: Solving the dimensions of the hidden layer of a single-layer ANN
If the input matrix of features has n rows, or observations, and m columns, or features, and we want our predicted target to have n rows (one for each observation) and 1 column (the predicted value), we can determine the size of our hidden layer by what is needed to make the matrix multiplication valid. Here is the representation of a single-layered ANN:
Figure 2.42: Representation of single-layer ANN
Here, we can determine that the weight matrix will be of size (mx1), since this will give us the desired output according to what the inner and outer dimensions should be in order for the matrix multiplication to be valid.
If we have more than one hidden layer in an ANN, then we have much more freedom with the size of these weight matrices. In fact, the possibilities are endless, depending on how many layers there are, and how many nodes we want in each layer. In practice, however, certain architecture designs work better than others, as we will be learning throughout the book.
In general, Keras abstracts much of the linear algebra out of building neural networks so that users can focus on designing the architecture. For most networks, only the input size, output size, and the number of nodes in each hidden layer are needed as minimum requirements to create networks in Keras.
The simplest model structure in Keras is the Sequential model, which can be imported from keras.models. The model of the Sequential class describes an ANN that consists of a linear stack of layers. A Sequential model can be instantiated as follows:
from keras.models import Sequential
model = Sequential()
Layers can be added to this model instance to create the structure of the model.
Layer Types
The notion of layers is part of the Keras core API. A layer can be thought of as a composition of nodes, and at each node a set of computations happen. In Keras, all the nodes of a layer can be initialized by simply initializing the layer itself. The depiction of the individual operation of a generalized layer node is shown in figure 2.42. At each node, the input data is multiplied by a set of weights, using matrix multiplication, as we learned earlier in the chapter. The sum of the product between the weights and the input is the generally applied, which may or may not include a bias. Further functions may be applied to the output of this matrix multiplication, such as activation functions:
Figure 2.43: A depiction of a layer node
Some common layer types in Keras are as follows:
- Dense: This is a fully-connected layer in which all nodes of the layer are directly connected to all inputs and all outputs.
- Convolutional: This layer type creates a convolutional kernel that is convolved with the input layer to produce a tensor of outputs. The convolution can occur in one or multiple dimensions.
- Pooling: This type of layer is used to reduce the dimensionality of an input layer. Common types of pooling include max-pooling, in which the maximum value of a given window is passed through to the output, or average-pooling, in which the average value of a window passed through.
- Recurrent: Recurrent layers learn patterns from sequences, so each output is dependent on the results from the previous step. Recurrent layers are appropriate when modeling sequential data such as natural language or time-series data.
There are other layer types in Keras; however, these are the most common types when learning how to build models using Keras.
We demonstrate how to add layers to a model by instantiating a model of the Sequential class and add a Dense layer to the model. Successive layers can be added to the model in the order in which we wish the computation to be performed, and can be imported from keras.layers. The number of units, or nodes, needs to be specified. This value will also determine the shape of the result from the layer. A Dense layer can be added to a Sequential model in the following way:
from keras.layers import Dense
from keras.models import Sequential
input_shape = 20
units = 1
model.add(Dense(units, input_dim=input_shape))
Note
After the first layer, the input dimension does not need to be specified, since it is determined from the previous layer.
Activation Functions
An activation function is generally applied to the output of a node to limit or bound its value. The value from each node is unbounded and may have any value from negative to positive infinity. These can be troublesome within neural networks in which values of weights and losses calculated can head towards infinity and produce unusable results. Activation functions can help in this regard by bounding the value, often these activation functions push the value to two limits. Activation functions are also useful for deciding whether the node should be "fired" or not. Common activation functions are as follows:
- Step function: The value is nonzero if it is above a certain threshold, otherwise it is zero.
- Linear function: , which is a scalar multiplication of the input value.
- Sigmoid function: , like a smoothed-out step function with smooth gradients. This activation function is useful for classification since the values are bound from zero to one.
- Tanh function: , which is a scaled version of the sigmoid with steeper gradients around x=0.
- ReLU function: , otherwise 0.
Now that we have some of the main components we can begin to see how we might create useful neural networks out of these components. In fact, we can create a logistic regression model with all the concepts we have learned in this chapter. A logistic regression model operates by taking the sum of the product of an input and a set of learned weights, followed by the output being passed through a logistic function. This can be achieved with a single layer neural network with a sigmoid activation function.
Activation functions can be added to models in the same manner that layers are added to models. The activation function will be applied to the output of the previous step in the model. A tanh activation function can be added to a Sequential model as follows:
from keras.layers import Dense, Activation
from keras.models import Sequential
input_shape = 20
units = 1
model.add(Dense(units, input_dim=input_shape))
model.add(Activation('tanh'))
Note
Activation functions can also be added to a model by including them as an argument when defining the layers.
Model Fitting
Once a model's architecture has been created, the model must be compiled. The compilation process configures all the learning parameters, including which optimizer to use, the loss function to minimize, as well as optional metrics, such as accuracy, to calculate at various stages of model training. Models are compiled using the compile method as follows:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
After the model has been compiled, it is ready to be fit to the training data. This is achieved with an instantiated model using the fit method. Useful arguments to the fit method are as follows:
- X: The array of training feature data to fit the data to.
- y: The array of training target data.
- epochs: The number of epochs to run the model for. An epoch is an iteration over the entire training dataset.
- batch_size: The number of training data samples to use per gradient update.
- validation_split: The proportion of the training data to be used for validation that is evaluated after each epoch.
The fit method can be used on a model in the following way:
history = model.fit(x=X_train, y=y_train['y'], epochs=10, batch_size=32, validation_split=0.2)
It is beneficial to save the output of calling the fit method of the model since it contains information on the model's performance throughout training, including the loss, which is evaluated after each epoch. If a validation split is defined, the loss is evaluated after each epoch on the validation split. Likewise, if any metrics are defined in training they are also calculated after each epoch. It is useful to plot such loss and evaluation metrics to determine model performance as a function of epoch. The model's loss as a function of the epoch can be visualized as follows:
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(history.history['loss'])
plt.show()
Keras models can be evaluated by utilizing the evaluate method of the model instance. This method returns the loss and any metrics that were passed to the model for training. The method can be called as follows when evaluating an out-of-sample test dataset:
test_loss = model.evaluate(X_test, y_test['y'])
These model-fitting steps represent the basic steps in order to build, train, and evaluate models using the Keras package. From here, there are an infinite number of ways to build and evaluate a model depending on the task you wish to accomplish.
Activity 2: Creating a Logistic Regression Model Using Keras
In this activity, we are going to create a basic model using the Keras library. We will perform the same classification task as we did in Chapter 1, Introduction to Machine Learning with Keras. We will use the same bank dataset and attempt to predict the same variable.
In the previous chapter, we used a logistic regression model to predict whether a client would subscribe to a given product given various attributes of each client, such as their age and occupation. In this activity, we will introduce the Keras library, though we'll continue to utilize the libraries we introduced previously, such as pandas for easy loading in of data, and sklearn for any data preprocessing and model evaluation metrics.
The steps to complete the activity are as follows:
- Load in the feature and target datasets from the previous chapter.
- Split the training and target data into training and test datasets. The model will be fit to the training dataset and the test dataset will be used to evaluate the model.
- Instantiate a model of the Sequential class from the keras.models library.
- Add a single layer of the Dense class from the keras.layers package to the model instance. The number of nodes should be equal to the number of features in the feature dataset.
- Add a sigmoid activation function to the model.
- Compile the model instance, specifying the optimizer to use, the loss metric to evaluate, and any other metrics to evaluate after each epoch.
- Fit the model to the training data, specifying the number of epochs to run for and validation split to use.
- Plot loss and other evaluation metrics with respect to the epoch, evaluated on the training and validation datasets.
- Evaluate the loss and other evaluation metrics on the test dataset.
Note
The solution for this activity can be found on page 294.
In this topic, we looked at some of the fundamental concepts of creating ANNs in Keras, including various layer types and activation functions. We have used these components to create a simple logistic regression model using a package that gives us similar results to the logistic regression model used in Chapter 1, Introduction to Machine Learning with Keras.