Building an ANN model for prediction using Keras and TensorFlow
Now that we have our libraries installed, let's create a folder called aibook and within that create another folder called chapter2. Move all the code for this chapter into the chapter2 folder. Make sure that the conda environment is still active (the prompt will start with the environment name):
Once within the chapter2 folder, type jupyter notebook. This will open an interactive Python editor on the browser.
Use the New dropdown in the top-right corner to create a new Python 3 notebook:
We are now ready to build our first ANN using Keras and TensorFlow, to predict real estate prices:
- Import all the libraries that we need for this exercise. Use the first cell to import all the libraries and run it. Here are the four main libraries we will use:
-
- pandas: We use this to read the data and store it in a dataframe
- sklearn: We use this to standardize data and for k-fold cross-validation
- keras: We use this to build our sequential neural network
- numpy: We use numpy for all math and array operations
Let's import these libraries:
import numpy
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras import optimizers
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
- Load the real estate housing data using pandas:
dataframe = pd.read_csv("housing.csv", sep=',', header=0)
dataset = dataframe.values
- To view the feature variables, the target variables, and a few rows of the data, enter the following:
dataframe.head()
This output will be a few rows of dataframe, which is shown in the following screenshot:
The dataset has eight columns, details of each column are given as follows:
- BIZPROP: Proportion of non-retail business acres per town
- ROOMS: Average number of rooms per dwelling
- AGE: Proportion of owner-occupied units built before 1940
- HIGHWAYS: Index of accessibility to radial highways
- TAX: Full-value property tax rate per $10,000
- PTRATIO: Pupil-to-teacher ratio by town
- LSTAT: Percentage of lower status of the population
- VALUE: Median value of owner-occupied homes in thousand dollars (target variable)
In our use case, we need to predict the VALUE column, so we need to split the dataframe into features and target values. We will use a 70/30 split, that is, 70% of data for training and 30% data for testing:
features = dataset[:,0:7]
target = dataset[:,7]
Also, to make sure we can reproduce the results, let's set a seed for random generation. This random function is used during cross-validation to randomly sample the data:
# fix random seed for reproducibility
seed = 9
numpy.random.seed(seed)
Now we are ready to build our ANN:
- Create a sequential neural network that has a simple and shallow architecture.
- Make a function called simple_shallow_seq_net() that will define the architecture of the neural network:
def simple_shallow_seq_net():
# create a sequential ANN
model = Sequential()
model.add(Dense(7, input_dim=7, kernel_initializer='normal', activation='sigmoid'))
model.add(Dense(1, kernel_initializer='normal'))
sgd = optimizers.SGD(lr=0.01)
model.compile(loss='mean_squared_error', optimizer=sgd)
return model
- The function does the following:
model = Sequential()
- A sequential model is instantiated – a sequential model is an ANN model built using a linear stack of layers:
model.add(Dense(7, input_dim=7, kernel_initializer='normal', activation='sigmoid'))
- Here, we are adding a dense layer or fully-connected layer with seven neurons that are added to this sequential network. This layer accepts an input with 7 features (since there are seven input or features for predicting house price), which is indicated by the input_dim parameter. The weights of all the neurons in this layer are initialized using a random normal distribution, as indicated by the kernel_initializer parameter. Similarly, all the neurons of this layer use the sigmoid activation function, as indicated by the activation parameter:
model.add(Dense(1, kernel_initializer='normal'))
- Add another layer with a single neuron initialized using a random normal distribution:
sgd = optimizers.SGD(lr=0.01)
- Set the network to use Scalar Gradient Descent (SGD) to learn, usually specified as optimizers. We also indicate that the network will use a learning rate (lr) of 0.01 at every step of learning:
model.compile(loss='mean_squared_error', optimizer=sgd)
- Indicate that the network needs to use the mean squared error (MSE) cost function to measure the magnitude of the error rate of the model, and use the SGD optimizer to learn from the wrongness measured or loss of the model:
return model
Finally, the function returns a model with the defined specifications.
The next step is to set a random seed for reproducibility; this random function is used to split the data into training and validation. The method used is k-fold validation, where the data is randomly divided into 10 subsets for training and validation:
seed = 9
kfold = KFold(n_splits=10, random_state=seed)
Now, we need to fit this model to predict a numerical value (house price, in this case), therefore we use KerasRegressor. KerasRegressor is a Keras wrapper used to access the regression estimators for the model from sklearn:
estimator = KerasRegressor(build_fn=simple_shallow_seq_netl, epochs=100, batch_size=50, verbose=0)
Note the following:
- We pass simple_shallow_seq_net as a parameter to indicate the function that returns the model.
- The epochs parameter indicates that every sample needs to go through the network at least 100 times.
- The batch_size parameter indicates that during every learning cycle of the network there are 50 training samples used at a time.
The next step is to train and cross-validate across the subsets of the data and print the MSE, which is the measure of how well the model performs:
results = cross_val_score(estimator, features, target, cv=kfold)
print("simple_shallow_seq_model:(%.2f) MSE" % (results.std()))
This will output the MSE – as you can see, it is pretty high and we need to make this value as low as possible:
simple_shallow_seq_net:(163.41) MSE
Save this model for later use:
estimator.fit(features, target)
estimator.model.save('simple_shallow_seq_net.h5')
Great, we have built and saved our first neural net to predict real estate price. Our next efforts are to improve the neural net. The first thing to try before fiddling with the network parameters is to improve its performance (lower the MSE) when we standardize the data and use it:
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('estimator', KerasRegressor(build_fn=simple_shallow_seq_net, epochs=100, batch_size=50, verbose=0)))
pipeline = Pipeline(estimators)
In the preceding code, we created a pipeline to standardize the data and then use it during every learning cycle of the network. In the following code block, we train and cross-evaluate the neural network:
results = cross_val_score(pipeline, features, target, cv=kfold)
print("simple_std_shallow_seq_net:(%.2f) MSE" % (results.std()))
This will output a much better MSE than before, hence standardizing and using the data makes a difference:
simple_std_shallow_seq_net:(65.55) MSE
Saving this model is slightly different than before as we have used pipeline to fit the model:
pipeline.fit(features, target)
pipeline.named_steps['estimator'].model.save('standardised_shallow_seq_net.h5')
Let's now fiddle with our network to see whether we can get better results. We can start by creating a deeper network. We will increase the number of hidden or fully-connected layers and use both the sigmoid and tanh activation functions in alternate layers:
def deep_seq_net():
# create a deep sequential model
model = Sequential()
model.add(Dense(7, input_dim=7, kernel_initializer='normal', activation='sigmoid'))
model.add(Dense(7,activation='tanh'))
model.add(Dense(7,activation='sigmoid'))
model.add(Dense(7,activation='tanh'))
model.add(Dense(1, kernel_initializer='normal'))
sgd = optimizers.SGD(lr=0.01)
model.compile(loss='mean_squared_error', optimizer=sgd)
return model
The next block of code is used to standardize the variables in the training data and then fit the shallow neural net model to the training data. Create the pipeline and fit the model using standardized data:
estimators = []
estimators.append(('standardize', StandardScaler())) estimators.append(('estimator', KerasRegressor(build_fn=deep_seq_net, epochs=100, batch_size=50, verbose=0)))
pipeline = Pipeline(estimators)
Now, we need to cross-validate the fit model across the subsets of the data and print the MSE:
results = cross_val_score(pipeline, features, target, cv=kfold)
print("simple_std_shallow_seq_net:(%.2f) MSE" % (results.std()))
This will output an MSE that is better than the previous shallow networks that we created:
deep_seq_net:(58.79) MSE
Save the model for later use:
pipeline.fit(features, target)
pipeline.named_steps['estimator'].model.save('deep_seq_net.h5')
So, we get better results when we increase the depth (layers) of the network. Now, let's see what happens when we widen the network, that is, increase the number of neurons (nodes) in each layer. Let's define a deep and wide network to tackle the problem, we increase the neurons in each layer to 21. Also, this time around, we will use the relu and sigmoid activation functions for the hidden layers:
def deep_and_wide_net():
# create a sequential model
model = Sequential()
model.add(Dense(21, input_dim=7, kernel_initializer='normal', activation='relu'))
model.add(Dense(21,activation='relu'))
model.add(Dense(21,activation='relu'))
model.add(Dense(21,activation='sigmoid'))
model.add(Dense(1, kernel_initializer='normal'))
sgd = optimizers.SGD(lr=0.01)
model.compile(loss='mean_squared_error', optimizer=sgd)
return model
The next block of code is used to standardize the variables in the training data and then fit the deep and wide neural net model to the training data:
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('estimator', KerasRegressor(build_fn=deep_and_wide_net, epochs=100, batch_size=50, verbose=0)))
pipeline = Pipeline(estimators)
Now, we need to cross-validate the fit model across the subsets of the data and print the MSE:
results = cross_val_score(pipeline, features, target, cv=kfold)
print("deep_and_wide_model:(%.2f) MSE" % (results.std()))
This time, the MSE is again better than the previous networks we created. This is a good example of how a deeper network with more neurons abstracts the problem better:
deep_and_wide_net:(34.43) MSE
Finally, save the network for later use. The saved network model will be used in the next section and served within a REST API:
pipeline.fit(features, target)
pipeline.named_steps['estimator'].model.save('deep_and_wide_net.h5')
So far, we have been able to build a sequential neural network for prediction using various network architectures. As an exercise, try the following:
- Experiment with the shape of the network; play around with the depth and width of the network to see how it impacts the output
- Try out the various activation functions (https://keras.io/activations/)
- Try out the various initializers, here we have only used the random normal initializer (https://keras.io/initializers/)
- The data we used here is for demonstrating the technique, so try out different use cases for prediction using the preceding technique on other datasets (https://data.world/datasets/prediction)
We will learn more about optimizers and regularizers, which are other parameters you can use to tune the network, in Chapter 4, Building a Machine Vision Mobile App to Classify Flower Species. The complete code for our ANN model creation is available as a Python notebook named sequence_networks_for_prediction.ipynb.