上QQ阅读APP看书，第一时间看更新

Output layer

The number of input neurons is equal to the output of the hidden layer 1. Then the number of outputs is equal to the number of predicted labels. We set a smaller value yet again, considering a very few inputs and features.

Here we used the Softmax activation function, which gives us a probability distribution over classes (the outputs sum to 1.0), and the losses function as cross-entropy for binary classification (XNET) since we want to convert the output (probability) to a discrete class, that is, zero or one:

OutputLayer output_layer = new OutputLayer.Builder(LossFunction.XENT) // XENT for Binary Classification
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.SOFTMAX)
                .nIn(16).nOut(numOutputs)
                .build();

XNET is used for binary classification with logistic regression. Check out more about this in LossFunctions.java class in DL4J.

Now we create a MultiLayerConfiguration by specifying NeuralNetConfiguration before conducting the training. With DL4J, we can add a layer by calling layer on the NeuralNetConfiguration.Builder(), specifying its place in the order of layers (the zero-indexed layer in the following code is the input layer):

MultiLayerConfiguration MLPconf = new NeuralNetConfiguration.Builder().seed(seed)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(0.0001))
                .list()
                    .layer(0, input_layer)
                    .layer(1, hidden_layer_1)
                    .layer(2, hidden_layer_2)
                    .layer(3, output_layer)
                .pretrain(false).backprop(true).build();// no pre-traning required

Apart from these, we also specify how to set the network's weights. For example, as discussed, we use Xavier as the weight initialization and Stochastic Gradient Descent (SGD) optimization algorithm with Adam as the updater. Finally, we also specify that we do not need to do any pre-training (which is typically needed in DBN or stacked autoencoders). Nevertheless, since MLP is a feedforward network, we set backpropagation as true.