上QQ阅读APP看书，第一时间看更新

Overall architecture

At this point, the overall architecture contains two main components:

Chapter 1: Become an Adaptive Thinker: A reinforcement learning program based on the value-action Q function using a reward matrix that is yet to be calculated. The reward matrix was given in the first chapter, but in real life, you'll often have to build it from scratch. This could take weeks to obtain.
Chapter 2: A set of six neurons that represent the flow of products at a given time at six locations. The output is the availability probability from 0 to 1. The highest value is the highest availability. The lowest value is the lowest availability.

At this point, there is some real-life information we can draw from these two main functions:

An AGV is moving in a warehouse and is waiting to receive its next location to use an MDP, in order to calculate an optimal trajectory of its mission, as shown in the first chapter.
An AGV is using a reward matrix that was given in the first chapter, but it needs to be designed in a real-life project through meetings, reports, and acceptance of the process.
A system of six neurons, one per location, weighing the real quantities and probable quantities to give an availability vector lv has been calculated. It is almost ready to provide the necessary reward matrix for the AGV.

To calculate the input values of the reward matrix in this reinforcement learning warehouse model, a bridge function between lv and the reward matrix R is missing.

That bridge function is a logistic classifier based on the outputs of the y neurons.

At this point, the system:

Took corporate data
Used y neurons calculated with weights
Applied an activation function

The activation function in this model requires a logistic classifier, a commonly used one.