Learn Unity ML-Agents：Fundamentals of Unity Machine Learning

上QQ阅读APP看书，第一时间看更新

Setting up the Agent

Agents represents the actors that we are training to learn to perform some task or set of task-based commands on some reward. We will cover more about actors, actions, state, and rewards when we talk more about Reinforcement Learning in Chapter 2, The Bandit and Reinforcement Learning. For now, all we need to do is set the Brain the agent will be using. Open up the editor and follow these steps:

Locate the Agent object in the Hierarchy window and select it.

Click the Target icon beside the Brain property on the Simple Agent component and select the Brain object in the scene, as shown in the following screenshot:

Setting the Agent Brain

Click the Target icon on the Simple Agent component and from the context menu select Edit Script. The agent script is what we use to observe the environment and collect observations. In our current example, we always assume that there is no previous observation.
Enter the highlighted code in the CollectObservations method as follows:

      public override void CollectObservations()
      {
        AddVectorObs(0);
      }

CollectObservations is the method called to set what the Agent observes about the environment. This method will be called on every agent step or action. We use AddVectorObs to add a single float value of 0 to the agent's observation collection. At this point, we are not currently using any observations and will assume our bandit provides no visual clues as to what arm to pull.
The agent will also need to evaluate the rewards and when they are collected. We will need to add four slots, one for each arm to our agent, in order to represent the reward when that arm is pulled.
Enter the following code in the SimpleAgent class:

      public Bandit bandit;
      public override void AgentAction(float[] vectorAction, 
      string textAction)
      {
        var action = (int)vectorAction[0];
        AddReward(bandit.PullArm(action));
        Done();
      }

      public override void AgentReset()
      {
        bandit.Reset();
      }

The code in our AgentStep method just takes the current action and applies that to the Bandit with the PullArm method, passing in the arm to pull. The reward returned from the bandit is added using AddReward. After that, we implement some code in the AgentReset method. This code just resets the Bandit back to its starting state. AgentReset is called when the agent is done, complete, or runs out of steps. Notice how we call the method Done after each step; this is because our bandit is only a single state or action.
Add the following code just below the last section:

      public Academy academy;
      public float timeBetweenDecisionsAtInference;
      private float timeSinceDecision;

      public void FixedUpdate()
      {
        WaitTimeInference();
      }

      private void WaitTimeInference()
      {
        if (!academy.GetIsInference())
        {
          RequestDecision();
        }
        else
        {
          if (timeSinceDecision >= timeBetweenDecisionsAtInference)
          {
            timeSinceDecision = 0f;
            RequestDecision();
          }
          else
          {
            timeSinceDecision += Time.fixedDeltaTime;
          }
        }
      }

We need to add the preceding code in order for our brain to wait long enough for it to accept Player decisions. Our first example that we will build will use player input. Don't worry too much about this code, as we only need it to allow for player input. When we develop our Agent Brains, we won't need to put a delay in.
Save the script when you are done editing.
Return to the editor and set the properties on the Simple Agent, as shown in the following screenshot:

Setting the Simple Agent properties

We are almost done. The agent is now able to interpret our actions and execute them on the Bandit. Actions are sent to the agent from the Brain. The Brain is responsible for making decisions and we will cover its setup in the next section.