On-policy Monte Carlo control