Q-learning in the checkerboard environment