Testing performance of your deep Q-learner in Atari games