Understanding gradient descent in word2vec