
Encoder-decoder structure
Neural machine translation models are Recurrent Neural Networks (RNN), arranged in encoder-decoder fashion. The encoder network takes in variable length input sequences through RNN and encodes the sequences into a fixed size vector. The decoder begins with this encoded vector and starts generating translation word by word, until it predicts the end of sentence. The whole architecture is trained end-to-end with input sentence and correct output translation. The major advantage of these systems (apart from the capability to handle variable input size) is that they learn the context of a sentence and predict accordingly, rather than making a word-to-word translation. Neural machine translation can be best seen in action on Google translate in the following screenshot:
