Transformers: Making Sense of the Madness
Transformer is a seq2seq model proposed by 8 Google employees in 2017 and initially applied to neural machine translation. But before transformer, let’s discuss what were used for the seq2seq task and their cons. Recurrent Neural Network(RNN) Given a sequences X_n it produces sequence of Y_n. In t time step, it takes the given input and previous hidden state as input to produce ouput and hidden state which act as input for t+1....