Final course on NLP Spec in Coursera study notes

Expected: Serial[
  Serial[
    Serial[
      ShiftRight(1)
    ]
    Embedding_33000_512
    Dropout
    Serial[
      PositionalEncoding
    ]
    Dup_out2
    ReversibleSerial_in2_out2[
      ReversibleHalfResidualDecoderAttn_in2_out2[
        Serial[
          LayerNorm
        ]
        SelfAttention
      ]
      ReversibleSwap_in2_out2
      ReversibleHalfResidualDecoderFF_in2_out2[
        Serial[
          LayerNorm
          Dense_2048
          Dropout
          Serial[
            FastGelu
          ]
          Dense_512
          Dropout
        ]
      ]
      ReversibleSwap_in2_out2
      ReversibleHalfResidualDecoderAttn_in2_out2[
        Serial[
          LayerNorm
        ]
       SelfAttention
      ]
      ReversibleSwap_in2_out2
      ReversibleHalfResidualDecoderFF_in2_out2[
        Serial[
         LayerNorm
          Dense_2048
          Dropout
          Serial[
            FastGelu
          ]
          Dense_512
          Dropout
        ]
      ]
      ReversibleSwap_in2_out2
    ]
    Concatenate_in2
    LayerNorm
    Dropout
    Serial[
      Dense_33000
    ]
  ]
  LogSoftmax
].

Instructions: Implement the training_loop below to train the neural network above. Here is a list of things you should do:

You will be using your CrossEntropyLoss loss function with Adam optimizer. Please read the trax documentation to get a full understanding.