Fast weights transformer

This is an annotated implementation of the paper Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch.

Here is the annotated implementation. Here are the training code and a notebook for training a fast weights transformer on the Tiny Shakespeare dataset.

Open In Colab