This is an annotated implementation of the paper Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch.
Here is the annotated implementation. Here are the training code and a notebook for training a fast weights transformer on the Tiny Shakespeare dataset.