This is a PyTorch implementation of the paper Pay Attention to MLPs.
This paper introduces a Multilayer Perceptron (MLP) based architecture with gating, which they name gMLP. It consists of a stack of gMLP blocks.
Here is the training code for a gMLP model based autoregressive model.