This is a PyTorch implementation of the Group Normalization paper.
Batch Normalization works well for large enough batch sizes but not well for small batch sizes, because it normalizes over the batch. Training large models with large batch sizes is not possible due to the memory capacity of the devices.
This paper introduces Group Normalization, which normalizes a set of features together as a group. This is based on the observation that classical features such as SIFT and HOG are group-wise features. The paper proposes dividing feature channels into groups and then separately normalizing all channels within each group.
Here's a CIFAR 10 classification model that uses group normalization.