Group Normalization

This is a PyTorch implementation of the Group Normalization paper.

Batch Normalization works well for large enough batch sizes but not well for small batch sizes, because it normalizes over the batch. Training large models with large batch sizes is not possible due to the memory capacity of the devices.

This paper introduces Group Normalization, which normalizes a set of features together as a group. This is based on the observation that classical features such as SIFT and HOG are group-wise features. The paper proposes dividing feature channels into groups and then separately normalizing all channels within each group.

Formulation

All normalization layers can be defined by the following computation.

where is the tensor representing the batch, and is the index of a single value. For instance, when it's 2D images is a 4-d vector for indexing image within batch, feature channel, vertical coordinate and horizontal coordinate. and are mean and standard deviation.

is the set of indexes across which the mean and standard deviation are calculated for index . is the size of the set which is the same for all .

The definition of is different for Batch normalization, Layer normalization, and Instance normalization.

Batch Normalization

The values that share the same feature channel are normalized together.

Layer Normalization

The values from the same sample in the batch are normalized together.

Instance Normalization

The values from the same sample and same feature channel are normalized together.

Group Normalization

where is the number of groups and is the number of channels.

Group normalization normalizes values of the same sample and the same group of channels together.

Here's a CIFAR 10 classification model that uses instance normalization.

Open In Colab

84import torch
85from torch import nn

Group Normalization Layer

89class GroupNorm(nn.Module):
  • groups is the number of groups the features are divided into
  • channels is the number of features in the input
  • eps is , used in for numerical stability
  • affine is whether to scale and shift the normalized value
94    def __init__(self, groups: int, channels: int, *,
95                 eps: float = 1e-5, affine: bool = True):
102        super().__init__()
103
104        assert channels % groups == 0, "Number of channels should be evenly divisible by the number of groups"
105        self.groups = groups
106        self.channels = channels
107
108        self.eps = eps
109        self.affine = affine

Create parameters for and for scale and shift

111        if self.affine:
112            self.scale = nn.Parameter(torch.ones(channels))
113            self.shift = nn.Parameter(torch.zeros(channels))

x is a tensor of shape [batch_size, channels, *] . * denotes any number of (possibly 0) dimensions. For example, in an image (2D) convolution this will be [batch_size, channels, height, width]

115    def forward(self, x: torch.Tensor):

Keep the original shape

123        x_shape = x.shape

Get the batch size

125        batch_size = x_shape[0]

Sanity check to make sure the number of features is the same

127        assert self.channels == x.shape[1]

Reshape into [batch_size, groups, n]

130        x = x.view(batch_size, self.groups, -1)

Calculate the mean across last dimension; i.e. the means for each sample and channel group

134        mean = x.mean(dim=[-1], keepdim=True)

Calculate the squared mean across last dimension; i.e. the means for each sample and channel group

137        mean_x2 = (x ** 2).mean(dim=[-1], keepdim=True)

Variance for each sample and feature group

140        var = mean_x2 - mean ** 2

Normalize

145        x_norm = (x - mean) / torch.sqrt(var + self.eps)

Scale and shift channel-wise

149        if self.affine:
150            x_norm = x_norm.view(batch_size, self.channels, -1)
151            x_norm = self.scale.view(1, -1, 1) * x_norm + self.shift.view(1, -1, 1)

Reshape to original and return

154        return x_norm.view(x_shape)

Simple test

157def _test():
161    from labml.logger import inspect
162
163    x = torch.zeros([2, 6, 2, 4])
164    inspect(x.shape)
165    bn = GroupNorm(2, 6)
166
167    x = bn(x)
168    inspect(x.shape)

172if __name__ == '__main__':
173    _test()