GoogLeNet_BN

发表于 2020-04-08 更新于 2021-07-09 分类于目标分类/object classification 阅读次数：

本文字数： 12k 阅读时长 ≈ 23 分钟

论文Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift将批量归一化方法作用于卷积神经网络，通过校正每层输入数据的数据分布，从而达到更快的训练目的。在文章最后，添加批量归一化层到GoogLeNet网络，得到了更好的检测效果

参数解析

论文中以表格方式给出了GoogLeNet_BN的参数设置

其相对于GoogLeNet的修改如下：

在Inception模块中，\(5\times 5\)卷积层通过两个\(3\times 3\)卷积层进行替代。该实现使得网络增加了9个权重层，从而使得参数数量提高了25%，计算耗时增加了30%
增加了Inception (3c)
在Inception模块中，使用平均池化（average pooling）或者最大池化（max pooling）
在各个Inception模块之间不再使用池化层进行操作，而是在Inception 3c/4e模块中使用步长２进行减半操作

同时GoogLeNet_BN在第一个卷积层使用了深度乘数为8的可分离卷积，以此来加速计算

Our model employed separable convolution with depth multiplier 8 on the first convolutional layer. This reduces the computational cost while increasing the memory consumption at training time

Note：经过计算后发现，Inception (4c/d/e)的输出深度有错误，应该分别为\(608/608/1056\)

推导

以Inception 3(a/b/c)模块为例，尝试推导修改后的模块实现

假定输入大小为\(128\times 192\times 28\times 28\)

Inception (3a)

1x1

输入数据体：\(128\times 192\times 28\times 28\)
卷积核大小为\(1\times 1\)，步长为\(1\)，零填充为\(0\)
滤波器个数：\(64\)
输出数据体：\(128\times 64\times 28\times 28\)

3x3

先执行\(1\times 1\)大小卷积操作

输入数据体：\(128\times 192\times 28\times 28\)
卷积核大小为\(1\times 1\)，步长为\(1\)，零填充为\(0\)
滤波器个数：\(64\)
输出数据体：\(128\times 64\times 28\times 28\)

再执行\(3\times 3\)大小卷积操作

输入数据体：\(128\times 64\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(1\)，零填充为\(1\)
滤波器个数：\(64\)
输出数据体：\(128\times 64\times 28\times 28\)

double 3x3

先执行\(1\times 1\)大小卷积操作

输入数据体：\(128\times 192\times 28\times 28\)
卷积核大小为\(1\times 1\)，步长为\(1\)，零填充为\(0\)
滤波器个数：\(64\)
输出数据体：\(128\times 64\times 28\times 28\)

第一次执行\(3\times 3\)大小卷积操作

输入数据体：\(128\times 64\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(1\)，零填充为\(1\)
滤波器个数：\(96\)
输出数据体：\(128\times 96\times 28\times 28\)

第二次执行\(3\times 3\)大小卷积操作

输入数据体：\(128\times 96\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(1\)，零填充为\(1\)
滤波器个数：\(96\)
输出数据体：\(128\times 96\times 28\times 28\)

avg pooling

先执行\(Average Pooling\)操作

输入数据体：\(128\times 192\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(1\)，零填充为\(1\)
输出数据体：\(128\times 192\times 28\times 28\)

再执行\(1\times 1\)大小卷积操作

输入数据体：\(128\times 192\times 28\times 28\)
卷积核大小为\(1\times 1\)，步长为\(1\)，零填充为\(0\)
滤波器个数：\(32\)
输出数据体：\(128\times 32\times 28\times 28\)

连接

上述4个子模块计算得到了相同的空间尺寸的输出书具体，然后按深度通道进行连接，最后得到\(128\times 256\times 28\times 28\)大小的输出数据体

Inception (3b)

1x1

输入数据体：\(128\times 256\times 28\times 28\)
卷积核大小为\(1\times 1\)，步长为\(1\)，零填充为\(0\)
滤波器个数：\(64\)
输出数据体：\(128\times 64\times 28\times 28\)

3x3

先执行\(1\times 1\)大小卷积操作

输入数据体：\(128\times 256\times 28\times 28\)
卷积核大小为\(1\times 1\)，步长为\(1\)，零填充为\(0\)
滤波器个数：\(64\)
输出数据体：\(128\times 64\times 28\times 28\)

再执行\(3\times 3\)大小卷积操作

输入数据体：\(128\times 64\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(1\)，零填充为\(1\)
滤波器个数：\(96\)
输出数据体：\(128\times 96\times 28\times 28\)

double 3x3

先执行\(1\times 1\)大小卷积操作

输入数据体：\(128\times 256\times 28\times 28\)
卷积核大小为\(1\times 1\)，步长为\(1\)，零填充为\(0\)
滤波器个数：\(64\)
输出数据体：\(128\times 64\times 28\times 28\)

第一次执行\(3\times 3\)大小卷积操作

输入数据体：\(128\times 64\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(1\)，零填充为\(1\)
滤波器个数：\(96\)
输出数据体：\(128\times 96\times 28\times 28\)

第二次执行\(3\times 3\)大小卷积操作

输入数据体：\(128\times 96\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(1\)，零填充为\(1\)
滤波器个数：\(96\)
输出数据体：\(128\times 96\times 28\times 28\)

avg pooling

先执行\(Average Pooling\)操作

输入数据体：\(128\times 256\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(1\)，零填充为\(1\)
输出数据体：\(128\times 256\times 28\times 28\)

再执行\(1\times 1\)大小卷积操作

输入数据体：\(128\times 256\times 28\times 28\)
卷积核大小为\(1\times 1\)，步长为\(1\)，零填充为\(0\)
滤波器个数：\(64\)
输出数据体：\(128\times 64\times 28\times 28\)

连接

上述4个子模块计算得到了相同的空间尺寸的输出书具体，然后按深度通道进行连接，最后得到\(128\times 320\times 28\times 28\)大小的输出数据体

Inception (3c)

其步长为\(2\)，执行空间尺寸减半操作，所以在此模块中不单独执行\(1\times 1\)大小卷积层操作

3x3

先执行\(1\times 1\)大小卷积操作

输入数据体：\(128\times 320\times 28\times 28\)
卷积核大小为\(1\times 1\)，步长为\(1\)，零填充为\(0\)
滤波器个数：\(128\)
输出数据体：\(128\times 128\times 28\times 28\)

再执行\(3\times 3\)大小卷积操作

输入数据体：\(128\times 128\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(2\)，零填充为\(1\)
滤波器个数：\(160\)
输出数据体：\(128\times 160\times 14\times 14\)

double 3x3

先执行\(1\times 1\)大小卷积操作

输入数据体：\(128\times 320\times 28\times 28\)
卷积核大小为\(1\times 1\)，步长为\(1\)，零填充为\(0\)
滤波器个数：\(64\)
输出数据体：\(128\times 64\times 28\times 28\)

第一次执行\(3\times 3\)大小卷积操作

输入数据体：\(128\times 64\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(1\)，零填充为\(1\)
滤波器个数：\(96\)
输出数据体：\(128\times 96\times 28\times 28\)

第二次执行\(3\times 3\)大小卷积操作

输入数据体：\(128\times 96\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(2\)，零填充为\(1\)
滤波器个数：\(96\)
输出数据体：\(128\times 96\times 14\times 14\)

max pooling

先执行\(Max Pooling\)操作

输入数据体：\(128\times 320\times 28\times 28\)
卷积核大小为\(3\times 3\)，步长为\(2\)，零填充为\(1\)
输出数据体：\(128\times 320\times 14\times 14\)

连接

上述4个子模块计算得到了相同的空间尺寸的输出数据，然后按深度通道进行连接，最后得到\(128\times 576\times 28\times 28\)大小的输出数据体（???，没有理解stride=2的目的，抑或者是参数表的错误。当前具体实现中不使用stride=2进行减半，还是通过Max Pooling）

PyTorch

关于GoogLeNet实现参考：GoogLeNet
关于GoogLeNet_BN的具体实现参考：zjZSTU/GoogLeNet

BasicConv2d

在卷积操作后执行批量归一化

class BasicConv2d(nn.Module):

    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
        self.bn = nn.BatchNorm2d(out_channels, eps=0.001)

    def forward(self, x):
        x = self.conv(x)
        # x = self.bn(x)
        return F.relu(x, inplace=True)

Inception

\(1\times 1\)大小卷积层可能不存在
修改\(5\times 5\)卷积操作为两个\(3\times 3\)卷积操作
根据输入选择最大池化或者平均池化操作

class Inception(nn.Module):
    __constants__ = ['branch2', 'branch3', 'branch4']

    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, dch3x3red, dch3x3, pool_proj,
                 conv_block=None, stride_num=1, pool_type='max'):
        super(Inception, self).__init__()
        if conv_block is None:
            conv_block = BasicConv2d
        if ch1x1 == 0:
            self.branch1 = None
        else:
            self.branch1 = conv_block(in_channels, ch1x1, kernel_size=1, stride=1, padding=0)

        self.branch2 = nn.Sequential(
            conv_block(in_channels, ch3x3red, kernel_size=1, stride=1, padding=0),
            conv_block(ch3x3red, ch3x3, kernel_size=3, stride=stride_num, padding=1)
        )

        self.branch3 = nn.Sequential(
            conv_block(in_channels, dch3x3red, kernel_size=1, stride=1, padding=0),
            conv_block(dch3x3red, dch3x3, kernel_size=5, stride=stride_num, padding=1),
            conv_block(dch3x3, dch3x3, kernel_size=5, stride=stride_num, padding=1),
        )

        if pool_proj != 0:
            if pool_type == 'max':
                self.branch4 = nn.Sequential(
                    nn.MaxPool2d(kernel_size=3, stride=stride_num, padding=1, ceil_mode=True),
                    conv_block(in_channels, pool_proj, kernel_size=1, stride=1, padding=0)
                )
            else:
                # avg pooling
                self.branch4 = nn.Sequential(
                    nn.AvgPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
                    conv_block(in_channels, pool_proj, kernel_size=1, stride=1, padding=0)
                )
        else:
            # only max pooling
            self.branch4 = nn.MaxPool2d(kernel_size=3, stride=stride_num, padding=1, ceil_mode=True)

    def _forward(self, x):
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)

        if self.branch1 is not None:
            branch1 = self.branch1(x)
            outputs = [branch1, branch2, branch3, branch4]
        else:
            outputs = [branch2, branch3, branch4]
        return outputs

    def forward(self, x):
        outputs = self._forward(x)
        return torch.cat(outputs, 1)

GoogLeNet_BN

class GoogLeNet_BN(nn.Module):
    __constants__ = ['aux_logits', 'transform_input']

    def __init__(self, num_classes=1000, aux_logits=True, transform_input=False, init_weights=True,
                 blocks=None):
        """
        GoogLeNet实现
        :param num_classes: 输出类别数
        :param aux_logits: 是否使用辅助分类器
        :param transform_input:
        :param init_weights:
        :param blocks:
        """
        super(GoogLeNet_BN, self).__init__()
        if blocks is None:
            blocks = [BasicConv2d, Inception, InceptionAux]
        assert len(blocks) == 3
        conv_block = blocks[0]
        inception_block = blocks[1]
        inception_aux_block = blocks[2]

        self.aux_logits = aux_logits
        self.transform_input = transform_input

        self.conv1 = conv_block(3, 64, kernel_size=7, stride=2, padding=3)
        self.maxpool1 = nn.MaxPool2d(3, stride=2, padding=0, ceil_mode=True)
        self.conv2 = conv_block(64, 64, kernel_size=1, stride=1, padding=0)
        self.conv3 = conv_block(64, 192, kernel_size=3, stride=1, padding=1)
        self.maxpool2 = nn.MaxPool2d(3, stride=2, padding=0, ceil_mode=True)

        self.inception3a = inception_block(192, 64, 64, 64, 64, 96, 32, pool_type='avg')
        self.inception3b = inception_block(256, 64, 64, 96, 64, 96, 64, pool_type='avg')
        self.inception3c = inception_block(320, 0, 128, 160, 64, 96, 0, pool_type='max')
        self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception4a = inception_block(576, 224, 64, 96, 96, 128, 128, pool_type='avg')
        self.inception4b = inception_block(576, 192, 96, 128, 96, 128, 128, pool_type='avg')
        self.inception4c = inception_block(576, 160, 128, 160, 128, 160, 128, pool_type='avg')
        self.inception4d = inception_block(608, 96, 128, 192, 160, 192, 128, pool_type='avg')
        self.inception4e = inception_block(608, 0, 128, 192, 192, 256, 0, pool_type='max')
        self.maxpool4 = nn.MaxPool2d(2, stride=2, ceil_mode=True)

        self.inception5a = inception_block(1056, 352, 192, 320, 160, 224, 128, pool_type='avg')
        self.inception5b = inception_block(1024, 352, 192, 320, 192, 224, 128, pool_type='max')

        if aux_logits:
            # 辅助分类器
            # inception (4a) 输出 14x14x576
            self.aux1 = inception_aux_block(576, num_classes)
            # inception (4d) 输出 14x14x608
            self.aux2 = inception_aux_block(608, num_classes)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.2)
        self.fc = nn.Linear(1024, num_classes)

        if init_weights:
            self._initialize_weights()

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
                import scipy.stats as stats
                X = stats.truncnorm(-2, 2, scale=0.01)
                values = torch.as_tensor(X.rvs(m.weight.numel()), dtype=m.weight.dtype)
                values = values.view(m.weight.size())
                with torch.no_grad():
                    m.weight.copy_(values)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def _transform_input(self, x):
        # type: (Tensor) -> Tensor
        if self.transform_input:
            x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
            x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
            x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
            x = torch.cat((x_ch0, x_ch1, x_ch2), 1)
        return x

    def _forward(self, x):
        # type: (Tensor) -> Tuple[Tensor, Optional[Tensor], Optional[Tensor]]
        # N x 3 x 224 x 224
        x = self.conv1(x)
        # N x 64 x 112 x 112
        x = self.maxpool1(x)
        # N x 64 x 56 x 56
        x = self.conv2(x)
        # N x 64 x 56 x 56
        x = self.conv3(x)
        # N x 192 x 56 x 56
        x = self.maxpool2(x)

        # N x 192 x 28 x 28
        x = self.inception3a(x)
        # N x 256 x 28 x 28
        x = self.inception3b(x)
        # N x 320 x 28 x 28
        x = self.inception3c(x)
        # N x 576 x 28 x 28
        x = self.maxpool3(x)
        # N x 576 x 14 x 14
        x = self.inception4a(x)
        # N x 576 x 14 x 14
        aux_defined = self.training and self.aux_logits
        if aux_defined:
            aux1 = self.aux1(x)
        else:
            aux1 = None

        x = self.inception4b(x)
        # N x 576 x 14 x 14
        x = self.inception4c(x)
        # N x 608 x 14 x 14
        x = self.inception4d(x)
        # N x 608 x 14 x 14
        if aux_defined:
            aux2 = self.aux2(x)
        else:
            aux2 = None

        x = self.inception4e(x)
        # N x 1056 x 14 x 14
        x = self.maxpool4(x)
        # N x 1024 x 7 x 7
        x = self.inception5a(x)
        # N x 1024 x 7 x 7
        x = self.inception5b(x)
        # N x 1024 x 7 x 7

        x = self.avgpool(x)
        # N x 1024 x 1 x 1
        x = torch.flatten(x, 1)
        # N x 1024
        x = self.dropout(x)
        x = self.fc(x)
        # N x 1000 (num_classes)
        return x, aux2, aux1

    def forward(self, x):
        x = self._transform_input(x)
        x, aux1, aux2 = self._forward(x)
        aux_defined = self.training and self.aux_logits
        if aux_defined:
            # 训练阶段返回3个分类器结果
            return x, aux2, aux1
        else:
            # 测试阶段仅使用最后一个分类器
            return x

测试

比较GoogLeNet_BN与GoogLeNet．具体测试代码参考test_googlenet_bn.py

参数个数

1
2
3

[googlenet_bn] param num: 17683640
[googlenet] param num: 13370744
num_googlenet_bn / num_googlenet: 1.32

GoogLeNet有1768万个参数，GoogLeNet有1337万个，两者相差1.32倍

测试时间

1
2
3

[googlenet_bn] time: 0.0596
[googlenet] time: 0.0602
time_googlenet / time_googlenet_bn: 1.010

计算100次测试图像平均使用时间：

GoogLeNet_BN：0.0596秒
GoogLeNet：0.0602秒

两者的计算时间相近

训练

比对GoogLeNet_BN和GoogLeNet训练，训练参数如下：

数据集：PASCAL VOC 07+12，20类共40058个训练样本和12032个测试样本
批量大小：128
优化器：Adam，学习率为1e-3
随步长衰减：每隔8轮衰减4%，学习因子为0.96
迭代次数：100轮

训练100次结果如下：

{'train': 40058, 'test': 12032}
Epoch 0/99
----------
train Loss: 4.2452 Acc: 0.2644
test Loss: 2.4459 Acc: 0.3763
Epoch 1/99
----------
...
...
----------
train Loss: 0.9129 Acc: 0.8467
test Loss: 0.9284 Acc: 0.7454
Epoch 98/99
----------
train Loss: 0.8963 Acc: 0.8524
test Loss: 0.9539 Acc: 0.7406
Epoch 99/99
----------
train Loss: 0.8869 Acc: 0.8526
test Loss: 0.9968 Acc: 0.7409
Training complete in 194m 38s
Best test Acc: 0.747839
train googlenet_bn done

Epoch 0/99
----------
train Loss: 4.2141 Acc: 0.2787
test Loss: 2.4076 Acc: 0.3763
Epoch 1/99
----------
train Loss: 3.9860 Acc: 0.3354
test Loss: 2.2959 Acc: 0.3969
Epoch 2/99
----------
...
...
----------
train Loss: 0.9720 Acc: 0.8304
test Loss: 0.9777 Acc: 0.7278
Epoch 98/99
----------
train Loss: 0.9744 Acc: 0.8279
test Loss: 0.9249 Acc: 0.7358
Epoch 99/99
----------
train Loss: 0.9632 Acc: 0.8336
test Loss: 0.9337 Acc: 0.7350
Training complete in 152m 5s
Best test Acc: 0.742852
train googlenet done

100轮迭代后，GoogLeNet_BN实现了74.78%的最好测试精度；GoogLeNet实现了74.23%的最好测试精度

大海

GoogLeNet_BN

参数解析

推导

Inception (3a)

1x1

3x3

double 3x3

avg pooling

连接

Inception (3b)

1x1

3x3

double 3x3

avg pooling

连接

Inception (3c)

3x3

double 3x3

max pooling

连接

PyTorch

BasicConv2d

Inception

GoogLeNet_BN

测试

参数个数

测试时间

训练

相关阅读