Image-to-Image Translation with Conditional Adversarial Networks 论文

论文原址

1.实现的功能

实现了将图片转成另外一个图片，具体使用包括：根据手稿生成画作，冬天图像转化为春天等等。

2.网络模型

使用有监督的GAN，通过G生成器将x转化为G(x)，然后通过D辨别器区分真实的图像和生成的图像。

通俗理解GAN

2.1 网络中的特殊点

2.1.1 同时生成高频和低频信息

低频图像主要包含整体形状，轮廓等。高频图像主要包含边缘等细节信息。

模型的损失函数为：

其中的 $\lambda L_{L1}(G)$ 是指通过G生成的图像和真实图像之间的差距

这个是对整个图像进行求损失期望，理所应当的这个误差描述的是整体之间的差异，通过这个能够很好的对低频信息进行建模。

对高频信息的建模是通过PatchGAN实现的，正常的GAN其中的辨别器的输入是两个图像，输出是一个数值，表示辨别的结果。PatchGAN是将一个图像分为很多个小的patch，对其中的每个patch输出一个数值，表示该patch的辨别结果。patch的输入仍然是两个图像，但是输出的是每个patch的判别结果。将图像分成很多个patch，然后再求损失，这样的精度相比于整体一起求细节保存的是更好的，所以这个技术能够很好的对高频信息进行建模。

2.2模型的具体实现：

2.2.1 生成器

生成器采用的网络模型是经典的U-net：UNet

U-Net通过将下采样时候的特征值和上采样的特征值进行结合建模，这样能够很好的保存图像的底层信息，使得生成的图像和本来的图像能够有相似的底层信息：例如根据手绘图像生成实际图像中，手绘图像的形状和实际图像的形状应该很相似。生成器的具体代码如下：

class UnetGenerator(nn.Module):
    """Create a Unet-based generator"""

    def __init__(self, input_nc, output_nc, num_downs, ngf=64, norm_layer=nn.BatchNorm2d, use_dropout=False):
        """Construct a Unet generator
        Parameters:
            input_nc (int)  -- the number of channels in input images
            output_nc (int) -- the number of channels in output images
            num_downs (int) -- the number of downsamplings in UNet. For example, # if |num_downs| == 7,
                                image of size 128x128 will become of size 1x1 # at the bottleneck
            ngf (int)       -- the number of filters in the last conv layer
            norm_layer      -- normalization layer

        We construct the U-Net from the innermost layer to the outermost layer.
        It is a recursive process.
        """
        super(UnetGenerator, self).__init__()
        # construct unet structure
        unet_block = UnetSkipConnectionBlock(ngf * 8, ngf * 8, input_nc=None, submodule=None, norm_layer=norm_layer, innermost=True)  # add the innermost layer
        for i in range(num_downs - 5):  # add intermediate layers with ngf * 8 filters
            unet_block = UnetSkipConnectionBlock(ngf * 8, ngf * 8, input_nc=None, submodule=unet_block, norm_layer=norm_layer, use_dropout=use_dropout)
        # gradually reduce the number of filters from ngf * 8 to ngf
        unet_block = UnetSkipConnectionBlock(ngf * 4, ngf * 8, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
        unet_block = UnetSkipConnectionBlock(ngf * 2, ngf * 4, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
        unet_block = UnetSkipConnectionBlock(ngf, ngf * 2, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
        self.model = UnetSkipConnectionBlock(output_nc, ngf, input_nc=input_nc, submodule=unet_block, outermost=True, norm_layer=norm_layer)  # add the outermost layer

    def forward(self, input):
        """Standard forward"""
        return self.model(input)

2.2.2 辨别器

辨别器使用的是常规的卷积神经网络，输出的结果是每一个patch的判别结果，具体代码如下：

class NLayerDiscriminator(nn.Module):
    """Defines a PatchGAN discriminator"""

    def __init__(self, input_nc, ndf=64, n_layers=3, norm_layer=nn.BatchNorm2d):
        """Construct a PatchGAN discriminator

        Parameters:
            input_nc (int)  -- the number of channels in input images
            ndf (int)       -- the number of filters in the last conv layer
            n_layers (int)  -- the number of conv layers in the discriminator
            norm_layer      -- normalization layer
        """
        super(NLayerDiscriminator, self).__init__()
        if type(norm_layer) == functools.partial:  # no need to use bias as BatchNorm2d has affine parameters
            use_bias = norm_layer.func == nn.InstanceNorm2d
        else:
            use_bias = norm_layer == nn.InstanceNorm2d

        kw = 4
        padw = 1
        sequence = [nn.Conv2d(input_nc, ndf, kernel_size=kw, stride=2, padding=padw), nn.LeakyReLU(0.2, True)]
        nf_mult = 1
        nf_mult_prev = 1
        for n in range(1, n_layers):  # gradually increase the number of filters
            nf_mult_prev = nf_mult
            nf_mult = min(2**n, 8)
            sequence += [nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=2, padding=padw, bias=use_bias), norm_layer(ndf * nf_mult), nn.LeakyReLU(0.2, True)]

        nf_mult_prev = nf_mult
        nf_mult = min(2**n_layers, 8)
        sequence += [nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=1, padding=padw, bias=use_bias), norm_layer(ndf * nf_mult), nn.LeakyReLU(0.2, True)]

        sequence += [nn.Conv2d(ndf * nf_mult, 1, kernel_size=kw, stride=1, padding=padw)]  # output 1 channel prediction map
        self.model = nn.Sequential(*sequence)

    def forward(self, input):
        """Standard forward."""
        return self.model(input)

2.2.3损失函数

def backward_D(self):
    """Calculate GAN loss for the discriminator"""
    # Fake; stop backprop to the generator by detaching fake_B
    fake_AB = torch.cat((self.real_A, self.fake_B), 1)  # we use conditional GANs; we need to feed both input and output to the discriminator
    pred_fake = self.netD(fake_AB.detach())
    self.loss_D_fake = self.criterionGAN(pred_fake, False)
    # Real
    real_AB = torch.cat((self.real_A, self.real_B), 1)
    pred_real = self.netD(real_AB)
    self.loss_D_real = self.criterionGAN(pred_real, True)
    # combine loss and calculate gradients
    self.loss_D = (self.loss_D_fake + self.loss_D_real) * 0.5
    self.loss_D.backward()

def backward_G(self):
    """Calculate GAN and L1 loss for the generator"""
    # First, G(A) should fake the discriminator
    fake_AB = torch.cat((self.real_A, self.fake_B), 1)
    pred_fake = self.netD(fake_AB)
    self.loss_G_GAN = self.criterionGAN(pred_fake, True)
    # Second, G(A) = B
    self.loss_G_L1 = self.criterionL1(self.fake_B, self.real_B) * self.opt.lambda_L1
    # combine loss and calculate gradients
    self.loss_G = self.loss_G_GAN + self.loss_G_L1
    self.loss_G.backward()

其中的loss_D包含两个部分，一个是输入为（真，假）的loss表示将假的误认为真的情况，一个是输入为（真，真）的loss表示将真的判别为假的情况。最后的loss为两者的平均值。如果想要改变或者重视某一个情况的话，可以修改权重。比如有的情况下认为将假的认为是真的，这种情况是难以容忍的，所以可以将之给很高的权重a,另外一个的权重则为1-a。

生成器的loss函数包含两部分，分别是生成器本身的loss。还有L1loss，表示生成的图像和真实的图像之间的区别，也是用来建模低频信息的关键。那么为什么不像判别器一样也使用（真，真），（真，假）两种情况的loss呢？因为生成器只关注生成的图像和真实的区别，即（真，假）的情况，并不关心判别器能否判断（真，真）的情况。从另一个角度看，就算加上（真，真）的loss，在计算梯度的时候也不会有任何影响，因为（真，真）的loss中并不包含生成器中的参数。

3.训练结果

3.1数据集

使用facades数据集，这个数据集包含从抽象图像到真实图像的配对。

3.2训练参数

参数使用开源代码中默认的参数。

3.3训练结果

3.4测试集实际效果：

4.论文中仍未解决的问题

原始图像在送进生成器之前会加上随机的噪音，通过这种随机的噪音，希望对于同一个输入，输出的图像的不同的，但是在实际中这个效果并不理想，是因为之后的网络能够学习到去掉噪音。论文中尝试加入dropout，此时的dropout不仅在训练的时候使用，在测试的时候也使用随机的dropout，通过这种方式能够一定程度上生成不同的图像，但是论文中提到这个效果并没有特别理想。噪声，我们在网络的输出中只观察到较小的随机性。所以设计能够产生高度随机输出的条件GAN，从而捕获它们所建模的条件分布的完全熵，是目前工作留下的一个重要问题。