Deep Convolutional Generative Adversarial Network | Python

Generative Adversarial Network is developed recently by Ian Goodfellow. They are a DL model to capture the training data distribution enhancing the generation of new data from a given distribution. Therefore, they transform random noise into meaningful data, thus producing a wide variety of data resembling the training dataset. Thus, they are a robust candidate for the unsupervised learning method.

In a normal deep learning problem, we will change the net weight value to alter the error and loss function. Here in Generative Adversarial Network, the generator feeds data into the discriminator, and then the discriminator will produce the desired output. Then, the generator simply tries to spawn ‘fake’ images that resemble the training images and the discriminator will classify images produced from the generator and label if the image is real or fake. Thus the generator will check if the output produced by the discriminator is real/ fake and depending on the output, back-propagation will take place to alter the weights of the discriminator and will repeat the cycle.

Deep Convolutional Generative Adversarial Network is thus considered as an extension of GAN architecture, where various classes including deep convolutional networks are used alongside the generator and discriminator networks. The DCGAN’s generator comprises convolutional-transpose layers, ReLu activations, and batch norm layers. They are unsupervised algorithms with the supervised loss for training.

Some of the impressive applications of Generative Adversarial Network are 3D Object Generation, Face Aging, Realistic Photographs, Face Frontal View Generation, and Semantic-Image-to-Photo Translation.

In this tutorial, we will create numbers with the help of the MNIST dataset using Deep Convolutional GANs. Thus, increases the stability of the training generator model.

To know more about these MNIST datasets, refer here.

Happy Reading!!!

IMPORTING LIBRARIES

The necessary Python libraries to run the projects are to be inputted.

import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

IMPORTING THE DATASET

The MNIST dataset is loaded for the project.

mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz

DECLARING PARAMETERS

The training and network parameters are declared in this section. Network parameters include image_dim and noise_dimension. The image_dim represents the 28*28 pixels that are 784. And the noise_dimension represents the noise data points.

num_steps = 10000
batch_size = 128
lr_generator = 0.002
lr_discriminator = 0.002
image_dim = 784 
noise_dimension = 100

BUILDING NETWORKS

Processing of the network inputs. istraining is the boolean to specify batch stabilization if it is training or implication time.

noise = tf.placeholder(tf.float32, shape = [None, noise_dimension] )
real_image_input = tf.placeholder(tf.float32, shape = [None, 28, 28, 1])
istraining = tf.placeholder(tf.bool)

LEAKY RELU ACTIVATION

Leaky Relu will print the output directly if it is positive, else will print zero. Two notable advantages of Leaky Relu is that:

  • Easy to train
  • Better Results
def leakyrelu(x, alpha=0.2):
    return 0.5x + 0.5*abs(x) - 0.5*alpha*abs(x) + 0.5x*alpha

GENERATOR NETWORK

Input: Noice; Output: Image

The steps involved in the Generator Network are:

  • Tf Layers automatically generate variables and estimate its shape, based on the noise given as input.
  • Reshaping all the 4D image arrays. Thus, the new shape is of size (batch, 7, 7, 128).  Here, the first 7 represents the height, the second 7 represents the width, and 128 represents the channels.
  • Deconvolution is to be done for reducing the complication of the image pixel-wise. Conversion of the image shape to (batch, 14, 14, 64) and then to (batch, 28, 28, 1) in the second deconvolution.
  • Tanh will be applied for better stability which will be in [-1,1] and will be converted/normalized to [0,1] in the later stage.
def generator ( x, reuse = False):
    with tf.variable_scope('Generator', reuse = reuse):
        x = tf.layers.dense(x, units = 7 * 7 * 128)
        x = tf.layers.batch_normalization(x, training = istraining)
        x = tf.nn.relu(x)
        x = tf.reshape(x, shape = [-1, 7, 7, 128])
        x = tf.layers.conv2d_transpose(x, 64, 5, strides=3, padding='same')
        x = tf.layers.batch_normalization(x, training=istraining)
        x = tf.nn.relu(x)
        x = tf.layers.conv2d_transpose(x, 1, 5, strides=3, padding='same')
        x = tf.nn.tanh(x)
        return x

Note: Placeholder indicates the layer if training is done or not. The batch standardization has diverse performance at training and implication time.

DISCRIMINATOR NETWORK

Input: Image.

Output: Prediction of Original or Forged Image

The steps involved in the Discriminator Network are:

  • CNN to classify the image data obtained.
  • Flatten the image
  • Two output classes: Real images or Fake images
def discriminator(x, reuse=False):
    with tf.variable_scope('Discriminator', reuse = reuse): 
        x = tf.layers.conv2d(x, 64, 5, strides=3, padding='same')
        x = tf.layers.batch_normalization(x, training=istraining)
        x = leakyrelu(x)
        x = tf.layers.conv2d(x, 128, 10, strides=3, padding='same')
        x = tf.layers.batch_normalization(x, training=istraining)
        x = leakyrelu(x)
        x = tf.reshape(x, shape=[-1, 7 * 7 * 128])
        x = tf.layers.dense(x, 1024)
        x = tf.layers.batch_normalization(x, training=istraining)
        x = leakyrelu(x)
        x = tf.layers.dense(x, 2)
    return x

BUILDING THE NETWORK

The building of the Generator Network using a generator(), initializing with the noise data as input

gen_sample = generator(noise_input)

The building of two Discriminator Networks. Thatis, the:

  • Noise input
  • Generated samples
disc_real = discriminator(real_image_input)
disc_fake = discriminator(gen_sample, reuse=True)

The building of the stacked generator or the discriminator

stacked_gan = discriminator(gen_sample, reuse=True)

Building the Loss function. Discriminator Loss is for differentiating between the original and fake samples.

1 is the label representing original images and 0 represents forged images.

disc_loss_real = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
    logits = disc_real, labels = tf.ones([batch_size], dtype = tf.int32)))
disc_loss_fake = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
    logits = disc_fake, labels = tf.zeros([batch_size], dtype = tf.int32)))

Summing up both the losses that are the original loss and the fake loss.

disc_loss = disc_loss_real + disc_loss_fake

The Generator Loss function. The chief role of the generator is to dupe the discriminator. Thus, it tries to label 1 for all the fake data created by the discriminator.

gen_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
    logits=stacked_gan, labels=tf.ones([batch_size], dtype=tf.int32)))

Build the Optimizer with AdamOptimizer as the optimizing function. Few important parameters for the optimizer function are:

  • Learning Rate
  • Beta1
  • Beta2
optimizer_gen = tf.train.AdamOptimizer(learning_rate=lr_generator, beta1=0.5, beta2=0.999)
optimizer_disc = tf.train.AdamOptimizer(learning_rate=lr_discriminator, beta1=0.5, beta2=0.999)

Consists of the training variables for each and every optimizer. The generator network has to be updated perfectly, as all the variables will be updated with the help of the optimizer (here AdamOptimizer).

gen_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='Generator')

The Discriminator Network Variables with get_collection().

disc_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='Discriminator')

Creating training operations.

gen_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, scope='Generator')

`gen_update_ops` should run before the `minimize` op (backpropagation), which will be ensured by the `control_dependencies`.

with tf.control_dependencies(gen_update_ops):
    train_gen = optimizer_gen.minimize( gen_loss,var_list=gen_vars )
disc_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, scope='Discriminator')
with tf.control_dependencies(disc_update_ops):
    train_disc = optimizer_disc.minimize( disc_loss,var_list=disc_vars )

Allocate default value to the variables.

initialize = tf.global_variables_initializer()

INITIALIZING TRAINING

Starting the new TF session and running the initializer.

sess = tf.Session()
sess.run(initialize)

 TRAINING

Discriminator and generator training occurs in this section. The steps involved here are:

  • Formulate Input Data
  • Get the following batch for processing
  • Discriminator Training
  • Generator Training

Loss  A represents the Discriminator Loss and Loss B represents the Generator Loss.

for i in range(1, num_steps+1):
    batch_x, _ = mnist.train.next_batch(batch_size)
    batch_x = np.reshape(batch_x, newshape=[-1, 28, 28, 1])
    batch_x = batch_x * 2. - 1.
    z = np.random.uniform(-1., 1., size=[batch_size, noise_dimension])
    _, dl = sess.run([train_disc, disc_loss], feed_dict={real_image_input: batch_x, noise: z, istraining:True})
    z = np.random.uniform(-1., 1., size=[batch_size, noise_dimension])
    _, gl = sess.run([train_gen, gen_loss], feed_dict={noise: z, istraining:True})
    
    if i % 500 == 0 or i == 1:
        print('Step %i: Loss A: %f, Loss B: %f' % (i, gl, dl))
Step 1: Loss A: 2.590860, Loss B: 0.907586 
Step 500: Loss A: 2.154698, Loss B: 0.895236 
Step 1000: Loss A: 1.430409, Loss B: 0.837684 
Step 1500: Loss A: 1.962198, Loss B: 0.618827 
Step 2000: Loss A: 2.767945, Loss B: 0.378071 
Step 2500: Loss A: 2.370605, Loss B: 0.561247 
Step 3000: Loss A: 3.427798, Loss B: 0.402951 
Step 3500: Loss A: 4.904454, Loss B: 0.554856 
Step 4000: Loss A: 4.045284, Loss B: 0.454970 
Step 4500: Loss A: 4.577699, Loss B: 0.687195 
Step 5000: Loss A: 3.476081, Loss B: 0.210492 
Step 5500: Loss A: 3.898139, Loss B: 0.143352 
Step 6000: Loss A: 4.089877, Loss B: 1.082561 
Step 6500: Loss A: 5.911457, Loss B: 0.154059 
Step 7000: Loss A: 3.594872, Loss B: 0.152970 
Step 7500: Loss A: 6.067883, Loss B: 0.084864 
Step 8000: Loss A: 6.737456, Loss B: 0.402566 
Step 8500: Loss A: 6.630128, Loss B: 0.034838 
Step 9000: Loss A: 6.480587, Loss B: 0.427419 
Step 9500: Loss A: 7.200409, Loss B: 0.124268 
Step 10000: Loss A: 5.479313, Loss B: 0.191389

TESTING

The steps involved in this section are:

  • Produce images from noise as the input present in the generator network.
  • Inputting the Noise data as input.
  • Produce images from the noise data.
  • The images are in tanh[-1,1]. Thus, rescaled to (0,1)
  • Converse colors for improved visuals
  • Then, Draw the newly created digits in the canvas
number = 6
canvas = np.empty(28 * number, 28 * number)
for a in range(number):
    z = np.random.uniform(-1., 1., size=[number, noise_dimension])
    g = sess.run(gen_sample, feed_dict={noise: z, istraining:False})
    g = (g + 1.) / 2.
    g = -1 * (g - 1)
    for j in range(number):
        canvas[a * 28:(a + 1) * 28, j * 28:(j + 1) * 28] = g[j].reshape( [28, 28] )

plt.figure(figsize=(number, number))
plt.imshow(canvas, origin="upper", cmap="gray")
plt.show()

Generative Adversarial Network

FINAL THOUGHTS

DCGANs are a very efficient machine learning algorithm, which is an extension of the GAN architecture. In this article, DCGANs has been explained in detail step-by-step with the example of the MNIST dataset. To explain what we have learned in simple words, we built the networks- generator and discriminator. Then, we declared the loss function, optimizers, training operations, and initializing the variables with their respective default values. Following that, we initialized the training by starting the TF session and running the initializer. Then we trained and tested the model. Thus, they are very ideal for training classifiers.

GANs produce sample images quicker than other networks like WaveNet, NADE, PixelRNN, etc. Thus, there will be no need to generate diverse entries in the sample data chronologically. GANs are comparatively harder to train, as we will need a lot of data continuously to check the accuracy. It has plenty of real-world uses which enhances the quality of images, colorizing images, generating faces, and other interesting tasks.

The code for this DCGAN project can be retrieved from here.

To learn more about DCGANs, refer to this research paper.

To learn from more of my blogs, click here.

Hope this article was helpful. Thank you!!!

Leave a Reply

Your email address will not be published. Required fields are marked *