Neural Style Transfer with TensorFlow in Python

In this tutorial, We’ll learn about Neural Style Transfer and how to perform it using the VGG19 model using the Python TensorFlow deep learning module.

Neural style transfer (NST) is an optimization technique which takes two images, a Content image (the one you want to edit) and a style quotation image, and combine them together so the resultant image looks like the content image, but “edited” in the style of the style quotation image.

For example, we’ll take two images 

Now how would this look if I decide to paint/edit the picture of Mine with this style? Something like,

IMPLEMENTATION 

NOTE:- USE GOOGLE COLAB AND ENABLE GPU FROM RUNTIME.

IMPORT REQUIRED LIBRARIES AND VGG19 MODEL

Below are the required Python libraries that we need to import:

import os
import tensorflow as tf
tf.executing_eagerly()
from tensorflow.python.keras.applications.vgg19 import VGG19
from tensorflow.python.keras.preprocessing.image import load_img, img_to_array
from tensorflow.python.keras.applications.vgg19 import preprocess_input
from tensorflow.python.keras.models import Model

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

ACCESSING DATA FROM DRIVE

You must have uploaded some content and style images in your drive. Remember image should not be too large as if images are too large they cannot be processed in Google colab, it will show an error of Resourceexhaustederror. (Recommended size 224 x 224)

from google.colab import drive
drive.mount('/content/drive')

os.chdir("ENTER PATH")   #EXAMPLE /content/drive/My Drive/Neural Style Transfer

os.listdir()

DEFINING VGG19 MODEL

model = VGG19(
    include_top = False,
    weights = 'imagenet'
)

model.trainable = False
model.summary()

Here, include_top = False because we don’t actually need to add the top layer, we just want some of the middle layers of the VGG19 model. Then, weights = ‘imagenet’, we want our model to be pre-trained on the dataset of imagenet so that it works as a feature detector for us. model.trainable = False, we don’t want to update model parameters, we just need the model for outputs.

OUTPUT:-

Model: "vgg19"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, None, None, 3)]   0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv4 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv4 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv4 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 20,024,384
Trainable params: 0
Non-trainable params: 20,024,384
_________________________________________________________________

HELPER FUNCTIONS

#FUNCTION FOR LOADING AND PROCESSING THE IMAGE
def load_and_process_image(path):
    img = load_img(path)
    img = img_to_array(img)
    img = preprocess_input(img)    #transforms input image array into the format our model expects          
    img = np.expand_dims(img, axis = 0)    #converts image in 4D tensor as model expects a 4D tensor as input
    return img

#FUNCTION FOR DEPROCESSING AND DISPLAYING THE IMAGE
def deprocess(w):
    '''performing inverse of the preprocessing step, if you see the documentation of preprocess_input function
       you'll find out the values 103.939, 116.779 etc these are just the opposite values of RGB channel to nullify the
       effect of preprocess_input'''
    w[:, :, 0] += 103.939
    w[:, :, 1] += 116.779
    w[:, :, 2] += 123.68
    w = w[:, :, ::-1]

    w = np.clip(w, 0, 255).astype('uint8')
    return w

def display_image(image):
    if len(image.shape) == 4:
        img = np.squeeze(image, axis = 0)

    img = deprocess(img)
    
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    plt.imshow(img)
    return

DISPLAY IMAGES

img = load_and_process_image('PATH OF THE IMAGE')   #EXAMPLE /content/drive/My Drive/Neural Style Transfer/Style1.jpg
display_image(img)        

#SIMILARLY, YOU CAN LOAD BOTH THE CONTENT AND STYLE IMAGE AND DISPLAY IT

OUTPUT:-

CONTENT AND STYLE MODELS

VGG19 has 5 blocks, all 5 blocks have 2 default convo layers and 1 pooling layer. We have to figure out which layer extracts the best features and which block processes style image and which process content image.

s_layers = [
    'block1_conv1', 
    'block3_conv1', 
    'block5_conv1'
]

c_layer = 'block5_conv2'

# intermediate models
c_model = Model(
    inputs = model.input, 
    outputs = model.get_layer(c_layer).output
)

s_models = [Model(inputs = model.input, outputs = model.get_layer(layer).output) for layer in s_layers]

Now, we have to calculate the content cost and style cost. And later we try to minimize the total cost and get the best image out of it.

CONTENT COST

# Content Cost
def content_cost(content, generated):
    a_C = c_model(content) #activation function of content image
    a_G = c_model(generated)   #activation function of generated image
    cost = tf.reduce_mean(tf.square(a_C - a_G))
    return cost

GRAM MATRIX

We calculate the Gram matrix for the activations of style and generated image and calculate the style cost by finding the mean square difference from the Gram matrix. It is also used to match feature distribution as a post to preserve a specific feature.

def gram_matrix(X):
    channels = int(X.shape[-1])
    a = tf.reshape(X, [-1, channels])
    n = tf.shape(a)[0]
    gram = tf.matmul(a, a, transpose_a = True)
    return gram / tf.cast(n, tf.float32)

STYLE COST

#Style cost
l = 1. / len(s_models)

def style_cost(style, generated):
    s = 0
    
    for style_model in s_models:
        a_S = style_model(style)
        a_G = style_model(generated)
        GS = gram_matrix(a_S)
        GG = gram_matrix(a_G)
        current_cost = tf.reduce_mean(tf.square(GS - GG))
        s += current_cost * l
    
    return s

TRAINING

#Training
import time

generated_images = []

def training_loop(content_img_path, style_img_path, iterations = 200, a = 15., b = 25.):
    # initialise
    content = load_and_process_image(content_img_path)
    style = load_and_process_image(style_img_path)
    generated = tf.Variable(content, dtype = tf.float32)
    
    opt = tf.optimizers.Adam(learning_rate = 7.)
    
    best_cost = 1e12+0.1
    best_image = None
    
    start_time = time.time()
    
    for i in range(iterations):
        
        with tf.GradientTape() as tape:
            c = content_cost(content, generated)
            s = style_cost(style, generated)
            total = a * c + b * s
        
        grads = tape.gradient(total, generated)
        opt.apply_gradients([(grads, generated)])
        
        if total < best_cost:
            best_cost = total
            best_image = generated.numpy()
        
        if i % int(iterations/10) == 0:
            time_taken = time.time() - start_time
            print('Cost at {}: {}. Time elapsed: {}'.format(i, total, time_taken))
            generated_images.append(generated.numpy())
        
    return best_image




final = training_loop('IMG_20200528_122818_725.jpg','Style1.jpg')

OUTPUT:-

Cost at 0: 5982162944.0. Time elapsed: 0.26296472549438477
Cost at 20: 466855200.0. Time elapsed: 4.0034027099609375
Cost at 40: 200161328.0. Time elapsed: 7.76223611831665
Cost at 60: 126009928.0. Time elapsed: 11.563525199890137
Cost at 80: 94797872.0. Time elapsed: 15.384293794631958
Cost at 100: 77722640.0. Time elapsed: 19.239824056625366
Cost at 120: 66518892.0. Time elapsed: 23.08683753013611
Cost at 140: 58410384.0. Time elapsed: 26.898508071899414
Cost at 160: 52121384.0. Time elapsed: 30.679835319519043
Cost at 180: 47068704.0. Time elapsed: 34.441083908081055

PLOT THE RESULT

#plot the result
plt.figure(figsize = (12, 12))

for i in range(10):
    plt.subplot(5, 2, i + 1)
    display_image(generated_images[i])
plt.show()

OUTPUT:-

display_image(final)

OUTPUT:-

CONCLUSION

YOU CAN ALSO SEE:-

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *