Neural Style Transfer with TensorFlow in Python

In this tutorial, We’ll learn about Neural Style Transfer and how to perform it using the VGG19 model using the Python TensorFlow deep learning module.
Neural style transfer (NST) is an optimization technique which takes two images, a Content image (the one you want to edit) and a style quotation image, and combine them together so the resultant image looks like the content image, but “edited” in the style of the style quotation image.
For example, we’ll take two images
Now how would this look if I decide to paint/edit the picture of Mine with this style? Something like,
IMPLEMENTATION
NOTE:- USE GOOGLE COLAB AND ENABLE GPU FROM RUNTIME.
IMPORT REQUIRED LIBRARIES AND VGG19 MODEL
Below are the required Python libraries that we need to import:
import os import tensorflow as tf tf.executing_eagerly() from tensorflow.python.keras.applications.vgg19 import VGG19 from tensorflow.python.keras.preprocessing.image import load_img, img_to_array from tensorflow.python.keras.applications.vgg19 import preprocess_input from tensorflow.python.keras.models import Model import numpy as np import matplotlib.pyplot as plt %matplotlib inline
ACCESSING DATA FROM DRIVE
You must have uploaded some content and style images in your drive. Remember image should not be too large as if images are too large they cannot be processed in Google colab, it will show an error of Resourceexhaustederror. (Recommended size 224 x 224)
from google.colab import drive drive.mount('/content/drive') os.chdir("ENTER PATH") #EXAMPLE /content/drive/My Drive/Neural Style Transfer os.listdir()
DEFINING VGG19 MODEL
model = VGG19( include_top = False, weights = 'imagenet' ) model.trainable = False model.summary()
Here, include_top = False because we don’t actually need to add the top layer, we just want some of the middle layers of the VGG19 model. Then, weights = ‘imagenet’, we want our model to be pre-trained on the dataset of imagenet so that it works as a feature detector for us. model.trainable = False, we don’t want to update model parameters, we just need the model for outputs.
OUTPUT:-
Model: "vgg19" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) [(None, None, None, 3)] 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, None, None, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, None, None, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, None, None, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, None, None, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, None, None, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, None, None, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, None, None, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, None, None, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, None, None, 256) 590080 _________________________________________________________________ block3_conv4 (Conv2D) (None, None, None, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, None, None, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, None, None, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, None, None, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, None, None, 512) 2359808 _________________________________________________________________ block4_conv4 (Conv2D) (None, None, None, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, None, None, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, None, None, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, None, None, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, None, None, 512) 2359808 _________________________________________________________________ block5_conv4 (Conv2D) (None, None, None, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, None, None, 512) 0 ================================================================= Total params: 20,024,384 Trainable params: 0 Non-trainable params: 20,024,384 _________________________________________________________________
HELPER FUNCTIONS
#FUNCTION FOR LOADING AND PROCESSING THE IMAGE def load_and_process_image(path): img = load_img(path) img = img_to_array(img) img = preprocess_input(img) #transforms input image array into the format our model expects img = np.expand_dims(img, axis = 0) #converts image in 4D tensor as model expects a 4D tensor as input return img #FUNCTION FOR DEPROCESSING AND DISPLAYING THE IMAGE def deprocess(w): '''performing inverse of the preprocessing step, if you see the documentation of preprocess_input function you'll find out the values 103.939, 116.779 etc these are just the opposite values of RGB channel to nullify the effect of preprocess_input''' w[:, :, 0] += 103.939 w[:, :, 1] += 116.779 w[:, :, 2] += 123.68 w = w[:, :, ::-1] w = np.clip(w, 0, 255).astype('uint8') return w def display_image(image): if len(image.shape) == 4: img = np.squeeze(image, axis = 0) img = deprocess(img) plt.grid(False) plt.xticks([]) plt.yticks([]) plt.imshow(img) return
DISPLAY IMAGES
img = load_and_process_image('PATH OF THE IMAGE') #EXAMPLE /content/drive/My Drive/Neural Style Transfer/Style1.jpg display_image(img) #SIMILARLY, YOU CAN LOAD BOTH THE CONTENT AND STYLE IMAGE AND DISPLAY IT
OUTPUT:-
CONTENT AND STYLE MODELS
VGG19 has 5 blocks, all 5 blocks have 2 default convo layers and 1 pooling layer. We have to figure out which layer extracts the best features and which block processes style image and which process content image.
s_layers = [ 'block1_conv1', 'block3_conv1', 'block5_conv1' ] c_layer = 'block5_conv2' # intermediate models c_model = Model( inputs = model.input, outputs = model.get_layer(c_layer).output ) s_models = [Model(inputs = model.input, outputs = model.get_layer(layer).output) for layer in s_layers]
Now, we have to calculate the content cost and style cost. And later we try to minimize the total cost and get the best image out of it.
CONTENT COST
# Content Cost def content_cost(content, generated): a_C = c_model(content) #activation function of content image a_G = c_model(generated) #activation function of generated image cost = tf.reduce_mean(tf.square(a_C - a_G)) return cost
GRAM MATRIX
We calculate the Gram matrix for the activations of style and generated image and calculate the style cost by finding the mean square difference from the Gram matrix. It is also used to match feature distribution as a post to preserve a specific feature.
def gram_matrix(X): channels = int(X.shape[-1]) a = tf.reshape(X, [-1, channels]) n = tf.shape(a)[0] gram = tf.matmul(a, a, transpose_a = True) return gram / tf.cast(n, tf.float32)
STYLE COST
#Style cost l = 1. / len(s_models) def style_cost(style, generated): s = 0 for style_model in s_models: a_S = style_model(style) a_G = style_model(generated) GS = gram_matrix(a_S) GG = gram_matrix(a_G) current_cost = tf.reduce_mean(tf.square(GS - GG)) s += current_cost * l return s
TRAINING
#Training import time generated_images = [] def training_loop(content_img_path, style_img_path, iterations = 200, a = 15., b = 25.): # initialise content = load_and_process_image(content_img_path) style = load_and_process_image(style_img_path) generated = tf.Variable(content, dtype = tf.float32) opt = tf.optimizers.Adam(learning_rate = 7.) best_cost = 1e12+0.1 best_image = None start_time = time.time() for i in range(iterations): with tf.GradientTape() as tape: c = content_cost(content, generated) s = style_cost(style, generated) total = a * c + b * s grads = tape.gradient(total, generated) opt.apply_gradients([(grads, generated)]) if total < best_cost: best_cost = total best_image = generated.numpy() if i % int(iterations/10) == 0: time_taken = time.time() - start_time print('Cost at {}: {}. Time elapsed: {}'.format(i, total, time_taken)) generated_images.append(generated.numpy()) return best_image final = training_loop('IMG_20200528_122818_725.jpg','Style1.jpg')
OUTPUT:-
Cost at 0: 5982162944.0. Time elapsed: 0.26296472549438477 Cost at 20: 466855200.0. Time elapsed: 4.0034027099609375 Cost at 40: 200161328.0. Time elapsed: 7.76223611831665 Cost at 60: 126009928.0. Time elapsed: 11.563525199890137 Cost at 80: 94797872.0. Time elapsed: 15.384293794631958 Cost at 100: 77722640.0. Time elapsed: 19.239824056625366 Cost at 120: 66518892.0. Time elapsed: 23.08683753013611 Cost at 140: 58410384.0. Time elapsed: 26.898508071899414 Cost at 160: 52121384.0. Time elapsed: 30.679835319519043 Cost at 180: 47068704.0. Time elapsed: 34.441083908081055
PLOT THE RESULT
#plot the result plt.figure(figsize = (12, 12)) for i in range(10): plt.subplot(5, 2, i + 1) display_image(generated_images[i]) plt.show()
OUTPUT:-
display_image(final)
OUTPUT:-
CONCLUSION
In this tutorial, you learned about neural style transfer and we are successfully able to perform it with TensorFlow in Python programming. Neural style transfer allows one to combine two images together to create new art. And you learned how to implement Neural Style Transfer and to Use VGG19 model.
The code for this tutorial is available here
YOU CAN ALSO SEE:-
- Ensemble technique using TensorFlow and Scikit learn
- Gradient Descent Optimization using TensorFlow in Python
Leave a Reply