Feature disentanglement using PCA

INTRODUCTION

In this tutorial, we will look at how to use feature entanglement for generalizing your model. In deep learning generalizing, your model is critical and even more important in the classification task. Feature disentanglement occurs when you disentangle your higher-level features in 2D space and represent them in your x-y plot.

IMPORT THE PYTHON LIBRARIES

import os
import cv2 as cv
import pathlib
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()

You already know most of these libraries, so I think no need to explain more of these libraries.

GETTING THE DATASET

For the dataset, I downloaded nearly 200 images of covid, normal and pneumonia images. You can get these images easily from public websites. My dataset contains 71 images each of this dataset. Just spend a few minutes searching for the images, or you can also do this for just pneumonia and normal classes.

data_path = 'C:/Windows/System32/ML_PATHtransfer_learning_covid/'
train_dir = os.path.join(data_path,'train')
test_dir = os.path.join(data_path, 'test')

I stored my dataset on my computer as I used only Jupiter notebook for training. If you wish to use colab, upload your dataset in your drive and mount it in your colab notebook. I stored the training and test directory in two variables.

BUILDING THE MODEL

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.densenet import preprocess_input
from tensorflow.keras.applications import DenseNet201

I am going to use the DenseNet model for my training. So, I imported the DenseNet model in Keras.

train_datagen=ImageDataGenerator(rotation_range=20,width_shift_range=0.3,height_shift_range=0.3,shear_range=0.2,preprocessing_function=preprocess_input,validation_split=0.1,horizontal_flip=True,vertical_flip=True,zoom_range=0.2)
val_generator=train_datagen.flow_from_directory(train_dir,target_size=(224,224),class_mode='categorical',subset='validation',batch_size=3,shuffle=False)

We have used data augmentation techniques to increase the size of our training dataset. It can be implemented using the ImageDataGenerator function in TensorFlow.flow_from_directory method is used and, when applied, returns batches of training data rather than the full dataset from the directory specified.

base_model=DenseNet201(input_shape=[224,224,3],weights='imagenet',include_top=False)

x=base_model.output
base_model.trainable=False
x1=keras.layers.GlobalAveragePooling2D()(x)
x2=keras.layers.Dense(512,activation='relu')(x1)
DenseNet=keras.models.Model(inputs=[base_model.input],outputs=[x2])

We have frozen the inside layers as it will take more time to train, and the imagenet weights will change if we train inside layers. You can look more at the model building in my first post. The link to the post is here.

Note that the output layer shape is 512 and not 3 because we plot the features, so we need only the features and not the output probabilities. The final layer is a Dense layer that contains 512 filters and a ‘ReLU’ activation function.

import pickle
with open("C:/Windows/System32/ML_PATH/Electrothon 3.0/DenseNet/x2.txt", "rb") as fp:
  weightsAndBiases=pickle.load(fp)

DenseNet.layers[-1].set_weights(weightsAndBiases)

I already had trained the model and downloaded the weights to use those weight here. If you wish, you can compile and train the model using the fit method and download the weight after training.

DenseNet.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.Adam(lr=0.001),metrics=['accuracy'])

labels=np.array([0.,0.,0.,0.,0.,0.,0.,1.,1.,1.,1.,1.,1.,1.,2.,2.,2.,2.,2.,2.,2.,])
features=DenseNet.predict(val_generator)

features.shape
(21, 512)

I compiled the model and then extracted the features from the output. Now we have 512 features. To represent it in 2D space, we need to compress these features in 2D space such that the compression error is minimum.

PRINCIPAL COMPONENT ANALYSIS

PCA is a very famous algorithm used in the representation of features in the desired hyperspace. You can find many articles that show how PCA works and the principle behind it. Generally, it is used in the compression of features in a compact space representation.

Here we will give these 512 filters input to the algorithm and then represent the 2D output.

pca=PCA(n_components=2)
pca_features=pca.fit_transform(features)

pca.explained_variance_ratio_
array([0.36044383, 0.25491834], dtype=float32)

Now the compressed 2D features are stored in a variable using PCA, and we also got the variance output. Let us represent the output features in a scatter plot.

def plot3clusters(X, title, vtitle, target_names):
    plt.figure()
    
    # Select the colours of the clusters
    colors = ['#A43F98', '#5358E0', '#DE0202']
    lw = 2
    plt.figure(figsize=(9,7));
    for color, i, target_name in zip(colors, [0, 1, 2], target_names):
        plt.scatter(X[7*i:7*(i+1), 0], X[7*i:7*(i+1), 1], color=color, alpha=1., lw=lw, label=target_name);
   
    plt.legend(loc='best', shadow=False, scatterpoints=1)
    plt.title(title);
    plt.xlabel(vtitle + "1")
    plt.ylabel(vtitle + "2")
    plt.show();
target_names = ['covid', 'normal', 'pneumonia']
plot3clusters(pca_features, 'Encoded data latent-space', 'dimension', target_names)

We created a helper function to display the output in a scatter plot. You can find the output plot here.

CONCLUSION

From the plot, you can see that the features of these 3 classes are located in distinct regions. This shows us that the features of these 3 classes are distinguished and not entangled. This provides us with nice proof that our model is generalisable and can distinguish the 3 classes.

 

Leave a Reply

Your email address will not be published. Required fields are marked *