Feature disentanglement using PCA
In this tutorial, we will look at how to use feature entanglement for generalizing your model. In deep learning generalizing, your model is critical and even more important in the classification task. Feature disentanglement occurs when you disentangle your higher-level features in 2D space and represent them in your x-y plot.
IMPORT THE PYTHON LIBRARIES
import os import cv2 as cv import pathlib import matplotlib.pyplot as plt import tensorflow as tf from tensorflow import keras import numpy as np import pandas as pd import seaborn as sns sns.set()
You already know most of these libraries, so I think no need to explain more of these libraries.
GETTING THE DATASET
For the dataset, I downloaded nearly 200 images of covid, normal and pneumonia images. You can get these images easily from public websites. My dataset contains 71 images each of this dataset. Just spend a few minutes searching for the images, or you can also do this for just pneumonia and normal classes.
data_path = 'C:/Windows/System32/ML_PATHtransfer_learning_covid/' train_dir = os.path.join(data_path,'train') test_dir = os.path.join(data_path, 'test')
I stored my dataset on my computer as I used only Jupiter notebook for training. If you wish to use colab, upload your dataset in your drive and mount it in your colab notebook. I stored the training and test directory in two variables.
BUILDING THE MODEL
from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.applications.densenet import preprocess_input from tensorflow.keras.applications import DenseNet201
I am going to use the DenseNet model for my training. So, I imported the DenseNet model in Keras.
We have used data augmentation techniques to increase the size of our training dataset. It can be implemented using the ImageDataGenerator function in TensorFlow.flow_from_directory method is used and, when applied, returns batches of training data rather than the full dataset from the directory specified.
base_model=DenseNet201(input_shape=[224,224,3],weights='imagenet',include_top=False) x=base_model.output base_model.trainable=False x1=keras.layers.GlobalAveragePooling2D()(x) x2=keras.layers.Dense(512,activation='relu')(x1) DenseNet=keras.models.Model(inputs=[base_model.input],outputs=[x2])
We have frozen the inside layers as it will take more time to train, and the imagenet weights will change if we train inside layers. You can look more at the model building in my first post. The link to the post is here.
Note that the output layer shape is 512 and not 3 because we plot the features, so we need only the features and not the output probabilities. The final layer is a Dense layer that contains 512 filters and a ‘ReLU’ activation function.
import pickle with open("C:/Windows/System32/ML_PATH/Electrothon 3.0/DenseNet/x2.txt", "rb") as fp: weightsAndBiases=pickle.load(fp) DenseNet.layers[-1].set_weights(weightsAndBiases)
I already had trained the model and downloaded the weights to use those weight here. If you wish, you can compile and train the model using the fit method and download the weight after training.
DenseNet.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.Adam(lr=0.001),metrics=['accuracy']) labels=np.array([0.,0.,0.,0.,0.,0.,0.,1.,1.,1.,1.,1.,1.,1.,2.,2.,2.,2.,2.,2.,2.,]) features=DenseNet.predict(val_generator) features.shape
I compiled the model and then extracted the features from the output. Now we have 512 features. To represent it in 2D space, we need to compress these features in 2D space such that the compression error is minimum.
PRINCIPAL COMPONENT ANALYSIS
PCA is a very famous algorithm used in the representation of features in the desired hyperspace. You can find many articles that show how PCA works and the principle behind it. Generally, it is used in the compression of features in a compact space representation.
Here we will give these 512 filters input to the algorithm and then represent the 2D output.
pca=PCA(n_components=2) pca_features=pca.fit_transform(features) pca.explained_variance_ratio_
array([0.36044383, 0.25491834], dtype=float32)
Now the compressed 2D features are stored in a variable using PCA, and we also got the variance output. Let us represent the output features in a scatter plot.
def plot3clusters(X, title, vtitle, target_names): plt.figure() # Select the colours of the clusters colors = ['#A43F98', '#5358E0', '#DE0202'] lw = 2 plt.figure(figsize=(9,7)); for color, i, target_name in zip(colors, [0, 1, 2], target_names): plt.scatter(X[7*i:7*(i+1), 0], X[7*i:7*(i+1), 1], color=color, alpha=1., lw=lw, label=target_name); plt.legend(loc='best', shadow=False, scatterpoints=1) plt.title(title); plt.xlabel(vtitle + "1") plt.ylabel(vtitle + "2") plt.show();
target_names = ['covid', 'normal', 'pneumonia'] plot3clusters(pca_features, 'Encoded data latent-space', 'dimension', target_names)
We created a helper function to display the output in a scatter plot. You can find the output plot here.
From the plot, you can see that the features of these 3 classes are located in distinct regions. This shows us that the features of these 3 classes are distinguished and not entangled. This provides us with nice proof that our model is generalisable and can distinguish the 3 classes.