Ensemble deep learning model for pneumonia dataset

INTRODUCTION

Ensemble learning is one of the most powerful deep learning techniques for getting great training accuracy. So in this tutorial, we are going to use ensemble learning for training a pneumonia dataset. Please take a look at my previous tutorial to learn how to train pneumonia dataset using transfer learning. Link for the tutorial is here. We will use google colab for training our dataset. Let’s dive into our tutorial.

OBTAINING THE DATASET

As I explained in my previous tutorial, we will get the pneumonia training dataset from Kaggle. Link to the dataset is here. Download and extract the dataset in the colab. The dataset has over 5000 training images and over 500 validation images. The dataset is divided into normal and pneumonia, so this problem comes under binary classification.

IMPORT THE LIBRARIES

import os
import cv2 as cv
import pathlib
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()

Refer to my previous tutorial to learn how to use transfer learning for pneumonia dataset.  We will now directly move to the training models. Once again, link to my previous tutorial is here.

PRE-TRAINED MODELS

In my previous tutorial, I used Densenet model to train the dataset. We need more than two models to use ensemble learning to use 5 models (Densenet, InceptionV3, Resnet, InceptionResnet, VGG19). We will train our dataset on these models and combine their results using ensemble learning.

I have already trained the 5 models and downloaded the weights for the output layer to use for my model. You can save model weights using model.save_weights(“file_name.h5”).  Since we have imported imagenet weights for our pre-trained model, downloading full model weights takes more data. After importing the model, I have used a 512 dense layer and an output model. So I am going to download these two-layer weights separately.

weightsAndBiases_1 = DenseNet.layers[-1].get_weights()
weightsAndBiases_2 = DenseNet.layers[-2].get_weights()

import pickle
with open("x2.txt", "wb") as fp:
  pickle.dump(weightsAndBiases_2, fp)
with open("pred.txt", "wb") as fp:
  pickle.dump(weightsAndBiases_1, fp)

Here I first saved the weights of 512 dense layer and output layer. Then using the pickle library, I downloaded the consequences as text files.

Densenet

from tensorflow.keras.applications import DenseNet201
base_model1=DenseNet201(input_shape=[224,224,3],weights='imagenet',include_top=False) 

x1=base_model1.output
base_model1.trainable=False
x11=keras.layers.GlobalAveragePooling2D()(x1)
x21=keras.layers.Dense(512,activation='relu')(x11)
preds1=keras.layers.Dense(3,activation='softmax')(x21) 
DenseNet=keras.models.Model(inputs=[base_model1.input],outputs=[preds1]) #specify the inputs and outputs

with open("/content/DenseNet/x2.txt", "rb") as fp:
  weightsAndBiases_21=pickle.load(fp)
with open("/content/DenseNet/pred.txt", "rb") as fp:
  weightsAndBiases_11=pickle.load(fp)
DenseNet.layers[-1].set_weights(weightsAndBiases_11)
DenseNet.layers[-2].set_weights(weightsAndBiases_21)

DenseNet.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.Adam(lr=0.001),metrics=['accuracy'])

As you can see, I built the model and used the already downloaded last two layer weights and used it in my dense and output layer. Now I don’t need to train and can use the test set to test my model accuracy.

InceptionResnet

from tensorflow.keras.applications import InceptionResNetV2
base_model2=InceptionResNetV2(input_shape=[224,224,3],weights='imagenet',include_top=False) 

x2=base_model2.output
base_model2.trainable=False
x12=keras.layers.GlobalAveragePooling2D()(x2)
x22=keras.layers.Dense(512,activation='relu')(x12)
preds2=keras.layers.Dense(3,activation='softmax')(x22) 
IRNet=keras.models.Model(inputs=[base_model2.input],outputs=[preds2]) #specify the inputs and outputs

with open("/content/IRNet/x2.txt", "rb") as fp:
  weightsAndBiases_22=pickle.load(fp)
with open("/content/IRNet/pred.txt", "rb") as fp:
  weightsAndBiases_12=pickle.load(fp)
IRNet.layers[-1].set_weights(weightsAndBiases_12)
IRNet.layers[-2].set_weights(weightsAndBiases_22)

IRNet.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.Adam(lr=0.001),metrics=['accuracy'])

Resnet

from tensorflow.keras.applications import ResNet152V2
base_model3=ResNet152V2(input_shape=[224,224,3],weights='imagenet',include_top=False) 

x3=base_model3.output
base_model3.trainable=False
x13=keras.layers.GlobalAveragePooling2D()(x3)
x23=keras.layers.Dense(512,activation='relu')(x13)
preds3=keras.layers.Dense(3,activation='softmax')(x23) 
ResNet=keras.models.Model(inputs=[base_model3.input],outputs=[preds3]) #specify the inputs and outputs

with open("/content/ResNet/x2.txt", "rb") as fp:
  weightsAndBiases_23=pickle.load(fp)
with open("/content/ResNet/pred.txt", "rb") as fp:
  weightsAndBiases_13=pickle.load(fp)
ResNet.layers[-1].set_weights(weightsAndBiases_13)

ResNet.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.Adam(lr=0.001),metrics=['accuracy']

Inception

from tensorflow.keras.applications import InceptionV3
base_model4=InceptionV3(input_shape=[224,224,3],weights='imagenet',include_top=False) 

x4=base_model4.output
base_model4.trainable=False
x14=keras.layers.GlobalAveragePooling2D()(x4)
x24=keras.layers.Dense(512,activation='relu')(x14)
preds4=keras.layers.Dense(3,activation='softmax')(x24) 
Inception=keras.models.Model(inputs=[base_model4.input],outputs=[preds4]) #specify the inputs and outputs

with open("/content/Inception/x2.txt", "rb") as fp:
  weightsAndBiases_24=pickle.load(fp)
with open("/content/Inception/pred.txt", "rb") as fp:
  weightsAndBiases_14=pickle.load(fp)
Inception.layers[-1].set_weights(weightsAndBiases_14)

Inception.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.Adam(lr=0.001),metrics=['accuracy'])

VGG19

from tensorflow.keras.applications import VGG19
base_model5=VGG19(input_shape=[224,224,3],weights='imagenet',include_top=False) 

x5=base_model5.output
base_model5.trainable=False
x15=keras.layers.GlobalAveragePooling2D()(x5)
x25=keras.layers.Dense(512,activation='relu')(x15)
preds5=keras.layers.Dense(3,activation='softmax')(x25) 
VGG=keras.models.Model(inputs=[base_model5.input],outputs=[preds5]) #specify the inputs and outputs

with open("/content/VGG/x2.txt", "rb") as fp:
  weightsAndBiases_25=pickle.load(fp)
with open("/content/VGG/pred.txt", "rb") as fp:
  weightsAndBiases_15=pickle.load(fp)
VGG.layers[-1].set_weights(weightsAndBiases_15)

VGG.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.Adam(lr=0.001),metrics=['accuracy'])
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
80142336/80134624 [==============================] - 0s 0us/step

INITIALIZING DATA GENERATOR FOR OUR MODEL

We will preprocess and convert image data into tensor data using ImageDataGenerator function and use flow_from_directory to get data directly from the mentioned directory in specified batches. We will use these functions for our 5 pre-trained models.

Densenet

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.densenet import preprocess_input as dn

dn_train_datagen=ImageDataGenerator(rotation_range=20,width_shift_range=0.3,height_shift_range=0.3,shear_range=0.2,preprocessing_function=dn,validation_split=0.1,horizontal_flip=True,vertical_flip=True,zoom_range=0.2)

#train_generator=train_datagen.flow_from_directory(train_dir,target_size=(224,224),class_mode='categorical',subset='training',shuffle=True)

dn_val_generator=dn_train_datagen.flow_from_directory(train_dir,target_size=(224,224),class_mode='categorical',subset='validation',batch_size=3,shuffle=False

We can use train_generator only once as we will use the same training data for our 5 models. We will use different shuffled validation data for our 5 models.

InceptionResnet

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.inception_resnet_v2 import preprocess_input as ir
# Get your train and test data
ir_train_datagen=ImageDataGenerator(rotation_range=20,width_shift_range=0.3,height_shift_range=0.3,shear_range=0.2,preprocessing_function=ir,validation_split=0.1,horizontal_flip=True,vertical_flip=True,zoom_range=0.2)
ir_val_generator=ir_train_datagen.flow_from_directory(train_dir,target_size=(224,224),class_mode='categorical',subset='validation',batch_size=3,shuffle=False)

Inception

from tensorflow.keras.applications.inception_v3 import preprocess_input as i
# Get your train and test data
i_train_datagen=ImageDataGenerator(rotation_range=20,width_shift_range=0.3,height_shift_range=0.3,shear_range=0.2,preprocessing_function=i,validation_split=0.1,horizontal_flip=True,vertical_flip=True,zoom_range=0.2)
i_val_generator=i_train_datagen.flow_from_directory(train_dir,target_size=(224,224),class_mode='categorical',subset='validation',batch_size=3,shuffle=False)

Resnet

from tensorflow.keras.applications.resnet_v2 import preprocess_input as rn
# Get your train and test data
rn_train_datagen=ImageDataGenerator(rotation_range=20,width_shift_range=0.3,height_shift_range=0.3,shear_range=0.2,preprocessing_function=rn,validation_split=0.1,horizontal_flip=True,vertical_flip=True,zoom_range=0.2)
rn_val_generator=rn_train_datagen.flow_from_directory(train_dir,target_size=(224,224),class_mode='categorical',subset='validation',batch_size=3,shuffle=False)

VGG19

from tensorflow.keras.applications.vgg19 import preprocess_input
# Get your train and test data
vgg_train_datagen=ImageDataGenerator(rotation_range=20,width_shift_range=0.3,height_shift_range=0.3,shear_range=0.2,preprocessing_function=preprocess_input,validation_split=0.1,horizontal_flip=True,vertical_flip=True,zoom_range=0.2)
vgg_val_generator=vgg_train_datagen.flow_from_directory(train_dir,target_size=(224,224),class_mode='categorical',subset='validation',batch_size=3,shuffle=False)

ENSEMBLE LEARNING

As we are solving a classifier problem, we will use voting classifier as our ensemble learning technique. We will use soft voting classifier and stack exchange to test our ensembled model accuracies.

SOFT VOTING CLASSIFIER

In soft voting classifier, the output class is the prediction based on the average probability given to the class. Suppose prediction of 1st model is (0.4,0.6) and 2nd model is (0.5,0.5). Then the prediction of the soft voting classifier is average of 1st and 2nd model (0.45,0.55). The highest probability is B class, so the soft voting classifier gives the output as B class.

labels=np.array([0.,0.,0.,0.,0.,0.,0.,1.,1.,1.,1.,1.,1.,1.,0.,0.,1.,0.,0.,0.,0.,])

step_size_validation=dn_val_generator.n//dn_val_generator.batch_size
dn_prob=DenseNet.predict(dn_val_generator,steps=step_size_validation)
ir_prob=IRNet.predict(ir_val_generator,steps=step_size_validation)
i_prob=Inception.predict(i_val_generator,steps=step_size_validation)
rn_prob=ResNet.predict(rn_val_generator,steps=step_size_validation)
vgg_prob=VGG.predict(vgg_val_generator,steps=step_size_validation)

Here we took some test data and labelled it. Next, we gave the test data input to these 5 models and stored the class probabilities for these 5 models in separate variables.

stack=np.dstack((dn_prob,ir_prob,i_prob,rn_prob,vgg_prob))
avg_ensemble_prob=np.mean(stack,axis=-1)
avg_ensemble_pred=np.argmax(avg_ensemble_prob,axis=-1)

Now we average the class probabilities and store the output class prediction in avg_ensemble_pred variable. Let’s check how our ensemble model performed for all the test image using the ROC curve.

from sklearn.preprocessing import label_binarize
y_true = label_binarize(labels,classes=[0,1])

n_classes = 2
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = metrics.roc_curve(y_true[:, i], avg_ensemble_prob[:, i])
    roc_auc[i] = metrics.auc(fpr[i], tpr[i])

# Plot of a ROC curve for a specific class
for i in range(n_classes):
    plt.figure()
    plt.plot(fpr[i], tpr[i], label='ROC curve (area = %0.2f)' % roc_auc[i])
    plt.plot([0, 1], [0, 1], 'k--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver operating characteristic curve for class'+str(i))
    plt.legend(loc="lower right")
    plt.show()
Let’s take a look at the ROC curve for the soft voting classifier. The link is here.
As you can see here, the area under the ROC curve (AUC) informs us about how well our model performs. We got the test predictions performance which is nearly =1, which indicates that our soft voting classifier predicted almost all the test images correctly.

STACK ENSEMBLE

The soft voting classifier’s problem is that all models irrelevance to their individual performance is treated equally. The variant of voting classifier called stack ensemble computes the weighted average of model probabilities in which better performing models are given more weights are less performing models are given low weights.

In stacking, the individual model predictions are given as input, and the algorithm learns how to combine the input prediction best to make a better output prediction.

stack=stack.reshape((stack.shape[0],stack.shape[1]*stack.shape[2]))

from sklearn.linear_model import LogisticRegression
softmax_reg=LogisticRegression(multi_class='multinomial',solver='lbfgs',C=10)
softmax_reg.fit(stack,labels)

stack_ensemble_prob=softmax_reg.predict_proba(stack)
stack_ensemble_pred=softmax_reg.predict(stack)

Here the stack variable represents the individual model predictions. It is given as input to a logistic regression model and the class labels, which then learn each model’s weights. Let’s check the model performance using the ROC curve.

from sklearn.preprocessing import label_binarize
y_true = label_binarize(labels,classes=[0,1])

n_classes = 2
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = metrics.roc_curve(y_true[:, i], stack_ensemble_prob[:, i])
    roc_auc[i] = metrics.auc(fpr[i], tpr[i])

# Plot of a ROC curve for a specific class
for i in range(n_classes):
    plt.figure()
    plt.plot(fpr[i], tpr[i], label='ROC curve (area = %0.2f)' % roc_auc[i])
    plt.plot([0, 1], [0, 1], 'k--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver operating characteristic curve for class'+str(i))
    plt.legend(loc="lower right")
    plt.show()
Let’s take a look at the ROC curve for stack ensemble. The link is here.
Here the AUC is 1, which indicates that our stack generalized ensemble model predicted all the test images correctly. When more images are given as test inputs, we can know about its performance better.

CONCLUSION

We have trained our pneumonia dataset on 5 different pre-trained models and used two ensemble learning techniques to improve our training accuracy. We conclude that the ensembled model accuracy is more remarkable than each of the individual models. By comparing the two above mentioned ensemble techniques, stack ensembles perform better than the soft voting classifiers.

Leave a Reply

Your email address will not be published. Required fields are marked *