Transfer learning for training pneumonia dataset
This tutorial will learn how to train a dataset s using data augmentation and pre-trained models. We will use google colab for our training and validation.
Obtaining dataset from Kaggle
Kaggle is one of the best sites for obtaining Deep learning datasets. As we are using google colab, you need to learn how to download Kaggle datasets directly in google colab. The tutorial can be found here. It’s easy so do take a look at it before proceeding.
Here, we have collected a training dataset that contains chest x-ray images of some pneumonia patients. We classified the training dataset into two categories normal and pneumonia. Our first objective is to increase the dataset’s size, which can be done using data augmentation.
Importing the required libraries
import os import cv2 as cv import pathlib import matplotlib.pyplot as plt import tensorflow as tf from tensorflow import keras import numpy as np import pandas as pd import seaborn as sns sns.set()
As you can see, we imported some of the famous ML and DL libraries like pandas, NumPy, matplotlib. matplotlib is a helpful tool for plotting graph, NumPy for vector and multidimensional array operations and Pandas for dataframe operations. OS library is for interacting the operating system (here colab interface). We can access, create and delete directories and files using the os library. Tensorflow is famous for performing efficient tensor operations.
Loading the dataset
We obtained the dataset from Kaggle covid detection competition. The link to the dataset can be found here. Let’s download the dataset and unzip it. Note that we are google colab for our programs as it offers free RAM and GPU for machine learning projects.
!kaggle datasets download -d paultimothymooney/chest-xray-pneumonia !unzip -qq /content/drive/MyDrive/Kaggle/chest-xray-pneumonia.zip
data_path = '/content/drive/MyDrive/Kaggle/chest_xray' train_dir = os.path.join(data_path,'train') test_dir = os.path.join(data_path, 'test')
Here we have mentioned the path of the train and test dataset for accessing our operations.
We have implemented data augmentation for creating more training images. Data augmentation is a common technique useful in obtaining more images which will be useful, particularly if we have only a few training images. It does operations like flipping, rotating, translating, cropping the input images. Using various combinations of these operations, we can obtain more training images. Using data augmentation is important because a neural network is good as the data we feed it.
from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.applications.densenet import preprocess_input
train_datagen=ImageDataGenerator(rotation_range=20,width_shift_range=0.3,preprocessing_function=preprocess_input,validation_split=0.1) train_generator=train_datagen.flow_from_directory(train_dir,target_size=(224,224),class_mode='categorical',subset='training',shuffle=True) val_generator=train_datagen.flow_from_directory(train_dir,target_size= (224,224),class_mode='categorical',subset='validation',batch_size=3,shuffle=True)
Found 4695 images belonging to 2 classes. Found 521 images belonging to 2 classes.
Data augmentation in Keras is straightforward and can be implemented in a single line of code. Here we have used Keras inbuilt function ImageDataGenerator to apply real-time data augmentation, pre-process the dataset, and convert image into tensor data given as input to our pre-trained model. The syntax for the ImageDataGenerator can be found here.
Converting images into tensor is the most important step because deep learning models are nothing but a stack of mathematical functions and complex mathematical transformations to modify the input data. So neural networks take only numbers as input and give numbers as output. Images are nothing but collections of pixel data where each pixel represents the colour intensity ranging from 0-255. The above function also converts images into tensor that contains pixel data
We have also used flow_from_directory methods when applied returns batches of training data rather than full dataset from the directory specified. It is useful because if we train the entire dataset, input training will take much longer, and variance will be very low. The syntax can be found here. Batch size indicates the no of input images to the neural network at the same time.
Importing the model
from tensorflow.keras.applications import DenseNet201 base_model= DenseNet201(input_shape=[224,224,3],weights='imagenet',include_top=False) x=base_model.output base_model.trainable=False x1=keras.layers.GlobalAveragePooling2D()(x) x2=keras.layers.Dense(512,activation='relu')(x1) preds=keras.layers.Dense(2,activation='softmax')(x) DenseNet=keras.models.Model(inputs=[base_model.input],outputs=[preds]) DenseNet.summary()
Here we have used a pre-trained DenseNet model for our training. We can also use VGG or resnet as the base model. DenseNet model is one of the most widely used models for medical object image classification.
The architecture of DnenseNet contains many layers, so it’s difficult to mention it here. The input data size is (224,224,3), where the first row entries represent the image height and width. We can also resize the images if the input size doesn’t match these. The third entry represents the number of input channels for RGB images; it’s 3, and for grayscale images, it’s 1.
We have frozen the inside layers as it will take more time to train and the imagenet weights will change if we train inside layers. We have not included the top layers of DenseNet(can implement by setting include_top=False when importing the model) as the output size doesn’t match our needs (We need output size to be 2 for normal, pneumonia). So, we have included only up to RelU(activation) layer. This layer’s output is given as input to the average pooling layer where the tensor data is converted into a single layer data. It is given as input to the final dense layer.
The weights used here are imagenet weights which were available after training millions of images. We have used pre-trained models because of their excellent accuracy and also they contain well built CNN structures. We have frozen the inside layers as it will take more time to train and the imagenet weights will change if we train inside layers.
Training the model
We complied the model with loss function being ‘binary_crossentropy’ which is for binary classifications. Various other loss functions are also available for different objectives. We have used Adam optimizer (Adagrad, SGD are also other optimizers available, but Adam is mostly enough for most of the models, here lr indicates the learning rates which is one of the hyperparameters ).
We have now called the fit function, which initiates the training. Note that we used train_generator, validation_generator which we have created using flow_from_directory function. The input data flows from the training file to our model our mentioned batch size rate. Epoch indicates the no of iterations. More information about compile and fit method can be found in the official documentation here.
model.compile(loss='binary_crossentropy',optimizer=keras.optimizers.Adam(lr=0.001),metrics=['accuracy']) step_size_train=train_generator.n//train_generator.batch_size step_size_validation=val_generator.n//val_generator.batch_size model.fit(train_generator,validation_data=val_generator,steps_per_epoch=step_size_train,validation_steps=step_size_validation,epochs=10)
Epoch 1/10 146/146 [==============================] - 119s 760ms/step - loss: 0.2560 - accuracy: 0.9100 - val_loss: 0.1287 - val_accuracy: 0.9441 Epoch 2/10 146/146 [==============================] - 110s 751ms/step - loss: 0.1222 - accuracy: 0.9501 - val_loss: 0.1110 - val_accuracy: 0.9518 Epoch 3/10 146/146 [==============================] - 109s 748ms/step - loss: 0.0748 - accuracy: 0.9751 - val_loss: 0.0964 - val_accuracy: 0.9595 Epoch 4/10 146/146 [==============================] - 109s 746ms/step - loss: 0.0765 - accuracy: 0.9714 - val_loss: 0.1229 - val_accuracy: 0.9480 Epoch 5/10 146/146 [==============================] - 109s 743ms/step - loss: 0.0652 - accuracy: 0.9725 - val_loss: 0.0927 - val_accuracy: 0.9653 Epoch 6/10 146/146 [==============================] - 109s 744ms/step - loss: 0.0747 - accuracy: 0.9716 - val_loss: 0.0892 - val_accuracy: 0.9634 Epoch 7/10 146/146 [==============================] - 108s 738ms/step - loss: 0.0672 - accuracy: 0.9767 - val_loss: 0.1134 - val_accuracy: 0.9576 Epoch 8/10 146/146 [==============================] - 108s 737ms/step - loss: 0.0695 - accuracy: 0.9762 - val_loss: 0.0983 - val_accuracy: 0.9595 Epoch 9/10 146/146 [==============================] - 108s 743ms/step - loss: 0.0566 - accuracy: 0.9792 - val_loss: 0.1409 - val_accuracy: 0.9441 Epoch 10/10 146/146 [==============================] - 108s 739ms/step - loss: 0.0619 - accuracy: 0.9748 - val_loss: 0.0863 - val_accuracy: 0.9595
We trained our model with accuracy as performance metric (other metrics can also available based on our needs) and evaluated it against the validation dataset. We can observe a steady decrease in both training and validation loss, indicating that our model performs well and is not overfitting.
Final validation accuracy is 95.9% which great for a small training dataset—the training accuracy further increases the training images number by data augmentation or hyperparameter tuning.
This tutorial taught us how to train a dataset with small training instances using data augmentation and pre-trained models. Try to play with different pre-trained models and data augmentation parameters to get some interesting validation accuracies.
You can also take a look at