Transfer learning for training pneumonia dataset

This tutorial will learn how to train a dataset s using data augmentation and pre-trained models. We will use google colab for our training and validation.

Obtaining dataset from Kaggle

Kaggle is one of the best sites for obtaining Deep learning datasets. As we are using google colab, you need to learn how to download Kaggle datasets directly in google colab. The tutorial can be found here. It’s easy so do take a look at it before proceeding.

Training dataset

Here, we have collected a training dataset that contains chest x-ray images of some pneumonia patients. We classified the training dataset into two categories normal and pneumonia. Our first objective is to increase the dataset’s size, which can be done using data augmentation.

Importing the required libraries

import os
import cv2 as cv
import pathlib
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
import seaborn as sns

As you can see, we imported some of the famous ML and DL libraries like pandas, NumPy, matplotlib. matplotlib is a helpful tool for plotting graph, NumPy for vector and multidimensional array operations and Pandas for dataframe operations. OS library is for interacting the operating system (here colab interface). We can access, create and delete directories and files using the os library. Tensorflow is famous for performing efficient tensor operations.

Loading the dataset

We obtained the dataset from Kaggle covid detection competition. The link to the dataset can be found here. Let’s download the dataset and unzip it. Note that we are google colab for our programs as it offers free RAM and GPU for machine learning projects.

!kaggle datasets download -d paultimothymooney/chest-xray-pneumonia
!unzip -qq /content/drive/MyDrive/Kaggle/
data_path = '/content/drive/MyDrive/Kaggle/chest_xray'
train_dir = os.path.join(data_path,'train')
test_dir = os.path.join(data_path, 'test')

Here we have mentioned the path of the train and test dataset for accessing our operations.

Data augmentation

We have implemented data augmentation for creating more training images. Data augmentation is a common technique useful in obtaining more images which will be useful, particularly if we have only a few training images. It does operations like flipping, rotating, translating, cropping the input images. Using various combinations of these operations, we can obtain more training images. Using data augmentation is important because a neural network is good as the data we feed it.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.densenet import preprocess_input
val_generator=train_datagen.flow_from_directory(train_dir,target_size=     (224,224),class_mode='categorical',subset='validation',batch_size=3,shuffle=True)
Found 4695 images belonging to 2 classes. 
Found 521 images belonging to 2 classes.

Data augmentation in Keras is straightforward and can be implemented in a single line of code. Here we have used Keras inbuilt function ImageDataGenerator to apply real-time data augmentation, pre-process the dataset, and convert image into tensor data given as input to our pre-trained model. The syntax for the ImageDataGenerator can be found here.

Converting images into tensor is the most important step because deep learning models are nothing but a stack of mathematical functions and complex mathematical transformations to modify the input data. So neural networks take only numbers as input and give numbers as output. Images are nothing but collections of pixel data where each pixel represents the colour intensity ranging from 0-255. The above function also converts images into tensor that contains pixel data


We have also used flow_from_directory methods when applied returns batches of training data rather than full dataset from the directory specified. It is useful because if we train the entire dataset, input training will take much longer, and variance will be very low. The syntax can be found here. Batch size indicates the no of input images to the neural network at the same time.

Importing the model

from tensorflow.keras.applications import DenseNet201
base_model= DenseNet201(input_shape=[224,224,3],weights='imagenet',include_top=False)



Here we have used a pre-trained DenseNet model for our training. We can also use VGG or resnet as the base model. DenseNet model is one of the most widely used models for medical object image classification.

The architecture of DnenseNet contains many layers, so it’s difficult to mention it here. The input data size is (224,224,3), where the first row entries represent the image height and width. We can also resize the images if the input size doesn’t match these. The third entry represents the number of input channels for RGB images; it’s 3, and for grayscale images, it’s 1.

We have frozen the inside layers as it will take more time to train and the imagenet weights will change if we train inside layers. We have not included the top layers of DenseNet(can implement by setting include_top=False when importing the model) as the output size doesn’t match our needs (We need output size to be 2 for normal, pneumonia). So, we have included only up to RelU(activation) layer. This layer’s output is given as input to the average pooling layer where the tensor data is converted into a single layer data. It is given as input to the final dense layer.

The weights used here are imagenet weights which were available after training millions of images. We have used pre-trained models because of their excellent accuracy and also they contain well built CNN structures.  We have frozen the inside layers as it will take more time to train and the imagenet weights will change if we train inside layers.

Training the model

We complied the model with loss function being ‘binary_crossentropy’ which is for binary classifications. Various other loss functions are also available for different objectives. We have used Adam optimizer (Adagrad, SGD are also other optimizers available, but Adam is mostly enough for most of the models, here lr indicates the learning rates which is one of the hyperparameters ).

We have now called the fit function, which initiates the training. Note that we used train_generator, validation_generator which we have created using flow_from_directory function. The input data flows from the training file to our model our mentioned batch size rate. Epoch indicates the no of iterations. More information about compile and fit method can be found in the official documentation here.


Epoch 1/10
146/146 [==============================] - 119s 760ms/step - loss: 0.2560 - accuracy: 0.9100 - val_loss: 0.1287 - val_accuracy: 0.9441
Epoch 2/10
146/146 [==============================] - 110s 751ms/step - loss: 0.1222 - accuracy: 0.9501 - val_loss: 0.1110 - val_accuracy: 0.9518
Epoch 3/10
146/146 [==============================] - 109s 748ms/step - loss: 0.0748 - accuracy: 0.9751 - val_loss: 0.0964 - val_accuracy: 0.9595
Epoch 4/10
146/146 [==============================] - 109s 746ms/step - loss: 0.0765 - accuracy: 0.9714 - val_loss: 0.1229 - val_accuracy: 0.9480
Epoch 5/10
146/146 [==============================] - 109s 743ms/step - loss: 0.0652 - accuracy: 0.9725 - val_loss: 0.0927 - val_accuracy: 0.9653
Epoch 6/10
146/146 [==============================] - 109s 744ms/step - loss: 0.0747 - accuracy: 0.9716 - val_loss: 0.0892 - val_accuracy: 0.9634
Epoch 7/10
146/146 [==============================] - 108s 738ms/step - loss: 0.0672 - accuracy: 0.9767 - val_loss: 0.1134 - val_accuracy: 0.9576
Epoch 8/10
146/146 [==============================] - 108s 737ms/step - loss: 0.0695 - accuracy: 0.9762 - val_loss: 0.0983 - val_accuracy: 0.9595
Epoch 9/10
146/146 [==============================] - 108s 743ms/step - loss: 0.0566 - accuracy: 0.9792 - val_loss: 0.1409 - val_accuracy: 0.9441
Epoch 10/10
146/146 [==============================] - 108s 739ms/step - loss: 0.0619 - accuracy: 0.9748 - val_loss: 0.0863 - val_accuracy: 0.9595

We trained our model with accuracy as performance metric (other metrics can also available based on our needs) and evaluated it against the validation dataset. We can observe a steady decrease in both training and validation loss, indicating that our model performs well and is not overfitting.

Final validation accuracy is 95.9% which great for a small training dataset—the training accuracy further increases the training images number by data augmentation or hyperparameter tuning.


This tutorial taught us how to train a dataset with small training instances using data augmentation and pre-trained models. Try to play with different pre-trained models and data augmentation parameters to get some interesting validation accuracies.

You can also take a look at

  1. Data Augmentation using Keras in Python
  2. Learning how to reduce noises in Images using TensorFlow

Leave a Reply

Your email address will not be published. Required fields are marked *