Image classification of Bird species using Keras in Python

Bird species

In this article, Image classification for huge datasets is clearly explained, step by step with the help of a bird species dataset. The major techniques used in this project are Data Augmentation and Transfer Learning methods, for improving the quality of our model. VGG16 pre-trained model for transfer learning is a very efficient open-source model. It consists of various convolutional layers followed by pooling layers. Pooling layers are responsible for the narrowing of the layers.

Few of the bird species in the dataset are the African Firefinch, Albatross, American coot, Anhinga, Bald Eagle, Bird of paradise, common loon, eastern bluebird, flamingo, golden ibis, hornbill, Javan magpie, killdear, king vulture, northern jacana, pelican, puffin, ostrich, robin, roadrunner, sand martin, etc. Few specifications about this dataset are:

  • A total of 31316 training images, 1125 test images(5 per species), and 1125 validation images(5 per species).
  • All images are 224 X 224 X 3 color images in jpg format (Thus, no formatting from our side is required).
  • Images gathered from internet searches by species name.

We are going to use the dataset for the classification of bird species with the help of Keras TensorFlow deep learning API in Python. This is actually an image classification task where we will classify different species of birds.

Note: This is always better to preprocess your dataset first and after that feed it to the learning algorithm otherwise preprocessing of our dataset will happen on each of the epoch.

Google Colaboratory is the preferred medium for machine learning algorithms because it supports free cloud service and free GPU service. Click here to go to google collaboratory notebook.

To enable GPU service, follow the steps after entering the Colab page, Go to Edit -> Notebook settings -> Enable GPU.

Thank you and Stay tuned!!!

INSTALLING DEPENDENCIES

The following are the dependent Python libraries in this project. Google Colaboratory has all the dependencies for this project downloaded in the server. So if Google Colaboratory is the platform used for coding, ignore this code and move to the next directly.

!pip install tensorflow
!pip install keras
!pip install numpy

IMPORTING THE REQUIRED LIBRARIES

The Python libraries are imported depending on the needs of this project.

import keras
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.vgg16 import preprocess_input
from google.colab import files
Using TensorFlow backend.

Keras is already coming with TensorFlow. This is the deep learning API that is going to perform the main classification task.

UPLOADING DATASET

Datasets are procured from the Kaggle website, which is a large data science community with powerful tools and open-source datasets. The code activates the API token and downloads directly from the Kaggle website into the Colab notebook.

Another method to use the dataset is as follows:

  • Click here to go to the Kaggle site.
  • Press the download(1 GB) button on the web page.
  • Now, Click here to go to Google Colab.
  • Press the Upload to section storage and upload the downloaded dataset.
files.upload()
!mkdir -p  ~/.kaggle
!cp kaggle.json ~/.kaggle
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download -d gpiosenka/100-bird-species
Saving kaggle.json to kaggle.json
Downloading 100-bird-species.zip to /content 
99% 1.27G/1.28G [00:21<00:00, 72.8MB/s] 
100% 1.28G/1.28G [00:21<00:00, 63.2MB/s]

EXTRACTING THE ZIP FILE

Kaggle stores the dataset in zip format to keep all the related files together thus making moving files from one place to another easier.

import zipfile
local_zip = '/content/100-bird-species.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/content/')
zip_ref.close()

CREATING GENERATORS

Generators load the dataset while training deep learning models. Data augmentation is a technique of artificially creating new data from existing training data. It helps in:

  • Increasing the size of the dataset
  • Introduces variability in the dataset, without the use of additional data.
train_datagen = ImageDataGenerator(
        preprocessing_function=preprocess_input,
        shear_range=0.1,
        zoom_range=0.1,
        horizontal_flip=True)
train_generator = train_datagen.flow_from_directory('/content/train',target_size=(224, 224),batch_size=64,class_mode='categorical')

#Creating generator for Validation DataSet
val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
val_generator = val_datagen.flow_from_directory('/content/valid',target_size=(224, 224),batch_size=32,class_mode='categorical')

#Creating generator for Test DataSet
test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
test_generator = test_datagen.flow_from_directory('/content/test',target_size=(224, 224),batch_size=32,class_mode='categorical')
Found 29544 images belonging to 215 classes. 
Found 1075 images belonging to 215 classes. 
Found 1075 images belonging to 215 classes.

PRE-TRAINED MODEL

The VGG16 model loads the weights from pre-trained on ImageNet. VGG16 network’s bottom layers are closer to the image are wide, whereas the top layers are deep.

base_model=keras.applications.VGG16(include_top=False, weights="imagenet", input_shape=(224,224,3))
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 3s 0us/step

FREEZING BASE LAYER

Freeze all layers in the base model. Advantages of freezing layers are:

  • Reduces the training model time
  • Backpropogate and update weights only for a couple of layers, thus saving computational time.
base_model.trainable = False

ADDING LAYERS

In this command, new layers from the last layer of the pre-trained model are architected. The dropout layer prevents the model from overfitting. The 215 in the last dense layer indicates the total number of possible classes in the dataset.

from keras.models import Sequential
from keras.layers import Dense,Flatten,Dropout
model=Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(2048,activation='relu',kernel_initializer='he_normal'))
model.add(Dropout(0.35))
model.add(Dense(2048,activation='relu',kernel_initializer='he_normal'))
model.add(Dropout(0.35))
model.add(Dense(215,activation='softmax',kernel_initializer='glorot_normal'))

SUMMARY OF THE MODEL

The summary of the model shows the number of layers and the specifics of the model layers precisely/ architecture of the neural network thus increasing the ease of understanding of the network. It displays all the layers including the pre-trained layers and the new layers included previously.

model.summary()
Model: "sequential_2"
_________________________________________________________________ 
Layer (type)               Output Shape              Param # 
=================================================================
vgg16 (Model)           (None, 7, 7, 512)           14714688 
_________________________________________________________________ 
flatten_2 (Flatten)       (None, 25088)                0 
_________________________________________________________________ 
dense_4 (Dense)           (None, 2048)              51382272 
_________________________________________________________________ 
dropout_3 (Dropout)       (None, 2048)                 0 
_________________________________________________________________ 
dense_5 (Dense)           (None, 2048)              4196352 
_________________________________________________________________ 
dropout_4 (Dropout)       (None, 2048)                 0 
_________________________________________________________________ 
dense_6 (Dense)           (None, 225)                440535 
================================================================= 
Total params: 70,733,847 
Trainable params: 56,019,159 
Non-trainable params: 14,714,688
_________________________________________________________

COMPILATION

Compile defines the loss function, metrics/ learning rate, and the optimizer. The parameters defining the compile function are:

  • Lose function specifies how well the machine learns from a specific algorithm with the given data. Binary cross-entropy is for multi-label classifications, whereas categorical cross-entropy is for multi-class classification where each example belongs to a single class. Thus, Categorical entropy outperforms binary entropy.
  • Here learning rate is the tuning parameter in an optimization algorithm. It determines the size of step at each of the iteration while moving toward a minimum of a loss function. Adjust the training rate during training and check for optimal solutions.
  • Optimizers are algorithms or methods that change the attributes of the neural network such as weights and learning rates in order to reduce the losses. It helps to get results faster.
model.compile(optimizer=keras.optimizers.Adam(1e-4),loss='categorical_crossentropy',metrics=['accuracy'])

TRAINING

The model.fit/ model.fit_generator does all the training part for the model using various parameters which includes the number of epochs, multiprocessing steps, batch size, etc.

history=model.fit(train_generator,epochs=40,validation_data=val_generator,workers=10,use_multiprocessing=True)
Epoch 1/40 462/462 [==============================] - 397s 859ms/step - loss: 9.0988 - accuracy: 0.0749 - val_loss: 2.8479 - val_accuracy: 0.3405 
Epoch 2/40 462/462 [==============================] - 361s 780ms/step - loss: 3.7011 - accuracy: 0.2908 - val_loss: 1.3898 - val_accuracy: 0.6214 
Epoch 3/40 462/462 [==============================] - 365s 791ms/step - loss: 2.5986 - accuracy: 0.4805 - val_loss: 1.2391 - val_accuracy: 0.7647 
Epoch 4/40 462/462 [==============================] - 364s 787ms/step - loss: 2.0046 - accuracy: 0.5929 - val_loss: 0.8035 - val_accuracy: 0.8140 
Epoch 5/40 462/462 [==============================] - 363s 786ms/step - loss: 1.6436 - accuracy: 0.6625 - val_loss: 0.5808 - val_accuracy: 0.8493 
Epoch 6/40 462/462 [==============================] - 365s 789ms/step - loss: 1.4380 - accuracy: 0.7080 - val_loss: 0.2312 - val_accuracy: 0.8781 
Epoch 7/40 462/462 [==============================] - 367s 793ms/step - loss: 1.2481 - accuracy: 0.7436 - val_loss: 0.1301 - val_accuracy: 0.8930 
Epoch 8/40 462/462 [==============================] - 367s 795ms/step - loss: 1.2016 - accuracy: 0.7653 - val_loss: 0.5041 - val_accuracy: 0.8977 
Epoch 9/40 462/462 [==============================] - 364s 788ms/step - loss: 1.0705 - accuracy: 0.7850 - val_loss: 0.2209 - val_accuracy: 0.9060 
Epoch 10/40 462/462 [==============================] - 363s 786ms/step - loss: 1.0148 - accuracy: 0.8008 - val_loss: 0.1227 - val_accuracy: 0.9153 
Epoch 11/40 462/462 [==============================] - 366s 793ms/step - loss: 0.9520 - accuracy: 0.8155 - val_loss: 0.0078 - val_accuracy: 0.9144 
Epoch 12/40 462/462 [==============================] - 370s 801ms/step - loss: 0.8859 - accuracy: 0.8280 - val_loss: 0.2111 - val_accuracy: 0.9181 
Epoch 13/40 462/462 [==============================] - 369s 798ms/step - loss: 0.8242 - accuracy: 0.8424 - val_loss: 0.0025 - val_accuracy: 0.9172 
Epoch 14/40 462/462 [==============================] - 370s 801ms/step - loss: 0.7976 - accuracy: 0.8503 - val_loss: 0.3693 - val_accuracy: 0.9293 
Epoch 15/40 462/462 [==============================] - 370s 801ms/step - loss: 0.7753 - accuracy: 0.8587 - val_loss: 0.3846 - val_accuracy: 0.9191 
Epoch 16/40 462/462 [==============================] - 369s 800ms/step - loss: 0.7194 - accuracy: 0.8680 - val_loss: 0.6372 - val_accuracy: 0.9274 
Epoch 17/40 462/462 [==============================] - 369s 798ms/step - loss: 0.7251 - accuracy: 0.8702 - val_loss: 0.4891 - val_accuracy: 0.9340 
Epoch 18/40 462/462 [==============================] - 369s 800ms/step - loss: 0.6661 - accuracy: 0.8784 - val_loss: 0.0439 - val_accuracy: 0.9284 
Epoch 19/40 462/462 [==============================] - 381s 826ms/step - loss: 0.6404 - accuracy: 0.8857 - val_loss: 0.2181 - val_accuracy: 0.9247 
Epoch 20/40 462/462 [==============================] - 382s 827ms/step - loss: 0.6016 - accuracy: 0.8938 - val_loss: 0.0015 - val_accuracy: 0.9247 
Epoch 21/40 462/462 [==============================] - 381s 824ms/step - loss: 0.6419 - accuracy: 0.8917 - val_loss: 0.4428 - val_accuracy: 0.9284 
Epoch 22/40 462/462 [==============================] - 370s 802ms/step - loss: 0.5791 - accuracy: 0.8995 - val_loss: 0.4855 - val_accuracy: 0.9377 
Epoch 23/40 462/462 [==============================] - 370s 801ms/step - loss: 0.5506 - accuracy: 0.9033 - val_loss: 0.0011 - val_accuracy: 0.9367 
Epoch 24/40 462/462 [==============================] - 374s 809ms/step - loss: 0.5470 - accuracy: 0.9063 - val_loss: 0.0406 - val_accuracy: 0.9414 
Epoch 25/40 462/462 [==============================] - 373s 808ms/step - loss: 0.5218 - accuracy: 0.9119 - val_loss: 3.8196e-04 - val_accuracy: 0.9367 
Epoch 26/40 462/462 [==============================] - 372s 804ms/step - loss: 0.5487 - accuracy: 0.9087 - val_loss: 0.0682 - val_accuracy: 0.9488 
Epoch 27/40 462/462 [==============================] - 372s 805ms/step - loss: 0.5054 - accuracy: 0.9155 - val_loss: 0.4439 - val_accuracy: 0.9386 
Epoch 28/40 462/462 [==============================] - 372s 805ms/step - loss: 0.5257 - accuracy: 0.9180 - val_loss: 0.1204 - val_accuracy: 0.9442 
Epoch 29/40 462/462 [==============================] - 371s 803ms/step - loss: 0.4760 - accuracy: 0.9224 - val_loss: 0.0936 - val_accuracy: 0.9395 
Epoch 30/40 462/462 [==============================] - 368s 798ms/step - loss: 0.4884 - accuracy: 0.9214 - val_loss: 0.0071 - val_accuracy: 0.9330 
Epoch 31/40 462/462 [==============================] - 367s 795ms/step - loss: 0.4349 - accuracy: 0.9284 - val_loss: 2.4745e-04 - val_accuracy: 0.9451 
Epoch 32/40 462/462 [==============================] - 372s 805ms/step - loss: 0.4470 - accuracy: 0.9296 - val_loss: 3.8900e-07 - val_accuracy: 0.9423 
Epoch 33/40 462/462 [==============================] - 372s 805ms/step - loss: 0.4205 - accuracy: 0.9332 - val_loss: 0.2065 - val_accuracy: 0.9451 
Epoch 34/40 462/462 [==============================] - 368s 795ms/step - loss: 0.4177 - accuracy: 0.9347 - val_loss: 0.0225 - val_accuracy: 0.9526 
Epoch 35/40 462/462 [==============================] - 363s 786ms/step - loss: 0.3954 - accuracy: 0.9351 - val_loss: 0.9441 - val_accuracy: 0.9442 
Epoch 36/40 462/462 [==============================] - 367s 793ms/step - loss: 0.4037 - accuracy: 0.9355 - val_loss: 0.1522 - val_accuracy: 0.9414 
Epoch 37/40 462/462 [==============================] - 367s 795ms/step - loss: 0.3815 - accuracy: 0.9394 - val_loss: 1.3292e-04 - val_accuracy: 0.9488 
Epoch 38/40 462/462 [==============================] - 371s 803ms/step - loss: 0.4154 - accuracy: 0.9369 - val_loss: 0.1124 - val_accuracy: 0.9423 
Epoch 39/40 462/462 [==============================] - 374s 810ms/step - loss: 0.3617 - accuracy: 0.9415 - val_loss: 0.5891 - val_accuracy: 0.9507 
Epoch 40/40 462/462 [==============================] - 372s 806ms/step - loss: 0.3659 - accuracy: 0.9434 - val_loss: 0.0092 - val_accuracy: 0.9507

The 40th epoch is the best in terms of training accuracy and validation loss. It has a training accuracy of 94.34&, validation loss of 0.92%, and validation accuracy of 95.07% which is considered to be a well-trained model. More on the analysis can be studied with the help of Visualization using the Matplotlib library.

VISUALIZATION

Visualization is a technique that makes sense of the data being poured out of the model. Thus, making an informed decision about the changes that need to be made on the parameters or hyperparameters that affect the Machine Learning model.

import matplotlib.pyplot as plt
#Loss
plt.plot(history.history['loss'],label='loss')
plt.plot(history.history['val_loss'],label='val_loss')
plt.legend()
plt.show()
#Accuracy
plt.plot(history.history['accuracy'],label='acc')
plt.plot(history.history['val_accuracy'],label='val_acc')
plt.legend()
plt.show()

SAVING MODEL

Saving the model is one of the vital steps in machine learning, which can be loaded from the local machine. Thus, to load the saved model, the model.load function can be used in the workspace.

model.save("/content/drive/My Drive/yolov3/birds.h5")

EVALUATION

The Evaluate function predicts the output for the given input thus bringing in a clear understanding of our trained model. Then computes the metrics function specified in the compile function thus returning the computed metric value as the output.

model.evaluate(test_generator,use_multiprocessing=True,workers=10)
34/34 [==============================] - 12s 358ms/step
[8.5635492723668e-06, 0.9655814170837402]

FINAL THOUGHTS

It’s no secret that machine learning is the future, and gaining an understanding of it might determine whether you will succeed in it.  In this article, we have discussed in detail various methods used in training models including Transfer learning and Data Augmentation.

With the power of deep learning algorithms, we can create value on top of these huge datasets (31,316 to be precise). Here, I tried to give the readers a very clear understanding with an example of how to train with bird species using its huge dataset and classify them using Keras in Python.

For more information about the basics of Keras, feel free to refer to the Keras documentation.  And for learning more about such projects, view the valueml blog page.

And if you have any queries regarding the article, feel free to drop a line.

One response to “Image classification of Bird species using Keras in Python”

  1. AS says:

    Thanks for such a nice descriptive tutorial on keras implementation of image classification. Certainly the best I have found online, was very helpful in understanding basic concepts!!!

Leave a Reply

Your email address will not be published. Required fields are marked *