Image classification of Bird species using Keras in Python

In this article, Image classification for huge datasets is clearly explained, step by step with the help of a bird species dataset. The major techniques used in this project are Data Augmentation and Transfer Learning methods, for improving the quality of our model. VGG16 pre-trained model for transfer learning is a very efficient open-source model. It consists of various convolutional layers followed by pooling layers. Pooling layers are responsible for the narrowing of the layers.
Few of the bird species in the dataset are the African Firefinch, Albatross, American coot, Anhinga, Bald Eagle, Bird of paradise, common loon, eastern bluebird, flamingo, golden ibis, hornbill, Javan magpie, killdear, king vulture, northern jacana, pelican, puffin, ostrich, robin, roadrunner, sand martin, etc. Few specifications about this dataset are:
- A total of 31316 training images, 1125 test images(5 per species), and 1125 validation images(5 per species).
- All images are 224 X 224 X 3 color images in jpg format (Thus, no formatting from our side is required).
- Images gathered from internet searches by species name.
We are going to use the dataset for the classification of bird species with the help of Keras TensorFlow deep learning API in Python. This is actually an image classification task where we will classify different species of birds.
Note: This is always better to preprocess your dataset first and after that feed it to the learning algorithm otherwise preprocessing of our dataset will happen on each of the epoch.
Google Colaboratory is the preferred medium for machine learning algorithms because it supports free cloud service and free GPU service. Click here to go to google collaboratory notebook.
To enable GPU service, follow the steps after entering the Colab page, Go to Edit -> Notebook settings -> Enable GPU.
Thank you and Stay tuned!!!
INSTALLING DEPENDENCIES
The following are the dependent Python libraries in this project. Google Colaboratory has all the dependencies for this project downloaded in the server. So if Google Colaboratory is the platform used for coding, ignore this code and move to the next directly.
!pip install tensorflow !pip install keras !pip install numpy
IMPORTING THE REQUIRED LIBRARIES
The Python libraries are imported depending on the needs of this project.
import keras import numpy as np from keras.preprocessing.image import ImageDataGenerator from keras.applications.vgg16 import preprocess_input from google.colab import files
Using TensorFlow backend.
Keras is already coming with TensorFlow. This is the deep learning API that is going to perform the main classification task.
UPLOADING DATASET
Datasets are procured from the Kaggle website, which is a large data science community with powerful tools and open-source datasets. The code activates the API token and downloads directly from the Kaggle website into the Colab notebook.
Another method to use the dataset is as follows:
- Click here to go to the Kaggle site.
- Press the download(1 GB) button on the web page.
- Now, Click here to go to Google Colab.
- Press the Upload to section storage and upload the downloaded dataset.
files.upload() !mkdir -p ~/.kaggle !cp kaggle.json ~/.kaggle !chmod 600 ~/.kaggle/kaggle.json !kaggle datasets download -d gpiosenka/100-bird-species
Saving kaggle.json to kaggle.json Downloading 100-bird-species.zip to /content 99% 1.27G/1.28G [00:21<00:00, 72.8MB/s] 100% 1.28G/1.28G [00:21<00:00, 63.2MB/s]
EXTRACTING THE ZIP FILE
Kaggle stores the dataset in zip format to keep all the related files together thus making moving files from one place to another easier.
import zipfile local_zip = '/content/100-bird-species.zip' zip_ref = zipfile.ZipFile(local_zip, 'r') zip_ref.extractall('/content/') zip_ref.close()
CREATING GENERATORS
Generators load the dataset while training deep learning models. Data augmentation is a technique of artificially creating new data from existing training data. It helps in:
- Increasing the size of the dataset
- Introduces variability in the dataset, without the use of additional data.
train_datagen = ImageDataGenerator( preprocessing_function=preprocess_input, shear_range=0.1, zoom_range=0.1, horizontal_flip=True) train_generator = train_datagen.flow_from_directory('/content/train',target_size=(224, 224),batch_size=64,class_mode='categorical') #Creating generator for Validation DataSet val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input) val_generator = val_datagen.flow_from_directory('/content/valid',target_size=(224, 224),batch_size=32,class_mode='categorical') #Creating generator for Test DataSet test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input) test_generator = test_datagen.flow_from_directory('/content/test',target_size=(224, 224),batch_size=32,class_mode='categorical')
Found 29544 images belonging to 215 classes. Found 1075 images belonging to 215 classes. Found 1075 images belonging to 215 classes.
PRE-TRAINED MODEL
The VGG16 model loads the weights from pre-trained on ImageNet. VGG16 network’s bottom layers are closer to the image are wide, whereas the top layers are deep.
base_model=keras.applications.VGG16(include_top=False, weights="imagenet", input_shape=(224,224,3))
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 58892288/58889256 [==============================] - 3s 0us/step
FREEZING BASE LAYER
Freeze all layers in the base model. Advantages of freezing layers are:
- Reduces the training model time
- Backpropogate and update weights only for a couple of layers, thus saving computational time.
base_model.trainable = False
ADDING LAYERS
In this command, new layers from the last layer of the pre-trained model are architected. The dropout layer prevents the model from overfitting. The 215 in the last dense layer indicates the total number of possible classes in the dataset.
from keras.models import Sequential from keras.layers import Dense,Flatten,Dropout model=Sequential() model.add(base_model) model.add(Flatten()) model.add(Dense(2048,activation='relu',kernel_initializer='he_normal')) model.add(Dropout(0.35)) model.add(Dense(2048,activation='relu',kernel_initializer='he_normal')) model.add(Dropout(0.35)) model.add(Dense(215,activation='softmax',kernel_initializer='glorot_normal'))
SUMMARY OF THE MODEL
The summary of the model shows the number of layers and the specifics of the model layers precisely/ architecture of the neural network thus increasing the ease of understanding of the network. It displays all the layers including the pre-trained layers and the new layers included previously.
model.summary()
Model: "sequential_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= vgg16 (Model) (None, 7, 7, 512) 14714688 _________________________________________________________________ flatten_2 (Flatten) (None, 25088) 0 _________________________________________________________________ dense_4 (Dense) (None, 2048) 51382272 _________________________________________________________________ dropout_3 (Dropout) (None, 2048) 0 _________________________________________________________________ dense_5 (Dense) (None, 2048) 4196352 _________________________________________________________________ dropout_4 (Dropout) (None, 2048) 0 _________________________________________________________________ dense_6 (Dense) (None, 225) 440535 ================================================================= Total params: 70,733,847 Trainable params: 56,019,159 Non-trainable params: 14,714,688 _________________________________________________________
COMPILATION
Compile defines the loss function, metrics/ learning rate, and the optimizer. The parameters defining the compile function are:
- Lose function specifies how well the machine learns from a specific algorithm with the given data. Binary cross-entropy is for multi-label classifications, whereas categorical cross-entropy is for multi-class classification where each example belongs to a single class. Thus, Categorical entropy outperforms binary entropy.
- Here learning rate is the tuning parameter in an optimization algorithm. It determines the size of step at each of the iteration while moving toward a minimum of a loss function. Adjust the training rate during training and check for optimal solutions.
- Optimizers are algorithms or methods that change the attributes of the neural network such as weights and learning rates in order to reduce the losses. It helps to get results faster.
model.compile(optimizer=keras.optimizers.Adam(1e-4),loss='categorical_crossentropy',metrics=['accuracy'])
TRAINING
The model.fit/ model.fit_generator does all the training part for the model using various parameters which includes the number of epochs, multiprocessing steps, batch size, etc.
history=model.fit(train_generator,epochs=40,validation_data=val_generator,workers=10,use_multiprocessing=True)
Epoch 1/40 462/462 [==============================] - 397s 859ms/step - loss: 9.0988 - accuracy: 0.0749 - val_loss: 2.8479 - val_accuracy: 0.3405 Epoch 2/40 462/462 [==============================] - 361s 780ms/step - loss: 3.7011 - accuracy: 0.2908 - val_loss: 1.3898 - val_accuracy: 0.6214 Epoch 3/40 462/462 [==============================] - 365s 791ms/step - loss: 2.5986 - accuracy: 0.4805 - val_loss: 1.2391 - val_accuracy: 0.7647 Epoch 4/40 462/462 [==============================] - 364s 787ms/step - loss: 2.0046 - accuracy: 0.5929 - val_loss: 0.8035 - val_accuracy: 0.8140 Epoch 5/40 462/462 [==============================] - 363s 786ms/step - loss: 1.6436 - accuracy: 0.6625 - val_loss: 0.5808 - val_accuracy: 0.8493 Epoch 6/40 462/462 [==============================] - 365s 789ms/step - loss: 1.4380 - accuracy: 0.7080 - val_loss: 0.2312 - val_accuracy: 0.8781 Epoch 7/40 462/462 [==============================] - 367s 793ms/step - loss: 1.2481 - accuracy: 0.7436 - val_loss: 0.1301 - val_accuracy: 0.8930 Epoch 8/40 462/462 [==============================] - 367s 795ms/step - loss: 1.2016 - accuracy: 0.7653 - val_loss: 0.5041 - val_accuracy: 0.8977 Epoch 9/40 462/462 [==============================] - 364s 788ms/step - loss: 1.0705 - accuracy: 0.7850 - val_loss: 0.2209 - val_accuracy: 0.9060 Epoch 10/40 462/462 [==============================] - 363s 786ms/step - loss: 1.0148 - accuracy: 0.8008 - val_loss: 0.1227 - val_accuracy: 0.9153 Epoch 11/40 462/462 [==============================] - 366s 793ms/step - loss: 0.9520 - accuracy: 0.8155 - val_loss: 0.0078 - val_accuracy: 0.9144 Epoch 12/40 462/462 [==============================] - 370s 801ms/step - loss: 0.8859 - accuracy: 0.8280 - val_loss: 0.2111 - val_accuracy: 0.9181 Epoch 13/40 462/462 [==============================] - 369s 798ms/step - loss: 0.8242 - accuracy: 0.8424 - val_loss: 0.0025 - val_accuracy: 0.9172 Epoch 14/40 462/462 [==============================] - 370s 801ms/step - loss: 0.7976 - accuracy: 0.8503 - val_loss: 0.3693 - val_accuracy: 0.9293 Epoch 15/40 462/462 [==============================] - 370s 801ms/step - loss: 0.7753 - accuracy: 0.8587 - val_loss: 0.3846 - val_accuracy: 0.9191 Epoch 16/40 462/462 [==============================] - 369s 800ms/step - loss: 0.7194 - accuracy: 0.8680 - val_loss: 0.6372 - val_accuracy: 0.9274 Epoch 17/40 462/462 [==============================] - 369s 798ms/step - loss: 0.7251 - accuracy: 0.8702 - val_loss: 0.4891 - val_accuracy: 0.9340 Epoch 18/40 462/462 [==============================] - 369s 800ms/step - loss: 0.6661 - accuracy: 0.8784 - val_loss: 0.0439 - val_accuracy: 0.9284 Epoch 19/40 462/462 [==============================] - 381s 826ms/step - loss: 0.6404 - accuracy: 0.8857 - val_loss: 0.2181 - val_accuracy: 0.9247 Epoch 20/40 462/462 [==============================] - 382s 827ms/step - loss: 0.6016 - accuracy: 0.8938 - val_loss: 0.0015 - val_accuracy: 0.9247 Epoch 21/40 462/462 [==============================] - 381s 824ms/step - loss: 0.6419 - accuracy: 0.8917 - val_loss: 0.4428 - val_accuracy: 0.9284 Epoch 22/40 462/462 [==============================] - 370s 802ms/step - loss: 0.5791 - accuracy: 0.8995 - val_loss: 0.4855 - val_accuracy: 0.9377 Epoch 23/40 462/462 [==============================] - 370s 801ms/step - loss: 0.5506 - accuracy: 0.9033 - val_loss: 0.0011 - val_accuracy: 0.9367 Epoch 24/40 462/462 [==============================] - 374s 809ms/step - loss: 0.5470 - accuracy: 0.9063 - val_loss: 0.0406 - val_accuracy: 0.9414 Epoch 25/40 462/462 [==============================] - 373s 808ms/step - loss: 0.5218 - accuracy: 0.9119 - val_loss: 3.8196e-04 - val_accuracy: 0.9367 Epoch 26/40 462/462 [==============================] - 372s 804ms/step - loss: 0.5487 - accuracy: 0.9087 - val_loss: 0.0682 - val_accuracy: 0.9488 Epoch 27/40 462/462 [==============================] - 372s 805ms/step - loss: 0.5054 - accuracy: 0.9155 - val_loss: 0.4439 - val_accuracy: 0.9386 Epoch 28/40 462/462 [==============================] - 372s 805ms/step - loss: 0.5257 - accuracy: 0.9180 - val_loss: 0.1204 - val_accuracy: 0.9442 Epoch 29/40 462/462 [==============================] - 371s 803ms/step - loss: 0.4760 - accuracy: 0.9224 - val_loss: 0.0936 - val_accuracy: 0.9395 Epoch 30/40 462/462 [==============================] - 368s 798ms/step - loss: 0.4884 - accuracy: 0.9214 - val_loss: 0.0071 - val_accuracy: 0.9330 Epoch 31/40 462/462 [==============================] - 367s 795ms/step - loss: 0.4349 - accuracy: 0.9284 - val_loss: 2.4745e-04 - val_accuracy: 0.9451 Epoch 32/40 462/462 [==============================] - 372s 805ms/step - loss: 0.4470 - accuracy: 0.9296 - val_loss: 3.8900e-07 - val_accuracy: 0.9423 Epoch 33/40 462/462 [==============================] - 372s 805ms/step - loss: 0.4205 - accuracy: 0.9332 - val_loss: 0.2065 - val_accuracy: 0.9451 Epoch 34/40 462/462 [==============================] - 368s 795ms/step - loss: 0.4177 - accuracy: 0.9347 - val_loss: 0.0225 - val_accuracy: 0.9526 Epoch 35/40 462/462 [==============================] - 363s 786ms/step - loss: 0.3954 - accuracy: 0.9351 - val_loss: 0.9441 - val_accuracy: 0.9442 Epoch 36/40 462/462 [==============================] - 367s 793ms/step - loss: 0.4037 - accuracy: 0.9355 - val_loss: 0.1522 - val_accuracy: 0.9414 Epoch 37/40 462/462 [==============================] - 367s 795ms/step - loss: 0.3815 - accuracy: 0.9394 - val_loss: 1.3292e-04 - val_accuracy: 0.9488 Epoch 38/40 462/462 [==============================] - 371s 803ms/step - loss: 0.4154 - accuracy: 0.9369 - val_loss: 0.1124 - val_accuracy: 0.9423 Epoch 39/40 462/462 [==============================] - 374s 810ms/step - loss: 0.3617 - accuracy: 0.9415 - val_loss: 0.5891 - val_accuracy: 0.9507 Epoch 40/40 462/462 [==============================] - 372s 806ms/step - loss: 0.3659 - accuracy: 0.9434 - val_loss: 0.0092 - val_accuracy: 0.9507
The 40th epoch is the best in terms of training accuracy and validation loss. It has a training accuracy of 94.34&, validation loss of 0.92%, and validation accuracy of 95.07% which is considered to be a well-trained model. More on the analysis can be studied with the help of Visualization using the Matplotlib library.
VISUALIZATION
Visualization is a technique that makes sense of the data being poured out of the model. Thus, making an informed decision about the changes that need to be made on the parameters or hyperparameters that affect the Machine Learning model.
import matplotlib.pyplot as plt #Loss plt.plot(history.history['loss'],label='loss') plt.plot(history.history['val_loss'],label='val_loss') plt.legend() plt.show() #Accuracy plt.plot(history.history['accuracy'],label='acc') plt.plot(history.history['val_accuracy'],label='val_acc') plt.legend() plt.show()
SAVING MODEL
Saving the model is one of the vital steps in machine learning, which can be loaded from the local machine. Thus, to load the saved model, the model.load function can be used in the workspace.
model.save("/content/drive/My Drive/yolov3/birds.h5")
EVALUATION
The Evaluate function predicts the output for the given input thus bringing in a clear understanding of our trained model. Then computes the metrics function specified in the compile function thus returning the computed metric value as the output.
model.evaluate(test_generator,use_multiprocessing=True,workers=10)
34/34 [==============================] - 12s 358ms/step [8.5635492723668e-06, 0.9655814170837402]
FINAL THOUGHTS
It’s no secret that machine learning is the future, and gaining an understanding of it might determine whether you will succeed in it. In this article, we have discussed in detail various methods used in training models including Transfer learning and Data Augmentation.
With the power of deep learning algorithms, we can create value on top of these huge datasets (31,316 to be precise). Here, I tried to give the readers a very clear understanding with an example of how to train with bird species using its huge dataset and classify them using Keras in Python.
For more information about the basics of Keras, feel free to refer to the Keras documentation. And for learning more about such projects, view the valueml blog page.
And if you have any queries regarding the article, feel free to drop a line.
Thanks for such a nice descriptive tutorial on keras implementation of image classification. Certainly the best I have found online, was very helpful in understanding basic concepts!!!