Image Classification using TensorFlow

Manpreet Singh
Mar 25, 2022
5 min read

Reference: https://www.tensorflow.org/tutorials/images/classification

Importing the required Libraries for model and data preprocessing:

#Importing the required Libraries
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import PIL
import os
import pathlib
#Importing the TensorFlow models and the
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers

Fetching the Dataset of Flower Data and preprocessing it:

import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)

train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Defining the model:

We used the Sequential model from the tensor flow library to train our dataset. We applied multiple convolutional layers with different parameters to add a layer to create resultant tensors of output. The Conv2D is a 2D convolutional layer that takes in filters and input layers to produce tensor output. It is one of the most common layers used in image classification.

After the normalization of our new input layer, we convert our layer into a Maxpool 2D matrix and for this, we use the MaxPooling2D function and add a stride to it as well. Below is the working of the max pooling function.

The MaXPooling2D takes in the input matrix and gives output as the maximum value of the matrix in 2x2 orientation.

Image Reference: https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781788629416/1/ch01lvl1sec08/convolutional-neural-networks-cnns

After our layers have been processed we proceed forward and flatten the input layer using the flatten(). In order to prevent our layer from overfitting, we use the dropout function and set its rate to 2.0. After setting up the dropout function, we create two dense layers as hidden layers to train our model better, as dense layers are deeply connected and use the output from the previous layer as their input. So, with that, it will help us more to train our data efficiently. In the end, we use softmax activation to define our output layer and gave it 10 filters which is equivalent to the number of classes we have in our CIFAR dataset.

model = Sequential([
  layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])

Comparing the results between Base Model and Modified model:

Difference between Base Model and Modified Model:

The number of layers used: we used extra layers in our modified model to train it more over the different layers, whereas in the Base model we only took 3 Conv2d layers and two dense layers.
The number of epochs: In our base model, we ran the model for 10 epochs whereas in the Modified model we ran it for 20 epochs.

Contributions:

The base model looked something like this in the following:

model = Sequential([
  layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])

The filter size of various layers is 16,32,64, whereas in our base model we changed the filters to 32,64,96. The other difference was in the dense layer as well, we increased the number of units in the layer from 128 to 512 and added an activation function softmax to our final output layer. We also defined the abstract pool size of our Maxpool function and added a stride to it as well. Following is the version of our base model which increased the efficiency from 60% to 64%. Also, the base model had batch_size=32, we increased it to 64 and also reduced the image dimensions from 180x180 to 150x150.

Modified Base Model:

model = Sequential([
  layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  layers.Conv2D(32, (5,5), padding='same', activation='relu'),
  layers.MaxPooling2D(pool_size=(2,2),strides=(2,2)),
  layers.Conv2D(64, (3,3), padding='same', activation='relu'),
  layers.MaxPooling2D(pool_size=(2,2),strides=(2,2)),
  layers.Conv2D(96, (3,3), padding='same', activation='relu'),
  layers.MaxPooling2D(pool_size=(2,2),strides=(2,2)),
  
  layers.Flatten(),
  layers.Dense(512, activation='relu'),
  layers.Dense(num_classes,activation='softmax')
])

The next modified base model was basically adding more data by data augmentation and adding a dropout function which helps in avoiding overfitting. Following is the referenced base model.

model = Sequential([
  data_augmentation,
  layers.Rescaling(1./255),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.2),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])

This model is similar to the previous base model but with augmented data and Dropout function. In this model, I used the dropout function on our layers which had different parameters and ran it for 15 epochs too.

Accuracy= 73%

BASE MODEL + Dropout

Modified Model

After adding a layer we discovered a 1.5% increase in the accuracy from 73.4 to 74.80.

In our final custom model, I created the final model by adding two new convolutional layers and running it for 20 epochs to train our model more. The final accuracy achieved was 75.34%.

My contribuitions involved changing hyperparameters, the size of the input image and the number of epochs for training the data.

Challenges:

The major challenge was understanding the CNN algorithm and its implementation using python. The other challenge was understanding the datasets and the orientation of the dataset. To overcome these challenges, I went through youtube to understand the basic implementation of the algorithm. The next part was to understand the various functions of TensorFlow to train the dataset. Learnt about implementing different layers and normalising the current layer using predefined functions. In order to learn about the basic implementation of the CNN algorithm, I took reference from a base model in the official TensorFlow guide. By understanding the basic concepts of the base model and how layers are added and implemented. I got the gist of the working of CNN and hence deployed multiple layers in the predefined base model which eventually helped in increasing the efficiency.

The major challenge was to check the optimal number of layers by hit and trial, I took different layer sizes and optimised them to get the maximum efficiency. I learnt about the filters and other parameters of Conv2d and max-pooling as well. Then with hit and trial, I implemented different layers and finally was able to come up with 4 layers with different filters and kernels.

Adding more than 5 layers with 1 layer with 32 as a filter, 2 layers with 64 as filter and 2 with 96 as filter gave a decrement in our final accuracy and hence we had to reduce it to 4 layers and remove one of the layers with 64 as the filter. In this way, we were able to maximize our accuracy to 75.34 from 73%.

CNN Algorithm:

CNN or Convolutional Neural Network is an algorithm that takes in the an input image and assigns weights and biases to the image and helps to differentiatiate the images from one another. The CNN algorithm consists of an input layer which is essentially a grayscale image, hidden dense layers, pooling layers and output layers.

Working of CNN algorithms:

Image Reference: https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148

The CNN layer has its first layer as the input layer and then we add a convolutional layer with activation of Relu. Then we add a pooling layer and minimize the orientation of the layer by giving it maximum size. We continue this process for different layer sizes and build different layers and then eventually create a layer that is flattened after that feature learning and then apply the softmax activation for our output layer and then train the model on that.