Keras flatten operation in CNN models in Machine Learning

In this section, we are going to look at various reasons for applying Keras flattening operation on CNN with Python. CNN involves working with images. Therefore, we will also look at various prerequisites for flattening operation.

Steps to reach flattening operation

1. Convolution

We start with an input image to which we apply multiple different feature detectors or also called filters to create feature maps. This forms our convolutional layer. Then on top of that convolutional layer, we apply the rectified linear unit to increase non-linearity.

2. Pooling

Then we apply a pooling layer to our convolutional layer. So, from every single feature map, we create a pooled feature map. Basically, the pooling layer has a lot of advantages. The main purpose of the pooling layer is to make sure that we have spatial invariants in our images. So basically, if something tilts or twists or is a bit different from the ideal scenario then we can still extract information out of it. Pooling significantly reduces the size of our images. Also, it helps with avoiding any kind of overfitting of our data or overall model to the data. But at the same time pooling preserves the main features.

3. Flattening:

Now, we have pooled feature maps. We flatten them so we put them into one long column sequentially one after the other and get one huge vector of inputs for an artificial neural network.


Before proceeding, let us try to answer two crucial questions:

  • Does flattening result in losing the spatial structure of the original image?
    Feature maps provide us with some high numbers corresponding to the spatial structure of the given image. So basically, these high numbers represent the spatial structure of our images because these are associated with a specific feature in the input image. Since then we apply the pooling in step we keep these high numbers and then the flattening step just consists of putting all the numbers in the cells of the future maps into one single vector. Well, we still keep these high numbers in this single vector.
  • Why didn’t we directly flatten them without applying the previous steps?
    The reason for not taking up this task is that if we directly flatten the input image pixels into a huge one-dimensional vector then each node of this vector will not represent any correlation of the image pixel with pixel around it. So we only get information about the pixel itself and not about how this pixel is spatially connected to the surrounding pixels. Since each feature map corresponds to one specific feature of the image then each node that this huge vector that contains a high number will represent the information of a specific feature a specific detail of the image.


Leave a Reply

Your email address will not be published. Required fields are marked *