Introduction :

The Convolutional Neural Network, known as CNN (Convolutional Neural Network), is one of the deep learning algorithms that is the development of the Multilayer Perceptron (MLP) designed to process data in the form of a Matrix (image, sound …).

Convolutional Neural Networks are used in many fields, but we will just be interested in the application of CNNs to Images.

The question now is, what is an Image?

Image is Just a Matrix of Pixels .

Coding Modes of an Image:


Convolutional Neural Network vs Multilayer Perceptron :

Imagine with me that we’ve an Image classification problem to solve , and we’ve only one choice which is Multilayer Perceptron (Neural Network ) , and The images they have 240 height and 240 width and we’re Using RGB.

do you know that we need to build a Neural Network with 240 * 240 * 3 = 172 800 Input which is a very big Neural Network , and it will be very hard for as to train it .

Can we find a solution that reduces the size of the images and preserves the Characteristics ?

This is Exactly What CNN Can Do .

In General :

CNN = Convolutional Layers + Activation Layers + Pooling Layers + Fully Connected Layers .


Convolutional Neural Network Layers :

Kernels or Filters in The Convolutional layer :

In the convolutional neural network, the Kernel is nothing more than a filter used to extract features from images. The kernel is a matrix that moves over the input data, performs the dot product with the input data subregion, and obtains the output as a dot product matrix. The kernel moves on the input data by the stride value.

There is a lot Kernels , each one is responsible for extracting a specific Feature.

Convolutional Layers :

The Convolution Layer Extract The Characteristics of The Image By Performing this operation To The Input Image :

The Convolutional Layer produce an Output Image with this Formula :

The Convolutional Layer needs Two Parameters to work :

  • Padding : the amount of pixels added to an image when it is being processed by the kernel of a CNN.
  • Stride : Stride is the number of pixels shifts over the input matrix .

Example 1 : Stride = 1 , Padding = 0 :

if we Applied our Formula (In The Picture above) we’ll get The Same Result .

output width = (input_width - kernel_width + 2 * padding) / stride_width + 1

output height = (input_height - kernel_height + 2 * padding) / stride_height + 1

input Image : 6*6
Kernel Size : 2*2

output width = (6 - 2 + 2 * 0) / 1 + 1 = 5
output height = (6 - 2 + 2 * 0) / 1 + 1 = 5

Example 2 : Stride = 2 , Padding = 0 :

input Image : 6*6
Kernel Size : 2*2

output width = (6 - 2 + 2 * 0) / 2 + 1 = 3
output height = (6 - 2 + 2 * 0) / 2 + 1 = 3

Example 3 : Stride = 2 , Padding = 1 :

input Image : 6*6
Kernel Size : 2*2

output width = (6 - 2 + 2 * 1) / 2 + 1 = 4
output height = (6 - 2 + 2 * 1) / 2 + 1 = 4

In All The Examples Above we was talking about Convolution 2D , now let See The general Case which is Convolution 3D :

Input Image : W1×H1×D1 .
Number of filters : K (With Size F*F).
the stride  : S .
Padding : P .
Output : 
W2 = (W1F+2P)/S+1 .
           H2 = (H1F+2P)/S+1 .
           D2 = K .

Activation Function in The Convolutional layer :

The activation function used in CNN networks is RELU and it is defined as follows:

RELU (z) = max (0, z)

Pooling Layer :

The Pooling Layer Reduce The Size of The Image , there is two type of Pooling :

  • Max Pooling .
  • AVG Pooling .

The Output Of The Pooling Layer Can be calculated Using This Formula :

Max Pooling :
AVG Pooling :

Fully Connected Layer :

fully connected layer it can be seen as one layer of a simple Neural Network .


Different Layers in Keras and pyTorch :

Keras :

Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library.

  • Convolution Layer :
tf.keras.layers.Conv2D(
    filters,
    kernel_size,
    strides=(1, 1),
    padding="valid",
    data_format=None,
    dilation_rate=(1, 1),
    groups=1,
    activation=None,
    use_bias=True,
    kernel_initializer="glorot_uniform",
    bias_initializer="zeros",
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    **kwargs
)
  • Activation Layer :
tf.keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0)
  • Pooling Layer :

    • Max-Pooling :
    tf.keras.layers.MaxPooling2D(
    pool_size=(2, 2), strides=None, padding="valid", data_format=None, **kwargs
    )
    
* Avg-Pooling :
    tf.keras.layers.AveragePooling2D(
    pool_size=(2, 2), strides=None, padding="valid", data_format=None, **kwargs
    )
    
  • Dropout Layer :
tf.keras.layers.Dropout(rate, noise_shape=None, seed=None, **kwargs)
  • Dense Layer or Fully Connected Layer :
tf.keras.layers.Dense(
    units,
    activation=None,
    use_bias=True,
    kernel_initializer="glorot_uniform",
    bias_initializer="zeros",
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    **kwargs
)

pyTorch :

PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab. It is free and open-source software released under the Modified BSD license.

  • Convolution Layer :
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
  • Activation Layer :
torch.nn.ReLU(inplace=False)
  • Pooling Layer :

    • Max-Pooling :
    torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
    
* Avg-Pooling :
    torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)
    
  • Dropout Layer :
torch.nn.Dropout(p=0.5, inplace=False)
  • Dense Layer or Fully Connected Layer :
torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)

References :