Batch Normalization in Neural Network

1. Introduction

2. Internal Covariate Shift

Internal covariate shift refers to the phenomenon where the distribution of the activations of a neural network layer changes during training. As the parameters of the earlier layers are updated during the learning process, the subsequent layers receive inputs with varying distributions, which can slow down training and make it harder to optimize the network effectively. Batch Normalization, by normalizing the activations within each mini-batch during training, mitigates the effects of internal covariate shift, leading to faster and more stable convergence.

3. History

4. Codes

Input

import numpy as np

def batch_normalize(X, epsilon=1e-5):

# Calculate mean and variance of the batch

batch_mean = np.mean(X, axis=0)

batch_var = np.var(X, axis=0)

# Normalize the input using the mean and variance

X_normalized = (X - batch_mean) / np.sqrt(batch_var + epsilon)

return X_normalized

import numpy as np

# Sample input data (5 data points with 3 features each)

data = np.array([[1, 2, 3],

[4, 5, 6],

[7, 8, 9],

[10, 11, 12],

[13, 14, 15]])

# Perform batch normalization on the data

normalized_data = batch_normalize(data)

print("Original data:\n", data)

print("\nNormalized data:\n", normalized_data)

Output

Explanation

The batch_normalize() function first calculates the mean and variance of each feature in the batch of data. Then, it normalizes the data by subtracting the mean and dividing by the standard deviation, plus a small epsilon value to avoid dividing by zero. The normalized data is then returned. The epsilon parameter is a small value that is added to the variance to avoid dividing by zero. This is important because the variance of a batch of data can sometimes be zero, especially if the batch is small. Adding a small epsilon value ensures that the denominator of the normalization equation is never zero. Running the code will print the original data and the normalized data. The normalized data will be centered around zero and have a standard deviation of 1. This will help to stabilize the training of a deep neural network that uses the data array as input.

5. Class Presentation