Batch Norm Webpage

1. Introduction

Batch normalization (also known as batch norm) is used to make the training of artificial neural networks faster and more stable by normalizing the layers' inputs by re-centering and re-scaling.

It was proposed by Sergey Ioffe and Christian Szegedy in 2015.

It was believed that it could mitigate the problem of internal covariate shift, where parameter initialization and changes in the distribution of the inputs of each layer affect the learning rate of the network.

2. Internal Covariate Shift

Internal covariate shift refers to the phenomenon where the distribution of the activations of a neural network layer changes during training. As the parameters of the earlier layers are updated during the learning process, the subsequent layers receive inputs with varying distributions, which can slow down training and make it harder to optimize the network effectively. Batch Normalization, by normalizing the activations within each mini-batch during training, mitigates the effects of internal covariate shift, leading to faster and more stable convergence.

3. History

Reference: Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

4. Codes

Input

import numpy as np

def batch_normalize(X, epsilon=1e-5):

# Calculate mean and variance of the batch

batch_mean = np.mean(X, axis=0)

batch_var = np.var(X, axis=0)

# Normalize the input using the mean and variance

X_normalized = (X - batch_mean) / np.sqrt(batch_var + epsilon)

return X_normalized

import numpy as np

# Sample input data (5 data points with 3 features each)

data = np.array([[1, 2, 3],

[4, 5, 6],

[7, 8, 9],

[10, 11, 12],

[13, 14, 15]])

# Perform batch normalization on the data

normalized_data = batch_normalize(data)

print("Original data:\n", data)

print("\nNormalized data:\n", normalized_data)

Output

Explanation

The batch_normalize() function first calculates the mean and variance of each feature in the batch of data. Then, it normalizes the data by subtracting the mean and dividing by the standard deviation, plus a small epsilon value to avoid dividing by zero. The normalized data is then returned. The epsilon parameter is a small value that is added to the variance to avoid dividing by zero. This is important because the variance of a batch of data can sometimes be zero, especially if the batch is small. Adding a small epsilon value ensures that the denominator of the normalization equation is never zero. Running the code will print the original data and the normalized data. The normalized data will be centered around zero and have a standard deviation of 1. This will help to stabilize the training of a deep neural network that uses the data array as input.

Batch Normalization in Neural Network

1. Introduction

2. Internal Covariate Shift

3. History

4. Codes

Input

Output

Explanation

5. Class Presentation

6. Presentation Explained

7. Tutorial

8. GitHub