Long Short Term Memory

Navya Aggarwal

Long Short Term Memory
Long Short Term Memory (LSTM) is a type of neural network that is particularly well-suited for handling sequences of data. It has been used to great effect in a variety of applications, from speech recognition to natural language processing.
Exploring Long Short Term Memory (LSTM)
In recent years, Long Short Term Memory (LSTM) has become one of the most popular architectures in deep learning.
image here
What is LSTM?
Long Short Term Memory (LSTM) is a type of recurrent neural network (RNN) that is designed to handle the vanishing gradient problem. This problem occurs when training an RNN on long sequences of data, and can cause the gradients to become very small, making it difficult to learn long-term dependencies.

LSTM overcomes this problem by introducing a series of gating mechanisms that allow the network to selectively remember and forget information. These gates include the forget gate, input gate, and output gate, and work together to control the flow of information through the network.

One of the key advantages of LSTM is that it is capable of processing and predicting sequences of data with very high accuracy. This makes it a popular choice for a variety of applications, including speech recognition, natural language processing, and even lifesciences.
Recurrent Neural Networks and LSTM
Image
What are RNNs?
Recurrent Neural Networks (RNNs) are a type of neural network that can process sequential data. They use previous output as input to the current step to maintain context and memory.
Image
Vanishing Gradient Problem
One challenge of RNNs is the vanishing gradient problem, where gradients become too small for learning. This limits the network's ability to learn dependencies over long sequences.
Image
How LSTM Solves It
Long Short-Term Memory (LSTM) is an RNN architecture that introduces a new memory cell and three gates to regulate information flow. This allows it to learn dependencies over long sequences and avoid the vanishing gradient problem.
LSTM Architecture and Components
Image
Network architecture
LSTM architecture consists of a cell, an input gate, an output gate, a forget gate, and an activation function.
Image
Key components
The three key components of LSTM are the cell state, memory, and output to help it preserve information across long sequences.
Image
Neural network
LSTM is a type of neural network that uses gates to selectively pass information, providing a more accurate and efficient way to process data.
Working Principle of LSTM
Gate functions
The gates in LSTM consist of matrix multiplications and activation functions that transform the input from the previous layer.
Forward pass
The forward pass involves a series of calculations to create a new cell state and output at each timestep.
Backward pass
The gradient is calculated using backpropagation through time to update the weights and improve performance.
Training process
The LSTM model is trained on a large dataset and then tested on unseen data to validate its performance.
The Working Principle of LSTM with Diagrams
diagram of LSTM
Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) that is capable of processing and predicting sequences of data. LSTM models have a unique architecture that includes a series of gates and cells that enable them to selectively remember and forget information.
Working of forget gate:
diagram of LSTM
The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks at ht−1 and xt-1, and outputs a number between 0 and 1 for each number in the cell state Ct−1. A 1 represents “completely keep this” while a 00 represents “completely get rid of this.”
Working of input gate:
diagram of LSTM
The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate values, C~t, that could be added to the state. In the next step, we’ll combine these two to create an update to the state.
Application of forgotten and remembered data:
diagram of LSTM
It’s now time to update the old cell state, Ct−1, into the new cell state Ct. The previous steps already decided what to do, we just need to actually do it.

We multiply the old state by ft, forgetting the things we decided to forget earlier. Then we add it∗C~t. This is the new candidate values, scaled by how much we decided to update each state value.
Working of output gate:
diagram of LSTM
Finally, we need to decide what we’re going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.
Summary of working:
  • The forget gate allows LSTM to forget irrelevant information from previous time steps, preventing the cell state from being cluttered with outdated data.
  • The input gate enables the LSTM to update the cell state with relevant new information from the current input and previous hidden state.
  • The cell state (C_t) acts as a memory that retains important information over time, allowing the LSTM to capture long-term dependencies in the data.
  • The output gate controls which parts of the cell state are used to calculate the hidden state (h_t), ensuring that the relevant information is propagated to subsequent layers or time steps.
Types of LSTM:
Bidirectional LSTM (BiLSTM): Uses two LSTMs to process the input sequence in both forward and backward directions. Captures contextual information from both past and future time steps. image here
Gated Recurrent Unit (GRU): A simplified version of LSTM with two gates (reset and update). Requires fewer parameters and is computationally less expensive while offering similar performance. image here
Peephole LSTM: Enhances the LSTM cell by allowing the gates to have direct access to the cell state. Introduced to allow gates to consider the cell state directly when making decisions. image here
Applications of LSTM in Lifesciences:
Drug discovery
LSTM has been used to predict the bioactivity of small molecules, identify potential drug targets, and design novel molecules.
Genomics
LSTM models have been used to predict splicing events, gene expression levels, and gene functions from sequence data.
Proteomics
LSTM has been used to predict the stability and structure of proteins, identify binding sites, and classify protein families.
Examples of LSTM in Genomics and Proteomics
Image
RNA sequence prediction
LSTM has been used to predict splice site mutations, and alternative splicing events, improving our understanding of the genetic basis of human diseases.
Image
Protein binding prediction
LSTM has been used to predict protein-ligand binding affinity, which is critical for drug design and optimization.
Image
Protein structure prediction
LSTM has been used to predict the secondary, tertiary, and quaternary structures of proteins, which is useful for modeling protein functions.
Challenges and Limitations of LSTM in Lifesciences
1) Data scarcity
Data scarcity is a significant challenge, as LSTM requires large amounts of data to perform optimally.
2) Noisy and biased data
Noisy and biased data introduce challenges that need to be overcome using advanced filtering and normalization techniques.
3) Interpretability and explainability
Interpretability and explainability are challenges that need to be addressed due to the complex and black box nature of the LSTM model.
4) Hardware and software requirements
Hardware and software requirements, such as memory, computational power, and compatibility, can create bottlenecks and limit the applicability of LSTM.
Conclusion and Future Directions
Long Short Term Memory models are powerful tools for analyzing complex data in lifesciences, with applications ranging from drug discovery to genomics. While there are still challenges and limitations, there is great potential for future breakthroughs in the field.
image of a robot here
The future of LSTM
The future of LSTM in lifesciences is bright, as research is ongoing to develop new architectures, develop better data sources, and refine modeling techniques.