Explain the LSTM working principle along with equations

Explain the LSTM working principle along with equations

Answer:-

Working Principle of LSTM

The Long Short-Term Memory (LSTM) network is designed to learn long-term dependencies in sequential data by overcoming the vanishing gradient problem in standard RNNs. It achieves this using specialized units called LSTM cells.

Each LSTM cell includes the following key components:

  1. Forget Gate: Determines which information to discard from the cell state.
  2. Input Gate: Controls which new information is added to the cell state.
  3. Output Gate: Decides what information is sent to the output.

Main Characteristics:

  • Self-Loops with Conditional Weights: Each LSTM cell has an internal recurrence (self-loop), enabling the network to maintain information over time.
  • Gated Mechanism: The flow of information is regulated by learnable gates.

Equations

  1. Forget Gate:
    f_i^{(t)} = \sigma \left( b_i^f + \sum_j U_{i,j}^f x_j^{(t)} + \sum_j W_{i,j}^f h_j^{(t-1)} \right)
    The forget gate controls the weight of the self-loop, determining how much of the previous cell state to retain.
  2. Cell State Update:
    s_i^{(t)} = f_i^{(t)} s_i^{(t-1)} + g_i^{(t)} \sigma \left( b_i + \sum_j U_{i,j} x_j^{(t)} + \sum_j W_{i,j} h_j^{(t-1)} \right)
    The internal state si(t)s_i^{(t)} is updated using the previous state, the forget gate, and the input gate.
  3. Input Gate:
    g_i^{(t)} = \sigma \left( b_i^g + \sum_j U_{i,j}^g x_j^{(t)} + \sum_j W_{i,j}^g h_j^{(t-1)} \right)
    The input gate determines the influence of new information on the cell state.
  4. Output Gate:
    q_i^{(t)} = \sigma \left( b_i^o + \sum_j U_{i,j}^o x_j^{(t)} + \sum_j W_{i,j}^o h_j^{(t-1)} \right)
    The output gate decides what part of the cell state contributes to the output.
  5. Hidden State Output:
    h_i^{(t)} = \tanh(s_i^{(t)}) \cdot q_i^{(t)}
    The final output of the LSTM cell combines the cell state and the output gate.

Summary of Components:

  • Forget gate f_i^{(t)} : Decides what to forget.
  • Input gate g_i^{(t)} : Decides what new information to store.
  • Cell state s_i^{(t)} : Maintains long-term memory.
  • Output gate q_i^{(t)} : Regulates the output.

Leave a Reply

Your email address will not be published. Required fields are marked *