Introduction
Long Short Term Memory (LSTM) is a type of artificial neural network widely used in the field of deep learning.LSTM was built to avoiding vanishing/exploding gradient problem of RNN. However, LSTM networks can still suffer from the exploding gradient problem. Due to special memory capability, LSTM generally use for time series forecasting. In this article, I would like to talk about LSTM architecture and building the architecture with PyTorch.
Architecture
First of all, let’s talk about the architecture of LSTM.
In the LSTM, there are two types of memory. One of this long-term memory that shown as a green line on the figure. The red line shows short-term memory. Also, LSTM has input, output and forget gate.
Calculation
We assume that the recent long term value is 2, short term value is 1 and input is 1. Also, i wrote the initial weights and biases.
Lets calculate the output.
Forget Gate
(1 x 2.2) + (1 x 2.7) = 4.9
4.9 + 2.3 = 7.2
sigmoid(7.2) = 1
Remember that;
Our forget gate output is 1. We multiply this with 2. Our first long term memory value is 2.
Input Gate
Green cell
(1 x 1.1) + (1 x 0.8) = 1.9
1.9 + 0.8 = 2.7
sigmoid(2.7) = 0.937
Yellow cell
(1 x 0.4) + (1 x 1.2) = 1.6
1.6 + 0.9 = 2.5
tanh(2.5) = 0.987
Remember that;
We multiply those two cell and add to long term memory.
2 + 0.987 = 2.987
Output Gate
Gray cell
(1 x 0.3) + (1 x 2.9) = 3.2
3.2 + 2.1 = 5.3
sigmoid(5.3) = 0.995
Pink cell
tanh(2.987) = 0.995
We multiply this value with green cell output.
0.995 x 0.995 = 0.9899
Finally, our new short memory value is 0.9899. This calculation continues with new input values. And then, again and again.
This is the forward propagation. When the algorithm calculates the last output, backpropagation does its job. The mission of the backpropagation is to decrease the loss.
Building with hands
Libraries
First of all, we have to import necessary libraries. Lightning helps us to create architecture.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
import lightning as L
Weight and biases
Now, we have to place weights and biases. Before that, i define a class that contains the lighting module.
class SimpleLSTM(L.LightningModule):
def __init__(self):
super().__init__()
mean = torch.tensor(0.0)
std = torch.tensor(1.0)
# Blue cell weights and biases
self.wbc1 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.wbc2 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.bbc1 = nn.Parameter(torch.tensor(0.),requires_grad=True)
# Green cell weights and biases
self.wgc1 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.wgc2 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.bgc1 = nn.Parameter(torch.tensor(0.),requires_grad=True)
# Yellow cell weights and biases
self.wyc1 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.wyc2 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.byc1 = nn.Parameter(torch.tensor(0.),requires_grad=True)
# Gray cell weights and biases
self.wpc1 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.wpc2 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.bpc1 = nn.Parameter(torch.tensor(0.),requires_grad=True)
“requires_grad = True” attribute tells us that while doing backpropagation, values change according to loss. However, I placed the weights and biases like we as do it on the example.
class SimpleLSTM(L.LightningModule):
def __init__(self):
super().__init__()
self.wbc1 = 2.7
self.wbc2 = 2.2
self.bbc1 = 2.3
self.wgc1 = 1.1
self.wgc2 = 0.8
self.bgc1 = 0.8
self.wyc1 = 0.4
self.wyc2 = 1.2
self.byc1 = 0.9
self.wgrc1 = 0.3
self.wgrc2 = 2.9
self.bgrc1 = 2.1
Cells
I defined input_value, long_memory and short_memory. Then, I connected all cells according to the figure.
def lstm_cells(self, input_value, long_memory, short_memory):
blue_cell = torch.sigmoid((short_memory * self.wbc1) +
(input_value * self.wbc2) +
self.bbc1)
green_cell = torch.sigmoid((short_memory * self.wgc1) +
(input_value * self.wgc2)+
self.bbc1)
yellow_cell= torch.tanh((short_memory * self.wyc1) +
(input_value * self.wyc2) + self.byc1)
updated_long_memory = ((long_memory * blue_cell) +
(green_cell * yellow_cell))
gray_cell= torch.sigmoid((short_memory * self.wgrc) +
(input_value * self.wgrc2) + self.bgrc1)
updated_short_memory = torch.tanh(updated_long_memory) * gray_cell
return ([updated_long_memory,updated_short_memory])
Forward Propagation
I chose the long memory values as 2, short memory as 1.
def forward(self, input):
long_memory = 2
short_memory = 1
value = input[0]
long_memory, short_memory = self.lstm_cells(value, long_memory, short_memory)
return short_memory
Prediction
I defined the model and predicted the value.
class SimpleLSTM(L.LightningModule):
def __init__(self):
super().__init__()
self.wlr1 = 2.7
self.wlr2 = 2.2
self.blr1 = 2.3
self.wpr1 = 1.1
self.wpr2 = 0.8
self.bpr1 = 0.8
self.wp1 = 0.4
self.wp2 = 1.2
self.bp1 = 0.9
self.wo1 = 0.3
self.wo2 = 2.9
self.bo1 = 2.1
def lstm_cells(self, input_value, long_memory, short_memory):
blue_cell = torch.sigmoid((short_memory * self.wbc1) +
(input_value * self.wbc2) +
self.bbc1)
green_cell = torch.sigmoid((short_memory * self.wgc1) +
(input_value * self.wgc2)+
self.bbc1)
yellow_cell= torch.tanh((short_memory * self.wyc1) +
(input_value * self.wyc2) + self.byc1)
updated_long_memory = ((long_memory * blue_cell) +
(green_cell * yellow_cell))
gray_cell= torch.sigmoid((short_memory * self.wgrc) +
(input_value * self.wgrc2) + self.bgrc1)
updated_short_memory = torch.tanh(updated_long_memory) * gray_cell
return ([updated_long_memory,updated_short_memory])
def forward(self, input):
long_memory = 2
short_memory = 1
value = input[0]
long_memory, short_memory = self.lstm_cells(value, long_memory, short_memory)
return short_memory
input = 1.
model = SimpleLSTM()
model(torch.tensor([input]).detach())
-OUTPUT-
Predict is
tensor(0.9893)
It is observed that the result is the same as in the example above.
With back propagation
When backpropagation is used, the model looks like that.
We defined Adam as an optimizer.
The model keeps updating the weights and biases until it reaches optimum loss.
class SimpleLSTM(L.LightningModule):
def __init__(self):
super().__init__()
mean = torch.tensor(0.0)
std = torch.tensor(1.0)
zero = torch.tensor(0.)
# Blue cell weights and biases
self.wbc1 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.wbc2 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.bbc1 = nn.Parameter(zero,requires_grad=True)
# Green cell weights and biases
self.wgc1 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.wgc2 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.bgc1 = nn.Parameter(zero,requires_grad=True)
# Yellow cell weights and biases
self.wyc1 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.wyc2 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.byc1 = nn.Parameter(zero,requires_grad=True)
# Gray cell weights and biases
self.wpc1 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.wpc2 = nn.Parameter(torch.normal(mean=mean,std=std),requires_grad=True)
self.bpc1 = nn.Parameter(zero,requires_grad=True)
def lstm_cells(self, input_value, long_memory, short_memory):
blue_cell = torch.sigmoid((short_memory * self.wbc1) +
(input_value * self.wbc2) +
self.bbc1)
green_cell = torch.sigmoid((short_memory * self.wgc1) +
(input_value * self.wgc2)+
self.bbc1)
yellow_cell= torch.tanh((short_memory * self.wyc1) +
(input_value * self.wyc2) + self.byc1)
updated_long_memory = ((long_memory * blue_cell) +
(green_cell * yellow_cell))
gray_cell= torch.sigmoid((short_memory * self.wgrc) +
(input_value * self.wgrc2) + self.bgrc1)
updated_short_memory = torch.tanh(updated_long_memory) * gray_cell
return ([updated_long_memory,updated_short_memory])
def forward(self, input):
long_memory = 0
short_memory = 0
value1 = input[0]
value2 = input[1]
value3 = input[2]
value4 = input[3]
long_memory, short_memory = self.lstm_cells(value1, long_memory, short_memory)
long_memory, short_memory = self.lstm_cells(value2, long_memory, short_memory)
long_memory, short_memory = self.lstm_cells(value3, long_memory, short_memory)
long_memory, short_memory = self.lstm_cells(value4, long_memory, short_memory)
return short_memory
def configure_optimizers(self):
return (Adam(self.parameters()))
def training_step(self, batch, batch_idx):
input_i, label_i = batch
output_i = self.forward(input_i[0])
loss = (output_i - label_i)**2
self.log("train_loss",loss)
if (label_i == 0):
self.log("out_0",output_i)
else:
self.log("out_1",output_i)
return loss
We talk about LSTM architecture and building this using PyTorch. I hope what I wrote was understandable.
Comments