Linear regression with PyTorch

Abhishek Koirala
9 min readApr 20, 2020
Photo by Gordon Johnson on Pixabay

Linear Regression

Linear regression attempts to model the connection between two variables by fitting a linear equation to the observed data. One variable is taken into account as an explanatory variable, while the opposite is taken into account to be a dependent variable. Let us consider two variables X and Y and that we want to relate the Y to X employing a rectilinear regression model.

x=[[1],[2],[3],[3.5]]
y=[[2],[4],[6],[7.5]]

Before even trying any linear model, one should first find out if there is a relationship between the variables. One can use the scatter plot in determining such a relationship. Let’s see how our scatter plot looks like

Attempting regression isn’t meaningful if there is no relationship between the proposed factors. The level of affiliation can likewise be checked utilizing the relationship between these two factors. Here, we will continue with our visual perception and supposition that these two factors are associated with a degree to such an extent that we can attempt to draw a regression line. A regression line is a straight line, so it follows the equation Y=mX+C. Here, X is the explanatory variable, and Y is the dependent variable. m is the slant of the line while C is the y-intercept[1]

Least Squares

Before diving into code, one must understand the concept of fitting a regression line using least squares. This method calculates the best fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line[2]. If the point lies exactly on the regression line, then it’s vertical deviations is 0. The deviations are first squared and summed, so there is no risk of cancellations between the predicted points up or down the regression line. This concept is also used in loss function in linear regression (Mean Square Error)

PyTorch

If you are looking at this article, I hope you are familiar with NumPy in Python. PyTorch is a deep learning-focused library while Numpy is for scientific computing. One of the main reasons for selecting PyTorch over NumPy is because of PyTorch’s ability of GPU acceleration. Using a GPU with PyTorch is super easy and super fast. In the case of large computations, this is beneficial because it speeds things a lot.

I would like to summarize the implementation of Linear regression using PyTorch in 7 steps. I assume you have already installed PyTorch in your system.

Step 1: Required imports

import torch
import torch.nn as nn
from torch.autograd import Variable

PyTorch uses a method called automatic differentiation. A recorder records what operation has been performed, and then it replay it backward to compute the gradients. PyTorch autograd makes it easy to define computational graphs and take gradients, but raw autograd can be a bit too low-level for defining complex neural networks[3]. That’s where the nn.Module comes into play where we can simply use provided libraries hiding those complex network definitions.

Step 2: Building a model

class LinearRegressionModel(nn.Module):
def __init__(self, input_size, output_size):
super(LinearRegressionModel, self).__init__()
self.linear=nn.Linear(input_dim, output_dim)

def forward(self, x):
out=self.linear(x)
return out

Here we have defined our own custom module for linear regression. Here we subclass the nn.Module. We then define a forward function that will receive input tensors and produce output tensors. This is just a one-layer network implementation. As discussed earlier nn.Module allows us to simply use nn.Linear, hiding all the complex network architecture implementation needed for the definition of the Linear module that we are using. The linear module is first initialized with the number of input parameters and output parameters in the initialization function. The input is later processed to generate some output in the forward function.

Step 3: Instantiate the model class

#instantiate model class
input_dim=1
output_dim=1
model=LinearRegressionModel(input_dim, output_dim)

We are giving only one input (height) and expecting one output (weight). Hence, the input_dim and output_dim are declared as 1. We instantiate our LinearRegressionModel class using these dimensions.

Step 4: Instantiate loss

#instantiate loss
criterion=nn.MSELoss()

Loss is a mathematical way of measuring how wrong our predictions were. As mentioned above, the least-squares comes into play here. Mean Square Error(MSE) is an extension of least squares. We take some of these least-squares and generate their mean which will give a single value. We use this single value as the loss between our original value of y and the predicted value of y. Mathematically mean square loss can be defined as:

Mean square loss

Step 5: Instantiate optimizer class

# instantiate optimizer class
learning_rate=0.01
optimizer=torch.optim.SGD(model.parameters(), lr=learning_rate)

The goal of linear regression is to minimize the overall loss. To minimize this loss we try tweaking and changing the parameters of our model during training. But how do we change those parameters? This is where optimizer comes into play. They tie together the loss function and model parameters by updating the model in response to the output of the loss function. The loss function is the guide to the terrain, telling the optimizer when it’s moving in the right or wrong direction. Imagine a hiker trying to get down a mountain with a blindfold on. It’s impossible to know which direction to go in, but the hiker can know if he/she is going down or going up. If the hiker keeps going down, he/she will eventually reach the base. We use the similar concept of optimizer here called gradient descent and use Stochastic Gradient Descent(SGD) for this example as our optimizer. In SGD, we only use a subset of the training example for calculating gradients each time instead of calculating gradients for all our training examples on every pass. SGD is very fast and efficient if we have a large volume of data. SGD either uses batches of examples at a time or random examples in each pass.

The value of the learning rate is 0.01. The learning rate plays an important role in changing the pace by which our weights change. If our weights start changing by a bigger margin, it can hinder our ability to minimize the loss function. Using a smaller value of learning rate enables us to change our weights gradually without missing the minimal target loss.

Step 6: Training our model

epochs=100
for epoch in range(epochs):
epoch+=1

#convert numpy array to torch Variable
inputs=Variable(torch.from_numpy(X_train))
labels=Variable(torch.from_numpy(y_train))
#clear gradients wrt parameters
optimizer.zero_grad()

#Forward to get outputs
outputs=model(inputs.float())

#calculate loss
loss=criterion(outputs.float(), labels.float())

#getting gradients wrt parameters
loss.backward()

#updating parameters
optimizer.step()

print('epoch {}, loss {}'.format(epoch, loss.data))

The model will be trained for 100 epochs. The number of epoch is not fixed. A larger epoch value will minimize the mean square loss more and more, but at one point the error gets minimally saturated. So, any training after that would not result in much change in the loss and predicted value. We previously had our values x and y in terms of matrices. We will first convert them to the tensor variables for processing using PyTorch. At the beginning of every epoch, the gradients will be reset for the new value of gradient for that particular epoch. The input variable tensor is fed to the model and predicted output is obtained after every epoch. The loss is calculated and the gradient of mean square loss is obtained. The parameters are updated using the formula:

parameters=parameters-learning_rate*gradient

Using this, the parameters are updated and the training further moves to the next epoch. Here are my training results for 100 epochs.

epoch 1, loss 36.08680725097656
epoch 2, loss 26.15007781982422
epoch 3, loss 18.950008392333984
epoch 4, loss 13.732902526855469
epoch 5, loss 9.952630996704102
epoch 6, loss 7.21347188949585
epoch 7, loss 5.228693962097168
epoch 8, loss 3.7905349731445312
epoch 9, loss 2.7484498023986816
epoch 10, loss 1.993356466293335
epoch 11, loss 1.4462133646011353
epoch 12, loss 1.0497496128082275
epoch 13, loss 0.7624667882919312
epoch 14, loss 0.5542953014373779
epoch 15, loss 0.40344783663749695
epoch 16, loss 0.2941356897354126
epoch 17, loss 0.21492034196853638
epoch 18, loss 0.1575128585100174
epoch 19, loss 0.11590703576803207
epoch 20, loss 0.08575120568275452
epoch 21, loss 0.06389184296131134
epoch 22, loss 0.048044003546237946
epoch 23, loss 0.03655223548412323
epoch 24, loss 0.02821674942970276
epoch 25, loss 0.02216847985982895
epoch 26, loss 0.01777726039290428
epoch 27, loss 0.014587100595235825
epoch 28, loss 0.01226697489619255
epoch 29, loss 0.010577471926808357
epoch 30, loss 0.00934491865336895
epoch 31, loss 0.00844346173107624
epoch 32, loss 0.007782032713294029
epoch 33, loss 0.007294544950127602
epoch 34, loss 0.00693306140601635
epoch 35, loss 0.006663032807409763
epoch 36, loss 0.006459235213696957
epoch 37, loss 0.006303474307060242
epoch 38, loss 0.006182607728987932
epoch 39, loss 0.006087017245590687
epoch 40, loss 0.006009785458445549
epoch 41, loss 0.005945917218923569
epoch 42, loss 0.005891732405871153
epoch 43, loss 0.005844651721417904
epoch 44, loss 0.005802752450108528
epoch 45, loss 0.0057646180503070354
epoch 46, loss 0.0057292478159070015
epoch 47, loss 0.005695942789316177
epoch 48, loss 0.005664162337779999
epoch 49, loss 0.00563353206962347
epoch 50, loss 0.005603751167654991
epoch 51, loss 0.005574648734182119
epoch 52, loss 0.005546061787754297
epoch 53, loss 0.005517900455743074
epoch 54, loss 0.005490065552294254
epoch 55, loss 0.005462489556521177
epoch 56, loss 0.0054351817816495895
epoch 57, loss 0.005408088210970163
epoch 58, loss 0.005381157621741295
epoch 59, loss 0.005354408174753189
epoch 60, loss 0.005327821243554354
epoch 61, loss 0.005301395431160927
epoch 62, loss 0.005275100935250521
epoch 63, loss 0.005248964764177799
epoch 64, loss 0.005222935229539871
epoch 65, loss 0.005197084974497557
epoch 66, loss 0.005171335302293301
epoch 67, loss 0.005145719274878502
epoch 68, loss 0.005120237357914448
epoch 69, loss 0.0050948867574334145
epoch 70, loss 0.005069647915661335
epoch 71, loss 0.005044546443969011
epoch 72, loss 0.005019560921937227
epoch 73, loss 0.004994711838662624
epoch 74, loss 0.004969998728483915
epoch 75, loss 0.0049453903920948505
epoch 76, loss 0.004920894280076027
epoch 77, loss 0.004896543920040131
epoch 78, loss 0.004872283432632685
epoch 79, loss 0.004848182201385498
epoch 80, loss 0.004824173171073198
epoch 81, loss 0.004800265189260244
epoch 82, loss 0.004776499234139919
epoch 83, loss 0.004752847831696272
epoch 84, loss 0.00472932169213891
epoch 85, loss 0.004705924540758133
epoch 86, loss 0.004682612605392933
epoch 87, loss 0.004659437108784914
epoch 88, loss 0.004636351019144058
epoch 89, loss 0.0046133967116475105
epoch 90, loss 0.004590551368892193
epoch 91, loss 0.004567836411297321
epoch 92, loss 0.004545208998024464
epoch 93, loss 0.004522724077105522
epoch 94, loss 0.004500317387282848
epoch 95, loss 0.004478035029023886
epoch 96, loss 0.00445586396381259
epoch 97, loss 0.004433815833181143
epoch 98, loss 0.004411864560097456
epoch 99, loss 0.004390021786093712
epoch 100, loss 0.00436827028170228

As stated earlier, epoch 1–12 sees a drastic decline in the loss. But after epoch 12 until epoch 100 the loss is slowly declined. This means the mean square loss was already converging after 50 epochs. So the remaining 50 epochs of training were actually unnecessary. Hence, the value of the epoch is experimental and depends on the number of data and distribution of data. Let’s see how our loss is converging in the graph below.

Epoch vs Loss curve

We can now analyze our true vs predicted value

True values:
array([[2.],
[4.],
[6.],
[7.]])
Predicted values:
array([[1.9737258],
[3.9890819],
[6.004438 ],
[7.012116 ]], dtype=float32

As you can see the predicted values are somehow closer to the actual values. Now in the last step, we plot the regression line.

Step 7: Plotting the regression line along with actual data

# plot graph
import matplotlib.pyplot as plt
#clear figure
plt.clf()
#get predictions
predicted=predicted
#plot true data
plt.plot(X_train, y_train, ‘go’, label=’True data’, alpha=0.50)
#plot predictions
plt.plot(X_train, predicted, ‘ — ‘, label=’Predictions’, alpha=0.50)
#Legend and plot
plt.legend(loc=’best’)
plt.show()
Regression line plot with true data

This was the basic implementation of linear regression using PyTorch. The structure of the class and the steps(model building and initialization, loss definition, optimizer definition, training) are going to be the same for further machine learning processes too.

I hope this is useful for someone who has just started exploring PyTorch.

References

[1]Yale, “Linear Regression”, “http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm

[2]Anjanita, “My Understanding about Linear Regression — Part I”, “http://analytics-anjanita.blogspot.com/2010/08/my-understanding-about-linear.html

[3]Wikipedia, “PyTorch”, “https://en.wikipedia.org/wiki/PyTorch

PyTorch, “Deep Learning with PyTorch: A 60 Minute Blitz”, “https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

Aakash N S, “PyTorch basics — Linear Regression from scratch”, “https://www.kaggle.com/aakashns/pytorch-basics-linear-regression-from-scratch

Deep Learning Wizard, Udemy, “Practical Deep Learning with PyTorch”, “https://www.udemy.com/course/practical-deep-learning-with-pytorch/

--

--

Abhishek Koirala

NLP Engineer and Python Developer at SB Web Technology, Senior Lecturer at New Summit College, Kathmandu