Jupyter Lab Shortcuts
Shift + Tab (Windows) to see a function params and definition
References
https://github.com/ritchieng/the-incredible-pytorch
https://github.com/yunjey/pytorch-tutorial
https://github.com/vahidk/EffectivePyTorch
https://github.com/dsgiitr/d2l-pytorch
https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html#module-numpy.doc.broadcasting
import torch
import numpy as np
#Conversion from numpy array to torch tensor and vice versa
tensor_0 = torch.Tensor([1,2])
arr = np.random.rand(2,2)
tensor_1 = torch.from_numpy(arr)
print(type(arr))
print(type(tensor_1))
<class 'numpy.ndarray'>
<class 'torch.Tensor'>
torch_tensor = torch.ones(2,2)
type(torch_tensor)
torch.Tensor
numpy_arr = torch_tensor.numpy()
print(type(numpy_arr))
<class 'numpy.ndarray'>
#CPU vs GPU
tensor_init = torch.rand(2,2) #this is on CPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
tensor_init.to(device) #to GPU
tensor_init.cpu() #to back to CPU
tensor([[0.6775, 0.7982],
[0.4669, 0.1016]])
c = torch.add(tensor_init, torch_tensor)
print(c)
c.add_(c)
tensor([[1.6775, 1.7982],
[1.4669, 1.1016]])
tensor([[3.3550, 3.5964],
[2.9338, 2.2032]])
c=torch.sub(tensor_init,c)
print(c)
c.sub_(c) #only this is in place (_)
print(c.sub(torch.ones(2,2))) #not in place
print(c)
tensor([[0.6775, 0.7982],
[0.4669, 0.1016]])
tensor([[-1., -1.],
[-1., -1.]])
tensor([[0., 0.],
[0., 0.]])
#element wise mul
c=torch.rand(2,1).mul(torch.rand(2,1))
print(c) #similarly div for division, also mul_ and div_
tensor([[0.0195],
[0.0051]])
c.size()
print(c)
print(c.long())
print(c)
tensor([[0.0195],
[0.0051]])
tensor([[0],
[0]])
tensor([[0.0195],
[0.0051]])
a = torch.Tensor([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
a.size()
torch.Size([2, 10])
a.mean(dim=1) #dim in pytorch is same as axis in numpy
tensor([5.5000, 5.5000])
a.std() #np.cov can be used if you want to calculate covariace matrix
tensor(2.9469)
#view
A torch.Tensor if has .requires_grad set as True, all the operations on that tensor will be tracked.
Call .backward() after finishing all operations to compute the gradients and .grad attribute saves all the gradients with respect to other vars.
with torch.no_grad(): for when we don’t want to track
Each torch.Tensor has a Function (.grad_fn) which has created the Tensor as a result of operations defined in the computational graph.
Note: if your Tensor is multi-dimensional, need to specify a gradient argument that is a tensor of matching shape
from torchviz import make_dot #open terminal and install if using jupyter lab, much easier than notebook
x = torch.tensor([[1.,2.], [3.,4.]], requires_grad=True)
print(x)
tensor([[1., 2.],
[3., 4.]], requires_grad=True)
Note: See in the next cell the error when we try to compute using numpy
This is expected behavior because moving to numpy will break the graph and so no gradient will be computed. If you don’t actually need gradients, then you can explicitly .detach() the Tensor that requires grad to get a tensor with the same content that does not require grad. This other Tensor can then be converted to a numpy array.
y = np.power(np.linalg.norm(x), 2) + 4
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-40-eea598aa356e> in <module>
----> 1 y = np.power(np.linalg.norm(x), 2) + 4
<__array_function__ internals> in norm(*args, **kwargs)
~\Anaconda3\lib\site-packages\numpy\linalg\linalg.py in norm(x, ord, axis, keepdims)
2458
2459 """
-> 2460 x = asarray(x)
2461
2462 if not issubclass(x.dtype.type, (inexact, object_)):
~\Anaconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
~\Anaconda3\lib\site-packages\torch\tensor.py in __array__(self, dtype)
410 def __array__(self, dtype=None):
411 if dtype is None:
--> 412 return self.numpy()
413 else:
414 return self.numpy().astype(dtype, copy=False)
RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
#So how do we circumvent this? If we don't need gradient on this, we can use
y = np.power(np.linalg.norm(x.detach().numpy()), 2) + 4
#But if we do need gradients, we will need to use Pytorch functions instead
y = torch.norm(x).pow(2) + 4
y.requires_grad #Since we computer y using x, this also requires grad now
True
#y was created as a result of an operation, so it has a grad_fn unlike x which was created by us
print(x.grad_fn)
print(y.grad_fn)
None
<AddBackward0 object at 0x00000235754CAA58>
z = y.pow(3).mean()
out = z + 0
print(y)
print(out)
tensor(34., grad_fn=<AddBackward0>)
tensor(39304., grad_fn=<AddBackward0>)
#Let's backpropagate. Because out is a scalar, we can directly do this
out.backward()
print(y.grad)
print(x.grad)
None
tensor([[ 6936., 13872.],
[20808., 27744.]])
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(y)
tensor([ -810.8762, -1189.0673, -784.6422], grad_fn=<MulBackward0>)
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)
tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])
#Another way to differentiate
x = torch.tensor(2., requires_grad=True)
def u(x):
return x**2
def g(u):
return -u
dgdx = torch.autograd.grad(g(u(x)), x)
print(dgdx)
y = torch.tensor(2., requires_grad=True)
def u_xy(x, y):
return x**2 + y**3 + 2*y
def g_xy(u):
return -u
dgdx = torch.autograd.grad(g_xy(u_xy(x, y)), x)
dgdy = torch.autograd.grad(g_xy(u_xy(x, y)), y)
print(dgdx)
print(dgdy)
(tensor(-4.),)
(tensor(-4.),)
(tensor(-14.),)
w = torch.tensor(torch.randn([3,1]), requires_grad=True) #requires_grad is True as it's a parameter which user initialised
opt = torch.optim.Adam([w], lr=0.1)
def model(x):
#basis fn
f = torch.stack([x*x, x, torch.ones(x.shape)], 1)
#print(f.shape)
yhat = torch.squeeze(f@w, 1) #why torch.squeeze?
return yhat
def loss(y, yhat):
l = torch.nn.functional.mse_loss(yhat, y).sum()
return l
def generate_data():
x = torch.rand(100)*20 -10
y = 5*x*x + 3
return x, y
def train_step():
x,y = generate_data()
yhat = model(x)
l = loss(y, yhat)
opt.zero_grad()
l.backward()
opt.step()
for _ in range(1000):
train_step()
print(w.detach().numpy())
C:\Users\hp\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
"""Entry point for launching an IPython kernel.
[[4.9837837e+00]
[2.8237203e-04]
[3.9645624e+00]]
We use torch.squeeze to make yhat [100] to avoid broadcasting and getting wrong results. torch.squeeze removes dimensions of 1
C:\Users\hp\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). “"”Entry point for launching an IPython kernel. C:\Users\hp\Anaconda3\lib\site-packages\ipykernel_launcher.py:14: UserWarning: Using a target size (torch.Size([100])) that is different to the input size (torch.Size([100, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
[[ 1.6391281 ] [-0.15200512] [77.61232 ]]
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
they are equal, or one of them is 1
If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.
x = np.arange(4) xx = x.reshape(4,1) y = np.ones(5) z = np.ones((3,4))
x.shape (4,)
y.shape (5,)
x + y ValueError: operands could not be broadcast together with shapes (4,) (5,)
xx.shape (4, 1)
y.shape (5,)
(xx + y).shape (4, 5)
xx + y array([[ 1., 1., 1., 1., 1.], [ 2., 2., 2., 2., 2.], [ 3., 3., 3., 3., 3.], [ 4., 4., 4., 4., 4.]])
x.shape (4,)
z.shape (3, 4)
(x + z).shape (3, 4)
x + z array([[ 1., 2., 3., 4.], [ 1., 2., 3., 4.], [ 1., 2., 3., 4.]])
a = torch.rand([5, 3, 5])
b = torch.rand([5, 1, 6])
linear1 = torch.nn.Linear(5, 10)
linear2 = torch.nn.Linear(6, 10)
pa = linear1(a)
print(pa.shape)
pb = linear2(b)
print(pb.shape)
d = torch.nn.functional.relu(pa + pb)
print(d.shape)
torch.Size([5, 3, 10])
torch.Size([5, 1, 10])
torch.Size([5, 3, 10])
Steps: (Taken from Ritchie Ng’s course)
Step 1. Load Dataset
Step 2. Make Dataset Iterable
Step 3. Create Model Class
Step 4. Instantiate Model Class
Step 5. Instantiate Loss Class
Step 6. Instantiate Optimizer Class
Step 7. Train Model
Step 8. Test Model
To train model (Step 7), we need to follow these steps:
Step 7.1. Convert inputs to tensors with grad accumulation capabilities
Step 7.2. Clear Gradient buffers
Step 7.3. Get output given inputs
Step 7.4. Get loss
Step 7.5. Get gradients w.r.t. parameters
Step 7.6 Update params using gradients
Repeat
We are working on the IRIS dataset and we’ll classify it using a standard feed-forward network.
References
https://pytorch.org/docs/stable/data.html
https://debuggercafe.com/custom-dataset-and-dataloader-in-pytorch/
https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel
First we need to use the Dataset class to instantiate our Dataset. We can either do this using Map-styled dataset (Dataset class) or iterable-style datasets (IterableDataset class). We use Iterable style generally for realtime data and when data is too large for random reads. Here let’s try using map-style dataset.
import pandas as pd
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
#from torchvision import transforms, utils
from sklearn.preprocessing import LabelEncoder
iris = pd.read_csv('iris.csv', header='infer')
iris.head()
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
le = LabelEncoder()
le.fit(iris['species'])
iris['species'] = le.transform(iris['species'])
iris.tail()
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
145 | 6.7 | 3.0 | 5.2 | 2.3 | 2 |
146 | 6.3 | 2.5 | 5.0 | 1.9 | 2 |
147 | 6.5 | 3.0 | 5.2 | 2.0 | 2 |
148 | 6.2 | 3.4 | 5.4 | 2.3 | 2 |
149 | 5.9 | 3.0 | 5.1 | 1.8 | 2 |
Let’s stop for a moment and think what we want. We need to be able to get x, y from our data and for that we inherit from a built-in Pytorch class - Dataset
#Now we need train and test data. But we do need to split the data in a way it's shuffled
from sklearn.model_selection import train_test_split
train, test = train_test_split(iris, test_size=0.1, random_state=5)
train.head(20) #now it's shuffled
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
23 | 5.1 | 3.3 | 1.7 | 0.5 | 0 |
123 | 6.3 | 2.7 | 4.9 | 1.8 | 2 |
130 | 7.4 | 2.8 | 6.1 | 1.9 | 2 |
21 | 5.1 | 3.7 | 1.5 | 0.4 | 0 |
12 | 4.8 | 3.0 | 1.4 | 0.1 | 0 |
71 | 6.1 | 2.8 | 4.0 | 1.3 | 1 |
128 | 6.4 | 2.8 | 5.6 | 2.1 | 2 |
48 | 5.3 | 3.7 | 1.5 | 0.2 | 0 |
72 | 6.3 | 2.5 | 4.9 | 1.5 | 1 |
88 | 5.6 | 3.0 | 4.1 | 1.3 | 1 |
148 | 6.2 | 3.4 | 5.4 | 2.3 | 2 |
74 | 6.4 | 2.9 | 4.3 | 1.3 | 1 |
96 | 5.7 | 2.9 | 4.2 | 1.3 | 1 |
63 | 6.1 | 2.9 | 4.7 | 1.4 | 1 |
132 | 6.4 | 2.8 | 5.6 | 2.2 | 2 |
39 | 5.1 | 3.4 | 1.5 | 0.2 | 0 |
53 | 5.5 | 2.3 | 4.0 | 1.3 | 1 |
79 | 5.7 | 2.6 | 3.5 | 1.0 | 1 |
10 | 5.4 | 3.7 | 1.5 | 0.2 | 0 |
50 | 7.0 | 3.2 | 4.7 | 1.4 | 1 |
class IrisDataset(Dataset):
def __init__(self, inputs, labels, transform=None):
#self.iris_df = pd.read_csv(csv_file)
self.X = inputs
self.Y = labels
self.transform = transform
def __len__(self):
#print(len(self.X))
return len(self.X)
def __getitem__(self, i):
data = self.X.iloc[i, :]
if self.tansform:
data = self.transform(data)
if self.Y is not None:
return (data, self.Y[i])
else:
return data
train_data = IrisDataset(train.iloc[:, 0:4], train.iloc[:, 4])
test_data = IrisDataset(test.iloc[:, 0:4], test.iloc[:, 4])
print(test_data.X)
sepal_length sepal_width petal_length petal_width
82 5.8 2.7 3.9 1.2
134 6.1 2.6 5.6 1.4
114 5.8 2.8 5.1 2.4
42 4.4 3.2 1.3 0.2
109 7.2 3.6 6.1 2.5
57 4.9 2.4 3.3 1.0
1 4.9 3.0 1.4 0.2
70 5.9 3.2 4.8 1.8
25 5.0 3.0 1.6 0.2
84 5.4 3.0 4.5 1.5
66 5.6 3.0 4.5 1.5
133 6.3 2.8 5.1 1.5
102 7.1 3.0 5.9 2.1
107 7.3 2.9 6.3 1.8
26 5.0 3.4 1.6 0.4
Now we have our dataset, let’s make it iterable using dataloaders.
We have 135 training examples. We use batch training - first because we don’t want to update weights only after going through all the data, and second (not applicable here), if dataset is very large, no batch training might have to load all inputs at once and we might not have that much RAM.
See this thread : https://discuss.pytorch.org/t/i-run-out-of-memory-after-a-certain-amount-of-batches-when-training-a-resnet18/1911/7
Make sure to clean your data structures after each iteration otherwise your program will run out of memory.
Now, if our batch size=5, then 145/5 = 29 iterations
An epoch means we have gone over our whole dataset once. One epoch has 29 iterations in this case. Let’s say we want 10 epochs.
So 10x29 = 290 total iterations.
batch_size = 5
n_iters = 290
num_epochs = n_iters/(len(train_data)/batch_size)
num_epochs = int(num_epochs)
train_loader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=False)
References:
https://github.com/vahidk/EffectivePyTorch
https://pytorch.org/docs/stable/nn.html
First a bit about Pytorch’s Modules. According to Pytorch documentation, torch.nn.Module is a Container(fancy name for a class here, but could have been a data structure as well) which acts as a Base class for all Neural Network modules. If we are making a custom neural net, we should subclass it.
“Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes.”
When you subclass the Module class, define _ init _() and forward() functions. This allows Pytorch to set up the computational graph and then easily call backward and find gradients.
Side note: “Parameters are essentially tensors with requires_grad set to true. It’s convenient to use parameters because you can simply retrieve them all with module’s parameters() method”
class FeedForwardNeuralNet(torch.nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(FeedForwardNeuralNet, self).__init__()
self.fc1 = torch.nn.Linear(input_dim, hidden_dim)
self.sigmoid = torch.nn.Sigmoid() #This is sigmoid layer, not nn.Functional.sigmoid - that's different
self.fc2 = torch.nn.Linear(hidden_dim, output_dim)
def forward(self, x):
out = self.fc1(x)
out = self.sigmoid(out)
out = self.fc2(out)
return out
input_dim = 4 #dimension of x, or number of features
hidden_dim = 20
output_dim = 3 #number of classes
#Instantiate our model
model = FeedForwardNeuralNet(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim)
Now let’s decide on a loss criterion. Since it’s a classification problem, we can use cross-entropy to compute the loss between our model’s output softmax distribution and the labels.
The Cross Entropy Function does 2 things at the same time.
See this: https://pytorch.org/docs/master/nn.html?highlight=crossentropyloss#torch.nn.CrossEntropyLoss
criterion = torch.nn.CrossEntropyLoss()