PyTorch系统性学习之入门介绍

pytorch 环境安装及配置

基础的cuda 啥的就不详细描述了，建议用docker、建议用docker、建议用docker，把搞环境的时间花在跑代码上不香吗？本文主要是来自于pytorch 官网，会稍微改下部分内容来作为学习笔记；

# install pytorch 1.4!pip install -U torch torchvision -i https://mirrors.aliyun.com/pypi/simpleLooking in indexes: https://mirrors.aliyun.com/pypi/simpleCollecting torch  Using cached https://mirrors.aliyun.com/pypi/packages/1a/3b/fa92ece1e58a6a48ec598bab327f39d69808133e5b2fb33002ca754e381e/torch-1.4.0-cp37-cp37m-manylinux1_x86_64.whl (753.4 MB)Collecting torchvision  Using cached https://mirrors.aliyun.com/pypi/packages/1c/32/cb0e4c43cd717da50258887b088471568990b5a749784c465a8a1962e021/torchvision-0.5.0-cp37-cp37m-manylinux1_x86_64.whl (4.0 MB)Requirement already satisfied, skipping upgrade: six in /app/anaconda3/lib/python3.7/site-packages (from torchvision) (1.12.0)Requirement already satisfied, skipping upgrade: pillow>=4.1.1 in /app/anaconda3/lib/python3.7/site-packages (from torchvision) (6.2.0)Requirement already satisfied, skipping upgrade: numpy in /app/anaconda3/lib/python3.7/site-packages (from torchvision) (1.17.2)Installing collected packages: torch, torchvisionSuccessfully installed torch-1.4.0 torchvision-0.5.0# verificationfrom __future__ import print_functionimport torchx = torch.rand(5, 3)print(x)tensor([[0.3890, 0.3672, 0.2697],        [0.1633, 0.1091, 0.9061],        [0.0438, 0.5167, 0.5995],        [0.0546, 0.0019, 0.8384],        [0.5708, 0.0217, 0.3954]])# check gpu deviceimport torchtorch.cuda.is_available()True

what is pytorch

Tensors

from __future__ import print_functionimport torchx = torch.empty(5, 3)print(x)x = torch.rand(5, 3)print(x)x = torch.zeros(5, 3, dtype=torch.long)print(x)x = torch.tensor([5.5, 3])print(x)x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizesprint(x)x = torch.randn_like(x, dtype=torch.float)    # override dtype!print(x)    print(x.size())tensor([[0., 0., 0.],        [0., 0., 0.],        [0., 0., 0.],        [0., 0., 0.],        [0., 0., 0.]])tensor([[0.5029, 0.7441, 0.5813],        [0.1014, 0.4897, 0.2367],        [0.2384, 0.6276, 0.0321],        [0.9223, 0.4334, 0.9809],        [0.1237, 0.3212, 0.0656]])tensor([[0, 0, 0],        [0, 0, 0],        [0, 0, 0],        [0, 0, 0],        [0, 0, 0]])tensor([5.5000, 3.0000])tensor([[1., 1., 1.],        [1., 1., 1.],        [1., 1., 1.],        [1., 1., 1.],        [1., 1., 1.]], dtype=torch.float64)tensor([[ 0.5468, -0.4615, -0.0450],        [ 0.5001, -0.9717, -0.6103],        [-0.5345,  0.1126, -0.0836],        [-0.5534,  0.5423, -1.1128],        [-1.3799,  1.3353, -1.6969]])torch.Size([5, 3])

Operaations

y = torch.rand(5, 3)print(x+y)tensor([[ 1.0743, -0.4365,  0.7751],        [ 1.4214, -0.7803, -0.2535],        [ 0.3591,  0.7957,  0.0637],        [-0.3185,  0.5621, -0.9368],        [-0.7098,  1.5445, -1.5394]])print(torch.add(x, y))tensor([[ 1.0743, -0.4365,  0.7751],        [ 1.4214, -0.7803, -0.2535],        [ 0.3591,  0.7957,  0.0637],        [-0.3185,  0.5621, -0.9368],        [-0.7098,  1.5445, -1.5394]])result = torch.empty(5, 3)torch.add(x, y, out=result)print(result)tensor([[ 1.0743, -0.4365,  0.7751],        [ 1.4214, -0.7803, -0.2535],        [ 0.3591,  0.7957,  0.0637],        [-0.3185,  0.5621, -0.9368],        [-0.7098,  1.5445, -1.5394]])# Any operation that mutates a tensor in-place is post-fixed with an _. For example: x.copy_(y), x.t_(), will change x.y.add_(x)print(y)print(x[:, 1])tensor([[ 1.0743, -0.4365,  0.7751],        [ 1.4214, -0.7803, -0.2535],        [ 0.3591,  0.7957,  0.0637],        [-0.3185,  0.5621, -0.9368],        [-0.7098,  1.5445, -1.5394]])tensor([-0.4615, -0.9717,  0.1126,  0.5423,  1.3353])# Resizing: If you want to resize/reshape tensor, you can use torch.view:x = torch.randn(4, 4)y = x.view(16)z = x.view(-1, 8)  # the size -1 is inferred from other dimensionsprint(x.size(), y.size(), z.size())torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])# If you have a one element tensor, use .item() to get the value as a Python numberx = torch.randn(1)print(x)print(x.item())tensor([0.9993])0.999320387840271

Numpy Bridge

# Converting a Torch Tensor to a NumPy Arraya = torch.ones(5)print(a)b = a.numpy()print(b)a.add_(1)print(a)print(b)tensor([1., 1., 1., 1., 1.])[1. 1. 1. 1. 1.]tensor([2., 2., 2., 2., 2.])[2. 2. 2. 2. 2.]# Converting NumPy Array to Torch Tensorimport numpy as npa = np.ones(5)b = torch.from_numpy(a)np.add(a, 1, out=a)print(a)print(b)[2. 2. 2. 2. 2.]tensor([2., 2., 2., 2., 2.], dtype=torch.float64)## CUDA Tensorsif torch.cuda.is_available():    device = torch.device("cuda")          # a CUDA device object    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU    x = x.to(device)                       # or just use strings ``.to("cuda")``    z = x + y    print(z)    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!tensor([1.9993], device='cuda:0')tensor([1.9993], dtype=torch.float64)

AUTOGRAD: AUTOMATIC DIFFERENTIATION

Central to all neural networks in PyTorch is the autograd package. Let’s first briefly visit this, and we will then go to training our first neural network.The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

Tensor

torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.

x = torch.ones(2, 2, requires_grad=True)print(x)y = x + 2print(y)print(y.grad_fn)z = y * y * 3out = z.mean()print(z, out)tensor([[1., 1.],        [1., 1.]], requires_grad=True)tensor([[3., 3.],        [3., 3.]], grad_fn=)tensor([[27., 27.],        [27., 27.]], grad_fn=) tensor(27., grad_fn=)a = torch.randn(2, 2)a = ((a * 3) / (a - 1))print(a.requires_grad)a.requires_grad_(True)print(a.requires_grad)b = (a * a).sum()print(b.grad_fn)FalseTrue

Gradients

out.backward()print(x.grad)tensor([[4.5000, 4.5000],        [4.5000, 4.5000]])x = torch.randn(3, requires_grad=True)y = x * 2while y.data.norm() < 1000:    y = y * 2print(y)tensor([-1138.8549,   484.4676,   417.7082], grad_fn=)v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)y.backward(v)print(x.grad)tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])# stop autograd from tracking history on Tensors with .requires_grad=True either by wrapping the code block in with torch.no_grad():print(x.requires_grad)print((x ** 2).requires_grad)with torch.no_grad():    print((x ** 2).requires_grad)    # Or by using .detach() to get a new Tensor with the same content but that does not require gradients:print(x.requires_grad)y = x.detach()print(y.requires_grad)print(x.eq(y).all())TrueTrueFalseTrueFalsetensor(True)

Neural networks

A typical training procedure for a neural network is as follows:

Define the neural network that has some learnable parameters (or weights)
Iterate over a dataset of inputs
Process input through the network
Compute the loss (how far is the output from being correct)
Propagate gradients back into the network’s parameters
Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient

import torchimport torch.nn as nnimport torch.nn.functional as Fclass Net(nn.Module):    def __init__(self):        super(Net, self).__init__()        # 1 input image channel, 6 output channels, 3x3 square convolution        # kernel        self.conv1 = nn.Conv2d(1, 6, 3)        self.conv2 = nn.Conv2d(6, 16, 3)        # an affine operation: y = Wx + b        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension        self.fc2 = nn.Linear(120, 84)        self.fc3 = nn.Linear(84, 10)    def forward(self, x):        # Max pooling over a (2, 2) window        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))        # If the size is a square you can only specify a single number        x = F.max_pool2d(F.relu(self.conv2(x)), 2)        x = x.view(-1, self.num_flat_features(x))        x = F.relu(self.fc1(x))        x = F.relu(self.fc2(x))        x = self.fc3(x)        return x    def num_flat_features(self, x):        size = x.size()[1:]  # all dimensions except the batch dimension        num_features = 1        for s in size:            num_features *= s        return num_featuresnet = Net()print(net)Net(  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))  (fc1): Linear(in_features=576, out_features=120, bias=True)  (fc2): Linear(in_features=120, out_features=84, bias=True)  (fc3): Linear(in_features=84, out_features=10, bias=True))params = list(net.parameters())print(len(params))print(params[0].size())  # conv1's .weight10torch.Size([6, 1, 3, 3])# feed a random inputinput = torch.randn(1, 1, 32, 32)out = net(input)print(out)tensor([[-0.0082, -0.0266,  0.0843,  0.0188,  0.1456, -0.1081, -0.0937,  0.0086,         -0.0356,  0.0723]], grad_fn=)# Zero the gradient buffers of all parameters and backprops with random gradients:net.zero_grad()out.backward(torch.randn(1, 10))

Loss Function

output = net(input)target = torch.randn(10)  # a dummy target, for exampletarget = target.view(1, -1)  # make it the same shape as outputcriterion = nn.MSELoss()loss = criterion(output, target)print(loss)tensor(1.0206, grad_fn=)print(loss.grad_fn)  # MSELossprint(loss.grad_fn.next_functions[0][0])  # Linearprint(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

Backprop

net.zero_grad()     # zeroes the gradient buffers of all parametersprint('conv1.bias.grad before backward')print(net.conv1.bias.grad)loss.backward()print('conv1.bias.grad after backward')print(net.conv1.bias.grad)conv1.bias.grad before backwardtensor([0., 0., 0., 0., 0., 0.])conv1.bias.grad after backwardtensor([-0.0241, -0.0161, -0.0086, -0.0032,  0.0125,  0.0005])

Update the weights

# using sgdlearning_rate = 0.01for f in net.parameters():    f.data.sub_(f.grad.data * learning_rate)# using custom optimizer in torch.optimimport torch.optim as optim# create your optimizeroptimizer = optim.SGD(net.parameters(), lr=0.01)# in your training loop:optimizer.zero_grad()   # zero the gradient buffersoutput = net(input)loss = criterion(output, target)loss.backward()optimizer.step()    # Does the update

Training a classifier

What about data?

When you have to deal with image, text, audio or video data, you can use standard python packages that load data into a numpy array. Then you can convert this array into a torch.*Tensor.

For images, packages such as Pillow, OpenCV are usefu
For audio, packages such as scipy and librosa
For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful

Training an image classifier

Load and normalizing the CIFAR10 training and test datasets using torchvision
Define a Convolutional Neural Network
Define a loss function
Train the network on the training data
Test the network on the test data

Loading and normalizing CIFAR10

import torchimport torchvisionimport torchvision.transforms as transformstransform = transforms.Compose(    [transforms.ToTensor(),     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])trainset = torchvision.datasets.CIFAR10(root='./data', train=True,                                        download=True, transform=transform)trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,                                          shuffle=True, num_workers=2)testset = torchvision.datasets.CIFAR10(root='./data', train=False,                                       download=True, transform=transform)testloader = torch.utils.data.DataLoader(testset, batch_size=4,                                         shuffle=False, num_workers=2)classes = ('plane', 'car', 'bird', 'cat',           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')Files already downloaded and verifiedFiles already downloaded and verified%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as np# functions to show an imagedef imshow(img):    img = img / 2 + 0.5     # unnormalize    npimg = img.numpy()    plt.imshow(np.transpose(npimg, (1, 2, 0)))    plt.show()# get some random training imagesdataiter = iter(trainloader)images, labels = dataiter.next()# show imagesimshow(torchvision.utils.make_grid(images))# print labelsprint(' '.join('%5s' % classes[labels[j]] for j in range(4)))

cat   cat  deer  ship

Define a Convolutional Neural Network

import torch.nn.functional as Fimport torch.nn as nnclass Net(nn.Module):    def __init__(self):        super(Net, self).__init__()        self.conv1 = nn.Conv2d(3, 6, 5)        self.pool = nn.MaxPool2d(2, 2)        self.conv2 = nn.Conv2d(6, 16, 5)        self.fc1 = nn.Linear(16 * 5 * 5, 120)        self.fc2 = nn.Linear(120, 84)        self.fc3 = nn.Linear(84, 10)    def forward(self, x):        x = self.pool(F.relu(self.conv1(x)))        x = self.pool(F.relu(self.conv2(x)))        x = x.view(-1, 16 * 5 * 5)        x = F.relu(self.fc1(x))        x = F.relu(self.fc2(x))        x = self.fc3(x)        return x    net = Net()

Define a Loss function and optimizer

import torch.optim as optimcriterion = nn.CrossEntropyLoss()optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Train the network

for epoch in range(2):  # loop over the dataset multiple times    running_loss = 0.0    for i, data in enumerate(trainloader):#         print(data)        # get the inputs; data is a list of [inputs, labels]        inputs, labels = data#         print(inputs, labels)        # zero the parameter gradients        optimizer.zero_grad()        # forward + backward + optimize        outputs = net(inputs)        loss = criterion(outputs, labels)        loss.backward()        optimizer.step()        # print statistics        running_loss += loss.item()        if i % 2000 == 1999:    # print every 2000 mini-batches            print('[%d, %5d] loss: %.3f' %                  (epoch + 1, i + 1, running_loss / 2000))            running_loss = 0.0print('Finished Training')[1,  2000] loss: 2.203[1,  4000] loss: 1.875[1,  6000] loss: 1.680[1,  8000] loss: 1.563[1, 10000] loss: 1.480[1, 12000] loss: 1.474[2,  2000] loss: 1.397[2,  4000] loss: 1.365[2,  6000] loss: 1.350[2,  8000] loss: 1.321[2, 10000] loss: 1.302[2, 12000] loss: 1.300Finished TrainingPATH = './cifar_net.pth'torch.save(net.state_dict(), PATH)dataiter = iter(testloader)images, labels = dataiter.next()# print imagesimshow(torchvision.utils.make_grid(images))print(labels)print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

tensor([3, 8, 8, 0])GroundTruth:    cat  ship  ship planenet = Net()net.load_state_dict(torch.load(PATH))outputs = net(images)print(outputs)_, predicted = torch.max(outputs, 1)print(predicted)print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]                              for j in range(4)))tensor([[-8.1609e-01, -7.3227e-01,  1.9178e-01,  1.9166e+00, -9.6811e-01,          8.3362e-01,  1.1127e+00, -1.4818e+00,  3.7150e-01, -7.1563e-01],        [ 7.2351e+00,  4.0154e+00,  1.7224e-03, -2.3247e+00, -2.0117e+00,         -4.3768e+00, -4.1172e+00, -4.6825e+00,  8.7625e+00,  2.0940e+00],        [ 3.9245e+00,  1.3894e+00,  6.4428e-01, -1.0531e+00, -8.4998e-01,         -2.2757e+00, -2.7469e+00, -2.2073e+00,  4.4428e+00,  6.6101e-01],        [ 4.5622e+00, -6.9576e-02,  1.1598e+00, -8.8092e-01,  9.0635e-01,         -2.1905e+00, -1.8022e+00, -2.2323e+00,  4.0340e+00, -7.8086e-01]],       grad_fn=)tensor([3, 8, 8, 0])Predicted:    cat  ship  ship planecorrect = 0total = 0with torch.no_grad():    for data in testloader:        images, labels = data        outputs = net(images)        _, predicted = torch.max(outputs.data, 1)        total += labels.size(0)        correct += (predicted == labels).sum().item()print('Accuracy of the network on the 10000 test images: %d %%' % (    100 * correct / total))Accuracy of the network on the 10000 test images: 55 %class_correct = list(0. for i in range(10))class_total = list(0. for i in range(10))with torch.no_grad():    for data in testloader:        images, labels = data        outputs = net(images)        _, predicted = torch.max(outputs, 1)        c = (predicted == labels).squeeze()        for i in range(4):            label = labels[i]            class_correct[label] += c[i].item()            class_total[label] += 1for i in range(10):    print('Accuracy of %5s : %2d %%' % (        classes[i], 100 * class_correct[i] / class_total[i]))Accuracy of plane : 66 %Accuracy of   car : 59 %Accuracy of  bird : 36 %Accuracy of   cat : 38 %Accuracy of  deer : 56 %Accuracy of   dog : 27 %Accuracy of  frog : 73 %Accuracy of horse : 59 %Accuracy of  ship : 72 %Accuracy of truck : 63 %

Training on GPU

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")# Assuming that we are on a CUDA machine, this should print a CUDA device:print(device)net.to(device)for epoch in range(2):  # loop over the dataset multiple times    running_loss = 0.0    for i, data in enumerate(trainloader):#         print(data)        # get the inputs; data is a list of [inputs, labels]        inputs, labels = data[0].to(device), data[1].to(device)#         print(inputs, labels)        # zero the parameter gradients        optimizer.zero_grad()        # forward + backward + optimize        outputs = net(inputs)        loss = criterion(outputs, labels)        loss.backward()        optimizer.step()        # print statistics        running_loss += loss.item()        if i % 2000 == 1999:    # print every 2000 mini-batches            print('[%d, %5d] loss: %.3f' %                  (epoch + 1, i + 1, running_loss / 2000))            running_loss = 0.0print('Finished Training')cuda:0[1,  2000] loss: 1.195[1,  4000] loss: 1.204[1,  6000] loss: 1.204[1,  8000] loss: 1.188[1, 10000] loss: 1.207[1, 12000] loss: 1.228[2,  2000] loss: 1.191[2,  4000] loss: 1.209[2,  6000] loss: 1.202[2,  8000] loss: 1.209[2, 10000] loss: 1.214[2, 12000] loss: 1.203Finished Trainingnet = Net()net.load_state_dict(torch.load(PATH))correct = 0total = 0with torch.no_grad():    for data in testloader:        images, labels = data        outputs = net(images)        _, predicted = torch.max(outputs.data, 1)        total += labels.size(0)        correct += (predicted == labels).sum().item()print('Accuracy of the network on the 10000 test images: %d %%' % (    100 * correct / total))Accuracy of the network on the 10000 test images: 55 %

Training on multi gpus

if torch.cuda.device_count() > 1:  print("Let's use", torch.cuda.device_count(), "GPUs!")net = nn.DataParallel(net)net.to(device)for epoch in range(4):  # loop over the dataset multiple times    running_loss = 0.0    for i, data in enumerate(trainloader):#         print(data)        # get the inputs; data is a list of [inputs, labels]        inputs, labels = data[0].to(device), data[1].to(device)#         print(inputs, labels)        # zero the parameter gradients        optimizer.zero_grad()        # forward + backward + optimize        outputs = net(inputs)        loss = criterion(outputs, labels)        loss.backward()        optimizer.step()        # print statistics        running_loss += loss.item()        if i % 2000 == 1999:    # print every 2000 mini-batches            print('[%d, %5d] loss: %.3f' %                  (epoch + 1, i + 1, running_loss / 2000))            running_loss = 0.0print('Finished Training')Let's use 2 GPUs![1,  2000] loss: 1.202[1,  4000] loss: 1.197[1,  6000] loss: 1.202[1,  8000] loss: 1.194[1, 10000] loss: 1.211[1, 12000] loss: 1.210[2,  2000] loss: 1.208[2,  4000] loss: 1.184[2,  6000] loss: 1.215[2,  8000] loss: 1.198[2, 10000] loss: 1.201[2, 12000] loss: 1.208[3,  2000] loss: 1.206[3,  4000] loss: 1.209[3,  6000] loss: 1.198[3,  8000] loss: 1.206[3, 10000] loss: 1.209[3, 12000] loss: 1.208[4,  2000] loss: 1.203[4,  4000] loss: 1.206[4,  6000] loss: 1.203[4,  8000] loss: 1.187[4, 10000] loss: 1.220[4, 12000] loss: 1.207Finished Training

（正文已结束）