Pytorch notes

1. tensor

1.1. init

由基础类创建，tensor类型先 tensor 再requires_grad 再转类型，默认为int
- Torch.Tensor ，类，可用于初始化空的
- Torch.tensor ，函数，可用于初始化非空
1
a = torch.tensor([1.,2,3], requires_grad=True)
1
a = torch.FloatTensor([1,2,3]).requires_grads_()
其他函数创建：tensor其他函数默认floattensor requires_grad为false

1	a= torch.zeros().float()

scipy创建
由numpy转换：中途转换numpy的时候没办法再用torch.from_numpy（共享内存）FloatTensor等等
由别的tensor转换：中途转换别的tensor的时候没办法再用Variable

1.2. attributes

.shape
.size()
.type()查看tensor本身属于哪个类

这两个一般是统一的。
.layout 是决定是并不是 torch.strided torch.sparse (sparse spmm dense = dense/用.to_dense转化）
.dype 是决定是什么类型和dtype一样（主要float用于计算，long用于index/用.float转化，type_as(xx))）
.device 是决定是什么类型cuda （一定要保证一致/用.cuda转化）

1.3. methods

1.3.1. tensor本身属性（最大值等）

求最大值
- .max()
  - 0表示值，1表示index
  - 来自 https://pytorch.org/docs/stable/tensors.html#torch.Tensor.clamp
直接得到数字
- .item()
求非0坐标
- torch.nonzero()
  - 返回非0元素的坐标，shape由原tensor决定
求非0值坐标
- torch.nonzero = np.where(）
- 返回准确坐标1,2],[2,5]
排序
- torch.sort()
  - 返回拍好序的坐标，shape由原长度决定
求所有元素个数
- torch.numel()
  - 返回一个tensor变量内所有元素个数,可以理解为矩阵内元素的个数

1.3.2. 从tensor取东西（取某一部分）

根据index，筛选
- [index_list,:]
根据index，筛选（同上）
- torch.index_select(x, 1, indices)
  - 来自 https://blog.csdn.net/jacke121/article/details/83044660 、
根据mask，筛选，输出不是原shape
- torch.masked_select(input, mask)
- 创建mask torch.ByteTensor(x>0)

根据条件，筛选或者更改

torch.where

例子：把矩阵x中大于5的变成5了

1
2
3

import torch
x = torch.linspace(1, 27, steps=27).view(9, 3)  
bbb = torch.where(x > 5, torch.full_like(x, 5), x)

根据index，一行行或者一列列筛选（可超过原大小）
- torch.gather(xx ,1, longtensor([index]))
  xxshape=（2，3），如果是dim=0，相当于一行一行选择（遗憾的是必须是shape保持一致），index的shape必须是（?,3），index的内容是选择列的位置。如果dim=1，相当于一列列选择，index=（2，？）

1.3.3. tensor自身变化（旋转、resize等）

维度旋转
- .permute(1,0,2)
  - 原不变。将原来第1维变为0维，同理，0→1,2→2
- .transpose(0,1)
  - 原变。只能交换两个维度
维度resize
- .resize_()
  - 原变
- .view()
  - 原不变，但是如果改变后者会变的。
- .reshape()
  - 原不一定
  - .reshape() = .contiguous.view()
- .expand_as(a)
  - 原变
  - 注意要在维数为1的维度做这个操作。把这一维的东西放到所有这个维
  - 例子：(4,1).expand_as(4,2)
去掉维度为1的
- .squeeze()
- .unsqueeze()
排序
- .Sort()
  - 0 是拍好序的值
  - 1 是index
连续化（用于其他变化后）
- .contiguous()
  - contiguous：view只能用在contiguous的variable上。如果在view之前用了transpose, permute等，需要用contiguous()来返回一个contiguous copy。
  - 一种可能的解释是：有些tensor并不是占用一整块内存，而是由不同的数据块组成，而tensor的view()操作依赖于内存是整块的，这时只需要执行contiguous()这个函数，把tensor变成在内存中连续分布的形式。
  - 判断是否contiguous用torch.Tensor.is_contiguous()函数。
四舍五入
- .ceil_()
把tensor四舍五入区间内
- torch.clamp(loss, min=0)
  - 把tensor归到一个区间内

1.3.4. 往tensor填充东西

全0化
- .zero_()
填充
- .fill_()
把一些tensor按index插入。
- index_copy_(dim, index, tensor)
复杂
- .scatter_(a,b,c)

1.3.5. 得到新tensor

拷贝tensor
- .copy_()/clone_()
  - this function is recorded in the computation graph. Gradients propagating to the cloned tensor will propagate to the original tensor, copy等于深拷贝，clone=浅拷贝，clone+detach=深拷贝。
  - 来自 https://pytorch.org/docs/stable/tensors.html#torch.Tensor.clamp
重复在维度上几次
- .repeat()
  - 例子：repeat(2,5)在第一维度2次，第二维度5次
拼接
- torch.stack(sequence, dim=0, out=None)/
- torch.cat()
  - 做tensor的拼接。sequence表示Tensor列表，dim表示拼接的维度，注意这个函数和concatenate是不同的，torch的concatenate函数是torch.cat，是在已有的维度上拼接，而stack是建立一个新的维度，然后再在该纬度上进行拼接。 cat几何直观很显然，stack不需要几何直观。
返回相同shape 的1/0
- .eq_()

1.3.6. 多个tensor相互作用

矩阵乘法
- Torch. Mm(tensor,tensor)
  - 只能2d
- Torch.matmul(）
  - 可以多维tensor

1.3.7. tensorboard相关

tensorboard相关
- Tensorboardx.SummaryWriter()
- Writer.add_scales()

1.4. transform

转换到np:

.numpy() （只能是dense）要.data后再用Numpy
转换到scipy:

.coomatrix(.numpy())
坑
- 可以和常数相加
- 不能和tensor相加

1.5. gradient

Def: gradient 类似二叉树，只有叶子节点才有grads
Def: 标量.backwards

对树上所有节点求导,待求导向量k维，输入参数k维是权重，在哪个节点输出维度就是节点维度。

Example:

假设在节点有n维，那么我们这里得到的就是\([\frac{da}{dx_1}+\frac{db*2}{dx_1},...,xn]\)
1
2
3
res = [a,b]

res.backwards(floattensor(1,2))
Def: 向量backwards

很多个标量backwards
Def: register_hook
- 可以对module
  - forward_hook 输入有两个，一个input，一个output，但是只能看，不能修改，没有输出。
  - backward_hook 输入有两个，一个inputgrad，一个outputgrad，可以返回新的outputgrad作为修改。(注意只能hook到最后一个运算的位置上，而且是分开的比如linear就是先加后加。比如是a/4，那就有两个输入一个是a的导数，一个是4的导数none)（而且不用非要是torch里的运算，sum，求和之类的都可以）inputgrad是最终结果对input的导数，outputgrad是最终结果对output的导数。
- 可以对tensor，输入为该节点的导数，输出为你操作后相同维度的导数。
Overall
- 对tensor求导，可以用backward+.grad，精确。也可以用.register tensor，得到对所有能求导的variable的导数
- 对module求导，就是对最后一个操作求导。

Details:

关于类型：
- 一定要float tensor才能求导
- longtensor做加减乘除很危险，用float来做。
关于inplace：
- 一定要保证leaf节点不做inplace变换
- 叶子结点，最后一步requires_grad_()才有用，也就是说先改值再变成可导的
  1
  x = torch.tensor([2, 1.]).view(2, 1).requires_grad_()
一定要保证中间过程不改变值
- 如果有变量比如c = a*b，改变a，b都不行。/或者c = z2, z[0]=z[0]2.因为给z[0]的导数需要自己，但是自己改变了，因为对z求导是一起的，记录的是z的值，没有单独记录改变后的值。
- 如果有变量比如c = a*b，改变c以后，a，b导数都为0
关于求两次：retain_variables=True，用来计算第二次backwards；不同的output有不同的图，backward不能两次针对同样的output来做/
关于累积：backwards 会累积grad

关于复制：.clone 不共享内存，但是在图里，想象为分身。/.detach()共享内存，但是从图中删除，想象为拿出来单独用。一般用不到共享内存这个操作。/.numpy()直接彻底分离/.variable或者.tensor都是比较奇怪的用法，尽量避免。

w = lin.weight

w4 = torch.FloatTensor(lin.weight).requires_grad_() 

wv = torch.autograd.Variable(w, requires_grad=True)

w1 = lin.weight.detach().requires_grad_()  

\## 共用值，不在图内（新起一个节点）。



w3 = lin.weight.clone().detach_().requires_grad_() 

\## 既不共用值，也不在图内（新起最安全）



w2 = lin.weight.clone().requires_grad_() 

w5 = torch.zeros_like(lin.weight).copy_(lin.weight).requires_grad_()

\## 不共用值(当backwards时候更新的不是自己的grad），在图内（最好不用）



.data.fill_()

\## data比较奇妙，和detach一样但是，他可以改变leafvaribale的inplace。一般会报错的东西，这里不会报错。

2. parameter/variable

2.1. paramter & variable difference

Note: nn.Parameter is a subclass of nn.Variable so most behaviors are the same.

The most important difference is that if you use nn.Parameter in a nn.Module's constructor, it will be added into the modules parameters just like nn.Module object do.

Example:

import torch



class MyModule(torch.nn.Module):



  def __init__(self):

    super().__init__()

    self.variable = torch.autograd.Variable(torch.Tensor([5]))

    self.parameter = torch.nn.Parameter(torch.Tensor([10]))



net = MyModule()

for param in net.parameters():

  print(param)

There’re no self.variable, only the self.parameter, and that means if we create optimizer with the net.parameters() as the first params and call optimizer.step(), only the self.parameter will be automatically optimized.

output:

Parameter containing:

tensor([10.], requires_grad=True)

# module

2.2. method

对每个module儿子释放f

.apply（f）

3. nn & F

3.1. nn methods

做pipeline：要么就init里加入属性要么就 Add_module
batchnorm层
- nn.BatchNorm1d(x.size()[1]).cuda()
  - 要用torch.manualseed()否则会随机，而且要在import之后设置
  - BatchNorm1d() = 在除了第一个维度做平均( a[0] [0~n]/ n )
- BatchNorm2d() = 在除了前两个维度做平均( a[0][0] [0~n][0~n]/ n)
lstm模型
- nn.lstm(）
  - 解释：构建网络模型---输入矩阵特征数input_size、输出矩阵特征数hidden_size、层数num_layers)
  - 输入格式(seq_len, batch, input_size)
  - h0(num_layers * num_directions, batch, hidden_size) c0(num_layers * num_directions, batch, hidden_size)
  - 输出数据格式： output(seq_len, batch, hidden_size * num_directions)
  - hn(num_layers * num_directions, batch, hidden_size) cn(num_layers * num_directions, batch, hidden_size)
conv层
- nn.conv1d (batch, m, n)
  - 解释：对 m 的纵列滑动 kernel_size = 纵列宽度（实际kernel是和input[m,n]相同大小的）
  - 输出向量 = [kernelsize能划几下, 1] Filters = 输出几个输出向量
  - 输出 = [filters, 输出向量]
- nn.conv2d (batch, c, m, n)
  - 解释：对 m,n 的滑动 kernel_size = [x,x]实际kernel是和input[c,m,n]相同大小的）
  - 输出向量 = [c，kernelsize能划几下，能划几下] （注意是3层一起加） Filters = 输出几个输出向量
  - 输出 = [filters，输出向量]
- nn.conv3d (batch, m, n, c, d)
把非参数放在state—dict里
- nn.register_buffer
  - 解释：you want a stateful part of your model that is not a parameter, but you want it in your state_dict

3.2. F methods

交叉熵
- F.cross_entropy（pred, label)
  - 解释：pred是[batchsize, labels] Label 是[batchsize] Reduce='mean' 指把batchsize 的loss都average一下

3.3. loss function

https://blog.csdn.net/zhangxb35/article/details/72464152

4. torch vision

4.1. detail

4.1.1. path of data

1
2
3

from torchvision.models

~/.torch/models