import torch
from torch import nn
from torch.nn import functional as F

net = nn.Sequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))

X = torch.rand(2, 20)
net(X)

5.1.1 Custom block 自定义块
#

其具体的实现方式是通过一个Python中的Class定义的

class MLP(nn.Module):
    # 用模型参数声明层。这里，我们声明两个全连接的层
    def __init__(self):
        super().__init__()
        # 调用nn.Module的构造函数减少重新定义的代码
        self.hidden = nn.Linear(20, 256)  # 隐藏层
        self.out = nn.Linear(256, 10)  # 输出层

    # 定义前向传播流程
    def forward(self, X):
        #hidden -> relu -> out
        return self.out(F.relu(self.hidden(X)))

5.1.2 Sequence Block 顺序块
#

简单定义一个Sequential类，实现
1. 按顺序执行Block
1. 一个前向传播函数

class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__() 
        for idx, module in enumerate(args):
            self._modules[str(idx)] = module

    def forward(self, X):
        for block in self._modules.values():
            X = block(X)
        return X

for idx, module in enumerate(args) : 遍历所有传入的模块，并为每个模块分配一个从 0 开始的索引
比如传入了三个模块

MySequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 10))

那么 enumerate(args) 会依次返回 (0, nn.Linear(10, 20)), (1, nn.ReLU()), (2, nn.Linear(20, 10))

self._modules[str(idx)] = module

self._modules 是 PyTorch 提供的一个内置容器（OrderedDict），用来存储子模块。
self._modules[str(idx)] = module 的作用是：
X 先传入 _modules["0"]（即 nn.Linear(10, 20)）中进行计算。
输出传入 _modules["1"]（即 nn.ReLU()）中激活。
最后传入 _modules["2"]（即 nn.Linear(20, 5)），得到最终结果。
str(idx)： Dict的Key要求使用可哈希值，所以需要转换为str

5.1.3 Control Flow in forward propagation
#

在网络中，可以加入一些不被更新的参数，即Constant Parameter，这一个参数不会在优化过程中被更新

class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.rand_weight = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)

    def forward(self, X):
        X = self.linear(X)
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        X = self.linear(X)
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()

self.rand_weight = torch.rand((20, 20), requires_grad=False)

requires_grad=False 指定该张量不会参与梯度计算，因此它是一个固定的权重，在训练过程中不会被优化。
它可以被视为一个网络中的“常量”

D2L - This article is part of a series.

Part : D2L 5.4 Custom Layer

Part : D2L 5.3 Deferred Initialization

Part : D2L 5.2 Parameter Management

Part : This Article

Part : D2L 4.1 Multilayer Perceptron

Part : D2L 4.2 Example of MLP

Part 1: D2L 5. Deep Learning Computation

Part 1: Linear Regression

Part 1: Chapter 3. Linear Neural Network

Part 1: Chapter 4. Multilayer Perceptron

Part 1: Dive Into Deep Learning

Part 2: D2L 3.1 Linear Regression

Part 3: D2L 3.2 Object-Oriented Design for Implementation

Part 4: D2L 3.3 A concise implementation of linear regression

Part 5: D2L 3.4 Softmax Regression

Part 6: D2L 3.5 Image classification datasets

Part 7: D2L 3.6 Implementation of softmax regression from scratch

Part 9: D2L 4.1 MultilayerPerceptron

Part 10: D2L Weierstrass Approximation Theorem

Part 10: D2L 4.4 Model Selection, Underfitting, and Overfitting

D2L 4.1 Multilayer Perceptron

Dec 20 2024·2588 words

D2L Computer Science Docs

D2L 4.2 Example of MLP

Dec 20 2024·532 words

D2L Computer Science Docs

D2L Weierstrass Approximation Theorem

Dec 19 2024·915 words

D2L Computer Science Docs

D2L 3.1 Linear Regression

Apr 15 2024·2946 words

D2L Computer Science Docs

D2L 3.2 Object-Oriented Design for Implementation

Apr 15 2024·568 words

D2L Computer Science Docs

D2L 3.3 A concise implementation of linear regression

Apr 15 2024·1286 words

D2L Computer Science Docs

D2 5.1 Layer & Block

Layer 层
#

Block 块
#

MLP
#

5.1.1 Custom block 自定义块
#

5.1.2 Sequence Block 顺序块
#

5.1.3 Control Flow in forward propagation
#

Related

Layer 层 #

Block 块 #

MLP #

5.1.1 Custom block 自定义块 #

5.1.2 Sequence Block 顺序块 #

5.1.3 Control Flow in forward propagation #

Related

Layer 层
#

Block 块
#

MLP
#

5.1.1 Custom block 自定义块
#

5.1.2 Sequence Block 顺序块
#

5.1.3 Control Flow in forward propagation
#