这是一个没有使用深度学习框架,只借助Python和Numpy实现的RNN网络。由于没有类似Pytorch的自动求梯度的功能,我们需要手动计算Loss到网络各层参数的梯度,这涉及到数学公式的推导,我们将数学公式的推导过程呈现在README文档中。 如果文档不能正常显示数学公式,请移步我的博客 Recurrent Neural Networks 查看。
conda create --name env_for_rnn python=3.9 numpy pandas matplotlib sympy ipykernel scikit-learn
conda activate env_for_rnn
网络的输入是一串m维向量序列
网络的状态是一串n维向量序列
网络的输出是一串m维的向量序列
网络在
(下面的推导式中省略了偏置项
网络在每个
记输出层
向量化计算梯度
记循环层
关于矩阵U的偏导
由上面的记号,
下面的公式推导出一个
Unable to render expression.
$$\begin{split} \frac{\partial E^t}{\partial U} % 第一个等号 &= \frac{\partial E^t}{\partial \boldsymbol{\eta}^t} \frac{\partial \boldsymbol{\eta}^t}{\partial U} \\\ (\boldsymbol{\eta}^t = U\boldsymbol{x}^t + W\boldsymbol{s}^{t-1}+\boldsymbol{b}^R)\rightarrow % 第二个等号 &= \frac{\partial E^t}{\partial \boldsymbol{\eta}^t} % 第二个等号括号中的内容 \left( \frac{\partial U\boldsymbol{x}^t}{\partial U} + \frac{\partial W\boldsymbol{s}^{t-1}}{\partial U} \right) \\\ % 第三个等号 &= \frac{\partial E^t}{\partial \boldsymbol{\eta}^t} % 第三个等号括号中的内容 \left( \frac{\partial U\boldsymbol{x}^t}{\partial U} + W\frac{\partial \boldsymbol{s}^{t-1}}{\partial \boldsymbol{\eta}^{t-1}} \frac{\partial \boldsymbol{\eta}^{t-1}}{\partial U} \right) \\\ % 第四个等号 将\frac{\partial E^t}{\partial \boldsymbol{\eta}^t}乘进括号中去\rightarrow &= % 第四个等号加号左边的内容 \frac{\partial E^t}{\partial \boldsymbol{\eta}^t} \frac{\partial U\boldsymbol{x}^t}{\partial U} + % 第四个等号加号右边的内容 \frac{\partial E^t}{\partial \boldsymbol{\eta}^t} \frac{\partial W\boldsymbol{s}^{t-1}}{\partial \boldsymbol{\eta}^{t-1}} \frac{\partial \boldsymbol{\eta}^{t-1}}{\partial U} \\\ \left(\frac{\partial W\boldsymbol{s}^{t-1}}{\partial \boldsymbol{\eta}^{t-1}}= \frac{\partial \boldsymbol{\eta}^t}{\partial \boldsymbol{\eta}^{t-1}}\right)\rightarrow % 第五个等号 &= \frac{\partial E^t}{\partial \boldsymbol{\eta}^t} \frac{\partial U\boldsymbol{x}^t}{\partial U} + \frac{\partial E^t}{\partial \boldsymbol{\eta}^t} \frac{\partial \boldsymbol{\eta}^t}{\partial \boldsymbol{\eta}^{t-1}} \frac{\partial \boldsymbol{\eta}^{t-1}}{\partial U} \\\ &= \frac{\partial E^t}{\partial \boldsymbol{\eta}^t} \frac{\partial U\boldsymbol{x}^t}{\partial U} + \frac{\partial E^t}{\partial \boldsymbol{\eta}^{t-1}} \frac{\partial \boldsymbol{\eta}^{t-1}}{\partial U} \end{split}$$
由这个递推式可以得到
计算
计算
Unable to render expression.
$$\begin{split} \frac{\partial E^t}{\partial \boldsymbol{\eta}^t} &= \frac{\partial E^t}{\partial \boldsymbol{\xi}^t} \frac{\partial \boldsymbol{\xi}^t}{\partial \boldsymbol{s}^t} \frac{\partial \boldsymbol{s}^t}{\partial \boldsymbol{\eta}^t} \\\ &= % 第二行的行向量 \begin{bmatrix} \frac{\partial E^t}{\partial \xi^t_1}&\frac{\partial E^t}{\partial \xi^t_2}&\cdots&\frac{\partial E^t}{\partial \xi^t_m} \end{bmatrix} % 第二行的第一个矩阵 \begin{bmatrix} \frac{\partial \xi^t_1}{\partial s^t_1} & \frac{\partial \xi^t_1}{\partial s^t_2} & \cdots & \frac{\partial \xi^t_1}{\partial s^t_n} \\\ \frac{\partial \xi^t_2}{\partial s^t_1} & \frac{\partial \xi^t_2}{\partial s^t_2} & \cdots & \frac{\partial \xi^t_2}{\partial s^t_n} \\\ \vdots&\vdots&\ddots&\vdots\\\ \frac{\partial \xi^t_m}{\partial s^t_1} & \frac{\partial \xi^t_m}{\partial s^t_2} & \cdots & \frac{\partial \xi^t_m}{\partial s^t_n} \\\ \end{bmatrix} % 第二行的第二个矩阵 \begin{bmatrix} \frac{\partial s^t_1}{\partial \eta^t_1} & \frac{\partial s^t_1}{\partial s^t_2} & \cdots & \frac{\partial s^t_1}{\partial s^t_n} \\\ \frac{\partial s^t_2}{\partial \eta^t_1} & \frac{\partial s^t_2}{\partial \eta^t_2} & \cdots & \frac{\partial s^t_2}{\partial \eta^t_n} \\\ \vdots&\vdots&\ddots&\vdots\\\ \frac{\partial s^t_n}{\partial \eta^t_1} & \frac{\partial s^t_n}{\partial \eta^t_2} & \cdots & \frac{\partial s^t_n}{\partial \eta^t_n} \\\ \end{bmatrix} \\\ &= % 第三行的行向量 \begin{bmatrix} \frac{\partial E^t}{\partial \xi^t_1}&\frac{\partial E^t}{\partial \xi^t_2}&\cdots&\frac{\partial E^t}{\partial \xi^t_m} \end{bmatrix} % 第三行的V矩阵 \begin{bmatrix} v_{11}&v_{12}&\cdots&v_{1n}\\\ v_{21}&v_{22}&\cdots&v_{2n}\\\ \vdots&\vdots&\ddots&\vdots\\\ v_{m1}&v_{m2}&\cdots&v_{mn}\\\ \end{bmatrix} % 第三行的对角矩阵 \begin{bmatrix} \frac{\partial s^t_1}{\partial \eta^t_1} & 0 & \cdots & 0 \\\ 0 & \frac{\partial s^t_2}{\partial \eta^t_2} & \cdots & 0 \\\ \vdots&\vdots&\ddots&\vdots\\\ 0 & 0 & \cdots & \frac{\partial s^t_n}{\partial \eta^t_n} \\\ \end{bmatrix} \\\ &= \left[ \frac{\partial s^t_1}{\partial \eta^t_1} \sum_{i=1}^m(\frac{\partial E^t}{\partial \xi^t_i}v_{i1}) ,\quad \frac{\partial s^t_2}{\partial \eta^t_2} \sum_{i=1}^m(\frac{\partial E^t}{\partial \xi^t_i}v_{i2}) ,\quad \cdots ,\quad \frac{\partial s^t_n}{\partial \eta^t_n} \sum_{i=1}^m(\frac{\partial E^t}{\partial \xi^t_i}v_{in}) \right] \\\ 记为&= \begin{bmatrix} \delta^{tt}_1&\delta^{tt}_2&\cdots&\delta^{tt}_n \end{bmatrix} \end{split}$$
计算
Unable to render expression.
$$\begin{split} \frac{\partial \boldsymbol{\eta}^t}{\partial \boldsymbol{\eta}^{t-1}} &= \frac{\partial W\boldsymbol{s}^{t-1}}{\partial \boldsymbol{\eta}^{t-1}}= W\frac{\partial \boldsymbol{s}^{t-1}}{\partial \boldsymbol{\eta}^{t-1}}= W % 第一个矩阵 \begin{bmatrix} \frac{\partial s^{t-1}_{1}}{\partial \eta^{t-1}_{1}}& \frac{\partial s^{t-1}_{1}}{\partial \eta^{t-1}_{2}}& \cdots& \frac{\partial s^{t-1}_{1}}{\partial \eta^{t-1}_{n}} \\\ \frac{\partial s^{t-1}_{2}}{\partial \eta^{t-1}_{1}}& \frac{\partial s^{t-1}_{2}}{\partial \eta^{t-1}_{2}}& \cdots& \frac{\partial s^{t-1}_{2}}{\partial \eta^{t-1}_{n}} \\\ \vdots&\vdots&\ddots&\vdots\\\ \frac{\partial s^{t-1}_{n}}{\partial \eta^{t-1}_{1}}& \frac{\partial s^{t-1}_{n}}{\partial \eta^{t-1}_{2}}& \cdots& \frac{\partial s^{t-1}_{n}}{\partial \eta^{t-1}_{n}} \end{bmatrix}= % 第二个矩阵 W\begin{bmatrix} \frac{\partial s^{t-1}_{1}}{\partial \eta^{t-1}_{1}}&0&\cdots&0 \\\ 0&\frac{\partial s^{t-1}_{2}}{\partial \eta^{t-1}_{2}}&\cdots&0 \\\ \vdots&\vdots&\ddots&\vdots\\\ 0&0&\cdots&\frac{\partial s^{t-1}_{n}}{\partial \eta^{t-1}_{n}} \end{bmatrix} \\\ &= W\begin{bmatrix} f'(\eta^{t-1}_{1})&0&\cdots&0 \\\ 0&f'(\eta^{t-1}_{2})&\cdots&0 \\\ \vdots&\vdots&\ddots&\vdots\\\ 0&0&\cdots&f'(\eta^{t-1}_{n}) \end{bmatrix} \end{split}$$
实际计算中我们会一步一步地计算
计算
计算
最后结果U的梯度
计算
计算
计算
最后结果W的梯度
关于偏置项
计算
最后结果