神经网络推导-批量数据
输入批量数据到神经网络,进行前向传播和反向传播的推导
TestNet网络
TestNet是一个2层神经网络,结构如下:
- 输入层有
3个神经元 - 隐藏层有
4个神经元 - 输出层有
2个神经元

- 激活函数为
relu函数 - 评分函数为
softmax回归 - 代价函数为交叉熵损失
网络符号定义
规范神经网络的计算符号
关于神经元和层数
表示网络层数(不计入输入层) ,其中输入层是第 0层,隐藏层是第1层,输出层是第2层
表示第 层的神经元个数(不包括偏置神经元) ,表示输入层神经元个数为 3,表示隐藏层神经元个数为 4,表示输出层神经元个数为 2
关于权重矩阵和偏置值
表示第 层到第 层的权重矩阵,矩阵行数为第 层的神经元个数,列数为第 层神经元个数 表示输入层到隐藏层的权重矩阵,大小为 表示隐藏层到输出层的权重矩阵,大小为
表示第 层第 个神经元到第 第 个神经元的权值 的取值范围是 的取值范围是
表示第 层第 个神经元对应的权重向量,大小为 表示第 层第 个神经元对应的权重向量,大小为 表示第 层的偏置向量 表示输入层到隐藏层的偏置向量,大小为 表示隐藏层到输出层的偏置向量,大小为
表示第 层第 个神经元的偏置值 表示第 层隐藏层第 个神经元的偏置值
关于神经元输入向量和输出向量
表示第 层输出向量,$a^{(l)}=[a^{(l)}{1},a^{(l)}{2},…,a^{(l)}_{m}]^{T}$ 表示输入层输出向量,大小为 表示隐藏层输出向量,大小为 表示输出层输出向量,大小为
表示第 层第 个单元的输出值,其是输入向量经过激活计算后的值 - $a^{(1)}{3}
3 a^{(1)}{3}=g(z^{(1)}_{3})$
- $a^{(1)}{3}
表示第 层输入向量,$z^{(l)}=[z^{(l)}{1},z^{(l)}{2},…,z^{(l)}_{m}]^{T}$ 表示隐藏层的输入向量,大小为 表示输出层的输入向量,大小为
表示第 层第 个单元的输入值,其是上一层输出向量第 个数据和该层第 个神经元权重向量的加权累加和 - $z^{(1)}{1,2}
2 z^{(1)}{1,2}=b^{(2)}{2}+a^{(0)}{1,1}\cdot W^{(1)}{1,2}+a^{(0)}{1,2}\cdot W^{(1)}{2,2}+a^{(0)}{1,3}\cdot W^{(1)}_{3,2}$
- $z^{(1)}{1,2}
关于神经元激活函数
表示激活函数操作
关于评分函数和损失函数
表示评分函数操作 表示代价函数操作
神经元执行步骤
神经元操作分为2步计算:
- 输入向量
=前一层神经元输出向量 与权重矩阵 的加权累加和+偏置向量
$$
z^{(l)}{i,j}=a^{(l-1)}{i}\cdot W^{(l)}{,j} + b^{(l)}{j} \Rightarrow
z^{(l)}=a^{(l-1)}\cdot W^{(l)} + b^{(l)}
$$
- 输出向量
=对输入向量 进行激活函数操作
$$
a^{(l)}{i}=g(z{i}^{(l)})
\Rightarrow
a^{(l)}=g(z^{(l)})
$$
网络结构
对输入层
$$
a^{(0)}
=\begin{bmatrix}
a^{(0)}{1}\
\vdots\
a^{(0)}{m}
\end{bmatrix}
=\begin{bmatrix}
a^{(0)}{1,1} & a^{(0)}{1,2} & a^{(0)}{1,3}\
\vdots & \vdots & \vdots\
a^{(0)}{m,1} & a^{(0)}{m,2} & a^{(0)}{m,3}
\end{bmatrix}\in R^{m\times 3}
$$
对隐藏层
$$
W^{(1)}
=\begin{bmatrix}
W^{(1)}{1,1} & W^{(1)}{1,2} & W^{(1)}{1,3} & W^{(1)}{1,4}\
W^{(1)}{2,1} & W^{(1)}{2,2} & W^{(1)}{2,3} & W^{(1)}{2,4}\
W^{(1)}{3,1} & W^{(1)}{3,2} & W^{(1)}{3,3} & W^{(1)}{3,4}
\end{bmatrix}
\in R^{3\times 4}
$$
$$
b^{(1)}=[[b^{(1)}{1},b^{(1)}{2},b^{(1)}{3},b^{(1)}{4}]]\in R^{1\times 4}
$$
$$
z^{(1)}
=\begin{bmatrix}
z^{(0)}{1,1} & z^{(0)}{1,2} & z^{(0)}{1,3} & z^{(0)}{1,4}\
\vdots & \vdots & \vdots & \vdots\
z^{(0)}{m,1} & z^{(0)}{m,2} & z^{(0)}{m,3} & z^{(0)}{m,4}
\end{bmatrix}\in R^{m\times 4}
$$
$$
a^{(1)}
=\begin{bmatrix}
a^{(0)}{1,1} & a^{(0)}{1,2} & a^{(0)}{1,3} & a^{(0)}{1,4}\
\vdots & \vdots & \vdots & \vdots\
a^{(0)}{m,1} & a^{(0)}{m,2} & a^{(0)}{m,3} & a^{(0)}{m,4}
\end{bmatrix}\in R^{m\times 4}
$$
对输出层
$$
W^{(2)}
=\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{1,2}\
W^{(2)}{2,1} & W^{(2)}{2,2}\
W^{(2)}{3,1} & W^{(2)}{3,2}\
W^{(2)}{4,1} & W^{(2)}{4,2}
\end{bmatrix}
\in R^{4\times 2}
$$
$$
b^{(2)}=[[b^{(2)}{1},b^{(2)}{2}]]\in R^{1\times 2}
$$
$$
z^{(2)}
=\begin{bmatrix}
z^{(2)}{1,1} & z^{(0)}{1,2}\
\vdots & \vdots\
z^{(2)}{m,1} & z^{(0)}{m,2}
\end{bmatrix}\in R^{m\times 2}
$$
评分值
损失值
前向传播
输入层到隐藏层计算
$$
z^{(1)}{i,1}=a^{(0)}{i}\cdot W^{(1)}{,1}+b^{(1)}{1}
=a^{(0)}{i,1}\cdot W^{(1)}{1,1}
+a^{(0)}{i,2}\cdot W^{(1)}{2,1}
+a^{(0)}{i,3}\cdot W^{(1)}{3,1}
+b^{(1)}_{1,1}
$$
$$
z^{(1)}{i,2}=a^{(0)}{i}\cdot W^{(1)}{,2}+b^{(1)}{2}
=a^{(0)}{i,1}\cdot W^{(1)}{1,2}
+a^{(0)}{i,2}\cdot W^{(1)}{2,2}
+a^{(0)}{i,3}\cdot W^{(1)}{3,2}
+b^{(1)}_{1,2}
$$
$$
z^{(1)}{i,3}=a^{(0)}{i}\cdot W^{(1)}{,3}+b^{(1)}{3}
=a^{(0)}{i,1}\cdot W^{(1)}{1,3}
+a^{(0)}{i,2}\cdot W^{(1)}{2,3}
+a^{(0)}{i,3}\cdot W^{(1)}{3,3}
+b^{(1)}_{1,3}
$$
$$
z^{(1)}{i,4}=a^{(0)}{i}\cdot W^{(1)}{,4}+b^{(1)}{4}
=a^{(0)}{i,1}\cdot W^{(1)}{1,4}
+a^{(0)}{i,2}\cdot W^{(1)}{2,4}
+a^{(0)}{i,3}\cdot W^{(1)}{3,4}
+b^{(1)}_{1,4}
$$
$$
\Rightarrow z^{(1)}{i}
=[z^{(1)}{i,1},z^{(1)}{i,2},z^{(1)}{i,3},z^{(1)}{i,4}]
=a^{(0)}{i}\cdot W^{(1)}+b^{(1)}
$$
隐藏层输入向量到输出向量
$$
a^{(1)}{i,1}=relu(z^{(1)}{i,1}) \
a^{(1)}{i,2}=relu(z^{(1)}{i,2}) \
a^{(1)}{i,3}=relu(z^{(1)}{i,3}) \
a^{(1)}{i,4}=relu(z^{(1)}{i,4})
$$
$$
\Rightarrow
a^{(1)}{i}=[a^{(1)}{i,1},a^{(1)}{i,2},a^{(1)}{i,3},a^{(1)}{i,4}]
=relu(z^{(1)}{i})
$$
隐藏层到输出层计算
$$
z^{(2)}{i,1}=a^{(1)}{i}\cdot W^{(2)}{,1}+b^{(2)}{1,1}
=a^{(1)}{i,1}\cdot W^{(2)}{1,1}
+a^{(1)}{i,2}\cdot W^{(2)}{2,1}
+a^{(1)}{i,3}\cdot W^{(2)}{3,1}
+a^{(1)}{i,4}\cdot W^{(2)}{4,1}
+b^{(2)}_{1,1}
$$
$$
z^{(2)}{i,2}=a^{(1)}{i}\cdot W^{(2)}{,2}+b^{(2)}{1,2}
=a^{(1)}{i,1}\cdot W^{(2)}{1,2}
+a^{(1)}{i,2}\cdot W^{(2)}{2,2}
+a^{(1)}{i,3}\cdot W^{(2)}{3,2}
+a^{(1)}{i,4}\cdot W^{(2)}{4,2}
+b^{(2)}_{1,2}
$$
$$
\Rightarrow z^{(2)}{i}
=[z^{(2)}{i,1},z^{(2)}{i,2}]
=a^{(1)}{i}\cdot W^{(2)}+b^{(2)}
$$
评分操作
$$
p(y_{i}=1)=\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})} \
p(y_{i}=2)=\frac {exp(z^{(2)}{i,2})}{\sum exp(z^{(2)}{i})}
$$
$$
\Rightarrow h(z^{(2)}{i})
=[p(y{i}=1),p(y_{i}=2)]
=[\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})}, \frac {exp(z^{(2)}{i,2})}{\sum exp(z^{(2)}{i})}]
$$
损失值
反向传播
计算输出层输入向量梯度
$$
\frac {\partial J}{\partial z^{(2)}{i,1}}=
(-1)\cdot \frac {1(y{i}=1)}{p(y_{i}=1)}\cdot \frac {\partial p(y_{i}=1)}{\partial z^{(2)}{i,1}}
+(-1)\cdot \frac {1(y{i}=2)}{p(y_{i}=2)}\cdot \frac {\partial p(y_{i}=2)}{\partial z^{(2)}_{i,1}}
$$
$$
\frac {\partial p(y_{i}=1)}{\partial z^{(2)}{i,1}}
=\frac {exp(z^{(2)}{i,1})\cdot \sum exp(z^{(2)}{i})-exp(z^{(2)}{i,1})\cdot exp(z^{(2)}{i,1})}{(\sum exp(z^{(2)}{i}))^2}
=\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})}
-(\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})})^2
=p(y_{i}=1)-(p(y_{i}=1))^2
$$
$$
\frac {\partial p(y_{i}=2)}{\partial z^{(2)}{i,1}}
=\frac {-exp(z^{(2)}{i,2})\cdot exp(z^{(2)}{i,1})}{(\sum exp(z^{(2)}{i}))^2}
=(-1)\cdot \frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})}\cdot \frac {exp(z^{(2)}{i,2})}{\sum exp(z^{(2)}{i})}
=(-1)\cdot p(y_{i}=1)p(y_{i}=2)
$$
$$
\Rightarrow \frac {\partial J}{\partial z^{(2)}{i,1}}
=(-1)\cdot \frac {1(y{i}=1)}{p(y_{i}=1)}\cdot (p(y_{i}=1)-(p(y_{i}=1))^2)
+(-1)\cdot \frac {1(y_{i}=2)}{p(y_{i}=2)}\cdot (-1)\cdot p(y_{i}=1)p(y_{i}=2) \
=(-1)\cdot 1(y_{i}=1)\cdot (1-p(y_{i}=1))
+1(y_{i}=2)\cdot p(y_{i}=1)
=p(y_{i}=1)-1(y_{i}=1)
$$
$$
\Rightarrow \frac {\partial J}{\partial z^{(2)}{i,2}}
=p(y{i}=2)-1(y_{i}=2)
$$
$$
\Rightarrow \frac {\partial J}{\partial z^{(2)}{i}}
=[p(y{i}=1)-1(y_{i}=1), p(y_{i}=2)-1(y_{i}=2)]
$$
计算输出层权重向量梯度
$$
\frac {\partial J}{\partial W^{(2)}{1,1}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{1,1}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,1})
$$
$$
\frac {\partial J}{\partial W^{(2)}{2,1}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{2,1}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,2})
$$
$$
\frac {\partial J}{\partial W^{(2)}{3,1}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{3,1}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,3})
$$
$$
\frac {\partial J}{\partial W^{(2)}{4,1}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{4,1}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,4})
$$
$$
\frac {\partial J}{\partial W^{(2)}{1,2}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{1,2}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,1})
$$
$$
\frac {\partial J}{\partial W^{(2)}{2,2}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{2,2}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,2})
$$
$$
\frac {\partial J}{\partial W^{(2)}{3,2}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{3,2}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,3})
$$
$$
\frac {\partial J}{\partial W^{(2)}{4,2}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{4,2}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,4})
$$
$$
\Rightarrow \frac {\partial J}{\partial W^{(2)}}
=\begin{bmatrix}
\frac {\partial J}{\partial W^{(2)}{1,1}} & \frac {\partial J}{\partial W^{(2)}{1,2}}\
\frac {\partial J}{\partial W^{(2)}{2,1}} & \frac {\partial J}{\partial W^{(2)}{2,2}}\
\frac {\partial J}{\partial W^{(2)}{3,1}} & \frac {\partial J}{\partial W^{(2)}{3,2}}\
\frac {\partial J}{\partial W^{(2)}{4,1}} & \frac {\partial J}{\partial W^{(2)}{4,2}}
\end{bmatrix}
$$
$$
=\begin{bmatrix}
\frac {1}{m}\sum_{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,1}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,2})\
\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,3}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,4})\
\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}{i,1}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}{i,2})\
\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}{i,3}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,4})
\end{bmatrix}
$$
$$
=\frac {1}{m}\sum_{i=1}^{m}
\begin{bmatrix}
a^{(1)}{i,1}\
a^{(1)}{i,2}\
a^{(1)}{i,3}\
a^{(1)}{i,4}
\end{bmatrix}
=\frac {1}{m}\sum_{i=1}^{m} ((a^{(1)}{i})^{T}\cdot \frac {\partial J}{\partial z^{(2)}{i}})
=\frac {1}{m} (a^{(1)})^{T}\cdot \frac {\partial J}{\partial z^{(2)}}
=\frac {1}{m}\sum_{i=1}^{m} (R^{4\times m}\cdot R^{m\times 2})
=R^{4\times 2}
$$
计算隐藏层输出向量梯度
$$
\frac {\partial J}{\partial a^{(1)}{i,1}}
=\frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,1}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,1}}
=(p(y{i}=1)-1(y_{i}=1))\cdot W^{(2)}{1,1}
+(p(y{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{1,2}
$$
$$
\frac {\partial J}{\partial a^{(1)}{i,2}}
=\frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,2}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,2}}
=(p(y{i}=1)-1(y_{i}=1))\cdot W^{(2)}{2,1}
+(p(y{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{2,2}
$$
$$
\frac {\partial J}{\partial a^{(1)}{i,3}}
=\frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,3}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,3}}
=(p(y{i}=1)-1(y_{i}=1))\cdot W^{(2)}{3,1}
+(p(y{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{3,2}
$$
$$
\frac {\partial J}{\partial a^{(1)}{i,4}}
=\frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,4}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,4}}
=(p(y{i}=1)-1(y_{i}=1))\cdot W^{(2)}{4,1}
+(p(y{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{4,2}
$$
$$
\Rightarrow \frac {\partial J}{\partial a^{(1)}{i}}
=\begin{bmatrix}
p(y{i}=1)-1(y_{i}=1) & p(y_{i}=2)-1(y_{i}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix} \
=\frac {\partial J}{\partial z^{(2)}_{i}}\cdot (W^{(2)})^T
=R^{1\times 2}\cdot R^{2\times 4}
=R^{1\times 4}
$$
$$
\Rightarrow \frac {\partial J}{\partial a^{(1)}}
=
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix} \
=\frac {\partial J}{\partial z^{(2)}}\cdot (W^{(2)})^T
=R^{m\times 2}\cdot R^{2\times 4}
=R^{m\times 4}
$$
计算隐藏层输入向量的梯度
$$
\frac {\partial J}{\partial z^{(1)}{i,1}}
=\frac {\partial J}{\partial a^{(1)}{i,1}}\cdot \frac {\partial a^{(1)}{i,1}}{\partial z^{(1)}{i,1}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{1,1}
+(p(y=2)-1(y=2))\cdot W^{(2)}{1,2})\cdot 1(z^{(1)}_{i,1}\geq 0)
$$
$$
\frac {\partial J}{\partial z^{(1)}{i,2}}
=\frac {\partial J}{\partial a^{(1)}{i,2}}\cdot \frac {\partial a^{(1)}{i,2}}{\partial z^{(1)}{i,2}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{2,1}
+(p(y{i}=2)-1(y_{i}=2))\cdot W^{(2)}{2,2})\cdot 1(z^{(1)}{i,2}\geq 0)
$$
$$
\frac {\partial J}{\partial z^{(1)}{i,3}}
=\frac {\partial J}{\partial a^{(1)}{i,3}}\cdot \frac {\partial a^{(1)}{i,3}}{\partial z^{(1)}{i,3}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{3,1}
+(p(y{i}=2)-1(y_{i}=2))\cdot W^{(2)}{3,2})\cdot 1(z^{(1)}{i,3}\geq 0)
$$
$$
\frac {\partial J}{\partial z^{(1)}{i,4}}
=\frac {\partial J}{\partial a^{(1)}{i,4}}\cdot \frac {\partial a^{(1)}{i,4}}{\partial z^{(1)}{i,4}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{4,1}
+(p(y{i}=2)-1(y_{i}=2))\cdot W^{(2)}{4,2})\cdot 1(z^{(1)}{i,4}\geq 0)
$$
$$
\Rightarrow \frac {\partial J}{\partial z^{(1)}{i}}
=(\begin{bmatrix}
p(y{i}=1)-1(y_{i}=1) & p(y_{i}=2)-1(y_{i}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix})
\begin{bmatrix}
\frac {\partial a^{(1)}{i,1}}{\partial z^{(1)}{i,1}}&
\frac {\partial a^{(1)}{i,2}}{\partial z^{(1)}{i,2}}&
\frac {\partial a^{(1)}{i,3}}{\partial z^{(1)}{i,3}}&
\frac {\partial a^{(1)}{i,4}}{\partial z^{(1)}{i,4}}
\end{bmatrix}\
=(R^{1\times 2}\cdot R^{2\times 4}) R^{1\times 4}
=R^{1\times 4}
$$
$$
\Rightarrow \frac {\partial J}{\partial z^{(1)}{i}}
=(\begin{bmatrix}
p(y{i}=1)-1(y_{i}=1) & p(y_{i}=2)-1(y_{i}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix})
*
\begin{bmatrix}
1(z^{(1)}{i,1}\geq 0) & 1(z^{(1)}{i,2}\geq 0) & 1(z^{(1)}{i,3}\geq 0) & 1(z^{(1)}{i,4}\geq 0)
\end{bmatrix}\
=(R^{1\times 2}\cdot R^{2\times 4})\ast R^{1\times 4}
=R^{1\times 4}
$$
$$
\Rightarrow \frac {\partial J}{\partial z^{(1)}}
=(
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix})
*
\begin{bmatrix}
1(z^{(1)}{1,1}\geq 0) & 1(z^{(1)}{1,2}\geq 0) & 1(z^{(1)}{1,3}\geq 0) & 1(z^{(1)}{1,4}\geq 0)\
\vdots & \vdots\
1(z^{(1)}{m,1}\geq 0) & 1(z^{(1)}{m,2}\geq 0) & 1(z^{(1)}{m,3}\geq 0) & 1(z^{(1)}{m,4}\geq 0)
\end{bmatrix}\
=\frac {\partial J}{\partial a^{(1)}} * 1(z^{(1)}\geq 0)
=(R^{m\times 2}\cdot R^{2\times 4})\ast R^{m\times 4}
=R^{m\times 4}
$$
计算隐藏层权重向量的梯度
$$
\frac {\partial J}{\partial W^{(1)}{1,1}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot
\frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{1,1}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{1,1}
+(p(y{i}=2)-1(y_{i}=2))\cdot W^{(2)}{1,2})\cdot 1(z^{(1)}{i,1}\geq 0)\cdot a^{(0)}_{i,1}
$$
$$
\frac {\partial J}{\partial W^{(1)}{1,2}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot
\frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{1,2}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{2,1}
+(p(y{i}=2)-1(y_{i}=2))\cdot W^{(2)}{2,2})\cdot 1(z^{(1)}{i,2}\geq 0)\cdot a^{(0)}_{i,1}
$$
$$
\Rightarrow \frac {\partial J}{\partial W^{(1)}{k,l}}
=\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,l}}\cdot
\frac {\partial z^{(1)}{i,l}}{\partial W^{(1)}{k,l}}
=\frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{l,1}
+(p(y{i}=2)-1(y_{i}=2))\cdot W^{(2)}{l,2})\cdot 1(z^{(1)}{i,l}\geq 0)\cdot a^{(0)}_{i,k}
$$
$$
\Rightarrow \frac {\partial J}{\partial W^{(1)}}
=\begin{bmatrix}
\frac {\partial J}{\partial W^{(1)}{1,1}} & \frac {\partial J}{\partial W^{(1)}{1,2}} & \frac {\partial J}{\partial W^{(1)}{1,3}} & \frac {\partial J}{\partial W^{(1)}{1,4}}\
\frac {\partial J}{\partial W^{(1)}{2,1}} & \frac {\partial J}{\partial W^{(1)}{2,2}} & \frac {\partial J}{\partial W^{(1)}{2,3}} & \frac {\partial J}{\partial W^{(1)}{2,4}}\
\frac {\partial J}{\partial W^{(1)}{3,1}} & \frac {\partial J}{\partial W^{(1)}{3,2}} & \frac {\partial J}{\partial W^{(1)}{3,3}} & \frac {\partial J}{\partial W^{(1)}{3,4}}
\end{bmatrix}\
=\begin{bmatrix}
\frac {1}{m}\sum_{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{1,1}}
& \frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{1,2}}
& \frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{1,3}}
& \frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{1,4}}\
\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{2,1}}
& \frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{2,2}}
& \frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{2,3}}
& \frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{2,4}}\
\frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{3,1}}
& \frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{3,2}}
& \frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{3,3}}
& \frac {1}{m}\sum{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{3,4}}
\end{bmatrix}\
=\frac {1}{m}\sum{i=1}^{m} \begin{bmatrix}
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{1,1}}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{1,2}}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{1,3}}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{1,4}}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{2,1}}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{2,2}}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{2,3}}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{2,4}}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{3,1}}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{3,2}}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{3,3}}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{3,4}}
\end{bmatrix}\
=\frac {1}{m}\sum_{i=1}^{m} \begin{bmatrix}
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot a^{(0)}{i,1}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot a^{(0)}{i,1}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot a^{(0)}{i,1}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot a^{(0)}{i,1}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot a^{(0)}{i,2}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot a^{(0)}{i,2}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot a^{(0)}{i,2}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot a^{(0)}{i,2}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot a^{(0)}{i,3}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot a^{(0)}{i,3}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot a^{(0)}{i,3}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot a^{(0)}{i,3}
\end{bmatrix}\
=\frac {1}{m}\sum_{i=1}^{m}
\begin{bmatrix}
a^{(0)}{i,1}\
a^{(0)}{i,2}\
a^{(0)}{i,3}
\end{bmatrix}
\begin{bmatrix}
\frac {\partial J}{\partial z^{(1)}{i,1}}
& \frac {\partial J}{\partial z^{(1)}{i,2}}
& \frac {\partial J}{\partial z^{(1)}{i,3}}
& \frac {\partial J}{\partial z^{(1)}{i,4}}
\end{bmatrix}
=\frac {1}{m}\sum{i=1}^{m} (a^{(0)}{i})^T\cdot \frac {\partial J}{\partial z^{(1)}{i}}
=\frac {1}{m} (a^{(0)})^T\cdot \frac {\partial J}{\partial z^{(1)}}
=R^{3\times m}\cdot R^{m\times 4}
=R^{3\times 4}
$$
小结
TestNet网络的前向操作如下:
反向传播如下:
假设批量数据大小为
参考反向传导算法和神经网络反向传播的数学原理,设每层输入向量为残差
前向传播执行步骤
层与层之间的操作就是输出向量和权值矩阵的加权求和以及对输入向量的函数激活(以relu为例)
输出层输出结果后,进行评分函数的计算,得到最终的计算结果(以softmax分类为例)
$$
h(z^{(L)})
=
=\begin{bmatrix}
\frac {exp(z^{(2)}{1,1})}{\sum exp(z^{(2)}{1})} & \dots & \frac {exp(z^{(2)}{1,C})}{\sum exp(z^{(2)}{1})} \
\vdots & \vdots & \vdots\
\frac {exp(z^{(2)}{m,1})}{\sum exp(z^{(2)}{m})} & \dots & \frac {exp(z^{(2)}{m,C})}{\sum exp(z^{(2)}{m})}
\end{bmatrix}
$$损失函数根据计算结果判断最终损失值(以交叉熵损失为例)
反向传播执行步骤
计算损失函数对于输出层输入向量的梯度(最终层残差)
计算中间隐藏层的残差值(
) 完成所有的可学习参数(权值矩阵和偏置向量)的梯度计算
更新权值矩阵和偏置向量