神经网络推导-批量数据

输入批量数据到神经网络,进行前向传播和反向传播的推导

TestNet网络

TestNet是一个2层神经网络,结构如下:

  • 输入层有3个神经元
  • 隐藏层有4个神经元
  • 输出层有2个神经元

  • 激活函数为relu函数
  • 评分函数为softmax回归
  • 代价函数为交叉熵损失

网络符号定义

规范神经网络的计算符号

关于神经元和层数

  • 表示网络层数(不计入输入层)
    • ,其中输入层是第0层,隐藏层是第1层,输出层是第2
  • 表示第层的神经元个数(不包括偏置神经元)
    • ,表示输入层神经元个数为3
    • ,表示隐藏层神经元个数为4
    • ,表示输出层神经元个数为2

关于权重矩阵和偏置值

  • 表示第层到第层的权重矩阵,矩阵行数为第层的神经元个数,列数为第层神经元个数
    • 表示输入层到隐藏层的权重矩阵,大小为
    • 表示隐藏层到输出层的权重矩阵,大小为
  • 表示第层第个神经元到第个神经元的权值
    • 的取值范围是
    • 的取值范围是
  • 表示第层第个神经元对应的权重向量,大小为
  • 表示第层第个神经元对应的权重向量,大小为
  • 表示第层的偏置向量
    • 表示输入层到隐藏层的偏置向量,大小为
    • 表示隐藏层到输出层的偏置向量,大小为
  • 表示第层第个神经元的偏置值
    • 表示第层隐藏层第个神经元的偏置值

关于神经元输入向量和输出向量

  • 表示第输出向量,$a^{(l)}=[a^{(l)}{1},a^{(l)}{2},…,a^{(l)}_{m}]^{T}$

    • 表示输入层输出向量,大小为
    • 表示隐藏层输出向量,大小为
    • 表示输出层输出向量,大小为
  • 表示第层第个单元的输出值,其是输入向量经过激活计算后的值

    • $a^{(1)}{3}3a^{(1)}{3}=g(z^{(1)}_{3})$
  • 表示第输入向量,$z^{(l)}=[z^{(l)}{1},z^{(l)}{2},…,z^{(l)}_{m}]^{T}$

    • 表示隐藏层的输入向量,大小为
    • 表示输出层的输入向量,大小为
  • 表示第层第个单元的输入值,其是上一层输出向量第个数据和该层第个神经元权重向量的加权累加和

    • $z^{(1)}{1,2}2z^{(1)}{1,2}=b^{(2)}{2}+a^{(0)}{1,1}\cdot W^{(1)}{1,2}+a^{(0)}{1,2}\cdot W^{(1)}{2,2}+a^{(0)}{1,3}\cdot W^{(1)}_{3,2}$

关于神经元激活函数

  • 表示激活函数操作

关于评分函数和损失函数

  • 表示评分函数操作
  • 表示代价函数操作

神经元执行步骤

神经元操作分为2步计算:

  1. 输入向量=前一层神经元输出向量与权重矩阵的加权累加和+偏置向量

$$
z^{(l)}{i,j}=a^{(l-1)}{i}\cdot W^{(l)}{,j} + b^{(l)}{j} \Rightarrow
z^{(l)}=a^{(l-1)}\cdot W^{(l)} + b^{(l)}
$$

  1. 输出向量=对输入向量进行激活函数操作

$$
a^{(l)}{i}=g(z{i}^{(l)})
\Rightarrow
a^{(l)}=g(z^{(l)})
$$

网络结构

对输入层

$$
a^{(0)}
=\begin{bmatrix}
a^{(0)}{1}\
\vdots\
a^{(0)}
{m}
\end{bmatrix}
=\begin{bmatrix}
a^{(0)}{1,1} & a^{(0)}{1,2} & a^{(0)}{1,3}\
\vdots & \vdots & \vdots\
a^{(0)}
{m,1} & a^{(0)}{m,2} & a^{(0)}{m,3}
\end{bmatrix}\in R^{m\times 3}
$$

对隐藏层

$$
W^{(1)}
=\begin{bmatrix}
W^{(1)}{1,1} & W^{(1)}{1,2} & W^{(1)}{1,3} & W^{(1)}{1,4}\
W^{(1)}{2,1} & W^{(1)}{2,2} & W^{(1)}{2,3} & W^{(1)}{2,4}\
W^{(1)}{3,1} & W^{(1)}{3,2} & W^{(1)}{3,3} & W^{(1)}{3,4}
\end{bmatrix}
\in R^{3\times 4}
$$

$$
b^{(1)}=[[b^{(1)}{1},b^{(1)}{2},b^{(1)}{3},b^{(1)}{4}]]\in R^{1\times 4}
$$

$$
z^{(1)}
=\begin{bmatrix}
z^{(0)}{1,1} & z^{(0)}{1,2} & z^{(0)}{1,3} & z^{(0)}{1,4}\
\vdots & \vdots & \vdots & \vdots\
z^{(0)}{m,1} & z^{(0)}{m,2} & z^{(0)}{m,3} & z^{(0)}{m,4}
\end{bmatrix}\in R^{m\times 4}
$$

$$
a^{(1)}
=\begin{bmatrix}
a^{(0)}{1,1} & a^{(0)}{1,2} & a^{(0)}{1,3} & a^{(0)}{1,4}\
\vdots & \vdots & \vdots & \vdots\
a^{(0)}{m,1} & a^{(0)}{m,2} & a^{(0)}{m,3} & a^{(0)}{m,4}
\end{bmatrix}\in R^{m\times 4}
$$

对输出层

$$
W^{(2)}
=\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{1,2}\
W^{(2)}{2,1} & W^{(2)}{2,2}\
W^{(2)}{3,1} & W^{(2)}{3,2}\
W^{(2)}{4,1} & W^{(2)}{4,2}
\end{bmatrix}
\in R^{4\times 2}
$$

$$
b^{(2)}=[[b^{(2)}{1},b^{(2)}{2}]]\in R^{1\times 2}
$$

$$
z^{(2)}
=\begin{bmatrix}
z^{(2)}{1,1} & z^{(0)}{1,2}\
\vdots & \vdots\
z^{(2)}{m,1} & z^{(0)}{m,2}
\end{bmatrix}\in R^{m\times 2}
$$

评分值

损失值

前向传播

输入层到隐藏层计算

$$
z^{(1)}{i,1}=a^{(0)}{i}\cdot W^{(1)}{,1}+b^{(1)}{1}
=a^{(0)}{i,1}\cdot W^{(1)}{1,1}
+a^{(0)}{i,2}\cdot W^{(1)}{2,1}
+a^{(0)}{i,3}\cdot W^{(1)}{3,1}
+b^{(1)}_{1,1}
$$

$$
z^{(1)}{i,2}=a^{(0)}{i}\cdot W^{(1)}{,2}+b^{(1)}{2}
=a^{(0)}{i,1}\cdot W^{(1)}{1,2}
+a^{(0)}{i,2}\cdot W^{(1)}{2,2}
+a^{(0)}{i,3}\cdot W^{(1)}{3,2}
+b^{(1)}_{1,2}
$$

$$
z^{(1)}{i,3}=a^{(0)}{i}\cdot W^{(1)}{,3}+b^{(1)}{3}
=a^{(0)}{i,1}\cdot W^{(1)}{1,3}
+a^{(0)}{i,2}\cdot W^{(1)}{2,3}
+a^{(0)}{i,3}\cdot W^{(1)}{3,3}
+b^{(1)}_{1,3}
$$

$$
z^{(1)}{i,4}=a^{(0)}{i}\cdot W^{(1)}{,4}+b^{(1)}{4}
=a^{(0)}{i,1}\cdot W^{(1)}{1,4}
+a^{(0)}{i,2}\cdot W^{(1)}{2,4}
+a^{(0)}{i,3}\cdot W^{(1)}{3,4}
+b^{(1)}_{1,4}
$$

$$
\Rightarrow z^{(1)}{i}
=[z^{(1)}
{i,1},z^{(1)}{i,2},z^{(1)}{i,3},z^{(1)}{i,4}]
=a^{(0)}
{i}\cdot W^{(1)}+b^{(1)}
$$

隐藏层输入向量到输出向量

$$
a^{(1)}{i,1}=relu(z^{(1)}{i,1}) \
a^{(1)}{i,2}=relu(z^{(1)}{i,2}) \
a^{(1)}{i,3}=relu(z^{(1)}{i,3}) \
a^{(1)}{i,4}=relu(z^{(1)}{i,4})
$$

$$
\Rightarrow
a^{(1)}{i}=[a^{(1)}{i,1},a^{(1)}{i,2},a^{(1)}{i,3},a^{(1)}{i,4}]
=relu(z^{(1)}
{i})
$$

隐藏层到输出层计算

$$
z^{(2)}{i,1}=a^{(1)}{i}\cdot W^{(2)}{,1}+b^{(2)}{1,1}
=a^{(1)}{i,1}\cdot W^{(2)}{1,1}
+a^{(1)}{i,2}\cdot W^{(2)}{2,1}
+a^{(1)}{i,3}\cdot W^{(2)}{3,1}
+a^{(1)}{i,4}\cdot W^{(2)}{4,1}
+b^{(2)}_{1,1}
$$

$$
z^{(2)}{i,2}=a^{(1)}{i}\cdot W^{(2)}{,2}+b^{(2)}{1,2}
=a^{(1)}{i,1}\cdot W^{(2)}{1,2}
+a^{(1)}{i,2}\cdot W^{(2)}{2,2}
+a^{(1)}{i,3}\cdot W^{(2)}{3,2}
+a^{(1)}{i,4}\cdot W^{(2)}{4,2}
+b^{(2)}_{1,2}
$$

$$
\Rightarrow z^{(2)}{i}
=[z^{(2)}
{i,1},z^{(2)}{i,2}]
=a^{(1)}
{i}\cdot W^{(2)}+b^{(2)}
$$

评分操作

$$
p(y_{i}=1)=\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})} \
p(y_{i}=2)=\frac {exp(z^{(2)}{i,2})}{\sum exp(z^{(2)}{i})}
$$

$$
\Rightarrow h(z^{(2)}{i})
=[p(y
{i}=1),p(y_{i}=2)]
=[\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})}, \frac {exp(z^{(2)}{i,2})}{\sum exp(z^{(2)}{i})}]
$$

损失值

反向传播

计算输出层输入向量梯度

$$
\frac {\partial J}{\partial z^{(2)}{i,1}}=
(-1)\cdot \frac {1(y
{i}=1)}{p(y_{i}=1)}\cdot \frac {\partial p(y_{i}=1)}{\partial z^{(2)}{i,1}}
+(-1)\cdot \frac {1(y
{i}=2)}{p(y_{i}=2)}\cdot \frac {\partial p(y_{i}=2)}{\partial z^{(2)}_{i,1}}
$$

$$
\frac {\partial p(y_{i}=1)}{\partial z^{(2)}{i,1}}
=\frac {exp(z^{(2)}
{i,1})\cdot \sum exp(z^{(2)}{i})-exp(z^{(2)}{i,1})\cdot exp(z^{(2)}{i,1})}{(\sum exp(z^{(2)}{i}))^2}
=\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})}
-(\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})})^2
=p(y_{i}=1)-(p(y_{i}=1))^2
$$

$$
\frac {\partial p(y_{i}=2)}{\partial z^{(2)}{i,1}}
=\frac {-exp(z^{(2)}
{i,2})\cdot exp(z^{(2)}{i,1})}{(\sum exp(z^{(2)}{i}))^2}
=(-1)\cdot \frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})}\cdot \frac {exp(z^{(2)}{i,2})}{\sum exp(z^{(2)}{i})}
=(-1)\cdot p(y_{i}=1)p(y_{i}=2)
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(2)}{i,1}}
=(-1)\cdot \frac {1(y
{i}=1)}{p(y_{i}=1)}\cdot (p(y_{i}=1)-(p(y_{i}=1))^2)
+(-1)\cdot \frac {1(y_{i}=2)}{p(y_{i}=2)}\cdot (-1)\cdot p(y_{i}=1)p(y_{i}=2) \
=(-1)\cdot 1(y_{i}=1)\cdot (1-p(y_{i}=1))
+1(y_{i}=2)\cdot p(y_{i}=1)
=p(y_{i}=1)-1(y_{i}=1)
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(2)}{i,2}}
=p(y
{i}=2)-1(y_{i}=2)
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(2)}{i}}
=[p(y
{i}=1)-1(y_{i}=1), p(y_{i}=2)-1(y_{i}=2)]
$$

计算输出层权重向量梯度

$$
\frac {\partial J}{\partial W^{(2)}{1,1}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{1,1}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,1})
$$

$$
\frac {\partial J}{\partial W^{(2)}{2,1}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{2,1}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,2})
$$

$$
\frac {\partial J}{\partial W^{(2)}{3,1}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{3,1}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,3})
$$

$$
\frac {\partial J}{\partial W^{(2)}{4,1}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{4,1}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,4})
$$

$$
\frac {\partial J}{\partial W^{(2)}{1,2}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{1,2}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,1})
$$

$$
\frac {\partial J}{\partial W^{(2)}{2,2}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{2,2}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,2})
$$

$$
\frac {\partial J}{\partial W^{(2)}{3,2}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{3,2}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,3})
$$

$$
\frac {\partial J}{\partial W^{(2)}{4,2}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{4,2}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,4})
$$

$$
\Rightarrow \frac {\partial J}{\partial W^{(2)}}
=\begin{bmatrix}
\frac {\partial J}{\partial W^{(2)}{1,1}} & \frac {\partial J}{\partial W^{(2)}{1,2}}\
\frac {\partial J}{\partial W^{(2)}{2,1}} & \frac {\partial J}{\partial W^{(2)}{2,2}}\
\frac {\partial J}{\partial W^{(2)}{3,1}} & \frac {\partial J}{\partial W^{(2)}{3,2}}\
\frac {\partial J}{\partial W^{(2)}{4,1}} & \frac {\partial J}{\partial W^{(2)}{4,2}}
\end{bmatrix}
$$

$$
=\begin{bmatrix}
\frac {1}{m}\sum_{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,1}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,2})\
\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,3}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,4})\
\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}{i,1}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}{i,2})\
\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}{i,3}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,4})
\end{bmatrix}
$$

$$
=\frac {1}{m}\sum_{i=1}^{m}
\begin{bmatrix}
a^{(1)}{i,1}\
a^{(1)}
{i,2}\
a^{(1)}{i,3}\
a^{(1)}
{i,4}
\end{bmatrix}
\
=\frac {1}{m}\sum_{i=1}^{m} ((a^{(1)}{i})^{T}\cdot \frac {\partial J}{\partial z^{(2)}{i}})
=\frac {1}{m} (a^{(1)})^{T}\cdot \frac {\partial J}{\partial z^{(2)}}
=\frac {1}{m}\sum_{i=1}^{m} (R^{4\times m}\cdot R^{m\times 2})
=R^{4\times 2}
$$

计算隐藏层输出向量梯度

$$
\frac {\partial J}{\partial a^{(1)}{i,1}}
=\frac {\partial J}{\partial z^{(2)}
{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,1}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,1}}
=(p(y
{i}=1)-1(y_{i}=1))\cdot W^{(2)}{1,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{1,2}
$$

$$
\frac {\partial J}{\partial a^{(1)}{i,2}}
=\frac {\partial J}{\partial z^{(2)}
{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,2}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,2}}
=(p(y
{i}=1)-1(y_{i}=1))\cdot W^{(2)}{2,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{2,2}
$$

$$
\frac {\partial J}{\partial a^{(1)}{i,3}}
=\frac {\partial J}{\partial z^{(2)}
{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,3}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,3}}
=(p(y
{i}=1)-1(y_{i}=1))\cdot W^{(2)}{3,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{3,2}
$$

$$
\frac {\partial J}{\partial a^{(1)}{i,4}}
=\frac {\partial J}{\partial z^{(2)}
{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,4}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,4}}
=(p(y
{i}=1)-1(y_{i}=1))\cdot W^{(2)}{4,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{4,2}
$$

$$
\Rightarrow \frac {\partial J}{\partial a^{(1)}{i}}
=\begin{bmatrix}
p(y
{i}=1)-1(y_{i}=1) & p(y_{i}=2)-1(y_{i}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix} \
=\frac {\partial J}{\partial z^{(2)}_{i}}\cdot (W^{(2)})^T
=R^{1\times 2}\cdot R^{2\times 4}
=R^{1\times 4}
$$

$$
\Rightarrow \frac {\partial J}{\partial a^{(1)}}
=
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix} \
=\frac {\partial J}{\partial z^{(2)}}\cdot (W^{(2)})^T
=R^{m\times 2}\cdot R^{2\times 4}
=R^{m\times 4}
$$

计算隐藏层输入向量的梯度

$$
\frac {\partial J}{\partial z^{(1)}{i,1}}
=\frac {\partial J}{\partial a^{(1)}
{i,1}}\cdot \frac {\partial a^{(1)}{i,1}}{\partial z^{(1)}{i,1}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{1,1}
+(p(y=2)-1(y=2))\cdot W^{(2)}
{1,2})\cdot 1(z^{(1)}_{i,1}\geq 0)
$$

$$
\frac {\partial J}{\partial z^{(1)}{i,2}}
=\frac {\partial J}{\partial a^{(1)}
{i,2}}\cdot \frac {\partial a^{(1)}{i,2}}{\partial z^{(1)}{i,2}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{2,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{2,2})\cdot 1(z^{(1)}{i,2}\geq 0)
$$

$$
\frac {\partial J}{\partial z^{(1)}{i,3}}
=\frac {\partial J}{\partial a^{(1)}
{i,3}}\cdot \frac {\partial a^{(1)}{i,3}}{\partial z^{(1)}{i,3}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{3,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{3,2})\cdot 1(z^{(1)}{i,3}\geq 0)
$$

$$
\frac {\partial J}{\partial z^{(1)}{i,4}}
=\frac {\partial J}{\partial a^{(1)}
{i,4}}\cdot \frac {\partial a^{(1)}{i,4}}{\partial z^{(1)}{i,4}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{4,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{4,2})\cdot 1(z^{(1)}{i,4}\geq 0)
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(1)}{i}}
=(\begin{bmatrix}
p(y
{i}=1)-1(y_{i}=1) & p(y_{i}=2)-1(y_{i}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix})
\begin{bmatrix}
\frac {\partial a^{(1)}{i,1}}{\partial z^{(1)}{i,1}}&
\frac {\partial a^{(1)}{i,2}}{\partial z^{(1)}{i,2}}&
\frac {\partial a^{(1)}{i,3}}{\partial z^{(1)}{i,3}}&
\frac {\partial a^{(1)}{i,4}}{\partial z^{(1)}{i,4}}
\end{bmatrix}\
=(R^{1\times 2}\cdot R^{2\times 4})
R^{1\times 4}
=R^{1\times 4}
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(1)}{i}}
=(\begin{bmatrix}
p(y
{i}=1)-1(y_{i}=1) & p(y_{i}=2)-1(y_{i}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix})
*
\begin{bmatrix}
1(z^{(1)}{i,1}\geq 0) & 1(z^{(1)}{i,2}\geq 0) & 1(z^{(1)}{i,3}\geq 0) & 1(z^{(1)}{i,4}\geq 0)
\end{bmatrix}\
=(R^{1\times 2}\cdot R^{2\times 4})\ast R^{1\times 4}
=R^{1\times 4}
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(1)}}
=(
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix})
*
\begin{bmatrix}
1(z^{(1)}{1,1}\geq 0) & 1(z^{(1)}{1,2}\geq 0) & 1(z^{(1)}{1,3}\geq 0) & 1(z^{(1)}{1,4}\geq 0)\
\vdots & \vdots\
1(z^{(1)}{m,1}\geq 0) & 1(z^{(1)}{m,2}\geq 0) & 1(z^{(1)}{m,3}\geq 0) & 1(z^{(1)}{m,4}\geq 0)
\end{bmatrix}\
=\frac {\partial J}{\partial a^{(1)}} * 1(z^{(1)}\geq 0)
=(R^{m\times 2}\cdot R^{2\times 4})\ast R^{m\times 4}
=R^{m\times 4}
$$

计算隐藏层权重向量的梯度

$$
\frac {\partial J}{\partial W^{(1)}{1,1}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot
\frac {\partial z^{(1)}
{i,1}}{\partial W^{(1)}{1,1}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{1,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{1,2})\cdot 1(z^{(1)}{i,1}\geq 0)\cdot a^{(0)}_{i,1}
$$

$$
\frac {\partial J}{\partial W^{(1)}{1,2}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot
\frac {\partial z^{(1)}
{i,2}}{\partial W^{(1)}{1,2}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{2,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{2,2})\cdot 1(z^{(1)}{i,2}\geq 0)\cdot a^{(0)}_{i,1}
$$

$$
\Rightarrow \frac {\partial J}{\partial W^{(1)}{k,l}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,l}}\cdot
\frac {\partial z^{(1)}
{i,l}}{\partial W^{(1)}{k,l}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{l,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{l,2})\cdot 1(z^{(1)}{i,l}\geq 0)\cdot a^{(0)}_{i,k}
$$

$$
\Rightarrow \frac {\partial J}{\partial W^{(1)}}
=\begin{bmatrix}
\frac {\partial J}{\partial W^{(1)}{1,1}} & \frac {\partial J}{\partial W^{(1)}{1,2}} & \frac {\partial J}{\partial W^{(1)}{1,3}} & \frac {\partial J}{\partial W^{(1)}{1,4}}\
\frac {\partial J}{\partial W^{(1)}{2,1}} & \frac {\partial J}{\partial W^{(1)}{2,2}} & \frac {\partial J}{\partial W^{(1)}{2,3}} & \frac {\partial J}{\partial W^{(1)}{2,4}}\
\frac {\partial J}{\partial W^{(1)}{3,1}} & \frac {\partial J}{\partial W^{(1)}{3,2}} & \frac {\partial J}{\partial W^{(1)}{3,3}} & \frac {\partial J}{\partial W^{(1)}{3,4}}
\end{bmatrix}\
=\begin{bmatrix}
\frac {1}{m}\sum_{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{1,1}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{1,2}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{1,3}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{1,4}}\
\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{2,1}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{2,2}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{2,3}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{2,4}}\
\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{3,1}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{3,2}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{3,3}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{3,4}}
\end{bmatrix}\
=\frac {1}{m}\sum
{i=1}^{m} \begin{bmatrix}
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{1,1}}
& \frac {\partial J}{\partial z^{(1)}
{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{1,2}}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{1,3}}
& \frac {\partial J}{\partial z^{(1)}
{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{1,4}}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{2,1}}
& \frac {\partial J}{\partial z^{(1)}
{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{2,2}}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{2,3}}
& \frac {\partial J}{\partial z^{(1)}
{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{2,4}}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{3,1}}
& \frac {\partial J}{\partial z^{(1)}
{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{3,2}}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{3,3}}
& \frac {\partial J}{\partial z^{(1)}
{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{3,4}}
\end{bmatrix}\
=\frac {1}{m}\sum_{i=1}^{m} \begin{bmatrix}
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot a^{(0)}{i,1}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot a^{(0)}{i,1}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot a^{(0)}{i,1}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot a^{(0)}{i,1}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot a^{(0)}{i,2}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot a^{(0)}{i,2}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot a^{(0)}{i,2}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot a^{(0)}{i,2}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot a^{(0)}{i,3}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot a^{(0)}{i,3}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot a^{(0)}{i,3}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot a^{(0)}{i,3}
\end{bmatrix}\
=\frac {1}{m}\sum_{i=1}^{m}
\begin{bmatrix}
a^{(0)}{i,1}\
a^{(0)}
{i,2}\
a^{(0)}{i,3}
\end{bmatrix}
\begin{bmatrix}
\frac {\partial J}{\partial z^{(1)}
{i,1}}
& \frac {\partial J}{\partial z^{(1)}{i,2}}
& \frac {\partial J}{\partial z^{(1)}
{i,3}}
& \frac {\partial J}{\partial z^{(1)}{i,4}}
\end{bmatrix}
=\frac {1}{m}\sum
{i=1}^{m} (a^{(0)}{i})^T\cdot \frac {\partial J}{\partial z^{(1)}{i}}
=\frac {1}{m} (a^{(0)})^T\cdot \frac {\partial J}{\partial z^{(1)}}
=R^{3\times m}\cdot R^{m\times 4}
=R^{3\times 4}
$$

小结

TestNet网络的前向操作如下:

反向传播如下:

假设批量数据大小为,数据维数为,网络层数为),输出类别为

参考反向传导算法神经网络反向传播的数学原理,设每层输入向量为残差,用于表示该层对最终输出值的残差造成的影响;而最终输出值的残差就是损失函数对输出层输入向量的梯度

前向传播执行步骤

  1. 层与层之间的操作就是输出向量和权值矩阵的加权求和以及对输入向量的函数激活(以relu为例

  2. 输出层输出结果后,进行评分函数的计算,得到最终的计算结果(以softmax分类为例

    $$
    h(z^{(L)})
    =
    =\begin{bmatrix}
    \frac {exp(z^{(2)}{1,1})}{\sum exp(z^{(2)}{1})} & \dots & \frac {exp(z^{(2)}{1,C})}{\sum exp(z^{(2)}{1})} \
    \vdots & \vdots & \vdots\
    \frac {exp(z^{(2)}{m,1})}{\sum exp(z^{(2)}{m})} & \dots & \frac {exp(z^{(2)}{m,C})}{\sum exp(z^{(2)}{m})}
    \end{bmatrix}
    $$

  3. 损失函数根据计算结果判断最终损失值(以交叉熵损失为例

反向传播执行步骤

  1. 计算损失函数对于输出层输入向量的梯度(最终层残差)

  2. 计算中间隐藏层的残差值(

  3. 完成所有的可学习参数(权值矩阵和偏置向量)的梯度计算

  4. 更新权值矩阵和偏置向量

相关阅读