概率论基础

参考PCA数学原理,小结PCA求解过程中相关的概率论基础

方差和协方差

方差参考方差 标准差,用于衡量一组数据的离散程度,值越大,表示数据分布越广

\[ Var(X) = D(X) = \frac {1}{N} \sum_{i=1}^{N}(x_{i} - \mu)^{2} \]

协方差用于判断两组数据之间的相关程度,直观上看,协方差是两个变量总体误差的期望

\[ Cov(X,Y) = E[(X-E(X)(Y-E(Y))] =\frac {1}{N} \sum_{i=1}^{N}(x_{i} - \mu_{x})(y_{i} - \mu_{y}) \]

协方差矩阵

\(X=(X_{1}, X_{2}, ..., X_{N})^{T}\)\(n\)维随机变量,称矩阵

\[ C = (c_{ij})_{n\times n} =\begin{pmatrix} c_{11} & c_{12} & \cdots & c_{1n}\\ c_{21} & c_{22} & \cdots & c_{2n}\\ \vdots & \vdots & \vdots & \vdots\\ c_{n1} & c_{n2} & \cdots & c_{nn} \end{pmatrix} \]

\(n\)维随机变量\(X\)的协方差矩阵(covariance matrix),记为\(D(X)\),其中

\[ c_{ij} = Cov(X_{i}, X_{j}),i,j=1,2,...,n \]

\(X\)的分量\(X_{i}\)\(X_{j}\)的协方差

以二维随机变量(X_{1}, X_{2})为例,协方差为

\[ C = \begin{pmatrix} E[X_{1} - E(X_{1})]^{2} & E[X_{1} - E(X_{1})]E[X_{2} - E(X_{2})]\\ E[X_{2} - E(X_{2})]E[X_{1} - E(X_{1})] & E[X_{2} - E(X_{2})]^{2} \end{pmatrix} \]

所以协方差矩阵是实对称矩阵元素为实数,矩阵转置等于本身

协方差矩阵\(C\)的对角元素\(c_{ii}\)表示变量\(X_{i}\)的方差,非对角元素\(c_{ij}\)表示变量\(X_{i}\)\(X_{j}\)的协方差

相关阅读