📝笔记：简明矩阵求导术之分子布局与分母布局-世界杯转播权-1994年世界杯_历年世界杯冠军

矩阵或者向量求导时经常会被分子/分母布局搞得头大，如什么时候转置，什么时候不转置。本文将简明介绍常用的矩阵/向量求导技巧。

简单例子

\(\mathbf{a}^{\top} \mathbf{x}\)

对向量\(\mathbf{x}\)求导，举个例子：

令\(\mathbf{a} = \left[\begin{array}{ll} 1 \\

2 \end{array}\right]\)， \(\mathbf{x} =

\left[\begin{array}{ll} x_1 \\ x_2 \end{array}\right]\)，则 \(\mathbf{a}^{\top} \mathbf{x} =x_1+

2x_2\)

于是

\begin{aligned}

\frac{\partial \mathbf{a}^{\top}\mathbf{x}}{\partial \mathbf{x}} =

& {\left[\begin{array}{ll}

\frac{\partial (x_1+ 2x_2)}{\partial x_{1}} \\

\frac{\partial (x_1+ 2x_2)}{\partial x_{2}}

\end{array}\right] } \\

=& {\left[\begin{array}{ll}

1 \\

\end{array}\right] } \\

=& \mathbf{a} (分母布局)

\end{aligned}

注意，上述结果以分母布局进行排布，具体见后一节的一般形式。

\(\mathbf{A} \mathbf{x}\)

对向量\(\mathbf{x}\)求导，可以举一个具体的例子对求导过程进行推导。

令\(\mathbf{A} = \left[\begin{array}{ll} 1

& 2 \\ 3 & 4 \end{array}\right]\)， \(\mathbf{x} = \left[\begin{array}{ll} x_1 \\ x_2

\end{array}\right]\) 则：

\begin{aligned}

\mathbf{A x}

=& {\left[\begin{array}{ll}

1 & 2 \\

3 & 4

\end{array}\right]\left[\begin{array}{l}

x_{1} \\

x_{2}

\end{array}\right] } \\

=& {\left[\begin{array}{l}

x_{1}+2 x_{2} \\

3 x_{1}+4 x_{2}

\end{array}\right] } \\

=&\left[\begin{array}{c}

f_{1} \\

f_{2}

\end{array}\right]

\end{aligned}

所以，

\begin{aligned}

\frac{\partial \mathbf{A}\mathbf{x}}{\partial \mathbf{x}} =

& {\left[\begin{array}{ll}

\frac{\partial f_{1}}{\partial x_{1}} & \frac{\partial

f_{2}}{\partial x_{1}} \\

\frac{\partial f_{1}}{\partial x_{2}} & \frac{\partial

f_{2}}{\partial x_{2}}

\end{array}\right] } \\

=& {\left[\begin{array}{ll}

1 & 3 \\

2 & 4

\end{array}\right] } \\

=& \mathbf{A}^{\top}(分母布局)

\end{aligned}

或者，

\begin{aligned}

\frac{\partial \mathbf{A}\mathbf{x}}{\partial \mathbf{x}} =

& {\left[\begin{array}{ll}

\frac{\partial f_{1}}{\partial x_{1}} & \frac{\partial

f_{1}}{\partial x_{2}} \\

\frac{\partial f_{2}}{\partial x_{1}} & \frac{\partial

f_{2}}{\partial x_{2}}

\end{array}\right] } \\

=& {\left[\begin{array}{ll}

1 & 2 \\

3 & 4

\end{array}\right] } \\

=& \mathbf{A} (分子布局)

\end{aligned}

上述求导结果的排列方式分别展示了分母布局与分子布局。

一般形式

向量一般可以被认为成一维矩阵，默认按列进行排列。

向量对向量求导，如\(\partial \mathbf{y} /

\partial \mathbf{x}\)，其中\(\mathbf{y}=\left[\begin{array}{lll}y_{1} &

\cdots & y_{m}\end{array}\right]^{\top}\)以及\(\mathbf{x}=\left[\begin{array}{lll}x_{1} &

\cdots & x_{n}\end{array}\right]^{\top}\)

于是\(\partial \mathbf{y} / \partial

\mathbf{x}\)是一个拥有\(m \times

n\)元素的矩阵，那么应该如何组织这个矩阵呢？目前有两种矩阵排列方式，它们分别是：分子布局（Numerator

Layout），分母布局（Denominator Layout）

分子布局

一句话就是按照分子的排列方式进行排列，分子原来怎样排列，求导之后的结果就怎样排列，如：

\begin{gathered}

\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{ccc}

\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial

y_{1}}{\partial x_{n}} \\

\vdots & \ddots & \vdots \\

\frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial

y_{m}}{\partial x_{n}}

\end{array}\right] \\

\equiv \frac{\partial \mathbf{y}}{\partial \mathbf{x}^{\top}}

\end{gathered}

上式结果中，分子\(\mathbf{y}\)的每个元素是是按照下标\(1

...m\)按列排布，于是\(\frac{\partial \mathbf{y}}{\partial \mathbf{x}}

\in \mathbb{R}^{m \times

n}\)，这种形式也被叫做雅可比矩阵3（Jacobian matrix）。

当\(y\)是标量，\(\mathbf{x}\) 是向量时：

\frac{\partial y}{\partial \mathbf{x}}=\left[\begin{array}{lll}

\frac{\partial y}{\partial x_{1}} & \cdots & \frac{\partial

y}{\partial x_{n}}

\end{array}\right] \equiv \frac{\partial y}{\partial \mathbf{x}^{\top}}

上述分子布局在标量对向量的求导的数据排布中并不常见。

分母布局

\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{ccc}

\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial

y_{m}}{\partial x_{1}} \\

\vdots & \ddots & \vdots \\

\frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial

y_{m}}{\partial x_{n}}

\end{array}\right]

上式结果中，分母\(\mathbf{x}\)的每个元素是是按照下标\(1 ...n\)按列排布，于是\(\frac{\partial \mathbf{y}}{\partial \mathbf{x}}

\in \mathbb{R}^{n \times m}\)

当\(y\)是标量，\(\mathbb{x}\)是向量时： \[

\frac{\partial y}{\partial \mathbf{x}}=\left[\begin{array}{c}

\frac{\partial y}{\partial x_{1}} \\

\vdots \\

\frac{\partial y}{\partial x_{n}}

\end{array}\right]

这种标量对向量求导的情况非常常见，通常是以分母布局对求导结果进行排布。

那么向量求导两种方式结果数据排布方式的图示效果如下图所示1：

分子-分母布局图示

以上两种形式比较容易搞混（通常在是否使用转置之间徘徊），在使用时务必要说明使用哪种布局！但是实际读论文时很少看到作者写明到底用的哪种，此时需要结合上下文进行判断，推理出论文公式使用的何种布局。另外，值得说明的是，如果作者没有明确说明，自己又懒得看，这时候你可以认为作者使用了“混合布局”，具体地：\(\frac{\partial \mathbf{y}}{\partial {x}}\)

按照分子布局，\(\frac{\partial {y}}{\partial

\mathbf{x}}\)

按照分母布局3。

以分母布局为例，常用的矩阵求导公式有：

\begin{aligned}

\frac{\partial \mathbf{x}^{\top} \mathbf{a}}{\partial

\mathbf{x}}&=\mathbf{a} \\

\frac{\partial \mathbf{A} \mathbf{x}}{\partial

\mathbf{x}}&=\mathbf{A}^{\top} \\

\frac{\partial \mathbf{x}^{\top} \mathbf{A} \mathbf{x}}{\partial

\mathbf{x}}&=\left(\mathbf{A}+\mathbf{A}^{\top}\right) \mathbf{x} \\

\frac{\partial \mathbf{u}^{\top}}{\partial \mathbf{x}}&= \left(

\frac{\partial \mathbf{u}}{\partial \mathbf{x}} \right)^{\top} \\

\end{aligned}

这里有个小技巧，即分母布局中要加个转置，这是为什么呢？因为分母布局中要求按照分母的排列方式进行组织（一般为列），而分子呢，则"被迫"需要进行转置，反映在求导结果上也就需要转置。

当\(\mathbf{W}\)为对称矩阵时，我们有如下公式2： \[

\begin{aligned}

\frac{\partial}{\partial \mathbf{s}}(\mathbf{x}-\mathbf{A}

\mathbf{s})^{T} \mathbf{W}(\mathbf{x}-\mathbf{A} \mathbf{s}) &=-2

\mathbf{A}^{T} \mathbf{W}(\mathbf{x}-\mathbf{A} \mathbf{s}) \\

\frac{\partial}{\partial \mathbf{x}}(\mathbf{x}-\mathbf{s})^{T}

\mathbf{W}(\mathbf{x}-\mathbf{s}) &=2

\mathbf{W}(\mathbf{x}-\mathbf{s}) \\

\frac{\partial}{\partial \mathbf{s}}(\mathbf{x}-\mathbf{s})^{T}

\mathbf{W}(\mathbf{x}-\mathbf{s}) &=-2

\mathbf{W}(\mathbf{x}-\mathbf{s}) \\

\frac{\partial}{\partial \mathbf{x}}(\mathbf{x}-\mathbf{A}

\mathbf{s})^{T} \mathbf{W}(\mathbf{x}-\mathbf{A} \mathbf{s}) &=2

\mathbf{W}(\mathbf{x}-\mathbf{A} \mathbf{s}) \\

\frac{\partial}{\partial \mathbf{A}}(\mathbf{x}-\mathbf{A}

\mathbf{s})^{T} \mathbf{W}(\mathbf{x}-\mathbf{A} \mathbf{s}) &=-2

\mathbf{W}(\mathbf{x}-\mathbf{A} \mathbf{s}) \mathbf{s}^{T}

\end{aligned}

小节3

向量对向量求导

标量对向量求导

特别需要注意的是： \[

\begin{aligned}

\frac{\partial \mathbf{u}^{\top} \mathbf{v}}{\partial \mathbf{x}} &=

\mathbf{u}^{\top} \frac{\partial \mathbf{v}}{\partial

\mathbf{x}}+\mathbf{v}^{\top} \frac{\partial \mathbf{u}}{\partial

\mathbf{x}}

(分子布局)

\frac{\partial \mathbf{u}^{\top} \mathbf{v}}{\partial \mathbf{x}} &=

\frac{\partial \mathbf{u}}{\partial \mathbf{x}}

\mathbf{v}+\frac{\partial \mathbf{v}}{\partial \mathbf{x}} \mathbf{u}

(分母布局)

\end{aligned}

其中\(\mathbf{u} = \mathbf{u(x)},

\mathbf{v} = \mathbf{v(x)}\), \(\mathbf{u}^{\top}\mathbf{v}\)为标量。

应用

最小化误差\(E\)：

E=\sum_{i=1}^{n}\left(\mathbf{a}_{i}^{\top}

\mathbf{x}-b_{i}\right)^{2}=\|\mathbf{A} \mathbf{x}-\mathbf{b}\|^{2}

推导过程如下：

\begin{aligned}

E=\|\mathbf{A x}-\mathbf{b}\|^{2} &=(\mathbf{A

x}-\mathbf{b})^{\top}(\mathbf{A x}-\mathbf{b}) \\

&=\left(\mathbf{x}^{\top}

\mathbf{A}^{\top}-\mathbf{b}^{\top}\right)(\mathbf{A a}-\mathbf{b}) \\

&=\mathbf{x}^{\top} \mathbf{A}^{\top} \mathbf{A x}-\mathbf{x}^{\top}

\mathbf{A}^{\top} \mathbf{b}-\mathbf{b}^{\top} \mathbf{A

x}+\mathbf{b}^{\top} \mathbf{b}

\end{aligned}

我们对每项进行求导：

\begin{aligned}

\frac{ \partial{ \mathbf{x}^{\top} \mathbf{A}^{\top} \mathbf{A x}}}

{\partial \mathbf{x}} =(\mathbf{A}^{\top} \mathbf{A} + \mathbf{A}^{\top}

\mathbf{A}) \mathbf{x}

&= 2\mathbf{A}^{\top} \mathbf{A}\mathbf{x} \\

\frac{\partial{ \mathbf{x}^{\top} \mathbf{A}^{\top} \mathbf{b}}}

{\partial \mathbf{x}} &=\mathbf{A}^{\top} \mathbf{b} \\

\frac{\partial{ \mathbf{b}^{\top} \mathbf{A x}}}

{\partial \mathbf{x}} &=(\mathbf{b}^{\top} \mathbf{A})^{\top} =

\mathbf{A^{\top}b} \\

\frac{\partial{ \mathbf{b}^{\top} \mathbf{b}}}

{\partial \mathbf{x}} &= \mathbf{0} (列向量)

\end{aligned}

所以：

\begin{aligned}

\frac{\partial{ E}}

{\partial \mathbf{x}} &= 2\mathbf{A}^{\top} \mathbf{A}\mathbf{x} -

\mathbf{A}^{\top} \mathbf{b} - \mathbf{A^{\top}b} + \mathbf{0} \\

&= 2\mathbf{A}^{\top} \mathbf{A}\mathbf{x} - 2\mathbf{A}^{\top}

\mathbf{b}

\end{aligned}

令\(\frac{\partial{ E}}{\partial

\mathbf{x}} = 0\)，我们有：

\begin{aligned}

2\mathbf{A}^{\top} \mathbf{A}\mathbf{x} - 2\mathbf{A}^{\top} \mathbf{b}

&= \mathbf{0}\\

\mathbf{A}^{\top} \mathbf{A}\mathbf{x} &= \mathbf{A}^{\top}

\mathbf{b} \\

\mathbf{x} &= ( \mathbf{A}^{\top} \mathbf{A})^{-1}\mathbf{A}^{\top}

\mathbf{b}

\end{aligned}

\(\mathbf{x} = ( \mathbf{A}^{\top}

\mathbf{A})^{-1}\mathbf{A}^{\top}

\mathbf{b}\)就是上述线性最小二乘问题的解。

参考

1.Matrix

Differentiation in Lecture CS5240 Theoretial Foundations in Multimedia,

https://www.comp.nus.edu.sg/~cs5240/lecture/matrix-diff.pdf↩︎

2.Matrix

Cookbook,

http://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf↩︎

3.Matrix

calculus,

https://en.jinzhao.wiki/wiki/Matrix_calculus↩︎

4.Vector/Matrix

Calculus More notes on matrix

differentiation.↩︎

5.Matrix

Differentiation (and some other stuff), Randal J. Barnes, Department

of Civil Engineering, University of

Minnesota.↩︎

6.Matrix

Calculus(一款矩阵求导计算器),

http://www.matrixcalculus.org/↩︎

揭秘Windows电池续航之谜：5招轻松看懂电池使用时间，告别电量焦虑！
在部队编制单位中，司令到底不是最大的官呢？看完后才知道不是的