落园 » 统计学习精要(The Elements of Statistical Learning)课堂笔记(二十一):SMO算法|专注经济视角下的互联网

统计学习精要(The Elements of Statistical Learning)课堂笔记(二十一):SMO算法

1. SVM优化问题

1) 原问题

\min\frac{1}{2}\left\Vert w\right\Vert ^{2}+C\sum_{i}\xi_{i}

s.t.\, y_{i}(w_{i}+b)\geq1-\xi_{i},\forall i

2) 拉格朗日形式的表述

\min\mathcal{L}(y_{i},w^{'}x_{i}+b)+\lambda\left\Vert w\right\Vert ^{2}

其中,\mathcal{L}(y_{i},w^{'}x_{i}+b)=l(y(w'x+b))

3) 对偶问题

\min-g(\lambda,\mu)=\frac{1}{2}\sum_{i}\sum_{j}(\lambda_{i}y_{i})(\lambda_{j}y_{j})(x_{i}'x_{j})-\sum_{i}\lambda_{i}

s.t.\sum_{i}\lambda_{i}y_{i}=0

0\leq\lambda_{i}\leq C

\mu_{i}\geq0

4) SVM分类器

(i) W=\sum_{i}\lambda_{i}y_{i}x_{i}

(ii) 选0<\lambda_{i}<Cb=y_{i}-w'x_{i}=y_{i}-\sum_{j}\lambda_{j}y_{j}(x_{i}'x_{j}),然后\tilde{b}=average(b_{i})

(iii)SVM分类器 sign(w'x+b)=sign(\sum_{i=1}^{N}\lambda_{i}y_{i}(x'x_{i})+b)

2. SMO算法

1) 基本思想:迭代下降、坐标下降

一次要选择两个变量(否则会破坏\sum\lambda_{i}y_{i}=0的约束),之后就可以解这个双变量优化问题。

2) 两个变量的优化

任取\lambda_{1},\lambda_{2}作为变量,其他\lambda_{3},...作为常量。

展开的矩阵大致如下:

\begin{array}{cccccc} &  & \lambda_{1} & \lambda_{2} & \cdots & \lambda_{N}\\ &  & y_{1}x_{1} & y_{2}x_{2} & \cdots & y_{N}x_{N}\\\lambda_{1} & y_{1}x_{1} &  &  & \vdots\\\lambda_{2} & y_{2}x_{2} &  &  & \vdots\\\vdots & \vdots & \cdots & \cdots & y_{i}y_{j}(x_{i}x_{j})\lambda_{i}\lambda_{j}\\\lambda_{N} & y_{N}x_{N}\end{array}

目标函数=a_{11}\lambda_{1}\lambda_{1}+a_{12}\lambda_{1}\lambda_{2}+a_{21}\lambda_{2}\lambda_{1}+a_{22}\lambda_{2}\lambda_{2}+b_{1}\lambda_{1}+b_{2}\lambda_{2}+c

这样a_{11}=\left\Vert x_{1}\right\Vert ^{2},a_{22}=\left\Vert x_{2}\right\Vert ^{2},a_{12}=\left|x_{1}\cdot x_{2}\right|,a_{21}=\left|x_{2}\cdot x_{1}\right|

约束0\leq\lambda_{1},\lambda_{2}\leq C(对应对偶问题)

\lambda_{1}y_{1}+\lambda_{2}y_{2}+d=0,这里d代表其余不改变的那些\sum_{i=3}^{N}\lambda_{i}y_{i}

化到单变量的话,\lambda_{2}=(d-\lambda_{1}y_{1})/y_{2}

所以,

  • 目标函数= e_{0}\lambda_{1}^{2}+e_{1}\lambda_{1}+e_{2},最优条件\bar{\lambda_{1}}=-\frac{e_{1}}{2e_{0}}
  • 约束 L\leq\lambda_{1}\leq H,其中LH分别为lower/upper bound。故必有最优点在L、H之间或者L、H之一。
  • \min e_{0}\lambda_{1}^{2}+e_{1}\lambda_{1}s.t.\,\, L\leq\lambda_{1}\leq H,可以解得\lambda_{1}^{*}=\begin{cases}L & \bar{\lambda_{1}}<L\\\bar{\lambda_{1}} & L\leq\bar{\lambda_{1}}\leq H\\H & \bar{\lambda_{1}}>H\end{cases}

这里虽然需要迭代很多次,但是迭代的每一步都比较快。

至于如何选择\lambda_{1,2},第一个变量\lambda_{1}可以选择0<\lambda<C,同时b_{i}-\bar{b}最大。第二个变量选择H-L最大的。

Leave a Reply

Your email address will not be published. Required fields are marked *