Andrew Ng机器学习公开课笔记 -- 支持向量机

0
0
0
1. 云栖社区>
2. 博客>
3. 正文

Andrew Ng机器学习公开课笔记 -- 支持向量机

notes，http://cs229.stanford.edu/notes/cs229-notes3.pdf

SVM-支持向量机算法概述, 这篇讲的挺好，可以参考

functional margin

，y取值{-1,1}

geometric margins

The optimal margin classifier

the line segment between any two points on the graph of the function lies above the graph

its epigraph (the set of points on or above the graph of the function) is a convex set

，使这个最大化，即最小化

Lagrange duality

min的那个一定不是无穷大，所以一定是满足约束条件的

Suppose f and the gi’s are convex, and the hi’s are affine.

Suppose further that the constraints gi are (strictly) feasible; this means that there exists some w so that gi(w) < 0 for all i.

Under our above assumptions, there must exist w∗, α∗, β∗ so that w∗ is the solution to the primal problem, α∗, β∗ are the solution to the dual problem, and moreover p∗ = d∗ = L(w∗, α∗, β∗). Moreover, w∗, α∗ and β∗ satisfy the Karush-Kuhn-Tucker (KKT) conditions, which are as follows:

Optimal margin classifiers – Continuing

w变量—>w,b两个变量

You should also be able to verify that the conditions required for p∗ =d∗ and the KKT conditions (Equations 3–7) to hold are indeed satisfied in our optimization problem.

w前面已经说过了，通过等式9就可以解出

Kernels

We will also let φ denote the feature mapping

polynomial kernel

，将x mapping到 ，虽然在 ，但是计算K仍然都是O(n)的

Gaussian kernel

(This kernel is called the Gaussian kernel, and corresponds to an infinite dimensional feature mapping φ.)

，这里符号用重了，K同时表示kernel函数和kernel矩阵

(这个没有找到比较通俗易懂的解释方式，只知道定义是对于任何非0向量z，都有zTKz>=0，那么K称为半正定矩阵。谁可以通俗的解释一下)

For instance, consider the digit recognition problem, in which given an image (16x16 pixels) of a handwritten digit (0-9), we have to figure out
which digit it was.

NG还讲了另外一个例子，即初始输入的维素不固定的情况下，例如输入字符串的长度不固定，如何产生固定的维数的输入，参考讲义吧呵呵

For instance, this kernel trick can be applied with the perceptron to to derive a kernel perceptron algorithm.

Regularization and the non-separable case

we reformulate our optimization (using ℓ1 regularization) as follows:

The parameter C controls the relative weighting between the twin goals of making the ||w||2 small (which we saw earlier makes the margin large) and of ensuring that most examples have functional margin at least 1.

The SMO algorithm by John Platt

Coordinate ascent (descent)

SMO (sequential minimal optimization)

The key reason that SMO is an efficient algorithm is that the update to αi, αj can be computed very efficiently.

Treating α3, . . . , αm as constants, you should be able to verify that this is just some quadratic function in α2.

There’re a couple more details that are quite easy but that we’ll leave you to read about yourself in Platt’s paper: One is the choice of the heuristics used to select the next αi, αj to update; the other is how to update b as the SMO algorithm is run.

+ 关注