
The Cutting Plane Procedure
- The Cutting Plane Procedure is a non-parametric method
that finds the cutting plane and its normal vector given
the Xi's and the corresponding vector of choices. The
method
- Maximizes Correct Classification of the Choices
- Is iterative and Always Converges
- Is Robust and Almost Always Finds the True Cutting Plane and Normal
Vector
- The Cutting Plane Procedure Consists of the Following Steps:
- First,
given a reasonable guess for the normal vector, the cutting plane
is moved through the space along the normal vector and placed at the point
that maximizes the correct classification of the choices. Conditioned on
the normal vector, the conditional global maximum in classification
is always found.
Intuitively, given b1 and
b2 such that
b12 +
b22 = 1, this
search stage finds the optimal b0.
Second,
the current cutting plane is used to construct a set of points that in turn
is used to change the orientation of the Cutting Plane (and Normal Vector) in the Space.
This is accomplished as follows:
- The Correctly Classified Points are projected onto the surface of the current
cutting plane. The Incorrectly Classified points are left at their current positions
in the space. Let Y represent the matrix of points
constructed from the (in)correctly classified Xi's.
- Place Y at its centroid by subtracting off the
column means from the corresponding entries in each column. Denote this matrix by
Y*. Note that the columns of
Y* sum to zero.
- Find the least squares line through
Y*. Note that this is equivalent to rotating the
cutting line through the space towards the errors.
The least squares line is given by the
Eckart-Young Theorem (1936). Namely,
let the singular value decomposition of Y* be:
Y* =
ULV'
where, in terms of the example discussed above, U is a 12 by 2 matrix,
L is a 2 by 2 diagonal matrix of singular values,
and V is an 2 by 2 matrix such that
U'U = I2 and
V'V = VV' = I2
with the singular values arranged in decreasing sequence on the diagonal
of L.
To find the least squares line simply set the smallest singular value in
L equal to zero and then remultiply; namely
B = ULsV'
where Ls is equivalent to
L except that the smallest singular value is
set equal to zero. B is the least squares line through the points in
Y* and the normal vector to this line is
the corresponding singular vector in V -- in this
case the second column of V. This is the new normal vector.
Intuitively, in this search stage
b0 is set to zero
and new estimates are found for b1 and
b2 that maximize correct
classification.
Third, Go to (a)
