MADlib
1.0 A newer version is available
User Documentation
|
Support vector machines (SVMs) and related kernel methods have been one of the most popular and well-studied machine learning techniques of the past 15 years, with an amazing number of innovations and applications.
In a nutshell, an SVM model \(f(x)\) takes the form of
\[ f(x) = \sum_i \alpha_i k(x_i,x), \]
where each \( \alpha_i \) is a real number, each \( \boldsymbol x_i \) is a data point from the training set (called a support vector), and \( k(\cdot, \cdot) \) is a kernel function that measures how "similar" two objects are. In regression, \( f(\boldsymbol x) \) is the regression function we seek. In classification, \( f(\boldsymbol x) \) serves as the decision boundary; so for example in binary classification, the predictor can output class 1 for object \(x\) if \( f(\boldsymbol x) \geq 0 \), and class 2 otherwise.
In the case when the kernel function \( k(\cdot, \cdot) \) is the standard inner product on vectors, \( f(\boldsymbol x) \) is just an alternative way of writing a linear function
\[ f'(\boldsymbol x) = \langle \boldsymbol w, \boldsymbol x \rangle, \]
where \( \boldsymbol w \) is a weight vector having the same dimension as \( \boldsymbol x \). One of the key points of SVMs is that we can use more fancy kernel functions to efficiently learn linear models in high-dimensional feature spaces, since \( k(\boldsymbol x_i, \boldsymbol x_j) \) can be understood as an efficient way of computing an inner product in the feature space:
\[ k(\boldsymbol x_i, \boldsymbol x_j) = \langle \phi(\boldsymbol x_i), \phi(\boldsymbol x_j) \rangle, \]
where \( \phi(\boldsymbol x) \) projects \( \boldsymbol x \) into a (possibly infinite-dimensional) feature space.
There are many algorithms for learning kernel machines. This module implements the class of online learning with kernels algorithms described in Kivinen et al. [1]. It also includes the Stochastic Gradient Descent (SGD) method [3] for learning linear SVMs with the Hinge loss \(l(z) = \max(0, 1-z)\). See also the book Scholkopf and Smola [2] for much more details.
The SGD implementation is based on Léon Bottou's SGD package (http://leon.bottou.org/projects/sgd). The methods introduced in [1] are implemented according to their original descriptions, except that we only update the support vector model when we make a significant error. The original algorithms in [1] update the support vector model at every step, even when no error was made, in the name of regularisation. For practical purposes, and this is verified empirically to a certain degree, updating only when necessary is both faster and better from a learning-theoretic point of view, at least in the i.i.d. setting.
Methods for classification, regression and novelty detection are available. Multiple instances of the algorithms can be executed in parallel on different subsets of the training data. The resultant support vector models can then be combined using standard techniques like averaging or majority voting.
Training data points are accessed via a table or a view. The support vector models can also be stored in tables for fast execution.
{TABLE|VIEW} input_table ( ... id INT, ind FLOAT8[], label FLOAT8, ... )For novelty detection, the label field is not required.
SELECT svm_regression( 'input_table', 'model_table', parallel, 'kernel_func', verbose DEFAULT false, eta DEFAULT 0.1, nu DEFAULT 0.005, slambda DEFAULT 0.05 );
SELECT lsvm_classification( 'input_table', 'model_table', parallel, verbose DEFAULT false, eta DEFAULT 0.1, reg DEFAULT 0.001 );
SELECT svm_classification( 'input_table', 'model_table', parallel, 'kernel_func', verbose DEFAULT false, eta DEFAULT 0.1, nu DEFAULT 0.005 );
SELECT svm_novelty_detection( 'input_table', 'model_table', parallel, 'kernel_func', verbose DEFAULT false, eta DEFAULT 0.1, nu DEFAULT 0.005 );Assuming the model_table parameter takes on value 'model', each learning function will produce two tables as output: 'model' and 'model_param'. The first contains the support vectors of the model(s) learned. The second contains the parameters of the model(s) learned, which includes information like the kernel function used and the value of the intercept, if there is one.
SELECT svm_predict('model_table',x);If the model is produced by the lsvm_classification() function, use the following prediction function instead
SELECT lsvm_predict('model_table',x);
SELECT svm_predict_combo('model_table',x);If the models are produced by the lsvm_classification() function, use the following prediction function instead
SELECT lsvm_predict_combo('model_table',x);
SELECT svm_predict('model_table',x) FROM data_table;Instead, to make predictions on new data points stored in a table using previously learned models, we use the function:
SELECT svm_predict_batch('input_table', 'data_col', 'id_col', 'model_table', 'output_table', parallel);The output_table is created during the function call; an existing table with the same name will be dropped. If the parallel parameter is true, then each data point in the input table will have multiple predicted values corresponding to the number of models learned in parallel.
SELECT lsvm_predict_batch('input_table', 'data_col', 'id_col', 'model_table','output_table', parallel);
Currently, three kernel functions have been implemented: dot product (svm_dot), polynomial (svm_polynomial) and Gaussian (svm_gaussian) kernels. To use the dot product kernel function, simply use 'MADlib.svm_dot
' as the kernel_func
argument, which accepts any function that takes in two float[] and returns a float. To use the polynomial or Gaussian kernels, a wrapper function is needed since these kernels require additional input parameters (see online_sv.sql_in for input parameters).
For example, to use the polynomial kernel with degree 2, first create a wrapper function:
CREATE OR REPLACE FUNCTION mykernel(FLOAT[],FLOAT[]) RETURNS FLOAT AS $$ SELECT svm_polynomial($1,$2,2) $$ language sql;
Then call the SVM learning functions with mykernel
as the argument to kernel_func
.
SELECT svm_regression('my_schema.my_train_data', 'mymodel', false, 'mykernel');
To drop all tables pertaining to the model, we can use
SELECT svm_drop_model('model_table');
As a general first step, we need to prepare and populate an input table/view with the following structure:
Note: The label field is not required for novelty detection.
Example usage for regression:
Example usage for classification:
Example usage for novelty detection:
[1] Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson: Online Learning with Kernels, IEEE Transactions on Signal Processing, 52(8), 2165-2176, 2004.
[2] Bernhard Scholkopf and Alexander J. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002.
[3] Léon Bottou: Large-Scale Machine Learning with Stochastic Gradient Descent, Proceedings of the 19th International Conference on Computational Statistics, Springer, 2010.