Support vector machines (SVMs) and related kernel methods have been among the most popular and well-studied machine learning techniques of the past 15 years, with an amazing number of innovations and applications.
In a nutshell, an SVM model \(f(x)\) takes the form of
\[ f(x) = \sum_i \alpha_i k(x_i,x), \]
where each \( \alpha_i \) is a real number, each \( \boldsymbol x_i \) is a data point from the training set (called a support vector), and \( k(\cdot, \cdot) \) is a kernel function that measures how "similar" two objects are. In regression, \( f(\boldsymbol x) \) is the regression function we seek. In classification, \( f(\boldsymbol x) \) serves as the decision boundary; so for example in binary classification, the predictor can output class 1 for object \(x\) if \( f(\boldsymbol x) \geq 0 \), and class 2 otherwise.
In the case when the kernel function \( k(\cdot, \cdot) \) is the standard inner product on vectors, \( f(\boldsymbol x) \) is just an alternative way of writing a linear function
\[ f'(\boldsymbol x) = \langle \boldsymbol w, \boldsymbol x \rangle, \]
where \( \boldsymbol w \) is a weight vector having the same dimension as \( \boldsymbol x \). One of the key points of SVMs is that we can use more fancy kernel functions to efficiently learn linear models in high-dimensional feature spaces, since \( k(\boldsymbol x_i, \boldsymbol x_j) \) can be understood as an efficient way of computing an inner product in the feature space:
\[ k(\boldsymbol x_i, \boldsymbol x_j) = \langle \phi(\boldsymbol x_i), \phi(\boldsymbol x_j) \rangle, \]
where \( \phi(\boldsymbol x) \) projects \( \boldsymbol x \) into a (possibly infinite-dimensional) feature space.
There are many algorithms for learning kernel machines. This module implements the class of online learning with kernels algorithms described in Kivinen et al. [1]. It also includes the Stochastic Gradient Descent (SGD) method [3] for learning linear SVMs with the Hinge loss \(l(z) = \max(0, 1-z)\). See also the book Scholkopf and Smola [2] for much more details.
The SGD implementation is based on Léon Bottou's SGD package (http://leon.bottou.org/projects/sgd). The methods introduced in [1] are implemented according to their original descriptions, except that we only update the support vector model when we make a significant error. The original algorithms in [1] update the support vector model at every step, even when no error was made, in the name of regularization. For practical purposes, and this is verified empirically to a certain degree, updating only when necessary is both faster and better from a learning-theoretic point of view, at least in the i.i.d. setting.
Methods for classification, regression and novelty detection are available. Multiple instances of the algorithms can be executed in parallel on different subsets of the training data. The resultant support vector models can then be combined using standard techniques like averaging or majority voting.
Training data points are accessed via a table or a view. The support vector models can also be stored in tables for fast execution.
Regression learning is achieved through the following function:
svm_regression( input_table, model_table, parallel, kernel_func, verbose DEFAULT false, eta DEFAULT 0.1, nu DEFAULT 0.005, slambda DEFAULT 0.05 );
For classification and regression, the training table/view is expected to be of the following form (the array size of ind must not be greater than 102,400.):
{TABLE|VIEW} input_table ( ... id INT, ind FLOAT8[], label FLOAT8, ... )
For novelty detection, the label field is not required.
Classification learning is achieved through the following two functions:
lsvm_classification( input_table, model_table, parallel, verbose DEFAULT false, eta DEFAULT 0.1, reg DEFAULT 0.001 )
svm_classification( input_table, model_table, parallel, kernel_func, verbose DEFAULT false, eta DEFAULT 0.1, nu DEFAULT 0.005 )
Novelty detection is achieved through the following function:
svm_novelty_detection( input_table, model_table, parallel, kernel_func, verbose DEFAULT false, eta DEFAULT 0.1, nu DEFAULT 0.005 )
Assuming the model_table parameter takes on value 'model', each learning function will produce two tables as output: 'model' and 'model_param'. The first contains the support vectors of the model(s) learned. The second contains the parameters of the model(s) learned, which include information like the kernel function used and the value of the intercept, if there is one.
svm_predict( model_table, x )
lsvm_predict( model_table, x )
svm_predict_combo( model_table, x )
lsvm_predict_combo( model_table, x)
SELECT svm_predict( model_table, x ) FROM data_table;Instead, to make predictions on new data points stored in a table using previously learned models, we use the function:
svm_predict_batch( input_table, data_col, id_col, model_table, output_table, parallel );The output_table is created during the function call; an existing table with the same name will be dropped. If the
parallel
parameter is true, then each data point in the input table will have multiple predicted values corresponding to the number of models learned in parallel.lsvm_predict_batch( input_table, data_col, id_col, model_table, output_table, parallel );
Currently, three kernel functions have been implemented: dot product (svm_dot), polynomial (svm_polynomial) and Gaussian (svm_gaussian) kernels. To use the dot product kernel function, simply use 'madlib.svm_dot
' as the kernel_func
argument, which accepts any function that takes in two float[] and returns a float. To use the polynomial or Gaussian kernels, a wrapper function is needed since these kernels require additional input parameters (see online_sv.sql_in for input parameters).
For example, to use the polynomial kernel with degree 2, first create a wrapper function:
CREATE OR REPLACE FUNCTION mykernel(FLOAT[],FLOAT[]) RETURNS FLOAT AS $$ SELECT svm_polynomial($1,$2,2) $$ language sql;
Then call the SVM learning functions with mykernel
as the argument to kernel_func
.
SELECT svm_regression( 'my_schema.my_train_data', 'mymodel', false, 'mykernel' );
To drop all tables pertaining to the model, use:
SELECT svm_drop_model( 'model_table' );
As a general first step, prepare and populate an input table/view with the following structure:
TABLE/VIEW my_schema.my_input_table ( id INT, -- point ID ind FLOAT8[], -- data point label FLOAT8 -- label of data point );
The label field is not required for novelty detection.
Example usage for regression:
t(x) = if x[5] = 10 then 50 else if x[5] = -10 then 50 else 0;and store that in the my_schema.my_train_data table as follows:
SELECT madlib.svm_generate_reg_data( 'my_schema.my_train_data', 1000, 5 );
SELECT madlib.svm_regression( 'my_schema.my_train_data', 'myexp', false, 'madlib.svm_dot' );
SELECT madlib.svm_predict( 'myexp', '{1,2,4,20,10}' ); SELECT madlib.svm_predict( 'myexp', '{1,2,4,20,-10}' );
SELECT madlib.svm_regression( 'my_schema.my_train_data', 'myexp', true, 'madlib.svm_dot' );The resultant models can be used for prediction as follows:
SELECT * FROM madlib.svm_predict_combo( 'myexp', '{1,2,4,20,10}' );
CREATE TABLE madlib.svm_reg_test ( id int, ind float8[] ); INSERT INTO madlib.svm_reg_test ( SELECT id, ind FROM my_schema.my_train_data LIMIT 20); SELECT madlib.svm_predict_batch( 'madlib.svm_reg_test', 'ind', 'id', 'myexp', 'madlib.svm_reg_output1', false ); SELECT * FROM madlib.svm_reg_output1; SELECT madlib.svm_predict_batch( 'madlib.svm_reg_test', 'ind', 'id, 'myexp', 'madlib.svm_reg_output2', true ); SELECT * FROM madlib.svm_reg_output2;
Example usage for classification:
t(x) = if x[1] > 0 and x[2] < 0 then 1 else -1;and store that in the my_schema.my_train_data table as follows:
SELECT madlib.svm_generate_cls_data( 'my_schema.my_train_data', 2000, 5 );
SELECT madlib.svm_classification( 'my_schema.my_train_data', 'myexpc', false, 'madlib.svm_dot' );
SELECT madlib.svm_predict( 'myexpc', '{10,-2,4,20,10}' );
SELECT madlib.svm_classification( 'my_schema.my_train_data', 'myexpc', true, 'madlib.svm_dot' ); SELECT * FROM madlib.svm_predict_combo( 'myexpc', '{10,-2,4,20,10}' );
SELECT madlib.lsvm_classification( 'my_schema.my_train_data', 'myexpc', false ); SELECT madlib.lsvm_predict( 'myexpc', '{10,-2,4,20,10}' );
SELECT madlib.lsvm_classification( 'my_schema.my_train_data', 'myexpc', true ); SELECT madlib.lsvm_predict_combo( 'myexpc', '{10,-2,4,20,10}' );
Example usage for novelty detection:
SELECT madlib.svm_generate_nd_data( 'my_schema.my_train_data', 100, 2 );
SELECT madlib.svm_novelty_detection( 'my_schema.my_train_data', 'myexpnd', false, 'madlib.svm_dot' ); SELECT madlib.svm_predict( 'myexpnd', '{10,-10}' ); SELECT madlib.svm_predict( 'myexpnd', '{-1,-1}' );
SELECT madlib.svm_novelty_detection( 'my_schema.my_train_data', 'myexpnd', true, 'madlib.svm_dot' ); SELECT * FROM madlib.svm_predict_combo( 'myexpnd', '{-1,-1}' );
[1] Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson: Online Learning with Kernels, IEEE Transactions on Signal Processing, 52(8), 2165-2176, 2004.
[2] Bernhard Scholkopf and Alexander J. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002.
[3] Léon Bottou: Large-Scale Machine Learning with Stochastic Gradient Descent, Proceedings of the 19th International Conference on Computational Statistics, Springer, 2010.
File online_sv.sql_in documenting the SQL functions.