User Documentation
Support Vector Machines
+ Collaboration diagram for Support Vector Machines:
About:

Support vector machines (SVMs) and related kernel methods have been one of the most popular and well-studied machine learning techniques of the past 15 years, with an amazing number of innovations and applications.

In a nutshell, an SVM model \(f(x)\) takes the form of

\[ f(x) = \sum_i \alpha_i k(x_i,x), \]

where each \( \alpha_i \) is a real number, each \( \boldsymbol x_i \) is a data point from the training set (called a support vector), and \( k(\cdot, \cdot) \) is a kernel function that measures how "similar" two objects are. In regression, \( f(\boldsymbol x) \) is the regression function we seek. In classification, \( f(\boldsymbol x) \) serves as the decision boundary; so for example in binary classification, the predictor can output class 1 for object \(x\) if \( f(\boldsymbol x) \geq 0 \), and class 2 otherwise.

In the case when the kernel function \( k(\cdot, \cdot) \) is the standard inner product on vectors, \( f(\boldsymbol x) \) is just an alternative way of writing a linear function

\[ f'(\boldsymbol x) = \langle \boldsymbol w, \boldsymbol x \rangle, \]

where \( \boldsymbol w \) is a weight vector having the same dimension as \( \boldsymbol x \). One of the key points of SVMs is that we can use more fancy kernel functions to efficiently learn linear models in high-dimensional feature spaces, since \( k(\boldsymbol x_i, \boldsymbol x_j) \) can be understood as an efficient way of computing an inner product in the feature space:

\[ k(\boldsymbol x_i, \boldsymbol x_j) = \langle \phi(\boldsymbol x_i), \phi(\boldsymbol x_j) \rangle, \]

where \( \phi(\boldsymbol x) \) projects \( \boldsymbol x \) into a (possibly infinite-dimensional) feature space.

There are many algorithms for learning kernel machines. This module implements the class of online learning with kernels algorithms described in Kivinen et al. [1]. It also includes the Stochastic Gradient Descent (SGD) method [3] for learning linear SVMs with the Hinge loss \(l(z) = \max(0, 1-z)\). See also the book Scholkopf and Smola [2] for much more details.

The SGD implementation is based on Léon Bottou's SGD package (http://leon.bottou.org/projects/sgd). The methods introduced in [1] are implemented according to their original descriptions, except that we only update the support vector model when we make a significant error. The original algorithms in [1] update the support vector model at every step, even when no error was made, in the name of regularisation. For practical purposes, and this is verified empirically to a certain degree, updating only when necessary is both faster and better from a learning-theoretic point of view, at least in the i.i.d. setting.

Methods for classification, regression and novelty detection are available. Multiple instances of the algorithms can be executed in parallel on different subsets of the training data. The resultant support vector models can then be combined using standard techniques like averaging or majority voting.

Training data points are accessed via a table or a view. The support vector models can also be stored in tables for fast execution.

Input:
For classification and regression, the training table/view is expected to be of the following form (the array size of ind must not be greater than 102,400.):
{TABLE|VIEW} input_table (
    ...
    id INT,
    ind FLOAT8[],
    label FLOAT8,
    ...
)
For novelty detection, the label field is not required.
Usage:
Implementation Notes:

Currently, three kernel functions have been implemented: dot product (svm_dot), polynomial (svm_polynomial) and Gaussian (svm_gaussian) kernels. To use the dot product kernel function, simply use 'MADlib.svm_dot' as the kernel_func argument, which accepts any function that takes in two float[] and returns a float. To use the polynomial or Gaussian kernels, a wrapper function is needed since these kernels require additional input parameters (see online_sv.sql_in for input parameters).

For example, to use the polynomial kernel with degree 2, first create a wrapper function:

CREATE OR REPLACE FUNCTION mykernel(FLOAT[],FLOAT[]) RETURNS FLOAT AS $$
	SELECT svm_polynomial($1,$2,2)
$$ language sql;

Then call the SVM learning functions with mykernel as the argument to kernel_func.

SELECT svm_regression('my_schema.my_train_data', 'mymodel', false, 'mykernel');

To drop all tables pertaining to the model, we can use

SELECT svm_drop_model('model_table');
Examples:

As a general first step, we need to prepare and populate an input table/view with the following structure:

TABLE/VIEW my_schema.my_input_table 
(       
        id    INT,       -- point ID
        ind   FLOAT8[],  -- data point
        label FLOAT8     -- label of data point
);

Note: The label field is not required for novelty detection.

Example usage for regression:

  1. We can randomly generate 1000 5-dimensional data labelled by the simple target function
    t(x) = if x[5] = 10 then 50 else if x[5] = -10 then 50 else 0;
    
    and store that in the my_schema.my_train_data table as follows:
    sql> select MADlib.svm_generate_reg_data('my_schema.my_train_data', 1000, 5);
    
  2. We can now learn a regression model and store the resultant model under the name 'myexp'.
    sql> select MADlib.svm_regression('my_schema.my_train_data', 'myexp', false, 'MADlib.svm_dot');
    
  3. We can now start using it to predict the labels of new data points like as follows:
    sql> select MADlib.svm_predict('myexp', '{1,2,4,20,10}');
    sql> select MADlib.svm_predict('myexp', '{1,2,4,20,-10}');
    
  4. To learn multiple support vector models, we replace the learning step above by
    sql> select MADlib.svm_regression('my_schema.my_train_data', 'myexp', true, 'MADlib.svm_dot');
    
    The resultant models can be used for prediction as follows:
    sql> select * from MADlib.svm_predict_combo('myexp', '{1,2,4,20,10}');
    
  5. We can also predict the labels of all the data points stored in a table. For example, we can execute the following:
    sql> create table MADlib.svm_reg_test ( id int, ind float8[] );
    sql> insert into MADlib.svm_reg_test (select id, ind from my_schema.my_train_data limit 20);
    sql> select MADlib.svm_predict_batch('MADlib.svm_reg_test', 'ind', 'id', 'myexp', 'MADlib.svm_reg_output1', false); 
    sql> select * from MADlib.svm_reg_output1;
    sql> select MADlib.svm_predict_batch('MADlib.svm_reg_test', 'ind', 'id, 'myexp', 'MADlib.svm_reg_output2', true);
    sql> select * from MADlib.svm_reg_output2;
    

Example usage for classification:

  1. We can randomly generate 2000 5-dimensional data labelled by the simple target function
    t(x) = if x[1] > 0 and  x[2] < 0 then 1 else -1;
    
    and store that in the my_schema.my_train_data table as follows:
    sql> select MADlib.svm_generate_cls_data('my_schema.my_train_data', 2000, 5);
    
  2. We can now learn a classification model and store the resultant model under the name 'myexpc'.
    sql> select MADlib.svm_classification('my_schema.my_train_data', 'myexpc', false, 'MADlib.svm_dot');
    
  3. We can now start using it to predict the labels of new data points like as follows:
    sql> select MADlib.svm_predict('myexpc', '{10,-2,4,20,10}');
    
  4. To learn multiple support vector models, replace the model-building and prediction steps above by
    sql> select MADlib.svm_classification('my_schema.my_train_data', 'myexpc', true, 'MADlib.svm_dot');
    sql> select * from MADlib.svm_predict_combo('myexpc', '{10,-2,4,20,10}');
    
  5. To learn a linear support vector model using SGD, replace the model-building and prediction steps above by
    sql> select MADlib.lsvm_classification('my_schema.my_train_data', 'myexpc', false);
    sql> select MADlib.lsvm_predict('myexpc', '{10,-2,4,20,10}');
    
  6. To learn multiple linear support vector models using SGD, replace the model-building and prediction steps above by
    sql> select MADlib.lsvm_classification('my_schema.my_train_data', 'myexpc', true);
    sql> select MADlib.lsvm_predict_combo('myexpc', '{10,-2,4,20,10}');
    

Example usage for novelty detection:

  1. We can randomly generate 100 2-dimensional data (the normal cases) and store that in the my_schema.my_train_data table as follows:
    sql> select MADlib.svm_generate_nd_data('my_schema.my_train_data', 100, 2);
    
  2. Learning and predicting using a single novelty detection model can be done as follows:
    sql> select MADlib.svm_novelty_detection('my_schema.my_train_data', 'myexpnd', false, 'MADlib.svm_dot');
    sql> select MADlib.svm_predict('myexpnd', '{10,-10}');  
    sql> select MADlib.svm_predict('myexpnd', '{-1,-1}');  
    
  3. Learning and predicting using multiple models can be done as follows:
    sql> select MADlib.svm_novelty_detection('my_schema.my_train_data', 'myexpnd', true, 'MADlib.svm_dot');
    sql> select * from MADlib.svm_predict_combo('myexpnd', '{10,-10}');  
    sql> select * from MADlib.svm_predict_combo('myexpnd', '{-1,-1}');  
    
Literature:

[1] Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson: Online Learning with Kernels, IEEE Transactions on Signal Processing, 52(8), 2165-2176, 2004.

[2] Bernhard Scholkopf and Alexander J. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002.

[3] Léon Bottou: Large-Scale Machine Learning with Stochastic Gradient Descent, Proceedings of the 19th International Conference on Computational Statistics, Springer, 2010.

See also:
File online_sv.sql_in documenting the SQL functions.