User Documentation
 All Files Functions Groups
SVD Matrix Factorisation
+ Collaboration diagram for SVD Matrix Factorisation:
Warning
This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change.
About:

This module implements "partial SVD decomposition" method for representing a sparse matrix using a low-rank approximation. Mathematically, this algorithm seeks to find matrices U and V that, for any given A, minimizes:

\[ ||\boldsymbol A - \boldsymbol UV ||_2 \]

subject to \( rank(\boldsymbol UV) \leq k \), where \( ||\cdot||_2 \) denotes the Frobenius norm and \( k \leq rank(\boldsymbol A)\). If A is \( m \times n \), then U will be \( m \times k \) and V will be \( k \times n \).

This algorithm is not intended to do the full decomposition, or to be used as part of inverse procedure. It effectively computes the SVD of a low-rank approximation of A (preferably sparse), with the singular values absorbed in U and V. Code is based on the write-up as appears at [1], with some modifications.

Input:
The input matrix is expected to be of the following form:
{TABLE|VIEW} input_table (
    col_num INTEGER,
    row_num INTEGER,
    value FLOAT    
)

Input is contained in a table where column number and row number for each cell are sequential; that is to say that if the data was written as a matrix, those values would be the actual row and column numbers and not some random identifiers. All rows and columns must be associated with a value. There should not be any missing row, columns or values.

Usage:
The SVD function is called as follows:
SELECT svdmf_run( 'input_table', 'col_name',
   'row_name', 'value', num_features);
The function returns two tables matrix_u and matrix_v, which represent the matrices U and V in table format.
Examples:
  1. Prepare an input table/view:
    CREATE TABLE svd_test (
    col INT,
    row INT,
    val FLOAT
    );
  2. Populate the input table with some data. e.g.:
    sql> INSERT INTO svd_test SELECT (g.a%1000)+1, g.a/1000+1, random() FROM generate_series(1,1000) AS g(a);
  3. Call svdmf_run() stored procedure, e.g.:
    sql> select madlib.svdmf_run( 'svd_test', 'col', 'row', 'val', 3);
  4. Sample Output:
    INFO: ('Started svdmf_run() with parameters:',)
    INFO: (' * input_matrix = madlib_svdsparse_test.test',)
    INFO: (' * col_name = col_num',)
    INFO: (' * row_name = row_num',)
    INFO: (' * value = val',)
    INFO: (' * num_features = 3',)
    INFO: ('Copying the source data into a temporary table...',)
    INFO: ('Estimating feature: 1',)
    INFO: ('...Iteration 1: residual_error = 33345014611.1, step_size = 4.9997500125e-10, min_improvement = 1.0',)
    INFO: ('...Iteration 2: residual_error = 33345014557.6, step_size = 5.49972501375e-10, min_improvement = 1.0',)
    INFO: ('...Iteration 3: residual_error = 33345014054.3, step_size = 6.04969751512e-10, min_improvement = 1.0',)
    ...
    INFO: ('...Iteration 78: residual_error = 2.02512133868, step_size = 5.78105354457e-10, min_improvement = 1.0',)
    INFO: ('...Iteration 79: residual_error = 0.893810181282, step_size = 6.35915889903e-10, min_improvement = 1.0',)
    INFO: ('...Iteration 80: residual_error = 0.34496773222, step_size = 6.99507478893e-10, min_improvement = 1.0',)
    INFO: ('Swapping residual error matrix...',)
    --------------------------------------------------------------------------------------------
    Finished SVD matrix factorisation for madlib_svdsparse_test.test (row_num, col_num, val).
    Results:
    total error = 0.34496773222
    number of estimated features = 1
    Output:
    table : madlib.matrix_u
    table : madlib.matrix_v
    Time elapsed: 4 minutes 47.86839 seconds.
Literature:

[1] Simon Funk, Netflix Update: Try This at Home, December 11 2006, http://sifter.org/~simon/journal/20061211.html

See Also
File svdmf.sql_in documenting the SQL functions.