Collaboration diagram for SVD Matrix Factorisation:

Warning: This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change.

About:

This module implements "partial SVD decomposition" method for representing a sparse matrix using a low-rank approximation. Mathematically, this algorithm seeks to find matrices U and V that, for any given A, minimizes:

\[ ||\boldsymbol A - \boldsymbol UV ||_2 \]

subject to \( rank(\boldsymbol UV) \leq k \), where \( ||\cdot||_2 \) denotes the Frobenius norm and \( k \leq rank(\boldsymbol A)\). If A is \( m \times n \), then U will be \( m \times k \) and V will be \( k \times n \).

This algorithm is not intended to do the full decomposition, or to be used as part of inverse procedure. It effectively computes the SVD of a low-rank approximation of A (preferably sparse), with the singular values absorbed in U and V. Code is based on the write-up as appears at [1], with some modifications.

Input:

The input matrix is expected to be of the following form:

{TABLE|VIEW} input_table (
    col_num INTEGER,
    row_num INTEGER,
    value FLOAT    
)

Input is contained in a table where column number and row number for each cell are sequential; that is to say that if the data was written as a matrix, those values would be the actual row and column numbers and not some random identifiers. All rows and columns must be associated with a value. There should not be any missing row, columns or values.

Usage:

The SVD function is called as follows:

SELECT svdmf_run( 'input_table', 'col_name',
   'row_name', 'value', num_features);

The function returns two tables matrix_u and matrix_v, which represent the matrices U and V in table format.

Examples:

Prepare an input table/view:
CREATE TABLE svd_test (

col INT,

row INT,

val FLOAT

);
Populate the input table with some data. e.g.:
sql> INSERT INTO svd_test SELECT (g.a%1000)+1, g.a/1000+1, random() FROM generate_series(1,1000) AS g(a);
Call svdmf_run() stored procedure, e.g.:
sql> select madlib.svdmf_run( 'svd_test', 'col', 'row', 'val', 3);
Sample Output:
INFO: ('Started svdmf_run() with parameters:',)

INFO: (' * input_matrix = madlib_svdsparse_test.test',)

INFO: (' * col_name = col_num',)

INFO: (' * row_name = row_num',)

INFO: (' * value = val',)

INFO: (' * num_features = 3',)

INFO: ('Copying the source data into a temporary table...',)

INFO: ('Estimating feature: 1',)

INFO: ('...Iteration 1: residual_error = 33345014611.1, step_size = 4.9997500125e-10, min_improvement = 1.0',)

INFO: ('...Iteration 2: residual_error = 33345014557.6, step_size = 5.49972501375e-10, min_improvement = 1.0',)

INFO: ('...Iteration 3: residual_error = 33345014054.3, step_size = 6.04969751512e-10, min_improvement = 1.0',)

...

INFO: ('...Iteration 78: residual_error = 2.02512133868, step_size = 5.78105354457e-10, min_improvement = 1.0',)

INFO: ('...Iteration 79: residual_error = 0.893810181282, step_size = 6.35915889903e-10, min_improvement = 1.0',)

INFO: ('...Iteration 80: residual_error = 0.34496773222, step_size = 6.99507478893e-10, min_improvement = 1.0',)

INFO: ('Swapping residual error matrix...',)

svdmf_run

--------------------------------------------------------------------------------------------

Finished SVD matrix factorisation for madlib_svdsparse_test.test (row_num, col_num, val).

Results:

total error = 0.34496773222

number of estimated features = 1

Output:

table : madlib.matrix_u

table : madlib.matrix_v

Time elapsed: 4 minutes 47.86839 seconds.

Literature:

[1] Simon Funk, Netflix Update: Try This at Home, December 11 2006, http://sifter.org/~simon/journal/20061211.html

See Also: File svdmf.sql_in documenting the SQL functions.