Collaboration diagram for SVD Matrix Factorisation:

About:

This module implements "partial SVD decomposition" method for representing a sparse matrix using a low-rank approximation. Mathematically, this algorithm seeks to find matrices U and V that, for any given A, minimizes:

\[ ||\boldsymbol A - \boldsymbol UV ||_2 \]

subject to \( rank(\boldsymbol UV) \leq k \), where \( ||\cdot||_2 \) denotes the Frobenius norm and \( k \leq rank(\boldsymbol A)\). If A is \( m \times n \), then U will be \( m \times k \) and V will be \( k \times n \).

This algorithm is not intended to do the full decomposition, or to be used as part of inverse procedure. It effectively computes the SVD of a low-rank approximation of A (preferably sparse), with the singular values absorbed in U and V. Code is based on the write-up as appears at [1], with some modifications.

Input:

The input matrix is expected to be of the following form:

{TABLE|VIEW} input_table (
    col_num INTEGER,
    row_num INTEGER,
	value FLOAT    
)

Input is contained in a table where column number and row number for each cell are sequential; that is to say that if the data was written as a matrix, those values would be the actual row and column numbers and not some random identifiers. All rows and columns must be associated with a value. There should not be any missing row, columns or values.

Usage:

The SVD function is called as follows:

SELECT svdmf_run( 'input_table', 'col_name',
   'row_name', 'value', num_features);

The function returns two tables matrix_u and matrix_v, which represent the matrices U and V in table format.

Examples:

Prepare an input table/view:

CREATE TABLE svd_test (
 col INT,
 row INT,
 val FLOAT
);

Populate the input table with some data. e.g.:

sql> INSERT INTO svd_test SELECT (g.a%1000)+1, g.a/1000+1, random() FROM generate_series(1,1000) AS g(a);

Call svdmf_run() stored procedure, e.g.:

sql> select madlib.svdmf_run( 'svd_test', 'col', 'row', 'val', 3);

Sample Output:

INFO:  ('Started svdmf_run() with parameters:',)
INFO:  (' * input_matrix = madlib_svdsparse_test.test',)
INFO:  (' * col_name = col_num',)
INFO:  (' * row_name = row_num',)
INFO:  (' * value = val',)
INFO:  (' * num_features = 3',)
INFO:  ('Copying the source data into a temporary table...',)
INFO:  ('Estimating feature: 1',)
INFO:  ('...Iteration 1: residual_error = 33345014611.1, step_size = 4.9997500125e-10, min_improvement = 1.0',)
INFO:  ('...Iteration 2: residual_error = 33345014557.6, step_size = 5.49972501375e-10, min_improvement = 1.0',)
INFO:  ('...Iteration 3: residual_error = 33345014054.3, step_size = 6.04969751512e-10, min_improvement = 1.0',)
...
INFO:  ('...Iteration 78: residual_error = 2.02512133868, step_size = 5.78105354457e-10, min_improvement = 1.0',)
INFO:  ('...Iteration 79: residual_error = 0.893810181282, step_size = 6.35915889903e-10, min_improvement = 1.0',)
INFO:  ('...Iteration 80: residual_error = 0.34496773222, step_size = 6.99507478893e-10, min_improvement = 1.0',)
INFO:  ('Swapping residual error matrix...',)
                                         svdmf_run                                          
--------------------------------------------------------------------------------------------
 
 Finished SVD matrix factorisation for madlib_svdsparse_test.test (row_num, col_num, val). 
 Results: 
 total error = 0.34496773222
 number of estimated features = 1
 Output:
 table : madlib.matrix_u
 table : madlib.matrix_v
 Time elapsed: 4 minutes 47.86839 seconds.

Literature:

[1] Simon Funk, Netflix Update: Try This at Home, December 11 2006, http://sifter.org/~simon/journal/20061211.html

See also:: File svdmf.sql_in documenting the SQL functions.