MADlib
0.7 A newer version is available
User Documentation
|
This module implements "partial SVD decomposition" method for representing a sparse matrix using a low-rank approximation. Mathematically, this algorithm seeks to find matrices U and V that, for any given A, minimizes:
\[ ||\boldsymbol A - \boldsymbol UV ||_2 \]
subject to \( rank(\boldsymbol UV) \leq k \), where \( ||\cdot||_2 \) denotes the Frobenius norm and \( k \leq rank(\boldsymbol A)\). If A is \( m \times n \), then U will be \( m \times k \) and V will be \( k \times n \).
This algorithm is not intended to do the full decomposition, or to be used as part of inverse procedure. It effectively computes the SVD of a low-rank approximation of A (preferably sparse), with the singular values absorbed in U and V. Code is based on the write-up as appears at [1], with some modifications.
{TABLE|VIEW} input_table ( col_num INTEGER, row_num INTEGER, value FLOAT )
Input is contained in a table where column number and row number for each cell are sequential; that is to say that if the data was written as a matrix, those values would be the actual row and column numbers and not some random identifiers. All rows and columns must be associated with a value. There should not be any missing row, columns or values.
SELECT svdmf_run( 'input_table', 'col_name', 'row_name', 'value', num_features);The function returns two tables
matrix_u
and matrix_v
, which represent the matrices U and V in table format.CREATE TABLE svd_test ( col INT, row INT, val FLOAT );
sql> INSERT INTO svd_test SELECT (g.a%1000)+1, g.a/1000+1, random() FROM generate_series(1,1000) AS g(a);
sql> select madlib.svdmf_run( 'svd_test', 'col', 'row', 'val', 3);
INFO: ('Started svdmf_run() with parameters:',) INFO: (' * input_matrix = madlib_svdsparse_test.test',) INFO: (' * col_name = col_num',) INFO: (' * row_name = row_num',) INFO: (' * value = val',) INFO: (' * num_features = 3',) INFO: ('Copying the source data into a temporary table...',) INFO: ('Estimating feature: 1',) INFO: ('...Iteration 1: residual_error = 33345014611.1, step_size = 4.9997500125e-10, min_improvement = 1.0',) INFO: ('...Iteration 2: residual_error = 33345014557.6, step_size = 5.49972501375e-10, min_improvement = 1.0',) INFO: ('...Iteration 3: residual_error = 33345014054.3, step_size = 6.04969751512e-10, min_improvement = 1.0',) ... INFO: ('...Iteration 78: residual_error = 2.02512133868, step_size = 5.78105354457e-10, min_improvement = 1.0',) INFO: ('...Iteration 79: residual_error = 0.893810181282, step_size = 6.35915889903e-10, min_improvement = 1.0',) INFO: ('...Iteration 80: residual_error = 0.34496773222, step_size = 6.99507478893e-10, min_improvement = 1.0',) INFO: ('Swapping residual error matrix...',) svdmf_run -------------------------------------------------------------------------------------------- Finished SVD matrix factorisation for madlib_svdsparse_test.test (row_num, col_num, val). Results: total error = 0.34496773222 number of estimated features = 1 Output: table : madlib.matrix_u table : madlib.matrix_v Time elapsed: 4 minutes 47.86839 seconds.
[1] Simon Funk, Netflix Update: Try This at Home, December 11 2006, http://sifter.org/~simon/journal/20061211.html