User Documentation
 All Files Functions Groups
Low-rank Matrix Factorization
+ Collaboration diagram for Low-rank Matrix Factorization:
About:

This module implements "factor model" for representing an incomplete matrix using a low-rank approximation [1]. Mathematically, this model seeks to find matrices U and V (also referred as factors) that, for any given incomplete matrix A, minimizes:

\[ \|\boldsymbol A - \boldsymbol UV^{T} \|_2 \]

subject to \(rank(\boldsymbol UV^{T}) \leq r\), where \(\|\cdot\|_2\) denotes the Frobenius norm. Let \(A\) be a \(m \times n\) matrix, then \(U\) will be \(m \times r\) and \(V\) will be \(n \times r\), in dimension, and \(1 \leq r \ll \min(m, n)\). This model is not intended to do the full decomposition, or to be used as part of inverse procedure. This model has been widely used in recommendation systems (e.g., Netflix [2]) and feature selection (e.g., image processing [3]).

Input:

The input matrix is expected to be of the following form:

{TABLE|VIEW} input_table (
    row    INTEGER,
    col    INTEGER,
    value  DOUBLE PRECISION
)

Input is contained in a table that describes an incomplete matrix, by having available entries specified as (row, column, value). The input matrix is expected to be based 1, which means row >= 1, and col >= 1. NULL values are not expected.

Usage:

Please find descriptions of SQL functions in lmf.sql_in

Output factors matrix U and V are in flatten format.

RESULT AS (
        matrix_u    DOUBLE PRECISION[],
        matrix_v    DOUBLE PRECISION[],
        rmse        DOUBLE PRECISION
);

Features correspond to row i is matrix_u[i:i][1:r]. Features correspond to column j is matrix_v[j:j][1:r].

Examples:
  1. Prepare an input table/view:
    CREATE TABLE lmf_data (
    column INT,
    row INT,
    value FLOAT8
    );
  2. Populate the input table with some data. e.g.:
    INSERT INTO lmf_data VALUES (1, 1, 5.0);
    INSERT INTO lmf_data VALUES (3, 100, 1.0);
    INSERT INTO lmf_data VALUES (999, 10000, 2.0);
  3. Call lmf_igd_run() stored procedure, e.g.:
    SELECT madlib.lmf_igd_run(
    'lmf_model', -- result table
    'lmf_data', -- input table
    'row', 'col', 'value', -- table column names
    999, -- row dimension
    10000, -- column dimension
    3, -- rank (number of features)
    0.1, -- stepsize
    2, -- initial value scale factor
    10, -- maximal number of iterations
    1e-9); -- error tolerance
    Example output (the exact result may not be the same):
    NOTICE:
    Finished low-rank matrix factorization using incremental gradient
    DETAIL:
    table : lmf_data (row, col, value)
    Results:
    RMSE = 4.31144557397543e-05
    Output:
    view : SELECT * FROM lmf_model WHERE id = 1
    -------------
    1
    (1 row)
  4. Sanity check of the result. You may need a model id returned and also indicated by the function lmf_igd_run(), assuming 1 here, e.g.:
    SELECT array_dims(matrix_u), array_dims(matrix_v) FROM lmf_model WHERE id = 1;
    Example output:
    array_dims | array_dims
    --------------+----------------
    [1:999][1:3] | [1:10000][1:3]
    (1 row)
  5. Query the result value, e.g.:
    SELECT matrix_u[2:2][1:3] AS row_2_features FROM lmf_model WHERE id = 1;
    Example output (the exact result may not be the same):
    row_2_features
    ----------------------------------------------------------
    {{0.51117920037359,0.169582297094166,0.837417622096837}}
    (1 row)
Literature:

[1] N. Srebro and T. Jaakkola. “Weighted Low-Rank Approximations.” In: ICML. Ed. by T. Fawcett and N. Mishra. AAAI Press, 2003, pp. 720–727. isbn: 1-57735-189-4.

[2] Simon Funk, Netflix Update: Try This at Home, December 11 2006, http://sifter.org/~simon/journal/20061211.html

[3] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma. “Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization.” In: NIPS. Ed. by Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta. Curran Associates, Inc., 2009, pp. 2080–2088. isbn: 9781615679119.