MADlib
1.0 A newer version is available
User Documentation
|
This module implements "factor model" for representing an incomplete matrix using a low-rank approximation [1]. Mathematically, this model seeks to find matrices U and V (also referred as factors) that, for any given incomplete matrix A, minimizes:
\[ \|\boldsymbol A - \boldsymbol UV^{T} \|_2 \]
subject to \(rank(\boldsymbol UV^{T}) \leq r\), where \(\|\cdot\|_2\) denotes the Frobenius norm. Let \(A\) be a \(m \times n\) matrix, then \(U\) will be \(m \times r\) and \(V\) will be \(n \times r\), in dimension, and \(1 \leq r \ll \min(m, n)\). This model is not intended to do the full decomposition, or to be used as part of inverse procedure. This model has been widely used in recommendation systems (e.g., Netflix [2]) and feature selection (e.g., image processing [3]).
The input matrix is expected to be of the following form:
{TABLE|VIEW} input_table ( row INTEGER, col INTEGER, value DOUBLE PRECISION )
Input is contained in a table that describes an incomplete matrix, by having available entries specified as (row, column, value). The input matrix is expected to be based 1, which means row >= 1, and col >= 1. NULL values are not expected.
Please find descriptions of SQL functions in lmf.sql_in
Output factors matrix U and V are in flatten format.
RESULT AS ( matrix_u DOUBLE PRECISION[], matrix_v DOUBLE PRECISION[], rmse DOUBLE PRECISION );
Features correspond to row i is matrix_u[i:i][1:r]
. Features correspond to column j is matrix_v[j:j][1:r]
.
[1] N. Srebro and T. Jaakkola. “Weighted Low-Rank Approximations.” In: ICML. Ed. by T. Fawcett and N. Mishra. AAAI Press, 2003, pp. 720–727. isbn: 1-57735-189-4.
[2] Simon Funk, Netflix Update: Try This at Home, December 11 2006, http://sifter.org/~simon/journal/20061211.html
[3] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma. “Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization.” In: NIPS. Ed. by Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta. Curran Associates, Inc., 2009, pp. 2080–2088. isbn: 9781615679119.