User Documentation
 All Files Functions Groups
Linear-Algebra Operations

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics. More...

+ Collaboration diagram for Linear-Algebra Operations:
Warning
This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change.
About:

The linalg module consists of useful utility functions for basic linear algebra operations. Several of these functions can be used while implementing new algorithms.

Refer to the file for documentation on each of the utlity functions.

Linear-algebra functions.

See Also
File linalg.sql_in documenting the SQL functions.

Let \(A\) be a \(mxn\) matrix, where \(m \ge n\). Then \(A\) can be decomposed as follows:

\[ A = U \Sigma V^T, \]

where \(U\) is a \(m \times n\) orthonormal matrix, \(\Sigma\) is a \(n \times n\) diagonal matrix, and \(V\) is an \(n \times n\) orthonormal matrix. The diagonal elements of \(\Sigma\) are called the {singular values}.

Input:
The input table can be two forms:
  • Dense matrix representation The table contains a 'row_id' column that identifies each row. Further, the other columns are assumed to be the data for the matrix represented in two possible forms, illustrated in a the 2x2 matrix example below:
    1.                             row_id     col1     col2
                      row1         1           1         0
                      row2         2           0         1
                  
    2.                             row_id     row_vec
                          row1        1       {1, 0}
                          row2        2       {0, 1}
                  
  • Sparse matrix representation An example representation is given below:
                           row_id    col_id         value
                row1        1             1           1
                row2        2             2           1
                
    The 'row_id' represents the row number, 'col_id' represents the column id and the 'value' represents the matrix value at ['row_id', 'col_id']

The output and summary table

OUTPUT TABLE

Ouptut for eigen vectors/values will be in the following 3 tables:

Result summary table will be in the following format:

rows_used INTEGER, -- Number of rows used for SVD calculation
exec_time FLOAT8, -- Total time for executing SVD
iter INTEGER, -- Total number of iterations run
recon_error FLOAT8 -- Total quality score (i.e. approximation quality) for this set of
orthonormal basis
relative_recon_error FLOAT8 -- relative quality score

In the result summary table, the reconstruction error is computed as \( \sqrt{mean((X - USV^T)_{ij}^2)} \), where the average is over all elements of the matrices. The relative reconstruction error is then computed as ratio of the reconstruction error and \( \sqrt{mean(X_{ij}^2)} \).

Usage:
Dense matrices:
SELECT {schema_madlib}.svd(
source_table, -- TEXT, Source table name (dense matrix)
output_table_prefix, -- TEXT, Prefix for output tables
row_id, -- TEXT, ID for each row
k, -- INTEGER, Number of singular vectors to compute
nIterations, -- INTEGER, Number of iterations to run (OPTIONAL)
result_summary_table, -- TEXT Table name to store result summary (OPTIONAL)
);
Sparse matrices:
SELECT {schema_madlib}.svd_sparse(
source_table, -- TEXT, Source table name (sparse matrix)
output_table_prefix, -- TEXT, Prefix for output tables
row_id, -- TEXT, ID for each row
col_id, -- TEXT, ID for each column
value, -- TEXT, Non-zero values of the sparse matrix
row_dim, -- INTEGER, Row dimension of sparse matrix
col_dim, -- INTEGER, Col dimension of sparse matrix
k, -- INTEGER, Number of singular vectors to compute
nIterations, -- INTEGER, Number of iterations to run (OPTIONAL)
result_summary_table -- TEXT Table name to store result summary (OPTIONAL)
);
Block matrices:
SELECT {schema_madlib}.svd_block(
source_table, -- TEXT, Source table name (block matrix)
output_table_prefix, -- TEXT, Prefix for output tables
k, -- INTEGER, Number of singular vectors to compute
nIterations, -- INTEGER, Number of iterations to run (OPTIONAL)
result_summary_table -- TEXT Table name to store result summary (OPTIONAL)
);
Native implementation for sparse matrix:
SELECT {schema_madlib}.svd_sparse_native(
source_table, -- TEXT, Source table name (sparse matrix)
output_table_prefix, -- TEXT, Prefix for output tables
row_id, -- TEXT, ID for each row
col_id, -- TEXT, ID for each column
value, -- TEXT, Non-zero values of the sparse matrix
row_dim, -- INTEGER, Row dimension of sparse matrix
col_dim, -- INTEGER, Col dimension of sparse matrix
k, -- INTEGER, Number of singular vectors to compute
lanczos_iter, -- INTEGER, Number of iterations to run, optional
result_summary_table -- TEXT Table name to store result summary, optional
);
CREATE TABLE mat (
row_id integer,
row_vec double precision[]
);
-- sample input data
COPY mat (row_id, row_vec) FROM stdin;
1 {691,58,899,163,159,533,604,582,269,390}
0 {396,840,353,446,318,886,15,584,159,383}
3 {462,532,787,265,982,306,600,608,212,885}
2 {293,742,298,75,404,857,941,662,846,2}
5 {327,946,368,943,7,516,272,24,591,204}
4 {304,151,337,387,643,753,603,531,459,652}
7 {458,959,774,376,228,354,300,669,718,565}
6 {877,59,260,302,891,498,710,286,864,675}
9 {882,761,398,688,761,405,125,484,222,873}
8 {824,390,818,844,180,943,424,520,65,913}
11 {492,220,576,289,321,261,173,1,44,241}
10 {528,1,860,18,814,242,314,965,935,809}
13 {350,192,211,633,53,783,30,444,176,932}
12 {415,701,221,503,67,393,479,218,219,916}
15 {739,651,678,577,273,935,661,47,373,618}
14 {909,472,871,695,930,455,398,893,693,838}
\.
DROP TABLE if exists svd_u;
DROP TABLE if exists svd_v;
DROP TABLE if exists svd_s;
-- SVD for dense matrices
SELECT {schema_madlib}.svd('mat', 'svd', 'row_id', 10);
----------------------------------------------------------------
DROP TABLE if exists mat_sparse;
SELECT {schema_madlib}.matrix_sparsify('mat', 'mat_sparse', False);
DROP TABLE if exists svd_u;
DROP TABLE if exists svd_v;
DROP TABLE if exists svd_s;
-- SVD for sparse matrices
SELECT {schema_madlib}.svd_sparse('mat_sparse', 'svd', 'row_id', 'col_id', 'value', 10);*
Technical Background
In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics.

Let \(A\) be a \(m \times n\) matrix, where \(m \ge n\). Then \(A\) can be decomposed as follows:

\[ A = U \Sigma V^T, \]

where \(U\) is a \(m \times n\) orthonormal matrix, \(\Sigma\) is a \(n \times n\) diagonal matrix, and \(V\) is an \(n \times n\) orthonormal matrix. The diagonal elements of \(\Sigma\) are called the {singular values}.

It is possible to formulate the problem of computing the singular triplets ( \(\sigma_i, u_i, v_i\)) of \(A\) as an eigenvalue problem involving a Hermitian matrix related to \(A\). There are two possible ways of achieving this:

  1. With the cross product matrix, \(A^TA\) and \(AA^T\)
  2. With the cyclic matrix

    \[ H(A) = \begin{bmatrix} 0 & A\\ A^* & 0 \end{bmatrix} \]

The singular values are the nonnegative square roots of the eigenvalues of the cross product matrix. This approach may imply a severe loss of accuracy in the smallest singular values. The cyclic matrix approach is an alternative that avoids this problem, but at the expense of significantly increasing the cost of the computation.

Computing the cross product matrix explicitly is not recommended, especially in the case of sparse A. Bidiagonalization was proposed by Golub and Kahan [citation?] as a way of tridiagonalizing the cross product matrix without forming it explicitly.

Consider the following decomposition

\[ A = P B Q^T, \]

where \(P\) and \(Q\) are unitary matrices and \(B\) is an \(m \times n\) upper bidiagonal matrix. Then the tridiagonal matrix \(B*B\) is unitarily similar to \(A*A\). Additionally, specific methods exist that compute the singular values of \(B\) without forming \(B*B\). Therefore, after computing the SVD of B,

\[ B = X\Sigma Y^T, \]

it only remains to compute the SVD of the original matrix with \(U = PX\) and \(V = QY\).