1.20.0
User Documentation for Apache MADlib

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics.

Let \(A\) be a \(mxn\) matrix, where \(m \ge n\). Then \(A\) can be decomposed as follows:

\[ A = U \Sigma V^T, \]

where \(U\) is a \(m \times n\) orthonormal matrix, \(\Sigma\) is a \(n \times n\) diagonal matrix, and \(V\) is an \(n \times n\) orthonormal matrix. The diagonal elements of \(\Sigma\) are called the singular values.

SVD Functions

SVD factorizations are provided for dense and sparse matrices. In addition, a native implementation is provided for very sparse matrices for improved performance.

SVD Function for Dense Matrices

svd( source_table,
     output_table_prefix,
     row_id,
     k,
     n_iterations,
     result_summary_table
);

Arguments

source_table

TEXT. Source table name (dense matrix).

The table contains a row_id column that identifies each row, with numbering starting from 1. The other columns contain the data for the matrix. There are two possible dense formats as illustrated by the 2x2 matrix example below. You can use either of these dense formats:

  1.             row_id     col1     col2
    row1         1           1         0
    row2         2           0         1
        
  2.         row_id     row_vec
    row1        1       {1, 0}
    row2        2       {0, 1}
        
output_table_prefix
TEXT. Prefix for output tables. See Output Tables below for a description of the convention used.
row_id
TEXT. ID for each row.
k
INTEGER. Number of singular values to compute.
n_iterations (optional).
INTEGER. Number of iterations to run.
Note
The number of iterations must be in the range [k, column dimension], where k is number of singular values.
result_summary_table (optional)
TEXT. The name of the table to store the result summary.

SVD Function for Sparse Matrices

Use this function for matrices that are represented in the sparse-matrix format (example below). Note that the input matrix is converted to a dense matrix before the SVD operation, for efficient computation reasons.

svd_sparse( source_table,
            output_table_prefix,
            row_id,
            col_id,
            value,
            row_dim,
            col_dim,
            k,
            n_iterations,
            result_summary_table
          );

Arguments

source_table

TEXT. Source table name (sparse matrix).

A sparse matrix is represented using the row and column indices for each non-zero entry of the matrix. This representation is useful for matrices containing multiple zero elements. Below is an example of a sparse 4x7 matrix with just 6 out of 28 entries being non-zero. The dimensionality of the matrix is inferred using the max value in row and col columns. Note the last entry is included (even though it is 0) to provide the dimensionality of the matrix, indicating that the 4th row and 7th column contain all zeros.

 row_id | col_id | value
--------+--------+-------
      1 |      1 |     9
      1 |      5 |     6
      1 |      6 |     6
      2 |      1 |     8
      3 |      1 |     3
      3 |      2 |     9
      4 |      7 |     0
(6 rows)

output_table_prefix
TEXT. Prefix for output tables. See Output Tables below for a description of the convention used.
row_id
TEXT. Name of the column containing the row index for each entry in sparse matrix.
col_id
TEXT. Name of the column containing the column index for each entry in sparse matrix.
value
TEXT. Name of column containing the non-zero values of the sparse matrix.
row_dim
INTEGER. Number of rows in matrix.
col_dim
INTEGER. Number of columns in matrix.
k
INTEGER. Number of singular values to compute.
n_iterations (optional)
INTEGER. Number of iterations to run.
Note
The number of iterations must be in the range [k, column dimension], where k is number of singular values.
result_summary_table (optional)
TEXT. The name of the table to store the result summary.

Native Implementation for Sparse Matrices

Use this function for matrices that are represented in the sparse-matrix format (see sparse matrix example above). This function uses the native sparse representation while computing the SVD.

Note
Note that this function should be favored if the matrix is highly sparse, since it computes very sparse matrices efficiently.
svd_sparse_native( source_table,
                   output_table_prefix,
                   row_id,
                   col_id,
                   value,
                   row_dim,
                   col_dim,
                   k,
                   n_iterations,
                   result_summary_table
                 );

Arguments

source_table
TEXT. Source table name (sparse matrix - see example above).
output_table_prefix
TEXT. Prefix for output tables. See Output Tables below for a description of the convention used.
row_id
TEXT. ID for each row.
col_id
TEXT. ID for each column.
value
TEXT. Non-zero values of the sparse matrix.
row_dim
INTEGER. Row dimension of sparse matrix.
col_dim
INTEGER. Col dimension of sparse matrix.
k
INTEGER. Number of singular values to compute.
n_iterations (optional)
INTEGER. Number of iterations to run.
Note
The number of iterations must be in the range [k, column dimension], where k is number of singular values.
result_summary_table (optional)
TEXT. Table name to store result summary.

Output Tables

Output for eigenvectors/values is in the following three tables:

The left and right singular vector tables are of the format:

row_id INTEGER. The ID corresponding to each eigenvalue (in decreasing order).
row_vec FLOAT8[]. Singular vector elements for this row_id. Each array is of size k.

The singular values table is in sparse table format, since only the diagonal elements of the matrix are non-zero:

row_id INTEGER. i for ith eigenvalue.
col_id INTEGER. i for ith eigenvalue (same as row_id).
value FLOAT8. Eigenvalue.

All row_id and col_id in the above tables start from 1.

The result summary table has the following columns:

rows_used INTEGER. Number of rows used for SVD calculation.
exec_time FLOAT8. Total time for executing SVD.
iter INTEGER. Total number of iterations run.
recon_error FLOAT8. Total quality score (i.e. approximation quality) for this set of orthonormal basis.
relative_recon_error FLOAT8. Relative quality score.

In the result summary table, the reconstruction error is computed as \( \sqrt{mean((X - USV^T)_{ij}^2)} \), where the average is over all elements of the matrices. The relative reconstruction error is then computed as ratio of the reconstruction error and \( \sqrt{mean(X_{ij}^2)} \).

Examples
  1. View online help for the SVD function.
    SELECT madlib.svd();
    
  2. Create an input dataset (dense matrix).
    DROP TABLE IF EXISTS mat, mat_sparse, svd_summary_table, svd_u, svd_v, svd_s;
    CREATE TABLE mat (
        row_id integer,
        row_vec double precision[]
    );
    INSERT INTO mat VALUES
    (1,'{396,840,353,446,318,886,15,584,159,383}'),
    (2,'{691,58,899,163,159,533,604,582,269,390}'),
    (3,'{293,742,298,75,404,857,941,662,846,2}'),
    (4,'{462,532,787,265,982,306,600,608,212,885}'),
    (5,'{304,151,337,387,643,753,603,531,459,652}'),
    (6,'{327,946,368,943,7,516,272,24,591,204}'),
    (7,'{877,59,260,302,891,498,710,286,864,675}'),
    (8,'{458,959,774,376,228,354,300,669,718,565}'),
    (9,'{824,390,818,844,180,943,424,520,65,913}'),
    (10,'{882,761,398,688,761,405,125,484,222,873}'),
    (11,'{528,1,860,18,814,242,314,965,935,809}'),
    (12,'{492,220,576,289,321,261,173,1,44,241}'),
    (13,'{415,701,221,503,67,393,479,218,219,916}'),
    (14,'{350,192,211,633,53,783,30,444,176,932}'),
    (15,'{909,472,871,695,930,455,398,893,693,838}'),
    (16,'{739,651,678,577,273,935,661,47,373,618}');
    
  3. Run SVD function for a dense matrix.
    SELECT madlib.svd( 'mat',       -- Input table
                       'svd',       -- Output table prefix
                       'row_id',    -- Column name with row index
                       10,          -- Number of singular values to compute
                       NULL,        -- Use default number of iterations
                       'svd_summary_table'  -- Result summary table
                     );
    
  4. Print out the singular values and the summary table. For the singular values:
    SELECT * FROM svd_s ORDER BY row_id;
    
    Result:
     row_id | col_id |      value
     --------+--------+------------------
          1 |      1 | 6475.67225281804
          2 |      2 | 1875.18065580415
          3 |      3 | 1483.25228429636
          4 |      4 | 1159.72262897427
          5 |      5 | 1033.86092570574
          6 |      6 | 948.437358703966
          7 |      7 | 795.379572772455
          8 |      8 | 709.086240684469
          9 |      9 | 462.473775959371
         10 |     10 | 365.875217945698
         10 |     10 |
    (11 rows)
    
    For the summary table:
    SELECT * FROM svd_summary_table;
    
    Result:
     rows_used | exec_time (ms) | iter |    recon_error    | relative_recon_error
     -----------+----------------+------+-------------------+----------------------
            16 |        1332.47 |   10 | 4.36920148766e-13 |    7.63134130332e-16
    (1 row)
    
  5. Create a sparse matrix by running the matrix_sparsify() utility function on the dense matrix.
    SELECT madlib.matrix_sparsify('mat',
                                  'row=row_id, val=row_vec',
                                  'mat_sparse',
                                  'row=row_id, col=col_id, val=value');
    
  6. Run the SVD function for a sparse matrix.
    SELECT madlib.svd_sparse( 'mat_sparse',   -- Input table
                              'svd',          -- Output table prefix
                              'row_id',       -- Column name with row index
                              'col_id',       -- Column name with column index
                              'value',        -- Matrix cell value
                              16,             -- Number of rows in matrix
                              10,             -- Number of columns in matrix
                              10              -- Number of singular values to compute
                              );
    
  7. Run the SVD function for a very sparse matrix.
    SELECT madlib.svd_sparse_native ( 'mat_sparse',   -- Input table
                              'svd',          -- Output table prefix
                              'row_id',       -- Column name with row index
                              'col_id',       -- Column name with column index
                              'value',        -- Matrix cell value
                              16,             -- Number of rows in matrix
                              10,             -- Number of columns in matrix
                              10              -- Number of singular values to compute
                              );
    
    Technical Background
    In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics. Let \(A\) be a \(m \times n\) matrix, where \(m \ge n\). Then \(A\) can be decomposed as follows:

    \[ A = U \Sigma V^T, \]

    where \(U\) is a \(m \times n\) orthonormal matrix, \(\Sigma\) is a \(n \times n\) diagonal matrix, and \(V\) is an \(n \times n\) orthonormal matrix. The diagonal elements of \(\Sigma\) are called the singular values. It is possible to formulate the problem of computing the singular triplets ( \(\sigma_i, u_i, v_i\)) of \(A\) as an eigenvalue problem involving a Hermitian matrix related to \(A\). There are two possible ways of achieving this: