2.1.0
User Documentation for Apache MADlib
Sparse Linear Systems

The sparse linear systems module implements solution methods for systems of consistent linear equations. Systems of linear equations take the form:

\[ Ax = b \]

where \(x \in \mathbb{R}^{n}\), \(A \in \mathbb{R}^{m \times n} \) and \(b \in \mathbb{R}^{m}\). This module accepts sparse matrix input formats for \(A\) and \(b\). We assume that there are no rows of \(A\) where all elements are zero.

Note
Algorithms with fail if there is an row of the input matrix containing all zeros.

The algorithms implemented in this module can handle large sparse square linear systems. Currently, the algorithms implemented in this module solve the linear system using direct or iterative methods.

Sparse Linear Systems Solution Function
linear_solver_sparse( tbl_source_lhs,
                      tbl_source_rhs,
                      tbl_result,
                      lhs_row_id,
                      lhs_col_id,
                      lhs_value,
                      rhs_row_id,
                      rhs_value,
                      grouping_cols := NULL,
                      optimizer := 'direct',
                      optimizer_params :=
                      'algorithm = llt'
                    )

Arguments

tbl_source_lhs

The name of the table containing the left hand side matrix. For the LHS matrix, the input data is expected to be of the following form:

{TABLE|VIEW} sourceName (
    ...
    row_id FLOAT8,
    col_id FLOAT8,
    value FLOAT8,
    ...
)

Each row represents a single equation. The rhs columns refer to the right hand side of the equations and the lhs columns refer to the multipliers on the variables on the left hand side of the same equations.

tbl_source_rhs

TEXT. The name of the table containing the right hand side vector. For the RHS matrix, the input data is expected to be of the following form:

{TABLE|VIEW} <em>sourceName</em> (
    ...
    <em>row_id</em> FLOAT8,
    <em>value</em> FLOAT8
    ...
)

Each row represents a single equation. The rhs columns refer to the right hand side of the equations while the lhs columns refers to the multipliers on the variables on the left hand side of the same equations.

tbl_result

TEXT. The name of the table where the output is saved. Output is stored in the tabled named by the tbl_result argument. The table contains the following columns. The output contains the following columns:

solution FLOAT8[]. The solution is an array with the variables in the same order as that provided as input in the 'left_hand_side' column name of the 'source_table'
residual_norm FLOAT8. Scaled residual norm, defined as \( \frac{|Ax - b|}{|b|} \). This value is an indication of the accuracy of the solution.
iters INTEGER. Number of iterations required by the algorithm (only applicable for iterative algorithms) . The output is NULL for 'direct' methods.

lhs_row_id
TEXT. The name of the column storing the 'row id' of the equations.
Note
For a system with N equations, the row_id's must be a continuous range of integers from \( 0 \ldots n-1 \).
lhs_col_id

TEXT. The name of the column (in tbl_source_lhs) storing the 'col id' of the equations.

lhs_value

TEXT. The name of the column (in tbl_source_lhs) storing the 'value' of the equations.

rhs_row_id

TEXT. The name of the column (in tbl_source_rhs) storing the 'col id' of the equations.

rhs_value

TEXT. The name of the column (in tbl_source_rhs) storing the 'value' of the equations.

num_vars

INTEGER. The number of variables in the linear system equations.

grouping_col (optional)
TEXT, default: NULL. Group by column names.
Note
The grouping feature is currently not implemented and this parameter is only a placeholder.
optimizer (optional)

TEXT, default: 'direct'. Type of optimizer.

optimizer_params (optional)
TEXT, default: NULL. Optimizer specific parameters.

Optimizer Parameters

For each optimizer, there are specific parameters that can be tuned for better performance.

algorithm (default: ldlt)

There are several algorithms that can be classified as 'direct' methods of solving linear systems. Madlib functions provide various algorithmic options available for users.

The following table provides a guideline on the choice of algorithm based on conditions on the A matrix, speed of the algorithms and numerical stability.

    Algorithm          | Conditions on A  | Speed | Memory
    ----------------------------------------------------------
    llt                | Sym. Pos Def     |  ++   |  ---
    ldlt               | Sym. Pos Def     |  ++   |  ---

    For speed '++' is faster than '+', which is faster than '-'.
    For accuracy '+++' is better than '++'.
    For memory, '-' uses less memory than '--'.

    Note: ldlt is often preferred over llt

There are several algorithms that can be classified as 'iterative' methods of solving linear systems. Madlib functions provide various algorithmic options available for users.

The following table provides a guideline on the choice of algorithm based on conditions on the A matrix, speed of the algorithms and numerical stability.

    Algorithm            | Conditions on A  | Speed | Memory | Convergence
    ----------------------------------------------------------------------
    cg-mem               | Sym. Pos Def     |  +++  |   -    |    ++
    bicgstab-mem         | Square           |  ++   |   -    |    +
    precond-cg-mem       | Sym. Pos Def     |  ++   |   -    |    +++
    precond-bicgstab-mem | Square           |  +    |   -    |    ++

    For memory, '-' uses less memory than '--'.
    For speed, '++' is faster than '+'.

Algorithm Details:

cg-memIn memory conjugate gradient with diagonal preconditioners.
bicgstab-memBi-conjugate gradient (equivalent to performing CG on the least squares formulation of Ax=b) with incomplete LU preconditioners.
precond-cg-memIn memory conjugate gradient with diagonal preconditioners.
bicgstab-memBi-conjugate gradient (equivalent to performing CG on the least squares formulation of Ax=b) with incomplete LU preconditioners.

toler (default: 1e-5)

Termination tolerance (applicable only for iterative methods) which determines the stopping criterion (with respect to residual norm) for iterative methods.

Examples
  1. View online help for the sparse linear systems solver function.
    SELECT madlib.linear_solver_sparse();
    
  2. Create the sample data set.
    DROP TABLE IF EXISTS sparse_linear_systems_lhs;
    CREATE TABLE sparse_linear_systems_lhs (
        rid INTEGER NOT NULL,
        cid  INTEGER,
        val DOUBLE PRECISION
    );
    DROP TABLE IF EXISTS sparse_linear_systems_rhs;
    CREATE TABLE sparse_linear_systems_rhs (
        rid INTEGER NOT NULL,
        val DOUBLE PRECISION
    );
    INSERT INTO sparse_linear_systems_lhs(rid, cid, val) VALUES
    (0, 0, 1),
    (1, 1, 1),
    (2, 2, 1),
    (3, 3, 1);
    INSERT INTO sparse_linear_systems_rhs(rid, val) VALUES
    (0, 10),
    (1, 20),
    (2, 30);
    
  3. Solve the linear systems with default parameters.
    SELECT madlib.linear_solver_sparse( 'sparse_linear_systems_lhs',
                                        'sparse_linear_systems_rhs',
                                        'output_table',
                                        'rid',
                                        'cid',
                                        'val',
                                        'rid',
                                        'val',
                                        4
                                      );
    
  4. View the contents of the output table.
    \x on
    SELECT * FROM output_table;
    
    Result:
    --------------------+-------------------------------------
    solution            | {10,20,30,0}
    residual_norm       | 0
    iters               | NULL
    
  5. Choose a different algorithm than the default algorithm.
    DROP TABLE IF EXISTS output_table;
    SELECT madlib.linear_solver_sparse( 'sparse_linear_systems_lhs',
                                        'sparse_linear_systems_rhs',
                                        'output_table',
                                        'rid',
                                        'cid',
                                        'val',
                                        'rid',
                                        'val',
                                        4,
                                        NULL,
                                        'direct',
                                        'algorithm=llt'
                                      );
    
  6. Choose a different algorithm than the default algorithm.
    DROP TABLE IF EXISTS output_table;
    SELECT madlib.linear_solver_sparse(
                                        'sparse_linear_systems_lhs',
                                        'sparse_linear_systems_rhs',
                                        'output_table',
                                        'rid',
                                        'cid',
                                        'val',
                                        'rid',
                                        'val',
                                        4,
                                        NULL,
                                        'iterative',
                                        'algorithm=cg-mem, toler=1e-5'
                                      );
    

Related Topics
File sparse_linear_sytems.sql_in documenting the SQL functions.