2.1.0
User Documentation for Apache MADlib
Dense Linear Systems

The linear systems module implements solution methods for systems of consistent linear equations. Systems of linear equations take the form:

\[ Ax = b \]

where \(x \in \mathbb{R}^{n}\), \(A \in \mathbb{R}^{m \times n} \) and \(b \in \mathbb{R}^{m}\). We assume that there are no rows of \(A\) where all elements are zero. The algorithms implemented in this module can handle large dense linear systems. Currently, the algorithms implemented in this module solve the linear system by a direct decomposition. Hence, these methods are known as direct method.

Solution Function
linear_solver_dense( tbl_source,
                     tbl_result,
                     row_id,
                     LHS,
                     RHS,
                     grouping_col,
                     optimizer,
                     optimizer_params
                   )
Arguments
tbl_source

TEXT. The name of the table containing the training data. The input data is expected to be of the following form:

{TABLE|VIEW} sourceName (
    ...
    row_id          FLOAT8,
    left_hand_side  FLOAT8[],
    right_hand_side FLOAT8,
    ...
)

Each row represents a single equation. The right_hand_side column refers to the right hand side of the equations while the left_hand_side column refers to the multipliers on the variables on the left hand side of the same equations.

tbl_result

TEXT. The name of the table where the output is saved. The output is stored in the table named by the tbl_result argument. It contains the following columns:

solution FLOAT8[]. The solution variables in the same order as that provided as input in the 'left_hand_side' column name of the source_table
residual_norm FLOAT8. The scaled residual norm, defined as \( \frac{|Ax - b|}{|b|} \). This value is an indication of the accuracy of the solution.
iters INTEGER. Number of iterations required by the algorithm (only applicable for iterative algorithms). The output is NULL for 'direct' methods.

row_id

TEXT. The name of the column storing the 'row id' of the equations.

For a system with N equations, the row_id's must be a continuous range of integers from \( 0 \ldots n-1 \).

LHS

TEXT. The name of the column storing the 'left hand side' of the equations, stored as an array.

RHS

TEXT. The name of the column storing the 'right hand side' of the equations.

grouping_cols (optional)
TEXT, default: NULL. Group by column names. Not currently implemented. Any non-NULL value is ignored.
optimizer (optional)

TEXT, default: 'direct'. The type of optimizer.

optimizer_params (optional)
TEXT, default: NULL. Optimizer specific parameters.

Optimizer Parameters

For each optimizer, there are specific parameters that can be tuned for better performance.

algorithm (default: householderqr)

There are several algorithms that can be classified as 'direct' methods of solving linear systems. MADlib dense linear system solvers provide various algorithmic options for users.

The following table provides a guideline on the choice of algorithm based on conditions on the A matrix, speed of the algorithms and numerical stability.

 Algorithm            | Conditions on A  | Speed | Accuracy
 ----------------------------------------------------------
 householderqr        | None             |  ++   |  +
 partialpivlu         | Invertable       |  ++   |  +
 fullpivlu            | None             |  -    |  +++
 colpivhouseholderqr  | None             |  +    |  ++
 fullpivhouseholderqr | None             |  -    |  +++
 llt                  | Pos. Definite    |  +++  |  +
 ldlt                 | Pos. or Neg Def  |  +++  |  ++

For speed '++' is faster than '+', which is faster than '-'. For accuracy '+++' is better than '++'.

More details about the individual algorithms can be found in the Eigen documentation. Eigen is an open source library for linear algebra.

Examples
  1. View online help for the linear systems solver function.
    SELECT madlib.linear_solver_dense();
    
  2. Create the sample data set.
    CREATE TABLE linear_systems_test_data( id INTEGER NOT NULL,
                                           lhs DOUBLE PRECISION[],
                                           rhs DOUBLE PRECISION
                                         );
    INSERT INTO linear_systems_test_data(id, lhs, rhs)
           VALUES
            (0, ARRAY[1,0,0], 20),
            (1, ARRAY[0,1,0], 15),
            (2, ARRAY[0,0,1], 20);
    
  3. Solve the linear systems with default parameters.
    SELECT madlib.linear_solver_dense( 'linear_systems_test_data',
                                       'output_table',
                                       'id',
                                       'lhs',
                                       'rhs'
                                     );
    
  4. Obtain the output from the output table.
    \x on
    SELECT * FROM output_table;
    
    Result:
    --------------------+-------------------------------------
    solution            | {20,15,20}
    residual_norm       | 0
    iters               | NULL
    
  5. Choose an algorithm different than the default.
    DROP TABLE IF EXISTS result_table;
    SELECT madlib.linear_solver_dense( 'linear_systems_test_data',
                                       'result_table',
                                       'id',
                                       'lhs',
                                       'rhs',
                                       NULL,
                                       'direct',
                                       'algorithm=llt'
                                     );
    

Related Topics
File dense_linear_systems.sql_in documenting the SQL functions