1.17.0 User Documentation for Apache MADlib

## Detailed Description

These modules provide basic mathematical operations to be run on array and matrices.

For a distributed system, a matrix cannot simply be represented as a 2D array of numbers in memory. We provide two forms of distributed representation of a matrix:

• Dense: The matrix is represented as a distributed collection of 1-D arrays. An example 3x10 matrix would be the below table:
 row_id |         row_vec
--------+-------------------------
1    | {9,6,5,8,5,6,6,3,10,8}
2    | {8,2,2,6,6,10,2,1,9,9}
3    | {3,9,9,9,8,6,3,9,5,6}

• Sparse: The matrix is represented using the row and column indices for each non-zero entry of the matrix. Example:
 row_id | col_id | value
--------+--------+-------
1 |      1 |     9
1 |      5 |     6
1 |      6 |     6
2 |      1 |     8
3 |      1 |     3
3 |      2 |     9
4 |      7 |     0
(6 rows)

All matrix operations work with either form of representation.

In many cases, a matrix function can be decomposed to vector operations applied independently on each row of a matrix (or corresponding rows of two matrices). We have also provided access to these internal vector operations (Array Operations) for greater flexibility. Matrix operations like matrix_add use the corresponding vector operation (array_add) and also include additional validation and formating. Other functions like matrix_mult are complex and use a combination of such vector operations and other SQL operations.

It's important to note that these array functions are only available for the dense format representation of the matrix. In general, the scope of a single array function invocation is limited to only an array (1-dimensional or 2-dimensional) that fits in memory. When such function is executed on a table of arrays, the function is called multiple times - once for each array (or pair of arrays). On contrary, scope of a single matrix function invocation is the complete matrix stored as a distributed table.

## Modules

Array Operations
Provides fast array operations supporting other MADlib modules.

Matrix Operations
Provides fast matrix operations supporting other MADlib modules.

Matrix Factorization
Linear algebra methods that factorize a matrix into a product of matrices.

Norms and Distance Functions
Provides utility functions for basic linear algebra operations.

Sparse Vectors
Implements a sparse vector data type that provides compressed storage of vectors that may have many duplicate elements.