User Documentation for Apache MADlib

Detailed Description

These modules provide basic mathematical operations to be run on array and matrices.

For a distributed system, a matrix cannot simply be represented as a 2D array of numbers in memory. We provide two forms of distributed representation of a matrix:

In many cases, a matrix function can be decomposed to vector operations applied independently on each row of a matrix (or corresponding rows of two matrices). We have also provided access to these internal vector operations (Array Operations) for greater flexibility. Matrix operations like matrix_add use the corresponding vector operation (array_add) and also include additional validation and formating. Other functions like matrix_mult are complex and use a combination of such vector operations and other SQL operations.

It's important to note that these array functions are only available for the dense format representation of the matrix. In general, the scope of a single array function invocation is limited to only an array (1-dimensional or 2-dimensional) that fits in memory. When such function is executed on a table of arrays, the function is called multiple times - once for each array (or pair of arrays). On contrary, scope of a single matrix function invocation is the complete matrix stored as a distributed table.


 Array Operations
 Provides fast array operations supporting other MADlib modules.
 Matrix Operations
 Provides fast matrix operations supporting other MADlib modules.
 Matrix Factorization
 Linear algebra methods that factorize a matrix into a product of matrices.
 Norms and Distance Functions
 Provides utility functions for basic linear algebra operations.
 Sparse Vectors
 Implements a sparse vector data type that provides compressed storage of vectors that may have many duplicate elements.