2.1.0
User Documentation for Apache MADlib

The linalg module consists of useful utility functions for basic linear algebra operations. Several of these functions can be used while implementing new algorithms. These functions operate on vectors (1-D FLOAT8 array) and matrices (2-D FLOAT8 array). Note that other array types may need to be casted into FLOAT8[] before calling the functions.

Refer to the linalg.sql_in file for documentation on each of the utility functions.

Linear Algebra Utility Functions
norm1()

1-norm of a vector, \(\|\vec{a}\|_1\).

norm2()

2-norm of a vector, \(\|\vec{a}\|_2\).

dist_norm1()

1-norm of the difference between two vectors, \(\|\vec{a} - \vec{b}\|_1\).

dist_norm2()

2-norm of the difference between two vectors, \(\|\vec{a} - \vec{b}\|_2\).

dist_pnorm()

Generic p-norm of the difference between two vectors, \(\|\vec{a} - \vec{b}\|_p, p > 0\).

dist_inf_norm()

Infinity-norm of the difference between two vectors, \(\|\vec{a} - \vec{b}\|_\infty\).

squared_dist_norm2()

Squared 2-norm of the difference between two vectors, \(\|\vec{a} - \vec{b}\|_2^2\).

cosine_similarity()

Cosine score between two vectors, \(\frac{\vec{a} \cdot \vec{b}}{\|\vec{a}\|_2 \|\vec{b}\|_2}\).

dist_angle()

Angle between two vectors in an Euclidean space, \(\cos^{-1}(\frac{\vec{a} \cdot \vec{b}}{\|\vec{a}\|_2 \|\vec{b}\|_2})\).

dist_tanimoto()

Tanimoto distance between two vectors. [1]

dist_jaccard()

Jaccard distance between two varchar vectors treated as sets.

get_row()

Return the indexed row of a matrix (2-D array).

get_col()

Return the indexed col of a matrix (2-D array).

avg()

Compute the average of vectors.

normalized_avg()

Compute the normalized average of vectors (unit vector in an Euclidean space).

matrix_agg()

Combine vectors to a matrix.

Vector Norms and Distances

  1. Create a database table with two vector columns and add some data.
    CREATE TABLE two_vectors(
        id  integer,
        a   float8[],
        b   float8[]);
    
    INSERT INTO two_vectors VALUES
    (1, '{3,4}', '{4,5}'),
    (2, '{1,1,0,-4,5,3,4,106,14}', '{1,1,0,6,-3,1,2,92,2}');
    
  2. Invoke norm functions.
    SELECT
        id,
        madlib.norm1(a),
        madlib.norm2(a)
    FROM two_vectors;
    
    Result:
     id | norm1 |      norm2
    ----+-------+------------------
      1 |     7 |                5
      2 |   138 | 107.238052947636
    (2 rows)
    
  3. Invoke distance functions.
    SELECT
        id,
        madlib.dist_norm1(a, b),
        madlib.dist_norm2(a, b),
        madlib.dist_pnorm(a, b, 5) AS norm5,
        madlib.dist_inf_norm(a, b),
        madlib.squared_dist_norm2(a, b) AS sq_dist_norm2,
        madlib.cosine_similarity(a, b),
        madlib.dist_angle(a, b),
        madlib.dist_tanimoto(a, b),
        madlib.dist_jaccard(a::text[], b::text[])
    FROM two_vectors;
    
    Result:
     id | dist_norm1 |    dist_norm2    |      norm5       | dist_inf_norm | sq_dist_norm2 | cosine_similarity |     dist_angle     |   dist_tanimoto    |   dist_jaccard
    ----+------------+------------------+------------------+---------------+---------------+-------------------+--------------------+--------------------+-------------------
      1 |          2 |  1.4142135623731 | 1.14869835499704 |             1 |             2 | 0.999512076087079 | 0.0312398334302684 | 0.0588235294117647 | 0.666666666666667
      2 |         48 | 22.6274169979695 |  15.585086360695 |            14 |           512 | 0.985403348449008 |   0.17106899659286 | 0.0498733684005455 | 0.833333333333333
    (2 rows)
    

Matrix Functions

  1. Create a database table with a matrix column.
    CREATE TABLE matrix(
        id  integer,
        m   float8[]);
    
    INSERT INTO matrix VALUES
    (1, '{{4,5},{3,5},{9,0}}');
    
  2. Invoke matrix functions.
    SELECT
        madlib.get_row(m, 1) AS row_1,
        madlib.get_row(m, 2) AS row_2,
        madlib.get_row(m, 3) AS row_3,
        madlib.get_col(m, 1) AS col_1,
        madlib.get_col(m, 2) AS col_2
    FROM matrix;
    
    Result:
     row_1 | row_2 | row_3 |  col_1  |  col_2
    -------+-------+-------+---------+---------
     {4,5} | {3,5} | {9,0} | {4,3,9} | {5,5,0}
    (1 row)
    

Aggregate Functions

  1. Create a database table with a vector column.
    CREATE TABLE vector(
        id  integer,
        v   float8[]);
    
    INSERT INTO vector VALUES
    (1, '{4,3}'),
    (2, '{8,6}'),
    (3, '{12,9}');
    
  2. Invoke aggregate functions.
    SELECT
        madlib.avg(v),
        madlib.normalized_avg(v),
        madlib.matrix_agg(v)
    FROM vector;
    
    Result:
      avg  | normalized_avg |      matrix_agg
    -------+----------------+----------------------
     {8,6} | {0.8,0.6}      | {{4,3},{8,6},{12,9}}
    (1 row)
    

Literature

[1] http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_similarity_and_distance

Related Topics
File linalg.sql_in documenting the SQL functions.