User Documentation
 All Files Functions Groups
Pearson's correlation
+ Collaboration diagram for Pearson's correlation:
About:

A correlation function is the degree and direction of association of two variables; how well can one random variable be predicted from the other. The coefficient of correlation varies from -1 to 1. Coefficient of 1 implies perfect correlation, 0 means no correlation, and -1 means perfect anti-correlation.

This function provides a cross-correlation matrix for all pairs of numeric columns in a source_table. A Correlation matrix describes correlation among \( M \) variables. It is a square symmetrical \( M \)x \(M \) matrix with the \( (ij) \)th element equal to the correlation coefficient between the \(i\)th and the \(j\)th variable. The diagonal elements (correlations of variables with themselves) are always equal to 1.0.

Usage:

Currently the correlation function can be used in the following way:

SELECT {schema_madlib}.correlation
(
    source_table        TEXT,       -- Source table name (Required)
    output_table        TEXT,       -- Output table name (Required)
    target_cols         TEXT,       -- Comma separated columns for which summary is desired
                                        --   (Default: NULL - produces result for all columns)
)

Output will be a table with N+2 columns and N rows, where N is the number of target columns.

The output table is arranged as a lower-traingular matrix with the upper triangle set to NULL and the diagonal elements set to 1.0. To obtain the result from the 'output_table' in this matrix format ensure to order the elements using the 'column_position', as given in the example below.

sql> SELECT * FROM output_table order by column_position;
Examples:
DROP TABLE IF EXISTS example_data;
CREATE TABLE example_data(
    id SERIAL,
    outlook text,
    temperature float8,
    humidity float8,
    windy text,
    class text) ;

INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('sunny', 85, 85, 'false', E'Don\\'t Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('sunny', 80, 90, 'true', E'Don\\'t Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('overcast', 83, 78, 'false', 'Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('rain', 70, 96, 'false', 'Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('rain', 68, 80, 'false', 'Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('rain', 65, 70, 'true', E'Don\\'t Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('overcast', 64, 65, 'true', 'Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('sunny', 72, 95, 'false', E'Don\\'t Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('sunny', 69, 70, 'false', 'Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('rain', 75, 80, 'false', 'Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('sunny', 75, 70, 'true', 'Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('overcast', 72, 90, 'true', 'Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('overcast', 81, 75, 'false', 'Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('rain', 71, 80, 'true', E'Don\\'t Play');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES(' ', 100, 100, 'true', ' ');
INSERT INTO example_data(outlook, temperature, humidity, windy, class) VALUES('', 110, 100, 'true', '');

SELECT madlib.correlation('example_data', 'example_data_output');
SELECT madlib.correlation('example_data', 'example_data_output', '*');
SELECT madlib.correlation('example_data', 'example_data_output', 'temperature, humidity');

To get the correlation matrix from output table:

    SELECT * from example_data_output order by column_position;