Collaboration diagram for Profile:

Warning: This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change.

About:: This module computes a "profile" of a table or view: a predefined set of aggregates to be run on each column of a table.

The following aggregates will be called on every integer column:

min(), max(), avg()
madlib.cmsketch_median()
madlib.cmsketch_depth_histogram()
madlib.cmsketch_width_histogram()

And these on non-integer columns:

madlib.fmsketch_dcount()
madlib.mfvsketch_quick_histogram()
madlib.mfvsketch_top_histogram()

Because the input schema of the table or view is unknown, we need to synthesize SQL to suit. This is done either via the profile or profile_full user defined function.

Usage:

Generate a basic profile information (subset of predefined aggregate functions) for all columns of the input table.
```
SELECT * FROM profile( 'input_table');
```
Generate a full profile information (all predefined aggregate functions) for all columns of the input table.
```
SELECT * FROM profile_full( 'input_table', buckets);
```

Examples:

For basic profile run:

sql> SELECT * FROM profile( 'pg_catalog.pg_tables');

 schema_name | table_name | column_name |       function    | value 
-------------+------------+-------------+-------------------+-------
 pg_catalog  | pg_tables  | *           | COUNT()           | 105
 pg_catalog  | pg_tables  | schemaname  | fmsketch_dcount() | 6
 pg_catalog  | pg_tables  | tablename   | fmsketch_dcount() | 104
 pg_catalog  | pg_tables  | tableowner  | fmsketch_dcount() | 2
 pg_catalog  | pg_tables  | tablespace  | fmsketch_dcount() | 1
 pg_catalog  | pg_tables  | hasindexes  | fmsketch_dcount() | 2
 pg_catalog  | pg_tables  | hasrules    | fmsketch_dcount() | 1
 pg_catalog  | pg_tables  | hastriggers | fmsketch_dcount() | 2
(8 rows)

For full profile run:

sql> SELECT * FROM profile_full( 'pg_catalog.pg_tables', 5);

 schema_name | table_name | column_name |                        function                 |                                               value                                                
-------------+------------+-------------+-------------------------------------------------+----------------------------------------------------------------------------------------------------
 pg_catalog  | pg_tables  | *           | COUNT()                                         | 105
 pg_catalog  | pg_tables  | schemaname  | fmsketch_dcount()                               | 6
 pg_catalog  | pg_tables  | schemaname  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:4]={pg_catalog:68,public:19,information_schema:7,gp_toolkit:5,maddy:5}
 pg_catalog  | pg_tables  | schemaname  | array_collapse(mfvsketch_top_histogram((),5))   | [0:4]={pg_catalog:68,public:19,information_schema:7,gp_toolkit:5,maddy:5}
 pg_catalog  | pg_tables  | tablename   | fmsketch_dcount()                               | 104
 pg_catalog  | pg_tables  | tablename   | array_collapse(mfvsketch_quick_histogram((),5)) | [0:4]={migrationhistory:2,pg_statistic:1,sql_features:1,sql_implementation_info:1,sql_languages:1}
 pg_catalog  | pg_tables  | tablename   | array_collapse(mfvsketch_top_histogram((),5))   | [0:4]={migrationhistory:2,pg_statistic:1,sql_features:1,sql_implementation_info:1,sql_languages:1}
 pg_catalog  | pg_tables  | tableowner  | fmsketch_dcount()                               | 2
 pg_catalog  | pg_tables  | tableowner  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:1]={agorajek:104,alex:1}
 pg_catalog  | pg_tables  | tableowner  | array_collapse(mfvsketch_top_histogram((),5))   | [0:1]={agorajek:104,alex:1}
 pg_catalog  | pg_tables  | tablespace  | fmsketch_dcount()                               | 1
 pg_catalog  | pg_tables  | tablespace  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:0]={pg_global:28}
 pg_catalog  | pg_tables  | tablespace  | array_collapse(mfvsketch_top_histogram((),5))   | [0:0]={pg_global:28}
 pg_catalog  | pg_tables  | hasindexes  | fmsketch_dcount()                               | 2
 pg_catalog  | pg_tables  | hasindexes  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:1]={t:59,f:46}
 pg_catalog  | pg_tables  | hasindexes  | array_collapse(mfvsketch_top_histogram((),5))   | [0:1]={t:59,f:46}
 pg_catalog  | pg_tables  | hasrules    | fmsketch_dcount()                               | 1
 pg_catalog  | pg_tables  | hasrules    | array_collapse(mfvsketch_quick_histogram((),5)) | [0:0]={f:105}
 pg_catalog  | pg_tables  | hasrules    | array_collapse(mfvsketch_top_histogram((),5))   | [0:0]={f:105}
 pg_catalog  | pg_tables  | hastriggers | fmsketch_dcount()                               | 2
 pg_catalog  | pg_tables  | hastriggers | array_collapse(mfvsketch_quick_histogram((),5)) | [0:1]={f:102,t:3}
 pg_catalog  | pg_tables  | hastriggers | array_collapse(mfvsketch_top_histogram((),5))   | [0:1]={f:102,t:3}
(22 rows)

Implementation Notes:

Because some of the aggregate functions used in profile return multi-dimensional arrays, which are not easily handled in pl/python, we are using array_collapse function to collaps the n-dim arrays to 1-dim arrays. All values of 2 and upper dimensions are separated with ':' character.

See Also: File profile.sql_in documenting SQL functions.