User Documentation
 All Files Functions Groups
+ Collaboration diagram for Profile:
Warning
This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change.
About:
This module computes a "profile" of a table or view: a predefined set of aggregates to be run on each column of a table.

The following aggregates will be called on every integer column:

And these on non-integer columns:

Because the input schema of the table or view is unknown, we need to synthesize SQL to suit. This is done either via the profile or profile_full user defined function.

Usage:
  • Generate a basic profile information (subset of predefined aggregate functions) for all columns of the input table.
    SELECT * FROM profile( 'input_table');
  • Generate a full profile information (all predefined aggregate functions) for all columns of the input table.
    SELECT * FROM profile_full( 'input_table', buckets);
Examples:
  • For basic profile run:
    sql> SELECT * FROM profile( 'pg_catalog.pg_tables');
    
     schema_name | table_name | column_name |       function    | value 
    -------------+------------+-------------+-------------------+-------
     pg_catalog  | pg_tables  | *           | COUNT()           | 105
     pg_catalog  | pg_tables  | schemaname  | fmsketch_dcount() | 6
     pg_catalog  | pg_tables  | tablename   | fmsketch_dcount() | 104
     pg_catalog  | pg_tables  | tableowner  | fmsketch_dcount() | 2
     pg_catalog  | pg_tables  | tablespace  | fmsketch_dcount() | 1
     pg_catalog  | pg_tables  | hasindexes  | fmsketch_dcount() | 2
     pg_catalog  | pg_tables  | hasrules    | fmsketch_dcount() | 1
     pg_catalog  | pg_tables  | hastriggers | fmsketch_dcount() | 2
    (8 rows)
    
  • For full profile run:
    sql> SELECT * FROM profile_full( 'pg_catalog.pg_tables', 5);
    
     schema_name | table_name | column_name |                        function                 |                                               value                                                
    -------------+------------+-------------+-------------------------------------------------+----------------------------------------------------------------------------------------------------
     pg_catalog  | pg_tables  | *           | COUNT()                                         | 105
     pg_catalog  | pg_tables  | schemaname  | fmsketch_dcount()                               | 6
     pg_catalog  | pg_tables  | schemaname  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:4]={pg_catalog:68,public:19,information_schema:7,gp_toolkit:5,maddy:5}
     pg_catalog  | pg_tables  | schemaname  | array_collapse(mfvsketch_top_histogram((),5))   | [0:4]={pg_catalog:68,public:19,information_schema:7,gp_toolkit:5,maddy:5}
     pg_catalog  | pg_tables  | tablename   | fmsketch_dcount()                               | 104
     pg_catalog  | pg_tables  | tablename   | array_collapse(mfvsketch_quick_histogram((),5)) | [0:4]={migrationhistory:2,pg_statistic:1,sql_features:1,sql_implementation_info:1,sql_languages:1}
     pg_catalog  | pg_tables  | tablename   | array_collapse(mfvsketch_top_histogram((),5))   | [0:4]={migrationhistory:2,pg_statistic:1,sql_features:1,sql_implementation_info:1,sql_languages:1}
     pg_catalog  | pg_tables  | tableowner  | fmsketch_dcount()                               | 2
     pg_catalog  | pg_tables  | tableowner  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:1]={agorajek:104,alex:1}
     pg_catalog  | pg_tables  | tableowner  | array_collapse(mfvsketch_top_histogram((),5))   | [0:1]={agorajek:104,alex:1}
     pg_catalog  | pg_tables  | tablespace  | fmsketch_dcount()                               | 1
     pg_catalog  | pg_tables  | tablespace  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:0]={pg_global:28}
     pg_catalog  | pg_tables  | tablespace  | array_collapse(mfvsketch_top_histogram((),5))   | [0:0]={pg_global:28}
     pg_catalog  | pg_tables  | hasindexes  | fmsketch_dcount()                               | 2
     pg_catalog  | pg_tables  | hasindexes  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:1]={t:59,f:46}
     pg_catalog  | pg_tables  | hasindexes  | array_collapse(mfvsketch_top_histogram((),5))   | [0:1]={t:59,f:46}
     pg_catalog  | pg_tables  | hasrules    | fmsketch_dcount()                               | 1
     pg_catalog  | pg_tables  | hasrules    | array_collapse(mfvsketch_quick_histogram((),5)) | [0:0]={f:105}
     pg_catalog  | pg_tables  | hasrules    | array_collapse(mfvsketch_top_histogram((),5))   | [0:0]={f:105}
     pg_catalog  | pg_tables  | hastriggers | fmsketch_dcount()                               | 2
     pg_catalog  | pg_tables  | hastriggers | array_collapse(mfvsketch_quick_histogram((),5)) | [0:1]={f:102,t:3}
     pg_catalog  | pg_tables  | hastriggers | array_collapse(mfvsketch_top_histogram((),5))   | [0:1]={f:102,t:3}
    (22 rows)
    
Implementation Notes:

Because some of the aggregate functions used in profile return multi-dimensional arrays, which are not easily handled in pl/python, we are using array_collapse function to collaps the n-dim arrays to 1-dim arrays. All values of 2 and upper dimensions are separated with ':' character.

See Also
File profile.sql_in documenting SQL functions.