MADlib  1.4.1
User Documentation
 All Files Functions Variables Groups
Warning
This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change.

This module computes a "profile" of a table or view: a predefined set of aggregates to be run on each column of a table.

The following aggregates are called on every integer column:

The following aggregates are called on non-integer columns:

Because the input schema of the table or view is unknown, SQL is synthesized to suit the input. This is done either via the profile or profile_full user defined function.

Function Syntax

Generate basic profile information (subset of predefined aggregate functions) for all columns of the input table.

profile( input_table )

Generate full profile information (all predefined aggregate functions) for all columns of the input table.

profile_full( input_table, 
              buckets
            )

Examples
  1. Generate basic profile information.
    SELECT * FROM profile( 'pg_catalog.pg_tables');
    
    Result:
     schema_name | table_name | column_name |       function    | value 
     ------------+------------+-------------+-------------------+-------
     pg_catalog  | pg_tables  | *           | COUNT()           | 105
     pg_catalog  | pg_tables  | schemaname  | fmsketch_dcount() | 6
     pg_catalog  | pg_tables  | tablename   | fmsketch_dcount() | 104
     pg_catalog  | pg_tables  | tableowner  | fmsketch_dcount() | 2
     pg_catalog  | pg_tables  | tablespace  | fmsketch_dcount() | 1
     pg_catalog  | pg_tables  | hasindexes  | fmsketch_dcount() | 2
     pg_catalog  | pg_tables  | hasrules    | fmsketch_dcount() | 1
     pg_catalog  | pg_tables  | hastriggers | fmsketch_dcount() | 2
    (8 rows)
    
  2. Generate full profile information.
    SELECT * FROM profile_full( 'pg_catalog.pg_tables', 
                                 5
                              );
    
    Result:
                              
     schema_name | table_name | column_name |                        function                 |                                               value                                                
     ------------+------------+-------------+-------------------------------------------------+----------------------------------------------------------------------------------------------------
     pg_catalog  | pg_tables  | *           | COUNT()                                         | 105
     pg_catalog  | pg_tables  | schemaname  | fmsketch_dcount()                               | 6
     pg_catalog  | pg_tables  | schemaname  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:4]={pg_catalog:68,public:19,information_schema:7,gp_toolkit:5,maddy:5}
     pg_catalog  | pg_tables  | schemaname  | array_collapse(mfvsketch_top_histogram((),5))   | [0:4]={pg_catalog:68,public:19,information_schema:7,gp_toolkit:5,maddy:5}
     pg_catalog  | pg_tables  | tablename   | fmsketch_dcount()                               | 104
     pg_catalog  | pg_tables  | tablename   | array_collapse(mfvsketch_quick_histogram((),5)) | [0:4]={migrationhistory:2,pg_statistic:1,sql_features:1,sql_implementation_info:1,sql_languages:1}
     pg_catalog  | pg_tables  | tablename   | array_collapse(mfvsketch_top_histogram((),5))   | [0:4]={migrationhistory:2,pg_statistic:1,sql_features:1,sql_implementation_info:1,sql_languages:1}
     pg_catalog  | pg_tables  | tableowner  | fmsketch_dcount()                               | 2
     pg_catalog  | pg_tables  | tableowner  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:1]={agorajek:104,alex:1}
     pg_catalog  | pg_tables  | tableowner  | array_collapse(mfvsketch_top_histogram((),5))   | [0:1]={agorajek:104,alex:1}
     pg_catalog  | pg_tables  | tablespace  | fmsketch_dcount()                               | 1
     pg_catalog  | pg_tables  | tablespace  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:0]={pg_global:28}
     pg_catalog  | pg_tables  | tablespace  | array_collapse(mfvsketch_top_histogram((),5))   | [0:0]={pg_global:28}
     pg_catalog  | pg_tables  | hasindexes  | fmsketch_dcount()                               | 2
     pg_catalog  | pg_tables  | hasindexes  | array_collapse(mfvsketch_quick_histogram((),5)) | [0:1]={t:59,f:46}
     pg_catalog  | pg_tables  | hasindexes  | array_collapse(mfvsketch_top_histogram((),5))   | [0:1]={t:59,f:46}
     pg_catalog  | pg_tables  | hasrules    | fmsketch_dcount()                               | 1
     pg_catalog  | pg_tables  | hasrules    | array_collapse(mfvsketch_quick_histogram((),5)) | [0:0]={f:105}
     pg_catalog  | pg_tables  | hasrules    | array_collapse(mfvsketch_top_histogram((),5))   | [0:0]={f:105}
     pg_catalog  | pg_tables  | hastriggers | fmsketch_dcount()                               | 2
     pg_catalog  | pg_tables  | hastriggers | array_collapse(mfvsketch_quick_histogram((),5)) | [0:1]={f:102,t:3}
     pg_catalog  | pg_tables  | hastriggers | array_collapse(mfvsketch_top_histogram((),5))   | [0:1]={f:102,t:3}
    (22 rows)
    

Implementation Notes

Because some of the aggregate functions used in profile return multi-dimensional arrays, which are not easily handled in pl/python, we are using array_collapse function to collaps the n-dim arrays to 1-dim arrays. All values of 2 and upper dimensions are separated with ':' character.

Related Topics
File profile.sql_in documenting SQL functions.