User Documentation
 All Files Functions Groups
MFV (Most Frequent Values)
+ Collaboration diagram for MFV (Most Frequent Values):
Warning
This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change.
About:
MFVSketch: Most Frequent Values variant of CountMin sketch, implemented as a UDA.
Usage:
Produces an n-bucket histogram for a column where each bucket counts one of the most frequent values in the column. The output is an array of doubles {value, count} in descending order of frequency; counts are approximated via CountMin sketches. Ties are handled arbitrarily.
SELECT mfvsketch_top_histogram(col_name,n) FROM table_name;
SELECT mfvsketch_top_histogram(col_name,n) FROM table_name;

The MFV frequent-value UDA comes in two different versions:

In PostgreSQL the two UDAs are identical. In Greenplum, the quick version should produce good results unless the number of values requested is very small, or the distribution is very flat.

Examples:
  1. Generate some data
    sql> CREATE TABLE data(class INT, a1 INT);
    sql> INSERT INTO data SELECT 1,1 FROM generate_series(1,10000);
    sql> INSERT INTO data SELECT 1,2 FROM generate_series(1,15000);
    sql> INSERT INTO data SELECT 1,3 FROM generate_series(1,10000);
    sql> INSERT INTO data SELECT 2,5 FROM generate_series(1,1000);
    sql> INSERT INTO data SELECT 2,6 FROM generate_series(1,1000);
    
  2. Produce histogram of 5 bins and return the most frequent value and associated count in each bin:
    sql> SELECT mfvsketch_top_histogram(a1,5) FROM data;
    
                    mfvsketch_top_histogram
    --------------------------------------------------------------
    [0:4][0:1]={{2,15000},{1,10000},{3,10000},{5,1000},{6,1000}}
    (1 row)
    
Literature:
This method is not usually called an MFV sketch in the literature; it is a natural extension of the CountMin sketch.
See Also
File sketch.sql_in documenting the SQL functions.

Module CountMin (Cormode-Muthukrishnan).