User Documentation
 All Files Functions Groups
sketch.sql_in File Reference

SQL functions for sketch-based approximations of descriptive statistics. More...

Go to the source code of this file.

Functions

aggregate int8 fmsketch_dcount (anyelement column)
 Flajolet-Martin's distinct count estimation. More...
 
aggregate text cmsketch (int8 column)
 cmsketch is a UDA that can be run on columns of type int8, or any column that can be cast to an int8. It produces a base64 string representing a CountMin sketch: a large array of counters that is intended to be passed into a UDF like cmsketch_width_histogram described below.
 
int8 cmsketch_count (text sketches64, int8 val)
 cmsketch_count is a scalar UDF to compute the approximate number of occurences of a value in a column summarized by a cmsketch. Takes the results of the cmsketch aggregate as its first argument, and the desired value as the second.
 
int8 cmsketch_rangecount (text sketches64, int8 bot, int8 top)
 cmsketch_rangecount is a scalar UDF to approximate the number of occurrences of values in the range [lo,hi] inclusive, given a cmsketch of a column. Takes the results of the cmsketch aggregate as its first argument, and the desired range boundaries as the second and third.
 
int8 cmsketch_centile (text sketches64, int8 centile, int8 cnt)
 cmsketch_centile is a scalar UDF to compute a centile value from a cmsketch. Takes the results of the cmsketch aggregate as its first argument, a number between 1 and 99 as the desired centile in the second, and the count of the column as the third. Produces a value from the sketched column that is approximately at the centile's position in sorted order.
 
int8 cmsketch_median (text sketches64, int8 cnt)
 cmsketch_median is a scalar UDF to compute a median value from a cmsketch. Takes the results of the cmsketch aggregate as its first argument, and the count as the second. Produces a value from the sketched column that is approximately at the halfway position in sorted order.
 
text cmsketch_width_histogram (text sketches64, int8 themin, int8 themax, int4 nbuckets)
 cmsketch_width_histogram is a scalar UDF that takes three aggregates of a column – cmsketch, min and max– as well as a number of buckets, and produces an n-bucket histogram for the column where each bucket has approximately the same width. The output is a text string containing triples {lo, hi, count} representing the buckets; counts are approximate.
 
text cmsketch_depth_histogram (text sketches64, int4 nbuckets)
 cmsketch_depth_histogram is a UDA that takes a cmsketch and a number of buckets n, and produces an n-bucket histogram for the column where each bucket has approximately the same count. The output is a text string containing triples {lo, hi, count} representing the buckets; counts are approximate. Note that an equi-depth histogram is equivalent to a spanning set of equi-spaced centiles.
 
aggregate text[][] mfvsketch_top_histogram (anyelement column, int4 number_of_buckets)
 Produces an n-bucket histogram for a column where each bucket counts one of the most frequent values in the column. The output is an array of doubles {value, count} in descending order of frequency; counts are approximated via CountMin sketches. Ties are handled arbitrarily.
 
aggregate text[][] mfvsketch_quick_histogram (anyelement column, int4 number_of_buckets)
 On Postgres it works the same way as mfvsketch_top_histogram but, in Greenplum it does parallel aggregation to provide a "quick and dirty" answer.
 

Detailed Description

Date
April 2011
See Also
For a brief introduction to sketches, see the module description Sketch-based Estimators

Definition in file sketch.sql_in.

Function Documentation

aggregate int8 fmsketch_dcount ( anyelement  column)
Parameters
columnname

Definition at line 313 of file sketch.sql_in.