2.1.0
User Documentation for Apache MADlib
sketch.sql_in File Reference

SQL functions for sketch-based approximations of descriptive statistics. More...

Functions

bytea big_or (bytea bitmap1, bytea bitmap2)
 
bytea __fmsketch_trans (bytea bitmaps, anyelement input)
 
int8 __fmsketch_count_distinct (bytea bitmaps)
 
bytea __fmsketch_merge (bytea bitmaps1, bytea bitmaps2)
 
aggregate bytea fmsketch_dcount (anyelement column)
 Flajolet-Martin's distinct count estimation. More...
 
bytea __cmsketch_int8_trans (bytea bitmaps, int8 input)
 
bytea __cmsketch_int8_trans (bytea bitmaps, int8 input, int8 arg1)
 
bytea __cmsketch_int8_trans (bytea bitmaps, int8 input, int8 arg1, int8 arg2)
 
bytea __cmsketch_int8_trans (bytea bitmaps, int8 input, int8 arg1, int8 arg2, int8 arg3)
 
text __cmsketch_base64_final (bytea sketch)
 
bytea __cmsketch_merge (bytea, bytea)
 
aggregate bytea cmsketch (int8 column)
 cmsketch is a UDA that can be run on columns of type int8, or any column that can be cast to an int8. It produces a base64 string representing a CountMin sketch: a large array of counters that is intended to be passed into a UDF like cmsketch_width_histogram described below. More...
 
int8 cmsketch_count (text sketches64, int8 val)
 cmsketch_count is a scalar UDF to compute the approximate number of occurences of a value in a column summarized by a cmsketch. Takes the results of the cmsketch aggregate as its first argument, and the desired value as the second. More...
 
int8 cmsketch_rangecount (text sketches64, int8 bot, int8 top)
 cmsketch_rangecount is a scalar UDF to approximate the number of occurrences of values in the range [lo,hi] inclusive, given a cmsketch of a column. Takes the results of the cmsketch aggregate as its first argument, and the desired range boundaries as the second and third. More...
 
int8 cmsketch_centile (text sketches64, int8 centile, int8 cnt)
 cmsketch_centile is a scalar UDF to compute a centile value from a cmsketch. Takes the results of the cmsketch aggregate as its first argument, a number between 1 and 99 as the desired centile in the second, and the count of the column as the third. Produces a value from the sketched column that is approximately at the centile's position in sorted order. More...
 
int8 cmsketch_median (text sketches64, int8 cnt)
 cmsketch_median is a scalar UDF to compute a median value from a cmsketch. Takes the results of the cmsketch aggregate as its first argument, and the count as the second. Produces a value from the sketched column that is approximately at the halfway position in sorted order. More...
 
text cmsketch_width_histogram (text sketches64, int8 themin, int8 themax, int4 nbuckets)
 cmsketch_width_histogram is a scalar UDF that takes three aggregates of a column – cmsketch, min and max– as well as a number of buckets, and produces an n-bucket histogram for the column where each bucket has approximately the same width. The output is a text string containing triples {lo, hi, count} representing the buckets; counts are approximate. More...
 
text cmsketch_depth_histogram (text sketches64, int4 nbuckets)
 cmsketch_depth_histogram is a UDA that takes a cmsketch and a number of buckets n, and produces an n-bucket histogram for the column where each bucket has approximately the same count. The output is a text string containing triples {lo, hi, count} representing the buckets; counts are approximate. Note that an equi-depth histogram is equivalent to a spanning set of equi-spaced centiles. More...
 
bytea __mfvsketch_trans (bytea, anyelement, int4)
 
text [][] __mfvsketch_final (bytea)
 
bytea __mfvsketch_merge (bytea, bytea)
 
integer __sketch_rightmost_one (bytea, integer, integer)
 
integer __sketch_leftmost_zero (bytea, integer, integer)
 
bytea __sketch_array_set_bit_in_place (bytea, integer, integer, integer, integer)
 
aggregate text [][] mfvsketch_top_histogram (anyelement column, int4 number_of_buckets)
 Produces an n-bucket histogram for a column where each bucket counts one of the most frequent values in the column. The output is an array of doubles {value, count} in descending order of frequency; counts are approximated via CountMin sketches. Ties are handled arbitrarily. More...
 
aggregate bytea mfvsketch_quick_histogram (anyelement column, int4 number_of_buckets)
 On Postgres it works the same way as mfvsketch_top_histogram but, in Greenplum it does parallel aggregation to provide a "quick and dirty" answer. More...
 

Detailed Description

Date
April 2011
See also
For a brief introduction to sketches, see the module description Cardinality Estimators

Function Documentation

◆ __cmsketch_base64_final()

text __cmsketch_base64_final ( bytea  sketch)

◆ __cmsketch_int8_trans() [1/4]

bytea __cmsketch_int8_trans ( bytea  bitmaps,
int8  input 
)

◆ __cmsketch_int8_trans() [2/4]

bytea __cmsketch_int8_trans ( bytea  bitmaps,
int8  input,
int8  arg1 
)

◆ __cmsketch_int8_trans() [3/4]

bytea __cmsketch_int8_trans ( bytea  bitmaps,
int8  input,
int8  arg1,
int8  arg2 
)

◆ __cmsketch_int8_trans() [4/4]

bytea __cmsketch_int8_trans ( bytea  bitmaps,
int8  input,
int8  arg1,
int8  arg2,
int8  arg3 
)

◆ __cmsketch_merge()

bytea __cmsketch_merge ( bytea  ,
bytea   
)

◆ __fmsketch_count_distinct()

int8 __fmsketch_count_distinct ( bytea  bitmaps)

◆ __fmsketch_merge()

bytea __fmsketch_merge ( bytea  bitmaps1,
bytea  bitmaps2 
)

◆ __fmsketch_trans()

bytea __fmsketch_trans ( bytea  bitmaps,
anyelement  input 
)

◆ __mfvsketch_final()

text [][] __mfvsketch_final ( bytea  )

◆ __mfvsketch_merge()

bytea __mfvsketch_merge ( bytea  ,
bytea   
)

◆ __mfvsketch_trans()

bytea __mfvsketch_trans ( bytea  ,
anyelement  ,
int4   
)

◆ __sketch_array_set_bit_in_place()

bytea __sketch_array_set_bit_in_place ( bytea  ,
integer  ,
integer  ,
integer  ,
integer   
)

◆ __sketch_leftmost_zero()

integer __sketch_leftmost_zero ( bytea  ,
integer  ,
integer   
)

◆ __sketch_rightmost_one()

integer __sketch_rightmost_one ( bytea  ,
integer  ,
integer   
)

◆ big_or()

bytea big_or ( bytea  bitmap1,
bytea  bitmap2 
)

◆ cmsketch()

aggregate bytea cmsketch ( int8  column)

◆ cmsketch_centile()

int8 cmsketch_centile ( text  sketches64,
int8  centile,
int8  cnt 
)

◆ cmsketch_count()

int8 cmsketch_count ( text  sketches64,
int8  val 
)

◆ cmsketch_depth_histogram()

text cmsketch_depth_histogram ( text  sketches64,
int4  nbuckets 
)

◆ cmsketch_median()

int8 cmsketch_median ( text  sketches64,
int8  cnt 
)

◆ cmsketch_rangecount()

int8 cmsketch_rangecount ( text  sketches64,
int8  bot,
int8  top 
)

◆ cmsketch_width_histogram()

text cmsketch_width_histogram ( text  sketches64,
int8  themin,
int8  themax,
int4  nbuckets 
)

◆ fmsketch_dcount()

aggregate bytea fmsketch_dcount ( anyelement  column)
Parameters
columnname

◆ mfvsketch_quick_histogram()

aggregate bytea mfvsketch_quick_histogram ( anyelement  column,
int4  number_of_buckets 
)

◆ mfvsketch_top_histogram()

aggregate text [][] mfvsketch_top_histogram ( anyelement  column,
int4  number_of_buckets 
)