MADlib
1.1 A newer version is available
User Documentation
|
SQL functions for sketch-based approximations of descriptive statistics. More...
Go to the source code of this file.
Functions | |
aggregate int8 | fmsketch_dcount (anyelement column) |
Flajolet-Martin's distinct count estimation. More... | |
aggregate text | cmsketch (int8 column) |
cmsketch is a UDA that can be run on columns of type int8, or any column that can be cast to an int8. It produces a base64 string representing a CountMin sketch: a large array of counters that is intended to be passed into a UDF like cmsketch_width_histogram described below. | |
int8 | cmsketch_count (text sketches64, int8 val) |
cmsketch_count is a scalar UDF to compute the approximate number of occurences of a value in a column summarized by a cmsketch. Takes the results of the cmsketch aggregate as its first argument, and the desired value as the second. | |
int8 | cmsketch_rangecount (text sketches64, int8 bot, int8 top) |
cmsketch_rangecount is a scalar UDF to approximate the number of occurrences of values in the range [lo,hi] inclusive, given a cmsketch of a column. Takes the results of the cmsketch aggregate as its first argument, and the desired range boundaries as the second and third. | |
int8 | cmsketch_centile (text sketches64, int8 centile, int8 cnt) |
cmsketch_centile is a scalar UDF to compute a centile value from a cmsketch. Takes the results of the cmsketch aggregate as its first argument, a number between 1 and 99 as the desired centile in the second, and the count of the column as the third. Produces a value from the sketched column that is approximately at the centile's position in sorted order. | |
int8 | cmsketch_median (text sketches64, int8 cnt) |
cmsketch_median is a scalar UDF to compute a median value from a cmsketch. Takes the results of the cmsketch aggregate as its first argument, and the count as the second. Produces a value from the sketched column that is approximately at the halfway position in sorted order. | |
text | cmsketch_width_histogram (text sketches64, int8 themin, int8 themax, int4 nbuckets) |
cmsketch_width_histogram is a scalar UDF that takes three aggregates of a column – cmsketch, min and max– as well as a number of buckets, and produces an n-bucket histogram for the column where each bucket has approximately the same width. The output is a text string containing triples {lo, hi, count} representing the buckets; counts are approximate. | |
text | cmsketch_depth_histogram (text sketches64, int4 nbuckets) |
cmsketch_depth_histogram is a UDA that takes a cmsketch and a number of buckets n, and produces an n-bucket histogram for the column where each bucket has approximately the same count. The output is a text string containing triples {lo, hi, count} representing the buckets; counts are approximate. Note that an equi-depth histogram is equivalent to a spanning set of equi-spaced centiles. | |
aggregate text[][] | mfvsketch_top_histogram (anyelement column, int4 number_of_buckets) |
Produces an n-bucket histogram for a column where each bucket counts one of the most frequent values in the column. The output is an array of doubles {value, count} in descending order of frequency; counts are approximated via CountMin sketches. Ties are handled arbitrarily. | |
aggregate text[][] | mfvsketch_quick_histogram (anyelement column, int4 number_of_buckets) |
On Postgres it works the same way as mfvsketch_top_histogram but, in Greenplum it does parallel aggregation to provide a "quick and dirty" answer. | |
Definition in file sketch.sql_in.
aggregate int8 fmsketch_dcount | ( | anyelement | column) |
column | name |
Definition at line 313 of file sketch.sql_in.