SQL functions for sketch-based approximations of descriptive statistics. More...

Functions
aggregate int8	fmsketch_dcount (anyelement column)
	Flajolet-Martin's distinct count estimation. More...

aggregate text	cmsketch (int8 column)
	`cmsketch` is a UDA that can be run on columns of type int8, or any column that can be cast to an int8. It produces a base64 string representing a CountMin sketch: a large array of counters that is intended to be passed into a UDF like `cmsketch_width_histogram` described below.

int8	cmsketch_count (text sketches64, int8 val)
	`cmsketch_count` is a scalar UDF to compute the approximate number of occurences of a value in a column summarized by a cmsketch. Takes the results of the `cmsketch` aggregate as its first argument, and the desired value as the second.

int8	cmsketch_rangecount (text sketches64, int8 bot, int8 top)
	`cmsketch_rangecount` is a scalar UDF to approximate the number of occurrences of values in the range `[lo,hi]` inclusive, given a cmsketch of a column. Takes the results of the `cmsketch` aggregate as its first argument, and the desired range boundaries as the second and third.

int8	cmsketch_centile (text sketches64, int8 centile, int8 cnt)
	`cmsketch_centile` is a scalar UDF to compute a centile value from a cmsketch. Takes the results of the `cmsketch` aggregate as its first argument, a number between 1 and 99 as the desired centile in the second, and the count of the column as the third. Produces a value from the sketched column that is approximately at the centile's position in sorted order.

int8	cmsketch_median (text sketches64, int8 cnt)
	`cmsketch_median` is a scalar UDF to compute a median value from a cmsketch. Takes the results of the `cmsketch` aggregate as its first argument, and the count as the second. Produces a value from the sketched column that is approximately at the halfway position in sorted order.

text	cmsketch_width_histogram (text sketches64, int8 themin, int8 themax, int4 nbuckets)
	`cmsketch_width_histogram` is a scalar UDF that takes three aggregates of a column – cmsketch, min and max– as well as a number of buckets, and produces an n-bucket histogram for the column where each bucket has approximately the same width. The output is a text string containing triples {lo, hi, count} representing the buckets; counts are approximate.

text	cmsketch_depth_histogram (text sketches64, int4 nbuckets)
	`cmsketch_depth_histogram` is a UDA that takes a cmsketch and a number of buckets n, and produces an n-bucket histogram for the column where each bucket has approximately the same count. The output is a text string containing triples {lo, hi, count} representing the buckets; counts are approximate. Note that an equi-depth histogram is equivalent to a spanning set of equi-spaced centiles.

aggregate text[][]	mfvsketch_top_histogram (anyelement column, int4 number_of_buckets)
	Produces an n-bucket histogram for a column where each bucket counts one of the most frequent values in the column. The output is an array of doubles {value, count} in descending order of frequency; counts are approximated via CountMin sketches. Ties are handled arbitrarily.

aggregate text[][]	mfvsketch_quick_histogram (anyelement column, int4 number_of_buckets)
	On Postgres it works the same way as mfvsketch_top_histogram but, in Greenplum it does parallel aggregation to provide a "quick and dirty" answer.

Detailed Description

Date: April 2011

See Also: For a brief introduction to sketches, see the module description Sketch-based Estimators

Definition in file sketch.sql_in.

Function Documentation

aggregate int8 fmsketch_dcount ( anyelement column)

Parameters

column name

Definition at line 313 of file sketch.sql_in.

Functions

Detailed Description

Function Documentation