MADlib
0.7 A newer version is available
User Documentation
|
Modules | |
CountMin (Cormode-Muthukrishnan) | |
FM (Flajolet-Martin) | |
MFV (Most Frequent Values) |
Sketches (sometimes called "synopsis data structures") are small randomized in-memory data structures that capture statistical properties of a large set of values (e.g. a column of a table). Sketches can be formed in a single pass of the data, and used to approximate a variety of descriptive statistics.
We implement sketches as SQL User-Defined Aggregates (UDAs). Because they are single-pass, small-space and parallelized, a single query can use many sketches to gather summary statistics on many columns of a table efficiently.
This module currently implements user-defined aggregates based on three main sketch methods:
COUNT(DISTINCT)
.COUNT(*)
of rows whose column value matches a given value in a setCOUNT(*)
of rows whose column value falls in a range (*)Note: Features marked with a star (*) only work for discrete types that can be cast to int8.