User Documentation
 All Files Functions Groups
Sketch-based Estimators
+ Collaboration diagram for Sketch-based Estimators:

Modules

 CountMin (Cormode-Muthukrishnan)
 
 FM (Flajolet-Martin)
 
 MFV (Most Frequent Values)
 

Detailed Description

Warning
This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change.
About:

Sketches (sometimes called "synopsis data structures") are small randomized in-memory data structures that capture statistical properties of a large set of values (e.g. a column of a table). Sketches can be formed in a single pass of the data, and used to approximate a variety of descriptive statistics.

We implement sketches as SQL User-Defined Aggregates (UDAs). Because they are single-pass, small-space and parallelized, a single query can use many sketches to gather summary statistics on many columns of a table efficiently.

This module currently implements user-defined aggregates based on three main sketch methods:

Note: Features marked with a star (*) only work for discrete types that can be cast to int8.

Implementation Notes:
The sketch methods consists of a number of SQL UDAs (user defined aggregates) and UDFs (user defined functions), to be used directly in SQL queries.