User Documentation
 All Files Functions Groups
FM (Flajolet-Martin)
+ Collaboration diagram for FM (Flajolet-Martin):
Warning
This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change.
About:
Flajolet-Martin's distinct count estimation implemented as a user-defined aggregate.
Usage:
  • Get the number of distinct values in a designated column.
    SELECT fmsketch_dcount(col_name) FROM table_name;
Implementation Notes:
fmsketch_dcount can be run on a column of any type. It returns an approximation to the number of distinct values (a la COUNT(DISTINCT x)), but faster and approximate. Like any aggregate, it can be combined with a GROUP BY clause to do distinct counts per group.
Examples:
  1. Generate some data:
    sql> CREATE TABLE data(class INT, a1 INT);
    sql> INSERT INTO data SELECT 1,1 FROM generate_series(1,10000);
    sql> INSERT INTO data SELECT 1,2 FROM generate_series(1,15000);
    sql> INSERT INTO data SELECT 1,3 FROM generate_series(1,10000);
    sql> INSERT INTO data SELECT 2,5 FROM generate_series(1,1000);
    sql> INSERT INTO data SELECT 2,6 FROM generate_series(1,1000);
    
  2. Find distinct number of values for each class
    sql> SELECT class,fmsketch_dcount(a1) FROM data GROUP BY data.class;
    class | fmsketch_dcount
    -------+-----------------
        2 |               2
        1 |               3
    (2 rows)
    
Literature:
[1] P. Flajolet and N.G. Martin. Probabilistic counting algorithms for data base applications, Journal of Computer and System Sciences 31(2), pp 182-209, 1985. http://algo.inria.fr/flajolet/Publications/FlMa85.pdf
See Also
File sketch.sql_in documenting the SQL function.