User Documentation
bayes.sql_in File Reference

SQL functions for naive Bayes. More...

Go to the source code of this file.

Functions

void create_nb_prepared_data_tables (varchar trainingSource, varchar trainingClassColumn, varchar trainingAttrColumn, integer numAttrs, varchar featureProbsDestName, varchar classPriorsDestName)
 Precompute all class priors and feature probabilities.
void create_nb_classify_view (varchar featureProbsSource, varchar classPriorsSource, varchar classifySource, varchar classifyKeyColumn, varchar classifyAttrColumn, integer numAttrs, varchar destName)
 Create a view with columns (key, nb_classification)
void create_nb_probs_view (varchar featureProbsSource, varchar classPriorsSource, varchar classifySource, varchar classifyKeyColumn, varchar classifyAttrColumn, integer numAttrs, varchar destName)
 Create view with columns (key, class, nb_prob)
void create_nb_classify_fn (varchar featureProbsSource, varchar classPriorsSource, integer numAttrs, varchar destName)
 Create a SQL function mapping arrays of attribute values to the Naive Bayes classification.

Detailed Description

Date:
January 2011
See also:
For a brief introduction to Naive Bayes Classification, see the module description Naive Bayes Classification.

Definition in file bayes.sql_in.


Function Documentation

void create_nb_classify_fn ( varchar  featureProbsSource,
varchar  classPriorsSource,
integer  numAttrs,
varchar  destName 
)

The created SQL function is bound to the given feature probabilities and class priors. Its declaration will be:

FUNCTION destName (attributes INTEGER[], smoothingFactor DOUBLE PRECISION) RETURNS INTEGER[]

The return type is INTEGER[] because the Naive Bayes classification might be ambiguous (in which case all of the most likely candiates are returned).

Parameters:
featureProbsSourceName of table with precomputed feature probabilities, as created with create_nb_prepared_data_tables()
classPriorsSourceName of table with precomputed class priors, as created with create_nb_prepared_data_tables()
numAttrsNumber of attributes to use for classification
destNameName of the function to create
Note:
Just like create_nb_classify_view and create_nb_probs_view, also create_nb_classify_fn can be called in an ad-hoc fashion. See Naive Bayes Classification for instructions.
Usage:
  1. Create classification function:
    SELECT create_nb_classify_fn(
        'featureProbsSource', 'classPriorsSource',
        numAttrs, 'destName'
    );
  2. Run classification function:
    SELECT destName(attributes, smoothingFactor);
Note:
On Greenplum, the generated SQL function can only be called on the master.

Definition at line 581 of file bayes.sql_in.

void create_nb_classify_view ( varchar  featureProbsSource,
varchar  classPriorsSource,
varchar  classifySource,
varchar  classifyKeyColumn,
varchar  classifyAttrColumn,
integer  numAttrs,
varchar  destName 
)

The created relation will be

{TABLE|VIEW} destName (key, nb_classification)

where nb_classification is an array containing the most likely class(es) of the record in classifySource identified by key.

Parameters:
featureProbsSourceName of table with precomputed feature probabilities, as created with create_nb_prepared_data_tables()
classPriorsSourceName of table with precomputed class priors, as created with create_nb_prepared_data_tables()
classifySourceName of the relation that contains data to be classified
classifyKeyColumnName of column in classifySource that can serve as unique identifier (the key of the source relation)
classifyAttrColumnName of attributes-array column in classifySource
numAttrsNumber of attributes to use for classification
destNameName of the view to create
Note:
create_nb_classify_view can be called in an ad-hoc fashion. See Naive Bayes Classification for instructions.
Usage:
  1. Create Naive Bayes classifications view:
    SELECT create_nb_classify_view(
        'featureProbsName', 'classPriorsName',
        'classifySource', 'classifyKeyColumn', 'classifyAttrColumn',
        numAttrs, 'destName'
    );
  2. Show Naive Bayes classifications:
    SELECT * FROM destName;

Definition at line 447 of file bayes.sql_in.

void create_nb_prepared_data_tables ( varchar  trainingSource,
varchar  trainingClassColumn,
varchar  trainingAttrColumn,
integer  numAttrs,
varchar  featureProbsDestName,
varchar  classPriorsDestName 
)

Feature probabilities are stored in a table of format

TABLE featureProbsDestName (
    class INTEGER,
    attr INTEGER,
    value INTEGER,
    cnt INTEGER,
    attr_cnt INTEGER
)

Class priors are stored in a table of format

TABLE classPriorsDestName (
    class INTEGER,
    class_cnt INTEGER,
    all_cnt INTEGER
)
Parameters:
trainingSourceName of relation containing the training data
trainingClassColumnName of class column in training data
trainingAttrColumnName of attributes-array column in training data
numAttrsNumber of attributes to use for classification
featureProbsDestNameName of feature-probabilities table to create
classPriorsDestNameName of class-priors table to create
Usage:
Precompute feature probabilities and class priors:
SELECT create_nb_prepared_data_tables(
    'trainingSource', 'trainingClassColumn', 'trainingAttrColumn',
    numAttrs, 'featureProbsName', 'classPriorsName'
);

Definition at line 398 of file bayes.sql_in.

void create_nb_probs_view ( varchar  featureProbsSource,
varchar  classPriorsSource,
varchar  classifySource,
varchar  classifyKeyColumn,
varchar  classifyAttrColumn,
integer  numAttrs,
varchar  destName 
)

The created view will be of the following form:

VIEW destName (
    key ANYTYPE,
    class INTEGER,
    nb_prob FLOAT8
)

where nb_prob is the Naive-Bayes probability that class is the true class of the record in classifySource identified by key.

Parameters:
featureProbsSourceName of table with precomputed feature probabilities, as created with create_nb_prepared_data_tables()
classPriorsSourceName of table with precomputed class priors, as created with create_nb_prepared_data_tables()
classifySourceName of the relation that contains data to be classified
classifyKeyColumnName of column in classifySource that can serve as unique identifier (the key of the source relation)
classifyAttrColumnName of attributes-array column in classifySource
numAttrsNumber of attributes to use for classification
destNameName of the view to create
Note:
create_nb_probs_view can be called in an ad-hoc fashion. See Naive Bayes Classification for instructions.
Usage:
  1. Create Naive Bayes probabilities view:
    SELECT create_nb_probs_view(
        'featureProbsName', 'classPriorsName',
        'classifySource', 'classifyKeyColumn', 'classifyAttrColumn',
        numAttrs, 'destName'
    );
  2. Show Naive Bayes probabilities:
    SELECT * FROM destName;

Definition at line 514 of file bayes.sql_in.