User Documentation
 All Files Functions Groups
bayes.sql_in
Go to the documentation of this file.
1 /* ----------------------------------------------------------------------- *//**
2  *
3  * @file bayes.sql_in
4  *
5  * @brief SQL functions for naive Bayes
6  * @date January 2011
7  *
8  * @sa For a brief introduction to Naive Bayes Classification, see the module
9  * description \ref grp_bayes.
10  *
11  *//* ----------------------------------------------------------------------- */
12 
13 m4_include(`SQLCommon.m4')
14 
15 /**
16 @addtogroup grp_bayes
17 
18 \warning <em> This MADlib method is still in early stage development. There may be some
19 issues that will be addressed in a future version. Interface and implementation
20 is subject to change. </em>
21 
22 @about
23 
24 Naive Bayes refers to a stochastic model where all independent variables
25 \f$ a_1, \dots, a_n \f$ (often referred to as attributes in this context)
26 independently contribute to the probability that a data point belongs to a
27 certain class \f$ c \f$. In detail, \b Bayes' theorem states that
28 \f[
29  \Pr(C = c \mid A_1 = a_1, \dots, A_n = a_n)
30  = \frac{\Pr(C = c) \cdot \Pr(A_1 = a_1, \dots, A_n = a_n \mid C = c)}
31  {\Pr(A_1 = a_1, \dots, A_n = a_n)}
32  \,,
33 \f]
34 and the \b naive assumption is that
35 \f[
36  \Pr(A_1 = a_1, \dots, A_n = a_n \mid C = c)
37  = \prod_{i=1}^n \Pr(A_i = a_i \mid C = c)
38  \,.
39 \f]
40 Naives Bayes classification estimates feature probabilities and class priors
41 using maximum likelihood or Laplacian smoothing. These parameters are then used
42 to classifying new data.
43 
44 A Naive Bayes classifier computes the following formula:
45 \f[
46  \text{classify}(a_1, ..., a_n)
47  = \arg\max_c \left\{
48  \Pr(C = c) \cdot \prod_{i=1}^n \Pr(A_i = a_i \mid C = c)
49  \right\}
50 \f]
51 where \f$ c \f$ ranges over all classes in the training data and probabilites
52 are estimated with relative frequencies from the training set.
53 There are different ways to estimate the feature probabilities
54 \f$ P(A_i = a \mid C = c) \f$. The maximum likelihood estimate takes the
55 relative frequencies. That is:
56 \f[
57  P(A_i = a \mid C = c) = \frac{\#(c,i,a)}{\#c}
58 \f]
59 where
60 - \f$ \#(c,i,a) \f$ denotes the # of training samples where attribute \f$ i \f$
61  is \f$ a \f$ and class is \f$ c \f$
62 - \f$ \#c \f$ denotes the # of training samples where class is \f$ c \f$.
63 
64 Since the maximum likelihood sometimes results in estimates of "0", you might
65 want to use a "smoothed" estimate. To do this, you add a number of "virtual"
66 samples and make the assumption that these samples are evenly distributed among
67 the values assumed by attribute \f$ i \f$ (that is, the set of all values
68 observed for attribute \f$ a \f$ for any class):
69 
70 \f[
71  P(A_i = a \mid C = c) = \frac{\#(c,i,a) + s}{\#c + s \cdot \#i}
72 \f]
73 where
74 - \f$ \#i \f$ denotes the # of distinct values for attribute \f$ i \f$ (for all
75  classes)
76 - \f$ s \geq 0 \f$ denotes the smoothing factor.
77 
78 The case \f$ s = 1 \f$ is known as "Laplace smoothing". The case \f$ s = 0 \f$
79 trivially reduces to maximum-likelihood estimates.
80 
81 \b Note:
82 (1) The probabilities computed on the platforms of PostgreSQL and Greenplum
83 database have a small difference due to the nature of floating point
84 computation. Usually this is not important. However, if a data point has
85 \f[
86 P(C=c_i \mid A) \approx P(C=c_j \mid A)
87 \f]
88 for two classes, this data point might be classified into diferent classes on
89 PostgreSQL and Greenplum. This leads to the differences in classifications
90 on PostgreSQL and Greenplum for some data sets, but this should not
91 affect the quality of the results.
92 
93 (2) When two classes have equal and highest probability among all classes,
94 the classification result is an array of these two classes, but the order
95 of the two classes is random.
96 
97 (3) The current implementation of Naive Bayes classification is only suitable
98 for discontinuous (categorial) attributes.
99 
100 For continuous data, a typical assumption, usually used for small datasets,
101 is that the continuous values associated with each class are distributed
102 according to a Gaussian distribution,
103 and then the probabilities \f$ P(A_i = a \mid C=c) \f$ can be estimated.
104 Another common technique for handling continuous values, which is better for
105 large data sets, is to use binning to discretize the values, and convert the
106 continuous data into categorical bins. These approaches are currently not
107 implemented and planned for future releases.
108 
109 (4) One can still provide floating point data to the naive Bayes
110 classification function. Floating point numbers can be used as symbolic
111 substitutions for categorial data. The classification would work best if
112 there are sufficient data points for each floating point attribute. However,
113 if floating point numbers are used as continuous data, no warning is raised and
114 the result may not be as expected.
115 
116 @input
117 
118 The <b>training data</b> is expected to be of the following form:
119 <pre>{TABLE|VIEW} <em>trainingSource</em> (
120  ...
121  <em>trainingClassColumn</em> INTEGER,
122  <em>trainingAttrColumn</em> INTEGER[],
123  ...
124 )</pre>
125 
126 The <b>data to classify</b> is expected to be of the following form:
127 <pre>{TABLE|VIEW} <em>classifySource</em> (
128  ...
129  <em>classifyKeyColumn</em> ANYTYPE,
130  <em>classifyAttrColumn</em> INTEGER[],
131  ...
132 )</pre>
133 
134 @usage
135 
136 - Precompute feature probabilities and class priors:
137  <pre>SELECT \ref create_nb_prepared_data_tables(
138  '<em>trainingSource</em>', '<em>trainingClassColumn</em>', '<em>trainingAttrColumn</em>',
139  <em>numAttrs</em>, '<em>featureProbsName</em>', '<em>classPriorsName</em>'
140  );</pre>
141  This creates table <em>featureProbsName</em> for storing feature
142  probabilities and table <em>classPriorsName</em> for storing the class priors.
143 - Perform Naive Bayes classification:
144  <pre>SELECT \ref create_nb_classify_view(
145  '<em>featureProbsName</em>', '<em>classPriorsName</em>',
146  '<em>classifySource</em>', '<em>classifyKeyColumn</em>', '<em>classifyAttrColumn</em>',
147  <em>numAttrs</em>, '<em>destName</em>'
148  );</pre>
149  This creates the view <tt><em>destName</em></tt> mapping
150  <em>classifyKeyColumn</em> to the Naive Bayes classification:
151  <pre>key | nb_classification
152 ----+------------------
153 ...</pre>
154 - Compute Naive Bayes probabilities:
155  <pre>SELECT \ref create_nb_probs_view(
156  '<em>featureProbsName</em>', '<em>classPriorsName</em>',
157  '<em>classifySource</em>', '<em>classifyKeyColumn</em>', '<em>classifyAttrColumn</em>',
158  <em>numAttrs</em>, '<em>destName</em>'
159 );</pre>
160  This creates the view <tt><em>destName</em></tt> mapping
161  <em>classifyKeyColumn</em> and every single class to the Naive Bayes
162  probability:
163  <pre>key | class | nb_prob
164 ----+-------+--------
165 ...</pre>
166 - Ad-hoc execution (no precomputation):
167  Functions \ref create_nb_classify_view and
168  \ref create_nb_probs_view can be used in an ad-hoc fashion without the above
169  precomputation step. In this case, replace the function arguments
170  <pre>'<em>featureProbsName</em>', '<em>classPriorsName</em>'</pre>
171  with
172  <pre>'<em>trainingSource</em>', '<em>trainingClassColumn</em>', '<em>trainingAttrColumn</em>'</pre>
173 
174 @examp
175 
176 The following is an extremely simplified example of the above option #1 which
177 can by verified by hand.
178 
179 -# The training and the classification data:
180 \verbatim
181 sql> SELECT * FROM training;
182  id | class | attributes
183 ----+-------+------------
184  1 | 1 | {1,2,3}
185  2 | 1 | {1,2,1}
186  3 | 1 | {1,4,3}
187  4 | 2 | {1,2,2}
188  5 | 2 | {0,2,2}
189  6 | 2 | {0,1,3}
190 (6 rows)
191 
192 sql> select * from toclassify;
193  id | attributes
194 ----+------------
195  1 | {0,2,1}
196  2 | {1,2,3}
197 (2 rows)
198 \endverbatim
199 -# Precompute feature probabilities and class priors
200 \verbatim
201 sql> SELECT madlib.create_nb_prepared_data_tables(
202 'training', 'class', 'attributes', 3, 'nb_feature_probs', 'nb_class_priors');
203 \endverbatim
204 -# Optionally check the contents of the precomputed tables:
205 \verbatim
206 sql> SELECT * FROM nb_class_priors;
207  class | class_cnt | all_cnt
208 -------+-----------+---------
209  1 | 3 | 6
210  2 | 3 | 6
211 (2 rows)
212 
213 sql> SELECT * FROM nb_feature_probs;
214  class | attr | value | cnt | attr_cnt
215 -------+------+-------+-----+----------
216  1 | 1 | 0 | 0 | 2
217  1 | 1 | 1 | 3 | 2
218  1 | 2 | 1 | 0 | 3
219  1 | 2 | 2 | 2 | 3
220 ...
221 \endverbatim
222 -# Create the view with Naive Bayes classification and check the results:
223 \verbatim
224 sql> SELECT madlib.create_nb_classify_view (
225 'nb_feature_probs', 'nb_class_priors', 'toclassify', 'id', 'attributes', 3, 'nb_classify_view_fast');
226 
227 sql> SELECT * FROM nb_classify_view_fast;
228  key | nb_classification
229 -----+-------------------
230  1 | {2}
231  2 | {1}
232 (2 rows)
233 \endverbatim
234 -# Look at the probabilities for each class (note that we use "Laplacian smoothing"):
235 \verbatim
236 sql> SELECT madlib.create_nb_probs_view (
237 'nb_feature_probs', 'nb_class_priors', 'toclassify', 'id', 'attributes', 3, 'nb_probs_view_fast');
238 
239 sql> SELECT * FROM nb_probs_view_fast;
240  key | class | nb_prob
241 -----+-------+---------
242  1 | 1 | 0.4
243  1 | 2 | 0.6
244  2 | 1 | 0.75
245  2 | 2 | 0.25
246 (4 rows)
247 \endverbatim
248 
249 @literature
250 
251 [1] Tom Mitchell: Machine Learning, McGraw Hill, 1997. Book chapter
252  <em>Generativ and Discriminative Classifiers: Naive Bayes and Logistic
253  Regression</em> available at: http://www.cs.cmu.edu/~tom/NewChapters.html
254 
255 [2] Wikipedia, Naive Bayes classifier,
256  http://en.wikipedia.org/wiki/Naive_Bayes_classifier
257 
258 @sa File bayes.sql_in documenting the SQL functions.
259 
260 @internal
261 @sa namespace bayes (documenting the implementation in Python)
262 @endinternal
263 
264 */
265 
266 -- Begin of argmax definition
267 
268 CREATE TYPE MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE AS (
269  args INTEGER[],
270  value DOUBLE PRECISION
271 );
272 
273 CREATE FUNCTION MADLIB_SCHEMA.argmax_transition(
274  oldmax MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE,
275  newkey INTEGER,
276  newvalue DOUBLE PRECISION)
277 RETURNS MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE AS
278 $$
279  SELECT CASE WHEN $3 < $1.value OR $2 IS NULL OR ($3 IS NULL AND NOT $1.value IS NULL) THEN $1
280  WHEN $3 = $1.value OR ($3 IS NULL AND $1.value IS NULL AND NOT $1.args IS NULL)
281  THEN ($1.args || $2, $3)::MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE
282  ELSE (array[$2], $3)::MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE
283  END
284 $$
285 LANGUAGE sql IMMUTABLE;
286 
287 CREATE FUNCTION MADLIB_SCHEMA.argmax_combine(
288  max1 MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE,
289  max2 MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE)
290 RETURNS MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE AS
291 $$
292  -- If SQL guaranteed short-circuit evaluation, the following could become
293  -- shorter. Unfortunately, this is not the case.
294  -- Section 6.3.3.3 of ISO/IEC 9075-1:2008 Framework (SQL/Framework):
295  --
296  -- "However, it is implementation-dependent whether expressions are
297  -- actually evaluated left to right, particularly when operands or
298  -- operators might cause conditions to be raised or if the results of the
299  -- expressions can be determined without completely evaluating all parts
300  -- of the expression."
301  --
302  -- Again, the optimizer does its job hopefully.
303  SELECT CASE WHEN $1 IS NULL THEN $2
304  WHEN $2 IS NULL THEN $1
305  WHEN ($1.value = $2.value) OR ($1.value IS NULL AND $2.value IS NULL)
306  THEN ($1.args || $2.args, $1.value)::MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE
307  WHEN $1.value IS NULL OR $1.value < $2.value THEN $2
308  ELSE $1
309  END
310 $$
311 LANGUAGE sql IMMUTABLE;
312 
313 CREATE FUNCTION MADLIB_SCHEMA.argmax_final(
314  finalstate MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE)
315 RETURNS INTEGER[] AS
316 $$
317  SELECT $1.args
318 $$
319 LANGUAGE sql IMMUTABLE;
320 
321 /**
322  * @internal
323  * @brief Argmax: Return the key of the row for which value is maximal
324  *
325  * The "index set" of the argmax function is of type INTEGER and we range over
326  * DOUBLE PRECISION values. It is not required that all keys are distinct.
327  *
328  * @note
329  * argmax should only be used on unsorted data because it will not exploit
330  * indices, and its running time is \f$ \Theta(n) \f$.
331  *
332  * @implementation
333  * The implementation is in SQL, with a flavor of functional programming.
334  * The hope is that the optimizer does a good job here.
335  */
336 CREATE AGGREGATE MADLIB_SCHEMA.argmax(/*+ key */ INTEGER, /*+ value */ DOUBLE PRECISION) (
337  SFUNC=MADLIB_SCHEMA.argmax_transition,
338  STYPE=MADLIB_SCHEMA.ARGS_AND_VALUE_DOUBLE,
339  m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.argmax_combine,')
340  FINALFUNC=MADLIB_SCHEMA.argmax_final
341 );
342 
343 
344 /**
345  * @brief Precompute all class priors and feature probabilities
346  *
347  * Feature probabilities are stored in a table of format
348  * <pre>TABLE <em>featureProbsDestName</em> (
349  * class INTEGER,
350  * attr INTEGER,
351  * value INTEGER,
352  * cnt INTEGER,
353  * attr_cnt INTEGER
354  *)</pre>
355  *
356  * Class priors are stored in a table of format
357  * <pre>TABLE <em>classPriorsDestName</em> (
358  * class INTEGER,
359  * class_cnt INTEGER,
360  * all_cnt INTEGER
361  *)</pre>
362  *
363  * @param trainingSource Name of relation containing the training data
364  * @param trainingClassColumn Name of class column in training data
365  * @param trainingAttrColumn Name of attributes-array column in training data
366  * @param numAttrs Number of attributes to use for classification
367  * @param featureProbsDestName Name of feature-probabilities table to create
368  * @param classPriorsDestName Name of class-priors table to create
369  *
370  * @usage
371  * Precompute feature probabilities and class priors:
372  * <pre>SELECT \ref create_nb_prepared_data_tables(
373  * '<em>trainingSource</em>', '<em>trainingClassColumn</em>', '<em>trainingAttrColumn</em>',
374  * <em>numAttrs</em>, '<em>featureProbsName</em>', '<em>classPriorsName</em>'
375  *);</pre>
376  *
377  * @internal
378  * @sa This function is a wrapper for bayes::create_prepared_data().
379  */
380 CREATE FUNCTION MADLIB_SCHEMA.create_nb_prepared_data_tables(
381  "trainingSource" VARCHAR,
382  "trainingClassColumn" VARCHAR,
383  "trainingAttrColumn" VARCHAR,
384  "numAttrs" INTEGER,
385  "featureProbsDestName" VARCHAR,
386  "classPriorsDestName" VARCHAR)
387 RETURNS VOID
388 AS $$PythonFunction(bayes, bayes, create_prepared_data_table)$$
389 LANGUAGE plpythonu VOLATILE;
390 
391 /**
392  * @brief Create a view with columns <tt>(key, nb_classification)</tt>
393  *
394  * The created relation will be
395  *
396  * <tt>{TABLE|VIEW} <em>destName</em> (key, nb_classification)</tt>
397  *
398  * where \c nb_classification is an array containing the most likely
399  * class(es) of the record in \em classifySource identified by \c key.
400  *
401  * @param featureProbsSource Name of table with precomputed feature
402  * probabilities, as created with create_nb_prepared_data_tables()
403  * @param classPriorsSource Name of table with precomputed class priors, as
404  * created with create_nb_prepared_data_tables()
405  * @param classifySource Name of the relation that contains data to be classified
406  * @param classifyKeyColumn Name of column in \em classifySource that can
407  * serve as unique identifier (the key of the source relation)
408  * @param classifyAttrColumn Name of attributes-array column in \em classifySource
409  * @param numAttrs Number of attributes to use for classification
410  * @param destName Name of the view to create
411  *
412  * @note \c create_nb_classify_view can be called in an ad-hoc fashion. See
413  * \ref grp_bayes for instructions.
414  *
415  * @usage
416  * -# Create Naive Bayes classifications view:
417  * <pre>SELECT \ref create_nb_classify_view(
418  * '<em>featureProbsName</em>', '<em>classPriorsName</em>',
419  * '<em>classifySource</em>', '<em>classifyKeyColumn</em>', '<em>classifyAttrColumn</em>',
420  * <em>numAttrs</em>, '<em>destName</em>'
421  *);</pre>
422  * -# Show Naive Bayes classifications:
423  * <pre>SELECT * FROM <em>destName</em>;</pre>
424  *
425  * @internal
426  * @sa This function is a wrapper for bayes::create_classification(). See there
427  * for details.
428  */
429 CREATE FUNCTION MADLIB_SCHEMA.create_nb_classify_view(
430  "featureProbsSource" VARCHAR,
431  "classPriorsSource" VARCHAR,
432  "classifySource" VARCHAR,
433  "classifyKeyColumn" VARCHAR,
434  "classifyAttrColumn" VARCHAR,
435  "numAttrs" INTEGER,
436  "destName" VARCHAR)
437 RETURNS VOID
438 AS $$PythonFunction(bayes, bayes, create_classification_view)$$
439 LANGUAGE plpythonu VOLATILE;
440 
441 CREATE FUNCTION MADLIB_SCHEMA.create_nb_classify_view(
442  "trainingSource" VARCHAR,
443  "trainingClassColumn" VARCHAR,
444  "trainingAttrColumn" VARCHAR,
445  "classifySource" VARCHAR,
446  "classifyKeyColumn" VARCHAR,
447  "classifyAttrColumn" VARCHAR,
448  "numAttrs" INTEGER,
449  "destName" VARCHAR)
450 RETURNS VOID
451 AS $$PythonFunction(bayes, bayes, create_classification_view)$$
452 LANGUAGE plpythonu VOLATILE;
453 
454 
455 /**
456  * @brief Create view with columns <tt>(key, class, nb_prob)</tt>
457  *
458  * The created view will be of the following form:
459  *
460  * <pre>VIEW <em>destName</em> (
461  * key ANYTYPE,
462  * class INTEGER,
463  * nb_prob FLOAT8
464  *)</pre>
465  *
466  * where \c nb_prob is the Naive-Bayes probability that \c class is the true
467  * class of the record in \em classifySource identified by \c key.
468  *
469  * @param featureProbsSource Name of table with precomputed feature
470  * probabilities, as created with create_nb_prepared_data_tables()
471  * @param classPriorsSource Name of table with precomputed class priors, as
472  * created with create_nb_prepared_data_tables()
473  * @param classifySource Name of the relation that contains data to be classified
474  * @param classifyKeyColumn Name of column in \em classifySource that can
475  * serve as unique identifier (the key of the source relation)
476  * @param classifyAttrColumn Name of attributes-array column in \em classifySource
477  * @param numAttrs Number of attributes to use for classification
478  * @param destName Name of the view to create
479  *
480  * @note \c create_nb_probs_view can be called in an ad-hoc fashion. See
481  * \ref grp_bayes for instructions.
482  *
483  * @usage
484  * -# Create Naive Bayes probabilities view:
485  * <pre>SELECT \ref create_nb_probs_view(
486  * '<em>featureProbsName</em>', '<em>classPriorsName</em>',
487  * '<em>classifySource</em>', '<em>classifyKeyColumn</em>', '<em>classifyAttrColumn</em>',
488  * <em>numAttrs</em>, '<em>destName</em>'
489  *);</pre>
490  * -# Show Naive Bayes probabilities:
491  * <pre>SELECT * FROM <em>destName</em>;</pre>
492  *
493  * @internal
494  * @sa This function is a wrapper for bayes::create_bayes_probabilities().
495  */
496 CREATE FUNCTION MADLIB_SCHEMA.create_nb_probs_view(
497  "featureProbsSource" VARCHAR,
498  "classPriorsSource" VARCHAR,
499  "classifySource" VARCHAR,
500  "classifyKeyColumn" VARCHAR,
501  "classifyAttrColumn" VARCHAR,
502  "numAttrs" INTEGER,
503  "destName" VARCHAR)
504 RETURNS VOID
505 AS $$PythonFunction(bayes, bayes, create_bayes_probabilities_view)$$
506 LANGUAGE plpythonu VOLATILE;
507 
508 CREATE FUNCTION MADLIB_SCHEMA.create_nb_probs_view(
509  "trainingSource" VARCHAR,
510  "trainingClassColumn" VARCHAR,
511  "trainingAttrColumn" VARCHAR,
512  "classifySource" VARCHAR,
513  "classifyKeyColumn" VARCHAR,
514  "classifyAttrColumn" VARCHAR,
515  "numAttrs" INTEGER,
516  "destName" VARCHAR)
517 RETURNS VOID
518 AS $$PythonFunction(bayes, bayes, create_bayes_probabilities_view)$$
519 LANGUAGE plpythonu VOLATILE;
520 
521 
522 /**
523  * @brief Create a SQL function mapping arrays of attribute values to the Naive
524  * Bayes classification.
525  *
526  * The created SQL function is bound to the given feature probabilities and
527  * class priors. Its declaration will be:
528  *
529  * <tt>
530  * FUNCTION <em>destName</em> (attributes INTEGER[], smoothingFactor DOUBLE PRECISION)
531  * RETURNS INTEGER[]</tt>
532  *
533  * The return type is \c INTEGER[] because the Naive Bayes classification might
534  * be ambiguous (in which case all of the most likely candiates are returned).
535  *
536  * @param featureProbsSource Name of table with precomputed feature
537  * probabilities, as created with create_nb_prepared_data_tables()
538  * @param classPriorsSource Name of table with precomputed class priors, as
539  * created with create_nb_prepared_data_tables()
540  * @param numAttrs Number of attributes to use for classification
541  * @param destName Name of the function to create
542  *
543  * @note
544  * Just like \ref create_nb_classify_view and \ref create_nb_probs_view,
545  * also \c create_nb_classify_fn can be called in an ad-hoc fashion. See
546  * \ref grp_bayes for instructions.
547  *
548  * @usage
549  * -# Create classification function:
550  * <pre>SELECT create_nb_classify_fn(
551  * '<em>featureProbsSource</em>', '<em>classPriorsSource</em>',
552  * <em>numAttrs</em>, '<em>destName</em>'
553  *);</pre>
554  * -# Run classification function:
555  * <pre>SELECT <em>destName</em>(<em>attributes</em>, <em>smoothingFactor</em>);</pre>
556  *
557  * @note
558  * On Greenplum, the generated SQL function can only be called on the master.
559  *
560  * @internal
561  * @sa This function is a wrapper for bayes::create_classification_function().
562  */
563 CREATE FUNCTION MADLIB_SCHEMA.create_nb_classify_fn(
564  "featureProbsSource" VARCHAR,
565  "classPriorsSource" VARCHAR,
566  "numAttrs" INTEGER,
567  "destName" VARCHAR)
568 RETURNS VOID
569 AS $$PythonFunction(bayes, bayes, create_classification_function)$$
570 LANGUAGE plpythonu VOLATILE;
571 
572 CREATE FUNCTION MADLIB_SCHEMA.create_nb_classify_fn(
573  "trainingSource" VARCHAR,
574  "trainingClassColumn" VARCHAR,
575  "trainingAttrColumn" VARCHAR,
576  "numAttrs" INTEGER,
577  "destName" VARCHAR)
578 RETURNS VOID
579 AS $$PythonFunction(bayes, bayes, create_classification_function)$$
580 LANGUAGE plpythonu VOLATILE;