User Documentation
 All Files Functions Groups
robust.sql_in
Go to the documentation of this file.
1 /* ----------------------------------------------------------------------- *//**
2  *
3  * @file robust.sql_in
4  *
5  * @brief SQL functions for linear regression
6  * @date January 2011
7  *
8  * @sa Calculates robust statistics for various regression models.
9  *
10  *//* ----------------------------------------------------------------------- */
11 
12 m4_include(`SQLCommon.m4')
13 
14 
15 /**
16 @addtogroup grp_robust
17 
18 @about
19 When doing regression analysis, we are sometimes interested in the variance of the computed coefficients \f$ \boldsymbol c \f$. While the built-in regression functions provide variance estimates, we may prefer a <i> robust </i> variance estimate.
20 
21 The robust variance calculation can be expressed in a sandwich formation, which is the form
22 \f[
23  S( \boldsymbol c) = B( \boldsymbol c) M( \boldsymbol c) B( \boldsymbol c)
24 \f]
25 where \f$ B( \boldsymbol c)\f$ and \f$ M( \boldsymbol c)\f$ are matrices. The \f$ B( \boldsymbol c) \f$ matrix,
26 also known as the bread, is relatively straight forward, and can be computed as
27 \f[
28 B( \boldsymbol c) = n\left(\sum_i^n -H(y_i, x_i, \boldsymbol c) \right)^{-1}
29 \f]
30 where \f$ H \f$ is the hessian matrix.
31 
32 The \f$ M( \boldsymbol c)\f$ matrix has several variation, each with different robustness properties.
33 The form implemented here is the Huber-White sandwich operator, which takes the form
34 \f[
35 M_{H} =\frac{1}{n} \sum_i^n \psi(y_i,x_i, \boldsymbol c)^T \psi(y_i,x_i, \boldsymbol c).
36 \f]
37 
38 
39 The above method for calculating robust variance (Huber-White estimates) is implemented for linear regression, logistic regression, and multinomial logistic regression. It is useful in calculating variances in a dataset with potentially noisy outliers. The Huber-White implemented here is identical to the "HC0" sandwich operator in the R module "sandwich".
40 
41 
42 The interface for robust linear, logistic, and multinomial logistic regression are similar, differing only in the optimal parameters. Calling the help and usage functions is identical across all three robust regressions.
43 
44 When multinomial logistic regression is computed before the multinomial robust
45 regression, it uses a default reference category of zero and the regression coefficients are included in the output table. The regression
46 coefficients in the output are in the same order as multinomial logistic
47 regression function, which is described below.
48 For a problem with
49 \f$ K \f$ dependent variables \f$ (1, ..., K) \f$ and \f$ J \f$ categories \f$ (0, ..., J-1)
50 \f$, let \f$ {m_{k,j}} \f$ denote the coefficient for dependent variable \f$ k
51 \f$ and category \f$ j \f$ . The output is \f$ {m_{k_1, j_0}, m_{k_1, j_1}
52 \ldots m_{k_1, j_{J-1}}, m_{k_2, j_0}, m_{k_2, j_1} \ldots m_{k_K, j_{J-1}}} \f$.
53 The order is NOT CONSISTENT with the multinomial regression marginal effect
54 calculation with function <em>marginal_mlogregr</em>. This is deliberate
55 because the interfaces of all multinomial regressions (robust, clustered, ...)
56 will be moved to match that used in marginal.
57 
58 
59 @input
60 
61 The training data is expected to be of the following form:
62 <pre>{TABLE|VIEW} <em>sourceName</em> (
63  <em>outputTable</em> VARCHAR,
64  <em>regressionType </em> VARCHAR,
65  <em>dependentVariable</em> VARCHAR,
66  <em>independentVariable</em> VARCHAR
67 )</pre>
68 
69 @usage
70 
71 
72 <b> The Full Interface</b>
73 
74 \warning The \b 'groupingCol' and \b 'print_warnings' input parameter for \e robust_variance_mlogregr
75 is a placeholder in the Madlib V1.0. These input parameters will be implemented in a
76 future release.
77 
78 <pre>
79 SELECT \ref madlib.robust_variance_linregr(
80  <em>'source_table'</em>, -- name of input table, VARCHAR
81  <em>'out_table'</em>, -- name of output table, VARCHAR
82  <em>'dependent_varname'</em>, -- dependent variable, VARCHAR
83  <em>'independent_varname'</em>, -- independent variables, VARCHAR
84  <em>'grouping_cols'</em> -- [OPTIONAL] grouping variables, VARCHAR
85 );
86 </pre>
87 OR
88 <pre>
89 SELECT \ref madlib.robust_variance_logregr(
90  <em>'source_table'</em>, -- name of input table, VARCHAR
91  <em>'out_table'</em>, -- name of output table, VARCHAR
92  <em>'dependent_varname'</em>, -- dependent variable, VARCHAR
93  <em>'independent_varname'</em>, -- independent variables, VARCHAR
94  <em>'grouping_cols'</em>, -- [OPTIONAL] grouping variables, VARCHAR
95  <em>max_iter</em>, -- [OPTIONAL] Integer identifying the maximum iterations used by the logistic regression solver. Default is 20, Integer
96  <em>'optimizer'</em>, -- [OPTIONAL] String identifying the optimizer used in the logistic regression. See the documentation in the logistic regression for the available options. Default is irls. VARCHAR
97  <em>tolerance</em>, -- [OPTIONAL] Float identifying the tolerance of the logistic regression optimizer. Default is 0.0001. DOUBLE PRECISION
98  <em>print_warnings</em> -- [OPTIONAL] Boolean specifying if the regression fit should print any warning messages. Default is false. BOOLEAN
99 );
100 </pre>
101 OR
102 <pre>
103 SELECT \ref madlib.robust_variance_mlogregr(
104  <em>'source_table'</em>, -- name of input table, VARCHAR
105  <em>'out_table'</em>, -- name of output table, VARCHAR
106  <em>'dependent_varname'</em>, -- dependent variable, VARCHAR
107  <em>'independent_varname'</em>, -- independent variables, VARCHAR
108  <em>ref_category</em>, -- [OPTIONAL] Integer specifying the reference category. Default is 0.
109  <em>'grouping_cols'</em>, -- [OPTIONAL] grouping variables, VARCHAR. Default is NULL.
110  <em>max_iter</em>, -- [OPTIONAL] Integer identifying the maximum iterations used by the logistic regression solver. Default is 20.
111  <em>'optimizer'</em>, -- [OPTIONAL] String identifying the optimizer used in the multinomial logistic regression. See the documentation in the multinomial logistic regression for the available options. Default is irls.
112  <em>tolerance</em>, -- [OPTIONAL] Float identifying the tolerance of the multinomial logistic regression optimizer. Default is 0.0001.
113  <em>print_warnings</em> -- [OPTIONAL] Boolean specifying if the regression fit should print any warning messages. Default is false.
114 );
115 </pre>
116 Here the <em>'independent_varname'</em> can be the name of a column, which contains
117 array of numeric values. It can also have a format of string 'array[1, x1, x2, x3]',
118 where <em>x1</em>, <em>x2</em> and <em>x3</em> are all column names.
119 
120 Output is stored in the <em>out_table</em>:
121 <pre>
122 [ coef | std_err | (z/t)-stats | p_values |
123 +------+---------+-------------+----------+
124 </pre>
125 
126 
127 @examp
128 
129 
130 -# For function summary information. Run
131 @verbatim
132 sql> select robust_variance_{linregr OR logregr OR mlogregr}('help');
133 OR
134 sql> select robust_variance_{linregr OR logregr OR mlogregr}();
135 OR
136 sql> select robust_variance_{linregr OR logregr OR mlogregr}('?');
137 @endverbatim
138 
139 -# For function usage information.
140 @verbatim
141 sql> select robust_variance_{linregr OR logregr OR mlogregr}('usage');
142 @endverbatim
143 -# Create the sample data set:
144 @verbatim
145 sql> SELECT * FROM data;
146  id | second_attack | treatment | trait_anxiety
147 ----+---------------+-----------+---------------
148  1 | 1 | 1 | 70
149  3 | 1 | 1 | 50
150  5 | 1 | 0 | 40
151  7 | 1 | 0 | 75
152  9 | 1 | 0 | 70
153  11 | 0 | 1 | 65
154  13 | 0 | 1 | 45
155  15 | 0 | 1 | 40
156  17 | 0 | 0 | 55
157 ...
158 @endverbatim
159 -# Run the logistic regression function and then compute the robust logistic variance of the regression:
160 @verbatim
161 sql> select robust_variance_logregr('patients', 'newTable', 'second_attack', 'ARRAY[1, treatment, trait_anxiety]');
162 sql> select * from newTable;
163 coef | {11.962748350258,1.37269168529894,0.00285507335100035}
164 std_err | {3.45872062333141,1.17161925782182,0.053432886418388}
165 z_stats | {-1.839833462942,-0.874094587942144,2.22793348156965}
166 p_values | {0.0657926909738772,0.382066744586027,0.0258849510756295}
167 @endverbatim
168 
169 
170 @literature
171 
172 
173 [1] vce(cluster) function in STATA: http://www.stata.com/help.cgi?vce_option
174 
175 [2] clustered estimators in R: http://people.su.se/~ma/clustering.pdf
176 
177 [3] Achim Zeileis: Object-oriented Computation of Sandwich Estimators.
178  Research Report Series / Department of Statistics and Mathematics, 37.
179  Department of Statistics and Mathematics, WU Vienna University of Economics and Business, Vienna.
180  http://cran.r-project.org/web/packages/sandwich/vignettes/sandwich-OOP.pdf
181 
182 @sa File robust.sql_in documenting the SQL functions.
183 
184 @internal
185 @sa Namespace \ref madlib::modules::regress
186  documenting the implementation in C++
187 @endinternal
188 */
189 
190 /**
191  * @internal
192  * @bDoes error checking on the input for the robust variance regressions for the mandatory inputs.
193  *
194  */
195 
196 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_input_checking(
197  source_table VARCHAR -- name of input table
198  , out_table VARCHAR -- name of output table
199  , dependent_varname VARCHAR -- name of dependent variable
200  , independent_varname VARCHAR -- name of independent variable
201 )
202 RETURNS VOID AS $$
203 BEGIN
204  IF (source_table IS NULL OR source_table = '') THEN
205  RAISE EXCEPTION 'Invalid input table name given.';
206  END IF;
207  IF (NOT MADLIB_SCHEMA.__table_exists(source_table)) THEN
208  RAISE EXCEPTION 'Input table name does not exist.';
209  END IF;
210  IF (out_table IS NULL OR out_table = '') THEN
211  RAISE EXCEPTION 'Invalid output table name given.';
212  END IF;
213  IF (MADLIB_SCHEMA.__table_exists(out_table)) THEN
214  RAISE EXCEPTION 'Output table name already exists. Drop the table before calling the function.';
215  END IF;
216  IF (independent_varname IS NULL OR independent_varname = '') THEN
217  RAISE EXCEPTION 'Invalid independent variable name given.';
218  END IF;
219  IF (dependent_varname IS NULL OR dependent_varname = '') THEN
220  RAISE EXCEPTION 'Invalid dependent variable name given.';
221  END IF;
222 END;
223 $$ LANGUAGE plpgsql VOLATILE;
224 
225 --------------------------- Robust Linear Regression ----------------------------------
226 
227 CREATE TYPE MADLIB_SCHEMA.robust_linregr_result AS (
228  coef DOUBLE PRECISION[],
229  std_err DOUBLE PRECISION[],
230  t_stats DOUBLE PRECISION[],
231  p_values DOUBLE PRECISION[]
232 );
233 
234 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_linregr_transition(
235  state MADLIB_SCHEMA.bytea8,
236  y DOUBLE PRECISION,
237  x DOUBLE PRECISION[],
238  coef DOUBLE PRECISION[])
239 RETURNS MADLIB_SCHEMA.bytea8
240 AS 'MODULE_PATHNAME'
241 LANGUAGE C
242 IMMUTABLE STRICT;
243 
244 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_linregr_merge_states(
245  state1 MADLIB_SCHEMA.bytea8,
246  state2 MADLIB_SCHEMA.bytea8)
247 RETURNS MADLIB_SCHEMA.bytea8
248 AS 'MODULE_PATHNAME'
249 LANGUAGE C
250 IMMUTABLE STRICT;
251 
252 -- Final functions
253 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_linregr_final(
254  state MADLIB_SCHEMA.bytea8)
255 RETURNS MADLIB_SCHEMA.robust_linregr_result
256 AS 'MODULE_PATHNAME'
257 LANGUAGE C IMMUTABLE STRICT;
258 
259 /**
260  * @brief Compute robust regression diagnostic statistics for linear regression.
261  *
262  * @param dependentVariable Column containing the dependent variable
263  * @param independentVariables Column containing the array of independent variables
264  * @param coef Column containing the array of the OLS coefficients (as obtained by linregr)
265  *
266  * @par
267  * To include an intercept in the model, set one coordinate in the
268  * <tt>independentVariables</tt> array to 1.
269  *
270  * @return A composite value:
271  * - <tt>std_err FLOAT8[]</tt> - Array of huber-white standard errors,
272  * \f$ \mathit{se}(c_1), \dots, \mathit{se}(c_k) \f$
273  * - <tt>t_stats FLOAT8[]</tt> - Array of t-statistics, \f$ \boldsymbol t \f$
274  * - <tt>p_values FLOAT8[]</tt> - Array of p-values, \f$ \boldsymbol p \f$
275  *
276  * @usage
277  * - Get all the diagnostic statistics:\n
278  *
279  * <pre> SELECT (robust_linregr(<em>dependentVariable</em>,
280  * <em>independentVariables</em>, coef)).*
281  * FROM (
282  * SELECT linregr(<em>dependentVariable</em>, <em>independentVariables</em>).coef
283  * ) AS ols_coef, <em>sourceName</em> as src;
284  * </pre>
285  * - Get a subset of the output columns, e.g., only the condition number
286  * and the array of p-values \f$ \boldsymbol p \f$:
287  * <pre>SELECT (lr).robust_condition_no, (lr).robust_p_values
288  *FROM (
289  * </pre>
290  * <pre> SELECT (robust_linregr(<em>dependentVariable</em>,
291  * <em>independentVariables</em>, coef)).*
292  * FROM (
293  * SELECT linregr(<em>dependentVariable</em>, <em>independentVariables</em>).coef
294  * ) AS ols_coef, <em>sourceName</em> as src
295  *) AS subq;</pre>
296  */
297 
298 CREATE AGGREGATE MADLIB_SCHEMA.robust_linregr(
299  /*+ "dependentVariable" */ DOUBLE PRECISION,
300  /*+ "independentVariables" */ DOUBLE PRECISION[],
301  /*+ "coef" */ DOUBLE PRECISION[]) (
302  SFUNC=MADLIB_SCHEMA.robust_linregr_transition,
303  STYPE=MADLIB_SCHEMA.bytea8,
304  FINALFUNC=MADLIB_SCHEMA.robust_linregr_final,
305  m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.robust_linregr_merge_states,')
306  INITCOND=''
307 );
308 
309 --------------------------- INTERNAL ---------------------------------------
310 /**
311  * @brief Return robust linear regression estimates given a set of coefficients
312 **/
313 CREATE FUNCTION MADLIB_SCHEMA.__internal_get_robust_linregr_result(
314  source_table VARCHAR -- name of input table
315  , dependent_varname VARCHAR -- name of dependent variable
316  , independent_varname VARCHAR -- name of independent variable
317  , linregr_coeffs DOUBLE PRECISION[] -- coeffs from linear regression
318 )
319 RETURNS MADLIB_SCHEMA.robust_linregr_result AS $$
320 DECLARE
321 robust_value MADLIB_SCHEMA.robust_linregr_result;
322 BEGIN
323  EXECUTE '
324  SELECT (MADLIB_SCHEMA.robust_linregr('
325  || dependent_varname || ' , '
326  || independent_varname || ' , '
327  || 'ARRAY[' || array_to_string(linregr_coeffs, ',') || '])
328  ).* FROM ' || source_table
329  INTO robust_value;
330  RETURN robust_value;
331 END
332 $$ LANGUAGE plpgsql VOLATILE;
333 
334 /**
335  * @brief Return insert string for robust linear regression
336 **/
337 
338 CREATE FUNCTION MADLIB_SCHEMA.__internal_get_robust_linregr_insert_string(
339  robust_lin_rst MADLIB_SCHEMA.robust_linregr_result
340  , linregr_coeffs DOUBLE PRECISION[] -- coeffs from linear regression
341  , out_table TEXT
342 )
343 RETURNS VARCHAR AS $$
344 DECLARE
345  insert_string VARCHAR;
346 BEGIN
347  insert_string := 'INSERT INTO ' || out_table || ' VALUES (';
348  insert_string := insert_string ||
349  CASE
350  WHEN linregr_coeffs is NULL
351  THEN '''{}'','
352  ELSE 'ARRAY[' || array_to_string(linregr_coeffs, ',') || '], '
353  END ||
354  CASE
355  WHEN (robust_lin_rst).std_err is NULL
356  THEN '''{}'','
357  ELSE 'ARRAY[' || array_to_string((robust_lin_rst).std_err, ',') || '], '
358  END ||
359  CASE
360  WHEN (robust_lin_rst).t_stats is NULL
361  THEN '''{}'','
362  ELSE 'ARRAY[' || array_to_string((robust_lin_rst).t_stats, ',') || '], '
363  END ||
364  CASE
365  WHEN (robust_lin_rst).p_values is NULL
366  THEN '''{}'','
367  ELSE 'ARRAY[' || array_to_string((robust_lin_rst).p_values, ',') || ']'
368  END;
369  RETURN insert_string;
370 END;
371 $$ LANGUAGE plpgsql VOLATILE;
372 
373 --------------------------- Interface ----------------------------------
374 
375 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_linregr(
376  usage_string VARCHAR -- usage string
377 )
378 RETURNS VARCHAR AS $$
379 DECLARE
380 insert_string VARCHAR;
381 BEGIN
382  IF (usage_string = '' OR usage_string = 'help' OR usage_string = '?') THEN
383  insert_string := '' ||
384  E'Summary \n' ||
385  E'-----------------------------------------------------------------------------------------\n' ||
386  E' Functionality: Calculate Huber-White robust statistics for linear regression\n' ||
387  E' The function first runs the regression to calculate the \n' ||
388  E' coefficients and uses them to calculate the robust statistics \n' ||
389  E' The functions robust_variance_linregr() and robust_variance_logregr() have nearly identical interfaces\n' ||
390  E' SELECT {schema_madlib}.robust_variance_linregr(''source_table'' \n' ||
391  E' ,''output_table'' \n' ||
392  E' ,''dependent_variable'' \n' ||
393  E' ,''independent_variable'' \n' ||
394  E' ,''group_cols'' \n' ||
395  E' );\n' ||
396  E'For more details on function usage: \n' ||
397  E'SELECT {schema_madlib}.robust_variance_linregr(''usage'') \n';
398  ElSIF (usage_string = 'usage') THEN
399  insert_string := '' ||
400  E'Usage\n' ||
401  E'-----------------------------------------------------------------------------------------\n' ||
402  E' To use this function \n' ||
403  E' SELECT {schema_madlib}.robust_variance_linregr( \n' ||
404  E' ''source_table'', -- Name of data table \n' ||
405  E' ''output_table'', -- Name of result table \n' ||
406  E' ''dependent_variable'', -- Name of column for dependent variables\n' ||
407  E' ''independent_variable'', -- Name of column for independent variables\n' ||
408  E' (can be any SQL expression that evaluates to an array) \n' ||
409  E' ''group_cols'', -- [OPTIONAL] Comma separated string with columns to group by. Default is NULL. \n' ||
410  E' );\n' ||
411  E'\n' ||
412  E'Output:\n' ||
413  E'-----------------------------------------------------------------------------------------\n' ||
414  E' The output table (''output_table'' above) has the following columns\n' ||
415  E' ''coef'' DOUBLE PRECISION[], -- Coefficients of regression \n' ||
416  E' ''std_err'' DOUBLE PRECISION[], -- Huber-White standard errors\n' ||
417  E' ''stats'' DOUBLE PRECISION[], -- T-stats of the standard errors\n' ||
418  E' ''p_values'' DOUBLE PRECISION[], -- p-values of the standard errors\n' ||
419  E'\n' ||
420  E'';
421  ELSE
422  insert_string := 'No such option. Run SELECT {schema_madlib}.robust_variance_linregr()';
423  END IF;
424  RETURN insert_string;
425 END;
426 $$ LANGUAGE plpgsql VOLATILE;
427 
428 
429 
430 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_linregr()
431 RETURNS VARCHAR AS $$
432 BEGIN
433  RETURN MADLIB_SCHEMA.robust_variance_linregr('');
434 END;
435 $$ LANGUAGE plpgsql VOLATILE;
436 
437 /**
438  * @brief Robust linear regression with default fit regression behaviour & no grouping
439  **/
440 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_linregr(
441  source_table VARCHAR -- name of input table
442  , out_table VARCHAR -- name of output table
443  , dependent_variable VARCHAR -- name of dependent variable
444  , independent_variable VARCHAR -- name of independent variable
445  )
446 RETURNS VOID AS $$
447 BEGIN
448  PERFORM MADLIB_SCHEMA.robust_variance_linregr(
449  source_table,
450  out_table,
451  dependent_variable,
452  independent_variable,
453  NULL);
454 END;
455 $$ LANGUAGE plpgsql VOLATILE;
456 
457 --------------------------- Robust Linear Regression ----------------------------------
458 
459 /**
460  * @brief Robust linear regression function subcall
461  **/
462 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_linregr(
463  source_table VARCHAR -- name of input table
464  , out_table VARCHAR -- name of output table
465  , dependent_varname VARCHAR -- name of dependent variable
466  , input_independent_varname VARCHAR -- name of independent variable
467  , input_group_cols VARCHAR -- grouping columns
468  )
469 RETURNS VOID AS $$
470 DECLARE
471  insert_string VARCHAR;
472  group_cols VARCHAR[];
473  regr_coef DOUBLE PRECISION[];
474  robust_lin_rst MADLIB_SCHEMA.robust_linregr_result;
475  table_creation_string VARCHAR;
476  group_string VARCHAR;
477  where_group_string VARCHAR;
478  actual_table_name VARCHAR;
479  input_table_name VARCHAR[];
480  group_array_length INTEGER;
481  each_group INTEGER;
482  robust_linregr_fitting_rst VARCHAR;
483  temp_table VARCHAR;
484  col_data_type VARCHAR;
485  schema_name VARCHAR;
486  sql VARCHAR;
487  old_msg_level TEXT;
488  independent_varname VARCHAR;
489 BEGIN
490  EXECUTE 'SELECT setting FROM pg_settings WHERE name=''client_min_messages''' INTO old_msg_level;
491  EXECUTE 'SET client_min_messages TO warning';
492 
493  PERFORM MADLIB_SCHEMA.robust_input_checking(source_table, out_table, dependent_varname, input_independent_varname);
494  independent_varname := input_independent_varname;
495 
496  --This code should be added back in should '*' support be implemented
497  /*IF (input_independent_varname = '*') THEN
498  independent_varname := MADLIB_SCHEMA.__internal_get_col_names_except_dep_variable(source_table, input_independent_varname);
499  ELSE
500  independent_varname := input_independent_varname;
501  END IF;*/
502 
503 
504  IF (input_group_cols IS NULL)
505  THEN
506  -------------------------------------------------------------------------
507  -- No grouping columns: Run regular robust variance estimates
508  -------------------------------------------------------------------------
509  -- Step 1: Create output table with appropriate column names
510  EXECUTE '
511  CREATE TABLE ' || out_table || ' (
512  coef DOUBLE PRECISION[],
513  std_err DOUBLE PRECISION[],
514  t_stats DOUBLE PRECISION[],
515  p_values DOUBLE PRECISION[])';
516  -- Step 2: Run the regression if the coefficients are not provided
517  EXECUTE '
518  SELECT (MADLIB_SCHEMA.linregr('
519  || dependent_varname || ' , '
520  || independent_varname || ')
521  ).coef FROM ' || source_table
522  INTO regr_coef;
523 
524  -- Step 3: Perform the robust linear regression
525  robust_lin_rst := MADLIB_SCHEMA.__internal_get_robust_linregr_result(
526  source_table, dependent_varname, independent_varname, regr_coef);
527 
528  -- Step 4: Insert into output table & cast infinity and nan
529  insert_string := MADLIB_SCHEMA.__internal_get_robust_linregr_insert_string(
530  robust_lin_rst, regr_coef, out_table);
531  insert_string := REGEXP_REPLACE(insert_string, 'Infinity',
532  '''Infinity''::double precision', 'gi');
533  insert_string := REGEXP_REPLACE(insert_string, 'NaN',
534  '''NaN''::double precision', 'gi');
535  EXECUTE insert_string || ')';
536 
537  ELSE
538  -------------------------------------------------------------------------
539  -- Grouping columns: Run robust variance estimates for each group
540  -------------------------------------------------------------------------
541 
542  -- Step 1: Prepare the group columns
543  -------------------------------------------------------------------------
544  group_cols = MADLIB_SCHEMA._string_to_array(input_group_cols);
545  group_array_length = array_upper(group_cols, 1);
546 
547  input_table_name = regexp_split_to_array(source_table, E'\\.');
548  IF array_upper(input_table_name, 1) = 1 THEN
549  actual_table_name = input_table_name[1];
550  schema_name := current_schema();
551  ELSIF array_upper(input_table_name, 1) = 2 THEN
552  actual_table_name = input_table_name[2];
553  schema_name = input_table_name[1];
554  ELSE
555  RAISE EXCEPTION 'Incorrect input source table name provided';
556  END IF;
557 
558  -- Check that each grouping column exists
559  FOR each_group in 1 .. group_array_length
560  LOOP
561  if NOT MADLIB_SCHEMA.check_if_col_exists(source_table,
562  group_cols[each_group]) THEN
563  RAISE EXCEPTION 'Grouping column % does not exist',
564  group_cols[each_group];
565  END IF;
566  END LOOP;
567 
568  table_creation_string := 'CREATE TABLE ' || out_table || '(';
569  FOR each_group in 1 .. group_array_length
570  LOOP
571  -- create a string that makes list of
572  EXECUTE 'SELECT data_type FROM information_schema.columns
573  WHERE
574  table_schema = ''' || schema_name || '''
575  AND table_name = ''' || actual_table_name || '''
576  AND column_name= ''' || group_cols[each_group] || ''''
577  INTO col_data_type;
578 
579  table_creation_string := table_creation_string
580  || group_cols[each_group]
581  || ' ' || col_data_type || ',';
582  END LOOP;
583 
584  -- finish creating the output table
585  EXECUTE table_creation_string || '
586  coef DOUBLE PRECISION[],
587  std_err DOUBLE PRECISION[],
588  t_stats DOUBLE PRECISION[],
589  p_values DOUBLE PRECISION[])';
590 
591  -- Step 2: Perform the linear regression on groups
592  -------------------------------------------------------------------------
593  temp_table := MADLIB_SCHEMA.__unique_string();
594  PERFORM MADLIB_SCHEMA.linregr_train(source_table,
595  temp_table,
596  dependent_varname,
597  independent_varname,
598  input_group_cols);
599 
600  -- Note: We need to alter the names on the output tables for linregr_train
601  -- so that they don't clash with the robust_variance_linregr output tables
602  -- This name changing is required to construct queries when the regressor
603  -- is also being grouped by. I don't know why anyone would do that but
604  -- it should be a mathematically valid operation.
605  FOR each_group in 1 .. group_array_length
606  LOOP
607  EXECUTE 'ALTER TABLE ' || temp_table || ' RENAME COLUMN ' ||
608  group_cols[each_group] || ' TO lin_regr_'
609  || group_cols[each_group];
610  END LOOP;
611 
612  -- Step 3: Prepare the grouping strings
613  group_string := '';
614  FOR each_group in 1 .. (group_array_length-1)
615  LOOP
616  group_string := group_string ||
617  actual_table_name || '.' ||
618  group_cols[each_group] || ',';
619  END LOOP;
620  group_string := group_string ||
621  actual_table_name || '.' ||
622  group_cols[group_array_length];
623 
624  -- Where clause string
625  where_group_string := '';
626  FOR each_group in 1 .. (group_array_length-1)
627  LOOP
628  where_group_string := where_group_string || '(' ||
629  temp_table || '.lin_regr_' ||
630  group_cols[each_group] || ' = ' ||
631  actual_table_name || '.' ||
632  group_cols[each_group] || ') AND ';
633  END LOOP;
634  where_group_string := where_group_string || '(' ||
635  temp_table || '.lin_regr_' ||
636  group_cols[group_array_length] || ' = ' ||
637  actual_table_name || '.' ||
638  group_cols[group_array_length] || ')';
639 
640  -- Step 4: Put robust statistics into the out table
641  EXECUTE
642  'INSERT INTO ' || out_table || '
643  SELECT ' || group_string || ', (result).coef, (result).std_err,
644  (result).t_stats, (result).p_values
645  FROM (
646  SELECT
647  '|| group_string ||',
648  MADLIB_SCHEMA.robust_linregr('||
649  dependent_varname ||','||
650  independent_varname || ', coef) AS result' ||
651  ' FROM '|| source_table || ',' || temp_table ||
652  ' WHERE '|| where_group_string ||
653  ' GROUP BY '|| group_string || ')' || actual_table_name;
654  EXECUTE 'DROP TABLE IF EXISTS ' || temp_table;
655  END IF;
656  --Restore message settings
657  EXECUTE 'SET client_min_messages TO '|| old_msg_level;
658 END;
659 $$ LANGUAGE plpgsql VOLATILE;
660 
661 
662 --------------------------- ROBUST LOGISTIC REGRESSION ---------------------------------------
663 
664 /**
665  * @internal
666  * @bCompute the results of the Huber-White sandwich estimator
667  *
668  */
669 
670 --DROP TYPE IF EXISTS MADLIB_SCHEMA.robust_logregr_result;
671 CREATE TYPE MADLIB_SCHEMA.robust_logregr_result AS (
672  coef DOUBLE PRECISION[],
673  std_err DOUBLE PRECISION[],
674  z_stats DOUBLE PRECISION[],
675  p_values DOUBLE PRECISION[]
676 );
677 
678 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_logregr_step_transition(
679  state DOUBLE PRECISION[],
680  y BOOLEAN,
681  x DOUBLE PRECISION[],
682  coef DOUBLE PRECISION[])
683 RETURNS DOUBLE PRECISION[]
684 AS 'MODULE_PATHNAME'
685 LANGUAGE C IMMUTABLE;
686 
687 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_logregr_step_merge_states(
688  state1 DOUBLE PRECISION[],
689  state2 DOUBLE PRECISION[])
690 RETURNS DOUBLE PRECISION[]
691 AS 'MODULE_PATHNAME'
692 LANGUAGE C IMMUTABLE STRICT;
693 
694 
695 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_logregr_step_final(
696  state DOUBLE PRECISION[])
697 RETURNS MADLIB_SCHEMA.robust_logregr_result
698 AS 'MODULE_PATHNAME'
699 LANGUAGE C IMMUTABLE STRICT;
700 
701 
702 /**
703  * @brief Compute robust regression diagnostic statistics for logistic regression.
704  *
705  * @param dependentVariable Column containing the dependent variable
706  * @param independentVariables Column containing the array of independent variables
707  * @param coef Column containing the array of the coefficients (as obtained by logregr)
708  *
709  * @par
710  * To include an intercept in the model, set one coordinate in the
711  * <tt>independentVariables</tt> array to 1.
712  *
713  * @return A composite value:
714  * - <tt>coef FLOAT8[] </tt> - The coefficients for the regression
715  * - <tt>std_err FLOAT8[]</tt> - Array of huber-white standard errors,
716  * \f$ \mathit{se}(c_1), \dots, \mathit{se}(c_k) \f$
717  * - <tt>z_stats FLOAT8[]</tt> - Array of Wald z-statistics, \f$ \boldsymbol t \f$
718  * - <tt>p_values FLOAT8[]</tt> - Array of p-values, \f$ \boldsymbol p \f$
719  *
720  * @usage
721  * - Get all the diagnostic statistics:\n
722  *
723  * <pre> SELECT robust_logregr(<em>dependentVariable</em>,
724  * <em>independentVariables</em>, coef)
725  * FROM <em>dataTable</em>;
726  * </pre>
727  */
728 
729 CREATE AGGREGATE MADLIB_SCHEMA.robust_logregr(
730  /*+ "dependentVariable" */ BOOLEAN,
731  /*+ "independentVariables" */ DOUBLE PRECISION[],
732  /*+ "coef" */ DOUBLE PRECISION[] ) (
733 
734  STYPE=DOUBLE PRECISION[],
735  SFUNC=MADLIB_SCHEMA.robust_logregr_step_transition,
736  m4_ifdef(`GREENPLUM',`prefunc=MADLIB_SCHEMA.robust_logregr_step_merge_states,')
737  FINALFUNC=MADLIB_SCHEMA.robust_logregr_step_final,
738  INITCOND='{0,0,0,0,0.0}'
739 );
740 
741 --------------------------- INTERNAL ---------------------------------------
742 /**
743  * @brief Return robust logistic regression estimates given a set of coefficients
744 **/
745 CREATE FUNCTION MADLIB_SCHEMA.__internal_get_robust_logregr_result(
746  source_table VARCHAR -- name of input table
747  , dependent_varname VARCHAR -- name of dependent variable
748  , independent_varname VARCHAR -- name of independent variable
749  , logregr_coeffs DOUBLE PRECISION[] -- coeffs from logear regression
750 )
751 RETURNS MADLIB_SCHEMA.robust_logregr_result AS $$
752 DECLARE
753 robust_value MADLIB_SCHEMA.robust_logregr_result;
754 BEGIN
755  EXECUTE 'SELECT (MADLIB_SCHEMA.robust_logregr(('|| dependent_varname || ' )::BOOLEAN, '|| independent_varname || ' , '|| 'ARRAY[' || array_to_string(logregr_coeffs, ',') || '])).* FROM ' || source_table
756  INTO robust_value;
757  RETURN robust_value;
758 END
759 $$ LANGUAGE plpgsql VOLATILE;
760 
761 /**
762  * @brief Return insert string for robust logistic regression
763 **/
764 
765 CREATE FUNCTION MADLIB_SCHEMA.__internal_get_robust_logregr_insert_string(
766  robust_log_rst MADLIB_SCHEMA.robust_logregr_result,
767  out_table TEXT
768 )
769 RETURNS VARCHAR AS $$
770 DECLARE
771  insert_string VARCHAR;
772 BEGIN
773  insert_string := 'INSERT INTO ' || out_table || ' VALUES (';
774  insert_string := insert_string ||
775  CASE
776  WHEN (robust_log_rst).coef is NULL
777  THEN '''{}'','
778  ELSE 'ARRAY[' || array_to_string((robust_log_rst).coef, ',') || '], '
779  END ||
780  CASE
781  WHEN (robust_log_rst).std_err is NULL
782  THEN '''{}'','
783  ELSE 'ARRAY[' || array_to_string((robust_log_rst).std_err, ',') || '], '
784  END ||
785  CASE
786  WHEN (robust_log_rst).z_stats is NULL
787  THEN '''{}'','
788  ELSE 'ARRAY[' || array_to_string((robust_log_rst).z_stats, ',') || '], '
789  END ||
790  CASE
791  WHEN (robust_log_rst).p_values is NULL
792  THEN '''{}'','
793  ELSE 'ARRAY[' || array_to_string((robust_log_rst).p_values, ',') || '] '
794  END;
795  RETURN insert_string;
796 END;
797 $$ LANGUAGE plpgsql VOLATILE;
798 
799 --------------------------- Interface ----------------------------------
800 
801 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_logregr(
802  usage_string VARCHAR -- usage string
803 )
804 RETURNS VARCHAR AS $$
805 DECLARE
806 insert_string VARCHAR;
807 BEGIN
808  IF (usage_string = '' OR usage_string = 'help' OR usage_string = '?') THEN
809  insert_string := '' ||
810  E'Summary \n' ||
811  E'-----------------------------------------------------------------------------------------\n' ||
812  E' Functionality: Calculate Huber-White robust statistics for logistic regression\n' ||
813  E' The function first runs the regression to calculate the \n' ||
814  E' coefficients and uses them to calculate the robust statistics \n' ||
815  E' The functions robust_variance_linregr() and robust_variance_logregr() have nearly identical interfaces\n' ||
816  E' SELECT {schema_madlib}.robust_variance_logregr(''source_table'' \n' ||
817  E' ,''output_table'' \n' ||
818  E' ,''dependent_variable'' \n' ||
819  E' ,''independent_variable'' \n' ||
820  E' ,''group_cols'' \n' ||
821  E' ,''max_iter'' \n' ||
822  E' ,''optimizer'' \n' ||
823  E' ,''tolerance'' \n' ||
824  E' ,''print_warnings'' \n' ||
825  E' );\n' ||
826  E'For more details on function usage: \n' ||
827  E'SELECT {schema_madlib}.robust_variance_logregr(''usage'') \n';
828  ElSIF (usage_string = 'usage') THEN
829  insert_string := '' ||
830  E'Usage\n' ||
831  E'-----------------------------------------------------------------------------------------\n' ||
832  E' To use this function \n' ||
833  E' SELECT {schema_madlib}.robust_variance_logregr( \n' ||
834  E' ''source_table'', -- Name of data table \n' ||
835  E' ''output_table'', -- Name of result table \n' ||
836  E' ''dependent_variable'', -- Name of column for dependent variables\n' ||
837  E' ''independent_variable'', -- Name of column for independent variables\n' ||
838  E' (can be any SQL expression that evaluates to an array) \n' ||
839  E' ''group_cols'', -- [OPTIONAL] Comma separated string with columns to group by. Default is NULL. \n' ||
840  E' ''max_iter'', -- [OPTIONAL] The number of iterations used by the logistic regression solver. Default is 20. \n' ||
841  E' ''optimizer'', -- [OPTIONAL] Name of the optimizer used in the logistic regression. Default is irls. \n' ||
842  E' ''tolerance'', -- [OPTIONAL] The tolerance of the logistic regression optimizer. Default is 0.0001. \n' ||
843  E' ''print_warnings'', -- [OPTIONAL] Should the optimizer print warning messages to the screen. Default is FALSE. \n' ||
844 
845  E' );\n' ||
846  E'\n' ||
847  E'Output:\n' ||
848  E'-----------------------------------------------------------------------------------------\n' ||
849  E' The output table (''output_table'' above) has the following columns\n' ||
850  E' ''coef'' DOUBLE PRECISION[], -- Coefficients of regression \n' ||
851  E' ''std_err'' DOUBLE PRECISION[], -- Huber-White standard errors\n' ||
852  E' ''stats'' DOUBLE PRECISION[], -- Z-stats of the standard errors\n' ||
853  E' ''p_values'' DOUBLE PRECISION[], -- p-values of the standard errors\n' ||
854  E'\n' ||
855  E'';
856  ELSE
857  insert_string := 'No such option. Run SELECT {schema_madlib}.robust_variance_logregr()';
858  END IF;
859  RETURN insert_string;
860 END;
861 $$ LANGUAGE plpgsql VOLATILE;
862 
863 
864 
865 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_logregr()
866 RETURNS VARCHAR AS $$
867 BEGIN
868  RETURN MADLIB_SCHEMA.robust_variance_logregr('');
869 END;
870 $$ LANGUAGE plpgsql VOLATILE;
871 /**
872  *@brief The robust logistic regression function.
873  *
874  * @param source_table String identifying the input table
875  * @param out_table String identifying the output table to be created
876  * @param dependent_varname Column containing the dependent variable
877  * @param independent_varname Column containing the array of independent variables
878  * @param input_group_cols Columns to group by.
879  * @param max_iter Integer identifying the maximum iterations used by the logistic regression solver. Default is 20.
880  * @param optimizer String identifying the optimizer used in the logistic regression. See the documentation in the logistic regression for the available options. Default is irls.
881  * @param tolerance Float identifying the tolerance of the logistic regression optimizer. Default is 0.0001.
882  * @param print_warnings Boolean specifying if the regression fit should print any warning messages. Default is false.
883  * @par
884  * To include an intercept in the model, set one coordinate in the
885  * <tt>independent_varname</tt> array to 1.
886  *
887  * @return A composite value:
888  * - <tt>std_err FLOAT8[]</tt> - Array of huber-white standard errors,
889  * \f$ \mathit{se}(c_1), \dots, \mathit{se}(c_k) \f$
890  * - <tt>t_stats FLOAT8[]</tt> - Array of t-statistics, \f$ \boldsymbol t \f$
891  * - <tt>p_values FLOAT8[]</tt> - Array of p-values, \f$ \boldsymbol p \f$
892  *
893  * @usage
894  * For function summary information. Run
895  * sql> select robust_variance_logregr('help');
896  * OR
897  * sql> select robust_variance_logregr();
898  * OR
899  * sql> select robust_variance_logregr('?');
900  * For function usage information. Run
901  * sql> select robust_variance_logregr('usage');
902  * - Compute the coefficients, and the get the robust diagnostic statistics:
903  * <pre>
904  * select robust_variance_logregr(source_table, out_table, regression_type, dependentVariable, independentVariables, NULL );
905  * </pre>
906  *
907  * - If the coefficients are already known, they can be provided directly
908  * <pre>select robust_variance_logregr(source_table, out_table, regression_type, dependentVariable, independentVariables, coef );
909  </pre>
910  */
911 
912 
913 
914 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_logregr(
915  source_table VARCHAR, -- name of input table
916  out_table VARCHAR, -- name of output table
917  dependent_varname VARCHAR, -- name of dependent variable
918  input_independent_varname VARCHAR, -- name of independent variable
919  input_group_cols VARCHAR, -- grouping columns
920  max_iter INTEGER, -- maximum of iterations in the optimizer
921  optimizer VARCHAR, -- the optimizer used
922  tolerance DOUBLE PRECISION,
923  print_warnings BOOLEAN
924  )
925 RETURNS VOID AS $$
926 DECLARE
927  insert_string VARCHAR;
928  group_cols VARCHAR[];
929  regr_coef DOUBLE PRECISION[];
930  table_creation_string VARCHAR;
931  group_string VARCHAR;
932  where_group_string VARCHAR;
933  actual_table_name VARCHAR;
934  input_table_name VARCHAR[];
935  group_array_length INTEGER;
936  each_group INTEGER;
937  robust_logregr_fitting_rst VARCHAR;
938  temp_table VARCHAR;
939  col_data_type VARCHAR;
940  schema_name VARCHAR;
941  sql VARCHAR;
942  tempTableName VARCHAR;
943  robust_log_rst MADLIB_SCHEMA.robust_logregr_result;
944  old_msg_level TEXT;
945  independent_varname VARCHAR;
946  print_warningsString VARCHAR;
947 BEGIN
948  EXECUTE 'SELECT setting FROM pg_settings WHERE name=''client_min_messages''' INTO old_msg_level;
949  EXECUTE 'SET client_min_messages TO warning';
950 
951  PERFORM MADLIB_SCHEMA.robust_input_checking(source_table, out_table, dependent_varname, input_independent_varname);
952 
953  independent_varname := input_independent_varname;
954 
955  --This code should be added back in should '*' support be implemented
956  --IF (input_independent_varname = '*') THEN
957  -- independent_varname := MADLIB_SCHEMA.__internal_get_col_names_except_dep_variable(source_table, input_independent_varname);
958 -- ELSE
959 -- independent_varname := input_independent_varname;
960 -- END IF;
961  IF (input_group_cols IS NULL)
962  THEN
963  -- create output table with appropriate column names
964  EXECUTE '
965  CREATE TABLE ' || out_table || ' (
966  coef DOUBLE PRECISION[],
967  std_err DOUBLE PRECISION[],
968  z_stats DOUBLE PRECISION[],
969  p_values DOUBLE PRECISION[])';
970  -- Run the regression if the coefficients are not provided
971  print_warningsString := 'FALSE';
972  IF(print_warnings IS TRUE)
973  THEN
974  print_warningsString := 'TRUE';
975  END IF;
976  tempTableName := MADLIB_SCHEMA.__unique_string();
977  EXECUTE '
978  SELECT MADLIB_SCHEMA.logregr_train('
979  || ' '''||source_table||''' ' || ','
980  || ' '''||tempTableName||''' ' || ','
981  || ' '''||dependent_varname ||''' ' || ' , '
982  || ' '''||independent_varname||''' '||' , '
983  || 'NULL'|| ' , '
984  ||max_iter || ' , '
985  || ' '''||optimizer|| ''' '|| ' , '
986  || tolerance||' , '
987  || print_warningsString||
988  ')';
989  EXECUTE 'SELECT coef from ' || tempTableName INTO regr_coef;
990 
991  EXECUTE 'DROP TABLE IF EXISTS ' || tempTableName;
992 
993 
994  -- compute robust variance calculation
995  robust_log_rst := MADLIB_SCHEMA.__internal_get_robust_logregr_result(
996  source_table, dependent_varname, independent_varname, regr_coef);
997  insert_string := MADLIB_SCHEMA.__internal_get_robust_logregr_insert_string(
998  robust_log_rst, out_table);
999  -- Ensure Infinity and NaN are cast properly
1000  insert_string := REGEXP_REPLACE(insert_string, 'Infinity',
1001  '''Infinity''::double precision', 'gi');
1002  insert_string := REGEXP_REPLACE(insert_string, 'NaN',
1003  '''NaN''::double precision', 'gi');
1004  -- complete the sql string and execute
1005  EXECUTE insert_string || ')';
1006 
1007  ELSE
1008  -------------------------------------------------------------------------
1009  -- Grouping columns: Run robust variance estimates for each group
1010  -------------------------------------------------------------------------
1011 
1012  -- Step 1: Prepare the group columns
1013  -------------------------------------------------------------------------
1014  group_cols = MADLIB_SCHEMA._string_to_array(input_group_cols);
1015  group_array_length = array_upper(group_cols, 1);
1016 
1017  input_table_name = regexp_split_to_array(source_table, E'\\.');
1018  IF array_upper(input_table_name, 1) = 1 THEN
1019  actual_table_name = input_table_name[1];
1020  schema_name := current_schema();
1021  ELSIF array_upper(input_table_name, 1) = 2 THEN
1022  actual_table_name = input_table_name[2];
1023  schema_name = input_table_name[1];
1024  ELSE
1025  RAISE EXCEPTION 'Incorrect input source table name provided';
1026  END IF;
1027 
1028  -- Check that each grouping column exists
1029  FOR each_group in 1 .. group_array_length
1030  LOOP
1031  if NOT MADLIB_SCHEMA.check_if_col_exists(source_table,
1032  group_cols[each_group]) THEN
1033  RAISE EXCEPTION 'Grouping column % does not exist',
1034  group_cols[each_group];
1035  END IF;
1036  END LOOP;
1037 
1038  table_creation_string := 'CREATE TABLE ' || out_table || '(';
1039  FOR each_group in 1 .. group_array_length
1040  LOOP
1041  -- create a string that makes list of
1042  EXECUTE 'SELECT data_type FROM information_schema.columns
1043  WHERE
1044  table_schema = ''' || schema_name || '''
1045  AND table_name = ''' || actual_table_name || '''
1046  AND column_name= ''' || group_cols[each_group] || ''''
1047  INTO col_data_type;
1048 
1049  table_creation_string := table_creation_string
1050  || group_cols[each_group]
1051  || ' ' || col_data_type || ',';
1052  END LOOP;
1053 
1054  -- finish creating the output table
1055  EXECUTE table_creation_string || '
1056  coef DOUBLE PRECISION[],
1057  std_err DOUBLE PRECISION[],
1058  z_stats DOUBLE PRECISION[],
1059  p_values DOUBLE PRECISION[])';
1060 
1061  -- Step 2: Perform the regression on groups
1062  -------------------------------------------------------------------------
1063  temp_table := MADLIB_SCHEMA.__unique_string();
1064  PERFORM MADLIB_SCHEMA.logregr_train(source_table,
1065  temp_table,
1066  dependent_varname,
1067  independent_varname,
1068  input_group_cols,
1069  max_iter,
1070  optimizer,
1071  tolerance,
1072  print_warnings);
1073 
1074  -- Note: We need to alter the names on the output tables for logregr_train
1075  -- so that they don't clash with the robust_variance_logregr output tables
1076  -- This name changing is required to construct queries when the regressor
1077  -- is also being grouped by. I don't know why anyone would do that but
1078  -- it should be a mathematically valid operation.
1079  FOR each_group in 1 .. group_array_length
1080  LOOP
1081  EXECUTE 'ALTER TABLE ' || temp_table || ' RENAME COLUMN ' ||
1082  group_cols[each_group] || ' TO log_regr_'
1083  || group_cols[each_group];
1084  END LOOP;
1085 
1086  -- Step 3: Prepare the grouping strings
1087  group_string := '';
1088  FOR each_group in 1 .. (group_array_length-1)
1089  LOOP
1090  group_string := group_string ||
1091  actual_table_name || '.' ||
1092  group_cols[each_group] || ',';
1093  END LOOP;
1094  group_string := group_string ||
1095  actual_table_name || '.' ||
1096  group_cols[group_array_length];
1097 
1098  -- Where clause string
1099  where_group_string := '';
1100  FOR each_group in 1 .. (group_array_length-1)
1101  LOOP
1102  where_group_string := where_group_string || '(' ||
1103  temp_table || '.log_regr_' ||
1104  group_cols[each_group] || ' = ' ||
1105  actual_table_name || '.' ||
1106  group_cols[each_group] || ') AND ';
1107  END LOOP;
1108  where_group_string := where_group_string || '(' ||
1109  temp_table || '.log_regr_' ||
1110  group_cols[group_array_length] || ' = ' ||
1111  actual_table_name || '.' ||
1112  group_cols[group_array_length] || ')';
1113 
1114  -- Step 4: Robust statistics into the out table
1115  EXECUTE
1116  'INSERT INTO ' || out_table || '
1117  SELECT ' || group_string || ', (result).coef, (result).std_err,
1118  (result).z_stats, (result).p_values
1119  FROM (
1120  SELECT
1121  '|| group_string ||',
1122  MADLIB_SCHEMA.robust_logregr('||
1123  dependent_varname ||','||
1124  independent_varname || ', coef) AS result' ||
1125  ' FROM '|| source_table || ',' || temp_table ||
1126  ' WHERE '|| where_group_string ||
1127  ' GROUP BY '|| group_string || ')' || actual_table_name;
1128  EXECUTE 'DROP TABLE IF EXISTS ' || temp_table;
1129  END IF;
1130  --Restore message settings
1131  EXECUTE 'SET client_min_messages TO '|| old_msg_level;
1132 
1133 END;
1134 $$ LANGUAGE plpgsql VOLATILE;
1135 
1136 /**
1137  * @brief Robust logistic function subcall
1138  **/
1139 
1140 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_logregr(
1141  source_table VARCHAR, -- name of input table
1142  out_table VARCHAR, -- name of output table
1143  dependent_variable VARCHAR, -- name of dependent variable
1144  independent_variable VARCHAR, -- name of independent variable
1145  input_group_cols VARCHAR -- grouping columns
1146  )
1147 RETURNS VOID AS $$
1148 BEGIN
1149  PERFORM MADLIB_SCHEMA.robust_variance_logregr(
1150  source_table,
1151  out_table,
1152  dependent_variable,
1153  independent_variable,
1154  input_group_cols,
1155  20,
1156  'irls',
1157  0.0001,
1158  FALSE);
1159 END;
1160 $$ LANGUAGE plpgsql VOLATILE;
1161 /**
1162  * @brief Robust logistic regression with default fit regression behavior, and no grouping,
1163  **/
1164 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_logregr(
1165  source_table VARCHAR -- name of input table
1166  ,out_table VARCHAR -- name of output table
1167  ,dependent_variable VARCHAR -- name of dependent variable
1168  ,independent_variable VARCHAR -- name of independent variable
1169  )
1170 RETURNS VOID AS $$
1171 BEGIN
1172  PERFORM MADLIB_SCHEMA.robust_variance_logregr(
1173  source_table,
1174  out_table,
1175  dependent_variable,
1176  independent_variable,
1177  NULL,
1178  20,
1179  'irls',
1180  0.0001,
1181  FALSE);
1182 END;
1183 $$ LANGUAGE plpgsql VOLATILE;
1184 
1185 
1186 
1187 
1188 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_logregr(
1189  source_table VARCHAR -- name of input table
1190  , out_table VARCHAR -- name of output table
1191  , dependent_variable VARCHAR -- name of dependent variable
1192  , independent_variable VARCHAR -- name of independent variable
1193  , input_group_cols VARCHAR -- grouping columns
1194  , max_iter INTEGER -- max number of iterations
1195  )
1196 RETURNS VOID AS $$
1197 BEGIN
1198  PERFORM MADLIB_SCHEMA.robust_variance_logregr(
1199  source_table,
1200  out_table,
1201  dependent_variable,
1202  independent_variable,
1203  input_group_cols,
1204  max_iter,
1205  'irls',
1206  0.0001,
1207  FALSE);
1208 END;
1209 $$ LANGUAGE plpgsql VOLATILE;
1210 
1211 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_logregr(
1212  source_table VARCHAR -- name of input table
1213  , out_table VARCHAR -- name of output table
1214  , dependent_variable VARCHAR -- name of dependent variable
1215  , independent_variable VARCHAR -- name of independent variable
1216  , input_group_cols VARCHAR -- grouping columns
1217  , max_iter INTEGER -- max number of iterations
1218  , optimizer VARCHAR -- The optimizer used in the robust regression
1219  )
1220 RETURNS VOID AS $$
1221 BEGIN
1222  PERFORM MADLIB_SCHEMA.robust_variance_logregr(
1223  source_table,
1224  out_table,
1225  dependent_variable,
1226  independent_variable,
1227  input_group_cols,
1228  max_iter,
1229  optimizer,
1230  0.0001,
1231  FALSE);
1232 END;
1233 $$ LANGUAGE plpgsql VOLATILE;
1234 
1235 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_logregr(
1236  source_table VARCHAR -- name of input table
1237  , out_table VARCHAR -- name of output table
1238  , dependent_variable VARCHAR -- name of dependent variable
1239  , independent_variable VARCHAR -- name of independent variable
1240  , input_group_cols VARCHAR -- grouping columns
1241  , max_iter INTEGER -- max number of iterations
1242  , optimizer VARCHAR -- The optimizer used in the robust regression
1243  , tolerance DOUBLE PRECISION -- The tolerance of the optimizer
1244  )
1245 RETURNS VOID AS $$
1246 BEGIN
1247  PERFORM MADLIB_SCHEMA.robust_variance_logregr(
1248  source_table,
1249  out_table,
1250  dependent_variable,
1251  independent_variable,
1252  input_group_cols,
1253  max_iter,
1254  optimizer,
1255  tolerance,
1256  FALSE);
1257 END;
1258 $$ LANGUAGE plpgsql VOLATILE;
1259 
1260 
1261 
1262 
1263 
1264 
1265 --------------------------- ROBUST MULTINOMIAL LOGISTIC REGRESSION -------------------------
1266 
1267 CREATE TYPE MADLIB_SCHEMA.robust_mlogregr_result AS (
1268  ref_category INTEGER,
1269  coef DOUBLE PRECISION[],
1270  std_err DOUBLE PRECISION[],
1271  z_stats DOUBLE PRECISION[],
1272  p_values DOUBLE PRECISION[]
1273 );
1274 
1275 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mlogregr_robust_step_transition
1276 (
1277  state DOUBLE PRECISION[],
1278  y INTEGER,
1279  numCategories INTEGER,
1280  ref_category INTEGER,
1281  x DOUBLE PRECISION[],
1282  coef DOUBLE PRECISION[]
1283 )
1284 RETURNS DOUBLE PRECISION[]
1285 AS 'MODULE_PATHNAME'
1286 LANGUAGE C IMMUTABLE;
1287 
1288 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mlogregr_robust_step_merge_states
1289 (
1290  state1 DOUBLE PRECISION[],
1291  state2 DOUBLE PRECISION[]
1292 )
1293 RETURNS DOUBLE PRECISION[]
1294 AS 'MODULE_PATHNAME'
1295 LANGUAGE C IMMUTABLE STRICT;
1296 
1297 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mlogregr_robust_step_final
1298 (
1299  state DOUBLE PRECISION[]
1300 )
1301 RETURNS MADLIB_SCHEMA.robust_mlogregr_result
1302 AS 'MODULE_PATHNAME'
1303 LANGUAGE C IMMUTABLE STRICT;
1304 
1305 
1306 /**
1307  * @brief Compute robust regression diagnostic statistics for multinomial logistic regression.
1308  *
1309  * @param source_table name of input table, VARCHAR
1310  * @param out_table name of output table, VARCHAR
1311  * @param dependent_varname dependent variable, VARCHAR
1312  * @param independent_varname independent variables, VARCHAR
1313  * @param ref_category Integer specifying the reference category. Default is 0.
1314  * @param grouping_cols grouping variables, VARCHAR. Default is NULL. Currently a placeholder.
1315  * @param max_iter Integer identifying the maximum iterations used by the logistic regression solver. Default is 20.
1316  * @param optimizer String identifying the optimizer used in the multinomial logistic regression. See the documentation in the multinomial logistic regression for the available options. Default is 'irls'.
1317  * @param tolerance Float identifying the tolerance of the multinomial logistic regression optimizer. Default is 0.0001.
1318  * @param print_warnings Boolean specifying if the multinomial logistic regression solver should print any warnings. Currently a placeholder.
1319  *
1320  *
1321  * @par
1322  * To include an intercept in the model, set one coordinate in the
1323  * <tt>independentVariables</tt> array to 1.
1324  *
1325  * @usage
1326  * <pre>
1327  * SELECT madlib.robust_variance_mlogregr(
1328  * <em>'source_table'</em>, -- name of input table, VARCHAR
1329  * <em>'out_table'</em>, -- name of output table, VARCHAR
1330  * <em>'dependent_varname'</em>, -- dependent variable, VARCHAR
1331  * <em>'independent_varname'</em>, -- independent variables, VARCHAR
1332  * <em>ref_category</em>, -- [OPTIONAL] Integer specifying the reference category. Default is 0.
1333  * <em>'grouping_cols'</em>, -- [OPTIONAL] grouping variables, VARCHAR. Default is NULL.
1334  * <em>max_iter</em>, -- [OPTIONAL] Integer identifying the maximum iterations used by the logistic regression solver. Default is 20.
1335  * <em>'optimizer'</em>, -- [OPTIONAL] String identifying the optimizer used in the multinomial logistic regression. See the documentation in the multinomial logistic regression for the available options. Default is irls.
1336  * <em>tolerance</em>, -- [OPTIONAL] Float identifying the tolerance of the multinomial logistic regression optimizer. Default is 0.0001.
1337  * <em>print_warnings</em> -- [OPTIONAL] Boolean specifying if the regression fit should print any warning messages. Default is false.
1338  * );
1339  * </pre>
1340  *
1341  * @return A composite value:
1342  * - <tt>ref_category INTEGER</tt> - The reference category
1343  * - <tt>coef FLOAT8[] </tt> - The coefficients for the regression
1344  * - <tt>std_err FLOAT8[]</tt> - Array of huber-white standard errors,
1345  * - <tt>z_stats FLOAT8[]</tt> - Array of Wald z-statistics,
1346  * - <tt>p_values FLOAT8[]</tt> - Array of p-values,
1347  *
1348  */
1349 
1350 
1351 
1352 
1353  CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_mlogregr(
1354  source_table VARCHAR, -- name of input table
1355  out_table VARCHAR, -- name of output table
1356  dependent_varname VARCHAR, -- name of dependent variable
1357  independent_varname VARCHAR, -- name of independent variable
1358  ref_category INTEGER, -- reference category (default 0)
1359  input_group_cols VARCHAR, -- grouping columns (PLACEHOLDER) (default NULL)
1360  max_iter INTEGER, -- max number of itertions (default 20)
1361  optimizer VARCHAR, -- optimizer ['irls', 'newton'] (default 'irls')
1362  tolerance DOUBLE PRECISION, -- tolerance (default 0.0001)
1363  print_warnings BOOLEAN -- PLACEHOLDER (default False)
1364 )
1365 RETURNS VOID AS $$
1366 DECLARE
1367  num_category INTEGER;
1368  regr_coef DOUBLE PRECISION[];
1369  robust_value MADLIB_SCHEMA.robust_mlogregr_result;
1370  insert_string VARCHAR;
1371  old_msg_level VARCHAR;
1372 BEGIN
1373  EXECUTE 'SELECT setting FROM pg_settings WHERE name=''client_min_messages''' INTO old_msg_level;
1374  EXECUTE 'SET client_min_messages TO warning';
1375 
1376  PERFORM MADLIB_SCHEMA.robust_input_checking(source_table, out_table, dependent_varname, independent_varname);
1377  -- Run the regression if the coefficients are not provided
1378  -- Will do further parameter checking in the regression function
1379  EXECUTE
1380  $sql$SELECT coef FROM MADLIB_SCHEMA.mlogregr('$sql$
1381  || source_table || $sql$', '$sql$
1382  || dependent_varname || $sql$', '$sql$
1383  || independent_varname || $sql$', $sql$
1384  || max_iter || $sql$, '$sql$
1385  || optimizer || $sql$', $sql$
1386  || tolerance || ','
1387  || ref_category || ')'
1388  INTO regr_coef;
1389 
1390  EXECUTE
1391  'SELECT count(DISTINCT ' || dependent_varname|| ') FROM ' ||
1392  textin(regclassout(source_table))
1393  INTO num_category;
1394 
1395  -- compute robust variance calculation
1396  --robust_log_rst := MADLIB_SCHEMA.__internal_get_robust_mlogregr_result(
1397  -- source_table, dependent_varname, num_category, ref_category,
1398  -- independent_varname, regr_coef);
1399  EXECUTE
1400  'SELECT (MADLIB_SCHEMA.robust_mlogregr('
1401  || dependent_varname || ', '
1402  || num_category || ', '
1403  || ref_category || ', '
1404  || independent_varname || ', '
1405  || 'ARRAY[' || array_to_string(regr_coef, ',') ||
1406  '])).* FROM ' || source_table
1407  INTO robust_value;
1408  insert_string :=
1409  MADLIB_SCHEMA.__internal_get_robust_mlogregr_insert_string(
1410  robust_value, out_table);
1411 
1412  -- Ensure Infinity and NaN are cast properly
1413  insert_string := REGEXP_REPLACE(
1414  insert_string, 'Infinity', '''Infinity''::double precision', 'gi');
1415  insert_string := REGEXP_REPLACE(
1416  insert_string, 'NaN', '''NaN''::double precision', 'gi');
1417 
1418  -- create output table with appropriate column names
1419  EXECUTE
1420  'CREATE TABLE ' || out_table || ' (
1421  ref_category INTEGER,
1422  coef DOUBLE PRECISION[],
1423  std_err DOUBLE PRECISION[],
1424  z_stats DOUBLE PRECISION[],
1425  p_values DOUBLE PRECISION[])';
1426 
1427  -- complete the sql string and execute
1428  EXECUTE insert_string || ')';
1429 
1430  --Restore message settings
1431  EXECUTE 'SET client_min_messages TO '|| old_msg_level;
1432 END;
1433 $$ LANGUAGE plpgsql VOLATILE;
1434 
1435 
1436  /**
1437  * @brief Robust multinomial logistic function subcall
1438  * @param dependentVariable Column containing the dependent variable
1439  * @param numCategories Number of categories in the dependent variable
1440  * @param ref_category Reference category for modeling
1441  * @param independentVariables Column containing the array of independent variables
1442  * @param coef Column containing the array of the coefficients (as obtained by mlogregr)
1443  * @usage
1444  * <pre>
1445  * SELECT robust_mlogregr(
1446  * <em>dependentVariable</em>,
1447  * <em>numCategories</em>,
1448  * <em>ref_category</em>,
1449  * <em>independentVariables</em>,
1450  * <em>coef</em>)
1451  * FROM <em>dataTable</em>;
1452  * </pre>
1453  * @return A composite value:
1454  * - <tt>ref_category INTEGER</tt> - The reference category
1455  * - <tt>coef FLOAT8[] </tt> - The coefficients for the regression
1456  * - <tt>std_err FLOAT8[]</tt> - Array of huber-white standard errors,
1457  * \f$ \mathit{se}(c_1), \dots, \mathit{se}(c_k) \f$
1458  * - <tt>z_stats FLOAT8[]</tt> - Array of Wald z-statistics, \f$ \boldsymbol t \f$
1459  * - <tt>p_values FLOAT8[]</tt> - Array of p-values, \f$ \boldsymbol p \f$
1460  *
1461  */
1462 
1463 CREATE AGGREGATE MADLIB_SCHEMA.robust_mlogregr(
1464  /*+ "dependentVariable" */ INTEGER,
1465  /*+ "numCategories" */ INTEGER,
1466  /*+ "ref_category" */ INTEGER,
1467  /*+ "independentVariables" */ DOUBLE PRECISION[],
1468  /*+ "coef" */ DOUBLE PRECISION[]
1469 )
1470 (
1471  STYPE=DOUBLE PRECISION[],
1472  SFUNC=MADLIB_SCHEMA.mlogregr_robust_step_transition,
1473  m4_ifdef(`GREENPLUM',`prefunc=MADLIB_SCHEMA.mlogregr_robust_step_merge_states,')
1474  FINALFUNC=MADLIB_SCHEMA.mlogregr_robust_step_final,
1475  INITCOND='{0,0,0,0,0}'
1476 );
1477 
1478 
1479 
1480 
1481 --------------------------- INTERNAL ---------------------------------------
1482 /**
1483  * @brief Return insert string for robust multinomial logistic regression
1484 **/
1485 CREATE FUNCTION MADLIB_SCHEMA.__internal_get_robust_mlogregr_insert_string(
1486  robust_mlog_rst MADLIB_SCHEMA.robust_mlogregr_result,
1487  out_table TEXT
1488 )
1489 RETURNS VARCHAR AS $$
1490 DECLARE
1491  insert_string VARCHAR;
1492 BEGIN
1493  insert_string := 'INSERT INTO ' || out_table || ' VALUES (';
1494  insert_string := insert_string ||
1495  (robust_mlog_rst).ref_category || ', ' ||
1496  CASE
1497  WHEN (robust_mlog_rst).coef is NULL
1498  THEN '''{}'','
1499  ELSE 'ARRAY[' || array_to_string((robust_mlog_rst).coef, ',') || '], '
1500  END ||
1501  CASE
1502  WHEN (robust_mlog_rst).std_err is NULL
1503  THEN '''{}'','
1504  ELSE 'ARRAY[' || array_to_string((robust_mlog_rst).std_err, ',') || '], '
1505  END ||
1506  CASE
1507  WHEN (robust_mlog_rst).z_stats is NULL
1508  THEN '''{}'','
1509  ELSE 'ARRAY[' || array_to_string((robust_mlog_rst).z_stats, ',') || '], '
1510  END ||
1511  CASE
1512  WHEN (robust_mlog_rst).p_values is NULL
1513  THEN '''{}'','
1514  ELSE 'ARRAY[' || array_to_string((robust_mlog_rst).p_values, ',') || '] '
1515  END;
1516  RETURN insert_string;
1517 END;
1518 $$ LANGUAGE plpgsql VOLATILE;
1519 
1520 --------------------------- Interface ----------------------------------
1521 
1522 
1523 
1524 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_mlogregr(
1525  usage_string VARCHAR -- usage string
1526 )
1527 RETURNS VARCHAR AS $$
1528 DECLARE
1529 insert_string VARCHAR;
1530 BEGIN
1531  IF (usage_string = '' OR usage_string = 'help' OR usage_string = '?') THEN
1532  insert_string := '' ||
1533  E'Summary \n' ||
1534  E'-----------------------------------------------------------------------------------------\n' ||
1535  E' Functionality: Calculate Huber-White robust statistics for multinomial logistic regression\n' ||
1536  E' The funciton first runs the regression to calculate the \n' ||
1537  E' coefficients and uses them to calculate the robust statistics \n' ||
1538  E' SELECT {schema_madlib}.robust_variance_mlogregr(''source_table'' \n' ||
1539  E' ,''output_table'' \n' ||
1540  E' ,''dependent_variable'' \n' ||
1541  E' ,''independent_variable'' \n' ||
1542  E' ,''reference_category'' \n' ||
1543  E' ,''group_cols'' \n' ||
1544  E' ,''max_iter'' \n' ||
1545  E' ,''optimizer'' \n' ||
1546  E' ,''tolerance'' \n' ||
1547  E' ,''verbose'' \n' ||
1548  E' );\n' ||
1549  E'For more details on function usage: \n' ||
1550  E'SELECT {schema_madlib}.robust_variance_mlogregr(''usage'') \n';
1551  ElSIF (usage_string = 'usage') THEN
1552  insert_string := '' ||
1553  E'Usage\n' ||
1554  E'-----------------------------------------------------------------------------------------\n' ||
1555  E' To use this function \n' ||
1556  E' SELECT {schema_madlib}.robust_variance_mlogregr( \n' ||
1557  E' ''source_table'', -- Name of data table\n' ||
1558  E' ''output_table'', -- Name of result table\n' ||
1559  E' ''dependent_variable'', -- Name of column for dependent variables\n' ||
1560  E' ''independent_variable'', -- Name of column for independent variables\n' ||
1561  E' (can be any SQL expression that evaluates to an array) \n' ||
1562  E' ''reference_category'', -- [OPTIONAL] Reference category. Default is 0. \n' ||
1563  E' ''group_cols'', -- [OPTIONAL] Comma separated string with columns to group by. Default is NULL. \n' ||
1564  E' ''max_iter'', -- [OPTIONAL] The number of iterations used by the logistic regression solver. Default is 20. \n' ||
1565  E' ''optimizer'', -- [OPTIONAL] Name of the optimizer used in the logistic regression. Default is irls. \n' ||
1566  E' ''tolerance'', -- [OPTIONAL] The tolerance of the logistic regression optimizer. Default is 0.0001. \n' ||
1567  E' ''verbose'' -- [OPTIONAL] Should the optimizer print warning messages to the screen. Default is FALSE. \n' ||
1568  E' );\n' ||
1569  E'\n' ||
1570  E'Output:\n' ||
1571  E'-----------------------------------------------------------------------------------------\n' ||
1572  E' The output table (''output_table'' above) has the following columns\n' ||
1573  E' ''ref_category" INTEGER, -- Reference category\n' ||
1574  E' ''coef'' DOUBLE PRECISION[], -- Coefficients of regression \n' ||
1575  E' ''std_err'' DOUBLE PRECISION[], -- Huber-White standard errors\n' ||
1576  E' ''z_stats'' DOUBLE PRECISION[], -- Z-stats of the standard errors\n' ||
1577  E' ''p_values'' DOUBLE PRECISION[], -- p-values of the standard errors\n' ||
1578  E'\n' ||
1579  E'';
1580  ELSE
1581  insert_string := 'No such option. Run SELECT {schema_madlib}.robust_variance_mlogregr()';
1582  END IF;
1583  RETURN insert_string;
1584 END;
1585 $$ LANGUAGE plpgsql VOLATILE;
1586 
1587 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_mlogregr()
1588 RETURNS VARCHAR AS $$
1589 BEGIN
1590  RETURN MADLIB_SCHEMA.robust_variance_mlogregr('');
1591 END;
1592 $$ LANGUAGE plpgsql VOLATILE;
1593 
1594 --------------------------- Robust Multinomial Logistic Regression ----------------------------------
1595 
1596 
1597 
1598 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_mlogregr(
1599  source_table VARCHAR,
1600  out_table VARCHAR,
1601  dependent_varname VARCHAR,
1602  independent_varname VARCHAR,
1603  ref_category INTEGER,
1604  input_group_cols VARCHAR,
1605  max_iter INTEGER,
1606  optimizer VARCHAR,
1607  tolerance DOUBLE PRECISION
1608 )
1609 RETURNS VOID AS $$
1610  SELECT MADLIB_SCHEMA.robust_variance_mlogregr($1, $2, $3, $4, $5, $6, $7,
1611  $8, $9, FALSE);
1612 $$ LANGUAGE sql;
1613 
1614 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_mlogregr(
1615  source_table VARCHAR,
1616  out_table VARCHAR,
1617  dependent_varname VARCHAR,
1618  independent_varname VARCHAR,
1619  ref_category INTEGER,
1620  input_group_cols VARCHAR,
1621  max_iter INTEGER,
1622  optimizer VARCHAR
1623 )
1624 RETURNS VOID AS $$
1625  SELECT MADLIB_SCHEMA.robust_variance_mlogregr($1, $2, $3, $4, $5, $6, $7,
1626  $8, 0.0001, FALSE);
1627 $$ LANGUAGE sql;
1628 
1629 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_mlogregr(
1630  source_table VARCHAR,
1631  out_table VARCHAR,
1632  dependent_varname VARCHAR,
1633  independent_varname VARCHAR,
1634  ref_category INTEGER,
1635  input_group_cols VARCHAR,
1636  max_iter INTEGER
1637 )
1638 RETURNS VOID AS $$
1639  SELECT MADLIB_SCHEMA.robust_variance_mlogregr($1, $2, $3, $4, $5, $6, $7,
1640  'irls', 0.0001, FALSE);
1641 $$ LANGUAGE sql;
1642 
1643 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_mlogregr(
1644  source_table VARCHAR,
1645  out_table VARCHAR,
1646  dependent_varname VARCHAR,
1647  independent_varname VARCHAR,
1648  ref_category INTEGER,
1649  input_group_cols VARCHAR
1650 )
1651 RETURNS VOID AS $$
1652  SELECT MADLIB_SCHEMA.robust_variance_mlogregr($1, $2, $3, $4, $5, $6, 20,
1653  'irls', 0.0001, FALSE);
1654 $$ LANGUAGE sql;
1655 
1656 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_mlogregr(
1657  source_table VARCHAR,
1658  out_table VARCHAR,
1659  dependent_varname VARCHAR,
1660  independent_varname VARCHAR,
1661  ref_category INTEGER
1662 )
1663 RETURNS VOID AS $$
1664  SELECT MADLIB_SCHEMA.robust_variance_mlogregr($1, $2, $3, $4, $5, NULL, 20,
1665  'irls', 0.0001, FALSE);
1666 $$ LANGUAGE sql;
1667 
1668 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.robust_variance_mlogregr(
1669  source_table VARCHAR,
1670  out_table VARCHAR,
1671  dependent_varname VARCHAR,
1672  independent_varname VARCHAR
1673 )
1674 RETURNS VOID AS $$
1675  SELECT MADLIB_SCHEMA.robust_variance_mlogregr($1, $2, $3, $4, 0, NULL, 20,
1676  'irls', 0.0001, FALSE);
1677 $$ LANGUAGE sql;