User Documentation
 All Files Functions Groups
logistic.sql_in File Reference

SQL functions for logistic regression. More...

Go to the source code of this file.

Functions

void logregr_train (varchar tbl_source, varchar tbl_output, varchar dep_col, varchar ind_col, varchar grouping_col, integer max_iter, varchar optimizer, float8 tolerance, boolean verbose)
 Compute logistic-regression coefficients and diagnostic statistics. More...
 
float8 logistic (float8 x)
 Evaluate the usual logistic function in an under-/overflow-safe way. More...
 

Detailed Description

Date
January 2011
See Also
For a brief introduction to logistic regression, see the module description Logistic Regression.

Definition in file logistic.sql_in.

Function Documentation

float8 logistic ( float8  x)
Parameters
x
Returns
\( \frac{1}{1 + \exp(-x)} \)

Evaluating this expression directly can lead to under- or overflows. This function performs the evaluation in a safe manner, making use of the following observations:

In order for the outcome of \( \exp(x) \) to be within the range of the minimum positive double-precision number (i.e., \( 2^{-1074} \)) and the maximum positive double-precision number (i.e., \( (1 + (1 - 2^{52})) * 2^{1023}) \), \( x \) has to be within the natural logarithm of these numbers, so roughly in between -744 and 709. However, \( 1 + \exp(x) \) will just evaluate to 1 if \( \exp(x) \) is less than the machine epsilon (i.e., \( 2^{-52} \)) or, equivalently, if \( x \) is less than the natural logarithm of that; i.e., in any case if \( x \) is less than -37. Note that taking the reciprocal of the largest double-precision number will not cause an underflow. Hence, no further checks are necessary.

Definition at line 717 of file logistic.sql_in.

void logregr_train ( varchar  tbl_source,
varchar  tbl_output,
varchar  dep_col,
varchar  ind_col,
varchar  grouping_col,
integer  max_iter,
varchar  optimizer,
float8  tolerance,
boolean  verbose 
)

To include an intercept in the model, set one coordinate in the independentVariables array to 1.

Parameters
tbl_sourceName of the source relation containing the training data
tbl_outputName of the output relation to store the model results
              Columns of the output relation are as follows:
               - <tt>coef FLOAT8[]</tt> - Array of coefficients, \form#73
               - <tt>log_likelihood FLOAT8</tt> - Log-likelihood \form#74
               - <tt>std_err FLOAT8[]</tt> - Array of standard errors,
\( \mathit{se}(c_1), \dots, \mathit{se}(c_k) \)
  • z_stats FLOAT8[] - Array of Wald z-statistics, \( \boldsymbol z \)
  • p_values FLOAT8[] - Array of Wald p-values, \( \boldsymbol p \)
  • odds_ratios FLOAT8[]: Array of odds ratios, \( \mathit{odds}(c_1), \dots, \mathit{odds}(c_k) \)
  • condition_no FLOAT8 - The condition number of matrix \( X^T A X \) during the iteration immediately preceding convergence (i.e., \( A \) is computed using the coefficients of the previous iteration)
dep_colName of the dependent column (of type BOOLEAN)
ind_colName of the independent column (of type DOUBLE PRECISION[])
grouping_colComma delimited list of column names to group-by
max_iterThe maximum number of iterations
optimizerThe optimizer to use (either 'irls'/'newton' for iteratively reweighted least squares or 'cg' for conjugent gradient)
toleranceThe difference between log-likelihood values in successive iterations that should indicate convergence. This value should be non-negative and a zero value here disables the convergence criterion, and execution will only stop after maxNumIterations iterations.
verboseIf true, any error or warning message will be printed to the console (irrespective of the 'client_min_messages' set by server). If false, no error/warning message is printed to console.
Usage:
  • Get vector of coefficients \( \boldsymbol c \) and all diagnostic statistics:
    SELECT logregr_train('sourceName', 'outName'
              'dependentVariable', 'independentVariables');
             SELECT * from outName;
       
  • Get vector of coefficients \( \boldsymbol c \):
    SELECT coef from outName;
  • Get a subset of the output columns, e.g., only the array of coefficients \( \boldsymbol c \), the log-likelihood of determination \( l(\boldsymbol c) \), and the array of p-values \( \boldsymbol p \):
    SELECT coef, log_likelihood, p_values FROM outName;
Note
This function starts an iterative algorithm. It is not an aggregate function. Source, output, and column names have to be passed as strings (due to limitations of the SQL syntax).

Definition at line 614 of file logistic.sql_in.