docs/v1.0/logistic_8sql__in_source.html

/* ----------------------------------------------------------------------- *//**

 *

 * @file logistic.sql_in

 *

 * @brief SQL functions for logistic regression

 * @date January 2011

 *

 * @sa For a brief introduction to logistic regression, see the

 *     module description \ref grp_logreg.

 *

 *//* ----------------------------------------------------------------------- */


m4_include(`SQLCommon.m4') --'


/**

@addtogroup grp_logreg


@about


(Binomial) Logistic regression refers to a stochastic model in which the

conditional mean of the dependent dichotomous variable (usually denoted

\f$ Y \in \{ 0,1 \} \f$) is the logistic function of an affine function of the

vector of independent variables (usually denoted \f$ \boldsymbol x \f$). That

is,

\f[

    E[Y \mid \boldsymbol x] = \sigma(\boldsymbol c^T \boldsymbol x)

\f]

for some unknown vector of coefficients \f$ \boldsymbol c \f$ and where

\f$ \sigma(x) = \frac{1}{1 + \exp(-x)} \f$ is the logistic function. Logistic

regression finds the vector of coefficients \f$ \boldsymbol c \f$ that maximizes

the likelihood of the observations.


Let

- \f$ \boldsymbol y \in \{ 0,1 \}^n \f$ denote the vector of observed dependent

  variables, with \f$ n \f$ rows, containing the observed values of the

  dependent variable,

- \f$ X \in \mathbf R^{n \times k} \f$ denote the design matrix with \f$ k \f$

  columns and \f$ n \f$ rows, containing all observed vectors of independent

  variables \f$ \boldsymbol x_i \f$ as rows.


By definition,

\f[

    P[Y = y_i | \boldsymbol x_i]

    =   \sigma((-1)^{y_i} \cdot \boldsymbol c^T \boldsymbol x_i)

    \,.

\f]

Maximizing the likelihood

\f$ \prod_{i=1}^n \Pr(Y = y_i \mid \boldsymbol x_i) \f$

is equivalent to maximizing the log-likelihood

\f$ \sum_{i=1}^n \log \Pr(Y = y_i \mid \boldsymbol x_i) \f$, which simplifies to

\f[

    l(\boldsymbol c) =

        -\sum_{i=1}^n \log(1 + \exp((-1)^{y_i}

            \cdot \boldsymbol c^T \boldsymbol x_i))

    \,.

\f]

The Hessian of this objective is \f$ H = -X^T A X \f$ where

\f$ A = \text{diag}(a_1, \dots, a_n) \f$ is the diagonal matrix with

\f$

    a_i = \sigma(\boldsymbol c^T \boldsymbol x)

          \cdot

          \sigma(-\boldsymbol c^T \boldsymbol x)

    \,.

\f$

Since \f$ H \f$ is non-positive definite, \f$ l(\boldsymbol c) \f$ is convex.

There are many techniques for solving convex optimization problems. Currently,

logistic regression in MADlib can use one of three algorithms:

- Iteratively Reweighted Least Squares

- A conjugate-gradient approach, also known as Fletcher-Reeves method in the

  literature, where we use the Hestenes-Stiefel rule for calculating the step

  size.

- Incremental gradient descent, also known as incremental gradient methods or

  stochastic gradient descent in the literature.


We estimate the standard error for coefficient \f$ i \f$ as

\f[

    \mathit{se}(c_i) = \left( (X^T A X)^{-1} \right)_{ii}

    \,.

\f]

The Wald z-statistic is

\f[

    z_i = \frac{c_i}{\mathit{se}(c_i)}

    \,.

\f]


The Wald \f$ p \f$-value for coefficient \f$ i \f$ gives the probability (under

the assumptions inherent in the Wald test) of seeing a value at least as extreme

as the one observed, provided that the null hypothesis (\f$ c_i = 0 \f$) is

true. Letting \f$ F \f$ denote the cumulative density function of a standard

normal distribution, the Wald \f$ p \f$-value for coefficient \f$ i \f$ is

therefore

\f[

    p_i = \Pr(|Z| \geq |z_i|) = 2 \cdot (1 - F( |z_i| ))

\f]

where \f$ Z \f$ is a standard normally distributed random variable.


The odds ratio for coefficient \f$ i \f$ is estimated as \f$ \exp(c_i) \f$.


The condition number is computed as \f$ \kappa(X^T A X) \f$ during the iteration

immediately <em>preceding</em> convergence (i.e., \f$ A \f$ is computed using

the coefficients of the previous iteration). A large condition number (say, more

than 1000) indicates the presence of significant multicollinearity.


@input


The training data for logistic regression is

 expected to be of the following form:\n

<pre>{TABLE|VIEW} <em>sourceName</em> (

    ...

    <em>dependentVariable</em> BOOLEAN,

    <em>independentVariables</em> FLOAT8[],

    ...

)</pre>


@usage

The logistic regression has several options in what it can return:

- Get vector of coefficients \f$ \boldsymbol c \f$ and all diagnostic

  statistics:\n

  <pre>SELECT \ref logregr_train(

    '<em>sourceName</em>', '<em>outName</em>', '<em>dependentVariable</em>',

    '<em>independentVariables</em>'[, '<em>grouping_columns</em>',

    [, <em>numberOfIterations</em> [, '<em>optimizer</em>' [, <em>precision</em>

    [, <em>verbose</em> ]] ] ] ]

);</pre>

  Output table:

  <pre>coef | log_likelihood | std_err | z_stats | p_values | odds_ratios | condition_no | num_iterations

-----+----------------+---------+---------+----------+-------------+--------------+---------------

                                               ...

</pre>

- Get vector of coefficients \f$ \boldsymbol c \f$:\n

  <pre>SELECT coef from outName; </pre>

- Get a subset of the output columns, e.g., only the array of coefficients

  \f$ \boldsymbol c \f$, the log-likelihood of determination

  \f$ l(\boldsymbol c) \f$, and the array of p-values \f$ \boldsymbol p \f$:

  <pre>SELECT coef, log_likelihood, p_values FROM outName; </pre>

- By default, the option <em>verbose</em> is False. If it is set to be True, warning messages

  will be output to the SQL client for groups that failed.

<pre>


@examp

-# Create the sample data set:

@verbatim

sql> SELECT * FROM data;

                  r1                      | val

---------------------------------------------+-----

 {1,3.01789340097457,0.454183579888195}   | t

 {1,-2.59380532894284,0.602678326424211}  | f

 {1,-1.30643094424158,0.151587064377964}  | t

 {1,3.60722299199551,0.963550757616758}   | t

 {1,-1.52197745628655,0.0782248834148049} | t

 {1,-4.8746574902907,0.345104880165309}   | f

...

@endverbatim

-# Run the logistic regression function:

@verbatim

sql> \x on

Expanded display is off.

sql> SELECT logregr_train('data', 'out_tbl', 'val', 'r1', Null, 100, 'irls', 0.001);

sql> SELECT * from out_tbl;

coef           | {5.59049410898112,2.11077546770772,-0.237276684606453}

log_likelihood | -467.214718489873

std_err        | {0.318943457652178,0.101518723785383,0.294509929481773}

z_stats        | {17.5281667482197,20.7919819024719,-0.805666162169712}

p_values       | {8.73403463417837e-69,5.11539430631541e-96,0.420435365338518}

odds_ratios    | {267.867942976278,8.2546400100702,0.788773016471171}

condition_no   | 179.186118573205

num_iterations | 9

@endverbatim


@literature


A somewhat random selection of nice write-ups, with valuable pointers into

further literature.


[1] Cosma Shalizi: Statistics 36-350: Data Mining, Lecture Notes, 18 November

    2009, http://www.stat.cmu.edu/~cshalizi/350/lectures/26/lecture-26.pdf


[2] Thomas P. Minka: A comparison of numerical optimizers for logistic

    regression, 2003 (revised Mar 26, 2007),

    http://research.microsoft.com/en-us/um/people/minka/papers/logreg/minka-logreg.pdf


[3] Paul Komarek, Andrew W. Moore: Making Logistic Regression A Core Data Mining

    Tool With TR-IRLS, IEEE International Conference on Data Mining 2005,

    pp. 685-688, http://komarix.org/ac/papers/tr-irls.short.pdf


[4] D. P. Bertsekas: Incremental gradient, subgradient, and proximal methods for

    convex optimization: a survey, Technical report, Laboratory for Information

    and Decision Systems, 2010,

    http://web.mit.edu/dimitrib/www/Incremental_Survey_LIDS.pdf


[5] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro: Robust stochastic

    approximation approach to stochastic programming, SIAM Journal on

    Optimization, 19(4), 2009, http://www2.isye.gatech.edu/~nemirovs/SIOPT_RSA_2009.pdf


@sa File logistic.sql_in (documenting the SQL functions)


@internal

@sa Namespace logistic (documenting the driver/outer loop implemented in

    Python), Namespace

    \ref madlib::modules::regress documenting the implementation in C++

@endinternal

</pre>

*/


DROP TYPE IF EXISTS MADLIB_SCHEMA.__logregr_result;

CREATE TYPE MADLIB_SCHEMA.__logregr_result AS (

    coef DOUBLE PRECISION[],

    log_likelihood DOUBLE PRECISION,

    std_err DOUBLE PRECISION[],

    z_stats DOUBLE PRECISION[],

    p_values DOUBLE PRECISION[],

    odds_ratios DOUBLE PRECISION[],

    condition_no DOUBLE PRECISION,

    status      INTEGER,

    num_iterations INTEGER

);


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_cg_step_transition(

    DOUBLE PRECISION[],

    BOOLEAN,

    DOUBLE PRECISION[],

    DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION[]

AS 'MODULE_PATHNAME', 'logregr_cg_step_transition'

LANGUAGE C IMMUTABLE;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_irls_step_transition(

    DOUBLE PRECISION[],

    BOOLEAN,

    DOUBLE PRECISION[],

    DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION[]

AS 'MODULE_PATHNAME', 'logregr_irls_step_transition'

LANGUAGE C IMMUTABLE;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_igd_step_transition(

    DOUBLE PRECISION[],

    BOOLEAN,

    DOUBLE PRECISION[],

    DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION[]

AS 'MODULE_PATHNAME', 'logregr_igd_step_transition'

LANGUAGE C IMMUTABLE;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_cg_step_merge_states(

    state1 DOUBLE PRECISION[],

    state2 DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION[]

AS 'MODULE_PATHNAME', 'logregr_cg_step_merge_states'

LANGUAGE C IMMUTABLE STRICT;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_irls_step_merge_states(

    state1 DOUBLE PRECISION[],

    state2 DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION[]

AS 'MODULE_PATHNAME', 'logregr_irls_step_merge_states'

LANGUAGE C IMMUTABLE STRICT;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_igd_step_merge_states(

    state1 DOUBLE PRECISION[],

    state2 DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION[]

AS 'MODULE_PATHNAME', 'logregr_igd_step_merge_states'

LANGUAGE C IMMUTABLE STRICT;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_cg_step_final(

    state DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION[]

AS 'MODULE_PATHNAME', 'logregr_cg_step_final'

LANGUAGE C IMMUTABLE STRICT;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_irls_step_final(

    state DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION[]

AS 'MODULE_PATHNAME', 'logregr_irls_step_final'

LANGUAGE C IMMUTABLE STRICT;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_igd_step_final(

    state DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION[]

AS 'MODULE_PATHNAME', 'logregr_igd_step_final'

LANGUAGE C IMMUTABLE STRICT;


------------------------------------------------------------------------


/**

 * @internal

 * @brief Perform one iteration of the conjugate-gradient method for computing

 *        logistic regression

 */

CREATE AGGREGATE MADLIB_SCHEMA.__logregr_cg_step(

    /*+ y */ BOOLEAN,

    /*+ x */ DOUBLE PRECISION[],

    /*+ previous_state */ DOUBLE PRECISION[]) (


    STYPE=DOUBLE PRECISION[],

    SFUNC=MADLIB_SCHEMA.__logregr_cg_step_transition,

    m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.__logregr_cg_step_merge_states,')

    FINALFUNC=MADLIB_SCHEMA.__logregr_cg_step_final,

    INITCOND='{0,0,0,0,0,0}'

);


/**

 * @internal

 * @brief Perform one iteration of the iteratively-reweighted-least-squares

 *        method for computing linear regression

 */

CREATE AGGREGATE MADLIB_SCHEMA.__logregr_irls_step(

    /*+ y */ BOOLEAN,

    /*+ x */ DOUBLE PRECISION[],

    /*+ previous_state */ DOUBLE PRECISION[]) (


    STYPE=DOUBLE PRECISION[],

    SFUNC=MADLIB_SCHEMA.__logregr_irls_step_transition,

    m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.__logregr_irls_step_merge_states,')

    FINALFUNC=MADLIB_SCHEMA.__logregr_irls_step_final,

    INITCOND='{0,0,0,0}'

);


------------------------------------------------------------------------


/**

 * @internal

 * @brief Perform one iteration of the incremental gradient

 *        method for computing logistic regression

 */

CREATE AGGREGATE MADLIB_SCHEMA.__logregr_igd_step(

    /*+ y */ BOOLEAN,

    /*+ x */ DOUBLE PRECISION[],

    /*+ previous_state */ DOUBLE PRECISION[]) (


    STYPE=DOUBLE PRECISION[],

    SFUNC=MADLIB_SCHEMA.__logregr_igd_step_transition,

    m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.__logregr_igd_step_merge_states,')

    FINALFUNC=MADLIB_SCHEMA.__logregr_igd_step_final,

    INITCOND='{0,0,0,0,0}'

);


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_cg_step_distance(

    /*+ state1 */ DOUBLE PRECISION[],

    /*+ state2 */ DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION AS

'MODULE_PATHNAME', 'internal_logregr_cg_step_distance'

LANGUAGE c IMMUTABLE STRICT;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_cg_result(

    /*+ state */ DOUBLE PRECISION[])

RETURNS MADLIB_SCHEMA.__logregr_result AS

'MODULE_PATHNAME', 'internal_logregr_cg_result'

LANGUAGE c IMMUTABLE STRICT;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_irls_step_distance(

    /*+ state1 */ DOUBLE PRECISION[],

    /*+ state2 */ DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION AS

'MODULE_PATHNAME', 'internal_logregr_irls_step_distance'

LANGUAGE c IMMUTABLE STRICT;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_irls_result(

    /*+ state */ DOUBLE PRECISION[])

RETURNS MADLIB_SCHEMA.__logregr_result AS

'MODULE_PATHNAME', 'internal_logregr_irls_result'

LANGUAGE c IMMUTABLE STRICT;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_igd_step_distance(

    /*+ state1 */ DOUBLE PRECISION[],

    /*+ state2 */ DOUBLE PRECISION[])

RETURNS DOUBLE PRECISION AS

'MODULE_PATHNAME', 'internal_logregr_igd_step_distance'

LANGUAGE c IMMUTABLE STRICT;


------------------------------------------------------------------------


CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__logregr_igd_result(

    /*+ state */ DOUBLE PRECISION[])

RETURNS MADLIB_SCHEMA.__logregr_result AS

'MODULE_PATHNAME', 'internal_logregr_igd_result'

LANGUAGE c IMMUTABLE STRICT;


------------------------------------------------------------------------


/**

 * @brief Compute logistic-regression coefficients and diagnostic statistics

 *

 * To include an intercept in the model, set one coordinate in the

 * <tt>independentVariables</tt> array to 1.

 *

 * @param tbl_source Name of the source relation containing the training data

 * @param tbl_output Name of the output relation to store the model results

 *                   Columns of the output relation are as follows:

 *                    - <tt>coef FLOAT8[]</tt> - Array of coefficients, \f$ \boldsymbol c \f$

 *                    - <tt>log_likelihood FLOAT8</tt> - Log-likelihood \f$ l(\boldsymbol c) \f$

 *                    - <tt>std_err FLOAT8[]</tt> - Array of standard errors,

 *                      \f$ \mathit{se}(c_1), \dots, \mathit{se}(c_k) \f$

 *                    - <tt>z_stats FLOAT8[]</tt> - Array of Wald z-statistics, \f$ \boldsymbol z \f$

 *                    - <tt>p_values FLOAT8[]</tt> - Array of Wald p-values, \f$ \boldsymbol p \f$

 *                    - <tt>odds_ratios FLOAT8[]</tt>: Array of odds ratios,

 *                      \f$ \mathit{odds}(c_1), \dots, \mathit{odds}(c_k) \f$

 *                    - <tt>condition_no FLOAT8</tt> - The condition number of

 *                          matrix \f$ X^T A X \f$ during the iteration

 *                          immediately <em>preceding</em> convergence

 *                          (i.e., \f$ A \f$ is computed using the coefficients

 *                          of the previous iteration)

 * @param dep_col Name of the dependent column (of type BOOLEAN)

 * @param ind_col Name of the independent column (of type DOUBLE

 *        PRECISION[])

 * @param grouping_col Comma delimited list of column names to group-by

 * @param max_iter The maximum number of iterations

 * @param optimizer The optimizer to use (either

 *        <tt>'irls'</tt>/<tt>'newton'</tt> for iteratively reweighted least

 *        squares or <tt>'cg'</tt> for conjugent gradient)

 * @param tolerance The difference between log-likelihood values in successive

 *         iterations that should indicate convergence. This value should be

 *         non-negative and a zero value here disables the convergence criterion,

 *         and execution will only stop after \c maxNumIterations iterations.

 * @param verbose If true, any error or warning message will be printed to the

 *         console (irrespective of the 'client_min_messages' set by server).

 *         If false, no error/warning message is printed to console.

 *

 *

 * @usage

 *  - Get vector of coefficients \f$ \boldsymbol c \f$ and all diagnostic

 *    statistics:\n

 *    <pre>SELECT logregr_train('<em>sourceName</em>', '<em>outName</em>'

 *           '<em>dependentVariable</em>', '<em>independentVariables</em>');

 *          SELECT * from outName;

 *    </pre>

 *  - Get vector of coefficients \f$ \boldsymbol c \f$:\n

 *    <pre>SELECT coef from outName;</pre>

 *  - Get a subset of the output columns, e.g., only the array of coefficients

 *    \f$ \boldsymbol c \f$, the log-likelihood of determination

 *    \f$ l(\boldsymbol c) \f$, and the array of p-values \f$ \boldsymbol p \f$:

 *    <pre>SELECT coef, log_likelihood, p_values FROM outName;</pre>

 *

 * @note This function starts an iterative algorithm. It is not an aggregate

 *       function. Source, output, and column names have to be passed as strings

 *       (due to limitations of the SQL syntax).

 *

 * @internal

 * @sa This function is a wrapper for logistic::compute_logregr(), which

 *     sets the default values.

 */

CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.logregr_train (

    tbl_source          VARCHAR,

    tbl_output          VARCHAR,

    dep_col             VARCHAR,

    ind_col             VARCHAR,

    grouping_col        VARCHAR,

    max_iter            INTEGER,

    optimizer           VARCHAR,

    tolerance           DOUBLE PRECISION,

    verbose             BOOLEAN

) RETURNS VOID AS $$

PythonFunction(regress, logistic, logregr_train)

$$ LANGUAGE plpythonu;


------------------------------------------------------------------------


CREATE FUNCTION MADLIB_SCHEMA.logregr_train (

    tbl_source          VARCHAR,

    tbl_output          VARCHAR,

    dep_col             VARCHAR,

    ind_col             VARCHAR)

RETURNS VOID AS $$

    SELECT MADLIB_SCHEMA.logregr_train($1, $2, $3, $4, NULL::VARCHAR, 20, 'irls', 0.0001, False);

$$ LANGUAGE sql VOLATILE;


------------------------------------------------------------------------


CREATE FUNCTION MADLIB_SCHEMA.logregr_train (

    tbl_source          VARCHAR,

    tbl_output          VARCHAR,

    dep_col             VARCHAR,

    ind_col             VARCHAR,

    grouping_col        VARCHAR)

RETURNS VOID AS $$

    SELECT MADLIB_SCHEMA.logregr_train($1, $2, $3, $4, $5, 20, 'irls', 0.0001, False);

$$LANGUAGE sql VOLATILE;


------------------------------------------------------------------------


CREATE FUNCTION MADLIB_SCHEMA.logregr_train (

    tbl_source          VARCHAR,

    tbl_output          VARCHAR,

    dep_col             VARCHAR,

    ind_col             VARCHAR,

    grouping_col        VARCHAR,

    max_iter            INTEGER)

RETURNS VOID AS $$

    SELECT MADLIB_SCHEMA.logregr_train($1, $2, $3, $4, $5, $6, 'irls', 0.0001, False);

$$LANGUAGE sql VOLATILE;


------------------------------------------------------------------------


CREATE FUNCTION MADLIB_SCHEMA.logregr_train (

    tbl_source          VARCHAR,

    tbl_output          VARCHAR,

    dep_col             VARCHAR,

    ind_col             VARCHAR,

    grouping_col        VARCHAR,

    max_iter            INTEGER,

    optimizer           VARCHAR)

RETURNS VOID AS $$

    SELECT MADLIB_SCHEMA.logregr_train($1, $2, $3, $4, $5, $6, $7, 0.0001, False);

$$ LANGUAGE sql VOLATILE;


------------------------------------------------------------------------


CREATE FUNCTION MADLIB_SCHEMA.logregr_train (

    tbl_source          VARCHAR,

    tbl_output          VARCHAR,

    dep_col             VARCHAR,

    ind_col             VARCHAR,

    grouping_col        VARCHAR,

    max_iter            INTEGER,

    optimizer           VARCHAR,

    tolerance           DOUBLE PRECISION)

RETURNS VOID AS $$

    SELECT MADLIB_SCHEMA.logregr_train($1, $2, $3, $4, $5, $6, $7, $8, False);

$$ LANGUAGE sql VOLATILE;


------------------------------------------------------------------------


/**

 * @brief Evaluate the usual logistic function in an under-/overflow-safe way

 *

 * @param x

 * @returns \f$ \frac{1}{1 + \exp(-x)} \f$

 *

 * Evaluating this expression directly can lead to under- or overflows.

 * This function performs the evaluation in a safe manner, making use of the

 * following observations:

 *

 * In order for the outcome of \f$ \exp(x) \f$ to be within the range of the

 * minimum positive double-precision number (i.e., \f$ 2^{-1074} \f$) and the

 * maximum positive double-precision number (i.e.,

 * \f$ (1 + (1 - 2^{52})) * 2^{1023}) \f$, \f$ x \f$ has to be within the

 * natural logarithm of these numbers, so roughly in between -744 and 709.

 * However, \f$ 1 + \exp(x) \f$ will just evaluate to 1 if \f$ \exp(x) \f$ is

 * less than the machine epsilon (i.e., \f$ 2^{-52} \f$) or, equivalently, if

 * \f$ x \f$ is less than the natural logarithm of that; i.e., in any case if

 * \f$ x \f$ is less than -37.

 * Note that taking the reciprocal of the largest double-precision number will

 * not cause an underflow. Hence, no further checks are necessary.

 */

CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.logistic(x DOUBLE PRECISION)

RETURNS DOUBLE PRECISION

LANGUAGE sql

AS $$

   SELECT CASE WHEN -$1 < -37 THEN 1

               WHEN -$1 > 709 THEN 0

               ELSE 1 / (1 + exp(-$1))

          END;

$$;