User Documentation
 All Files Functions Groups
robust.sql_in File Reference

SQL functions for linear regression. More...

Go to the source code of this file.

Functions

aggregate robust_linregr_result robust_linregr (float8 dependentVariable, float8[] independentVariables, float8[] coef)
 Compute robust regression diagnostic statistics for linear regression. More...
 
CREATE FUNCTION MADlib __internal_get_robust_linregr_result (source_table VARCHAR--name of input table, dependent_varname VARCHAR--name of dependent variable, independent_varname VARCHAR--name of independent variable, linregr_coeffs DOUBLE PRECISION[]--coeffs from linear regression) RETURNS MADlib.robust_linregr_result AS $$DECLARE robust_value MADlib.robust_linregr_result
 Return robust linear regression estimates given a set of coefficients.
 
CREATE FUNCTION MADlib __internal_get_robust_linregr_insert_string (robust_lin_rst MADlib.robust_linregr_result, linregr_coeffs DOUBLE PRECISION[]--coeffs from linear regression, out_table TEXT) RETURNS VARCHAR AS $$DECLARE insert_string VARCHAR
 Return insert string for robust linear regression.
 
CREATE OR REPLACE FUNCTION MADlib robust_variance_linregr (source_table VARCHAR--name of input table, out_table VARCHAR--name of output table, dependent_variable VARCHAR--name of dependent variable, independent_variable VARCHAR--name of independent variable) RETURNS VOID AS $$BEGIN PERFORM MADlib.robust_variance_linregr(source_table
 Robust linear regression with default fit regression behaviour & no grouping.
 
Robust Linear Regression
CREATE OR REPLACE FUNCTION
MADlib 
robust_variance_linregr (source_table VARCHAR--name of input table, out_table VARCHAR--name of output table, dependent_varname VARCHAR--name of dependent variable, input_independent_varname VARCHAR--name of independent variable, input_group_cols VARCHAR--grouping columns) RETURNS VOID AS $$DECLARE insert_string VARCHAR
 Robust linear regression function subcall.
 
aggregate robust_logregr_result robust_logregr (boolean dependentVariable, float8[] independentVariables, float8[] coef)
 Compute robust regression diagnostic statistics for logistic regression. More...
 
CREATE FUNCTION MADlib __internal_get_robust_logregr_result (source_table VARCHAR--name of input table, dependent_varname VARCHAR--name of dependent variable, independent_varname VARCHAR--name of independent variable, logregr_coeffs DOUBLE PRECISION[]--coeffs from logear regression) RETURNS MADlib.robust_logregr_result AS $$DECLARE robust_value MADlib.robust_logregr_result
 Return robust logistic regression estimates given a set of coefficients.
 
CREATE FUNCTION MADlib __internal_get_robust_logregr_insert_string (robust_log_rst MADlib.robust_logregr_result, out_table TEXT) RETURNS VARCHAR AS $$DECLARE insert_string VARCHAR
 Return insert string for robust logistic regression.
 
void robust_variance_logregr (varchar source_table, varchar out_table, varchar dependent_varname, varchar input_independent_varname, varchar input_group_cols, integer max_iter, varchar optimizer, float8 tolerance, boolean print_warnings)
 The robust logistic regression function. More...
 
CREATE OR REPLACE FUNCTION MADlib robust_variance_logregr (source_table VARCHAR,--name of input table out_table VARCHAR,--name of output table dependent_variable VARCHAR,--name of dependent variable independent_variable VARCHAR,--name of independent variable input_group_cols VARCHAR--grouping columns) RETURNS VOID AS $$BEGIN PERFORM MADlib.robust_variance_logregr(source_table
 Robust logistic function subcall.
 
CREATE OR REPLACE FUNCTION MADlib robust_variance_logregr (source_table VARCHAR--name of input table, out_table VARCHAR--name of output table, dependent_variable VARCHAR--name of dependent variable, independent_variable VARCHAR--name of independent variable) RETURNS VOID AS $$BEGIN PERFORM MADlib.robust_variance_logregr(source_table
 Robust logistic regression with default fit regression behavior, and no grouping,.
 

Detailed Description

Date
January 2011
See Also
Calculates robust statistics for various regression models.

Definition in file robust.sql_in.

Function Documentation

aggregate robust_linregr_result robust_linregr ( float8  dependentVariable,
float8[]  independentVariables,
float8[]  coef 
)
Parameters
dependentVariableColumn containing the dependent variable
independentVariablesColumn containing the array of independent variables
coefColumn containing the array of the OLS coefficients (as obtained by linregr)
To include an intercept in the model, set one coordinate in the independentVariables array to 1.
Returns
A composite value:
  • std_err FLOAT8[] - Array of huber-white standard errors, \( \mathit{se}(c_1), \dots, \mathit{se}(c_k) \)
  • t_stats FLOAT8[] - Array of t-statistics, \( \boldsymbol t \)
  • p_values FLOAT8[] - Array of p-values, \( \boldsymbol p \)
Usage:
  • Get all the diagnostic statistics:
 SELECT (robust_linregr(dependentVariable,
    independentVariables, coef)).*
    FROM (
    SELECT linregr(dependentVariable, independentVariables).coef
    ) AS ols_coef, sourceName as src;
 
  • Get a subset of the output columns, e.g., only the condition number and the array of p-values \( \boldsymbol p \):
    SELECT (lr).robust_condition_no, (lr).robust_p_values
    FROM (
     
     SELECT (robust_linregr(dependentVariable,
        independentVariables, coef)).*
        FROM (
        SELECT linregr(dependentVariable, independentVariables).coef
        ) AS ols_coef, sourceName as src
    ) AS subq;

Definition at line 320 of file robust.sql_in.

aggregate robust_logregr_result robust_logregr ( boolean  dependentVariable,
float8[]  independentVariables,
float8[]  coef 
)
Parameters
dependentVariableColumn containing the dependent variable
independentVariablesColumn containing the array of independent variables
coefColumn containing the array of the coefficients (as obtained by logregr)
To include an intercept in the model, set one coordinate in the independentVariables array to 1.
Returns
A composite value:
  • coef FLOAT8[] - The coefficients for the regression
  • std_err FLOAT8[] - Array of huber-white standard errors, \( \mathit{se}(c_1), \dots, \mathit{se}(c_k) \)
  • z_stats FLOAT8[] - Array of Wald z-statistics, \( \boldsymbol t \)
  • p_values FLOAT8[] - Array of p-values, \( \boldsymbol p \)
Usage:
  • Get all the diagnostic statistics:
 SELECT robust_logregr(dependentVariable,
 independentVariables, coef)
 FROM dataTable;

Definition at line 751 of file robust.sql_in.

void robust_variance_logregr ( varchar  source_table,
varchar  out_table,
varchar  dependent_varname,
varchar  input_independent_varname,
varchar  input_group_cols,
integer  max_iter,
varchar  optimizer,
float8  tolerance,
boolean  print_warnings 
)
Parameters
source_tableString identifying the input table
out_tableString identifying the output table to be created
dependent_varnameColumn containing the dependent variable
independent_varnameColumn containing the array of independent variables
input_group_colsColumns to group by.
max_iterInteger identifying the maximum iterations used by the logistic regression solver. Default is 20.
optimizerString identifying the optimizer used in the logistic regression. See the documentation in the logistic regression for the available options. Default is irls.
toleranceFloat identifying the tolerance of the logistic regression optimizer. Default is 0.0001.
print_warningsBoolean specifying if the regression fit should print any warning messages. Default is false.
To include an intercept in the model, set one coordinate in the independent_varname array to 1.
Returns
A composite value:
  • std_err FLOAT8[] - Array of huber-white standard errors, \( \mathit{se}(c_1), \dots, \mathit{se}(c_k) \)
  • t_stats FLOAT8[] - Array of t-statistics, \( \boldsymbol t \)
  • p_values FLOAT8[] - Array of p-values, \( \boldsymbol p \)
Usage:
For function summary information. Run sql> select robust_variance_logregr('help'); OR sql> select robust_variance_logregr(); OR sql> select robust_variance_logregr('?'); For function usage information. Run sql> select robust_variance_logregr('usage');
  • Compute the coefficients, and the get the robust diagnostic statistics:
       select robust_variance_logregr(source_table, out_table, regression_type, dependentVariable, independentVariables, NULL );
      
  • If the coefficients are already known, they can be provided directly
    select robust_variance_logregr(source_table, out_table, regression_type, dependentVariable, independentVariables, coef );
    

Definition at line 936 of file robust.sql_in.