2.1.0
User Documentation for Apache MADlib
Define Custom Functions
Warning
For security reasons there are controls on custom functions in MADlib. You must be a superuser to create custom functions because they could theoretically allow execution of any untrusted Python code. Regular users with MADlib USAGE permission can use existing custom functions but cannot create new ones or update existing ones. See references [1] and [2] for information on privileges in Greenplum and PostgreSQL.

This function loads custom Python functions into a table for use by deep learning algorithms.

Custom functions can be useful if, for example, you need loss functions or metrics that are not built into the standard libraries. The functions to be loaded must be in the form of serialized Python objects created using Dill, which extends Python's pickle module to the majority of the built-in Python types [3].

Custom functions can also be used to return top k categorical accuracy in the case that you want a different k value than the default from tensorflow.keras. This module includes a helper function to create the custom function automatically for a specified k.

There is also a utility function to delete a function from the table.

Note
Do not specify a schema for the argument 'object_table' containing the Python objects, because the 'object_table' is automatically put in the MADlib schema. Also, any subsequent SQL queries on this table by regular users must specify '<madlib_schema>.object_table' in the usual way.

Load Function
load_custom_function(
    object_table,
    object,
    name,
    description
    )

Arguments

object_table

VARCHAR. Table to load serialized Python objects. If this table does not exist, it will be created. If this table already exists, a new row is inserted into the existing table. Do not specify schema as part of the object table name, since it will be put in the MADlib schema automatically.

object

BYTEA. PostgreSQL binary data type of the Python object. Object must be created with the Dill package for serializing Python objects.

Note
The Dill package must be installed on all segments of the database cluster [3].
name

TEXT. Name of the object. Must be unique identifier in the table, since this name is used when passing the object to Keras.

description (optional)

TEXT, default: NULL. Free text string to provide a description, if desired.

Output table
The output table contains the following columns:

id SERIAL. Object ID.
name TEXT PRIMARY KEY. Name of the object.
description TEXT. Description of the object (free text).
object BYTEA. Serialized Python object stored as a PostgreSQL binary data type.

Delete Function

Delete by id:

delete_custom_function(
    object_table,
    id
)

Or alternatively by name:

delete_custom_function(
    object_table,
    name
)

Arguments

object_table
VARCHAR. Table containing Python object to be deleted. Do not specify schema as part of the object table name.
id
INTEGER. The id of the object to be deleted.
name
TEXT. Name of the object to be deleted.

Top k Accuracy Function

Create and load a custom function for a specific k into the custom functions table. The Keras accuracy parameter 'top_k_categorical_accuracy' returns top 5 accuracy by default [4]. If you want a different top k value, use this helper function to create a custom Python function to compute the top k accuracy that you specify.

load_top_k_accuracy_function(
    object_table,
    k
    )

Arguments

object_table

VARCHAR. Table to load serialized Python objects. If this table does not exist, it will be created. If this table already exists, a new row is inserted into the existing table. Do not specify schema as part of the object table name, since it will be put in the MADlib schema automatically.

k

INTEGER. k value for the top k accuracy that you want.

Output table
The output table contains the following columns:

id SERIAL. Object ID.
name TEXT PRIMARY KEY. Name of the object. Generated with the following pattern: top_(k)_accuracy.
description TEXT. Description of the object.
object BYTEA. Serialized Python object stored as a PostgreSQL binary data type.

Examples
  1. Load object using psycopg2. Psycopg is a PostgreSQL database adapter for the Python programming language. Note need to use the psycopg2.Binary() method to pass as bytes.
    # import database connector psycopg2 and create connection cursor
    import psycopg2 as p2
    conn = p2.connect('postgresql://gpadmin:8000/madlib')
    cur = conn.cursor()
    # import Dill and define functions
    import dill
    # custom loss
    def squared_error(y_true, y_pred):
        import keras.backend as K
        return K.square(y_pred - y_true)
    pb_squared_error=dill.dumps(squared_error)
    # custom metric
    def rmse(y_true, y_pred):
        import keras.backend as K
        return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
    pb_rmse=dill.dumps(rmse)
    # call load function
    cur.execute("DROP TABLE IF EXISTS madlib.custom_function_table")
    cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'squared_error', 'squared error')", [p2.Binary(pb_squared_error)])
    cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'rmse', 'root mean square error')", [p2.Binary(pb_rmse)])
    conn.commit()
    
    List table to see objects:
    SELECT id, name, description FROM madlib.custom_function_table ORDER BY id;
    
     id |     name      |      description
    ----+---------------+------------------------
      1 | squared_error | squared error
      2 | rmse          | root mean square error
    
  2. Load object using a PL/Python function. First define the objects:
    CREATE OR REPLACE FUNCTION custom_function_squared_error()
    RETURNS BYTEA AS
    $$
    import dill
    def squared_error(y_true, y_pred):
        import keras.backend as K
        return K.square(y_pred - y_true)
    pb_squared_error=dill.dumps(squared_error)
    return pb_squared_error
    $$ language plpython3u;
    CREATE OR REPLACE FUNCTION custom_function_rmse()
    RETURNS BYTEA AS
    $$
    import dill
    def rmse(y_true, y_pred):
        import keras.backend as K
        return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
    pb_rmse=dill.dumps(rmse)
    return pb_rmse
    $$ language plpython3u;
    
    Now call loader:
    DROP TABLE IF EXISTS madlib.custom_function_table;
    SELECT madlib.load_custom_function('custom_function_table',
                                       custom_function_squared_error(),
                                       'squared_error',
                                       'squared error');
    SELECT madlib.load_custom_function('custom_function_table',
                                       custom_function_rmse(),
                                       'rmse',
                                       'root mean square error');
    
  3. Delete an object by id:
    SELECT madlib.delete_custom_function( 'custom_function_table', 1);
    SELECT id, name, description FROM madlib.custom_function_table ORDER BY id;
    
     id | name |      description
    ----+------+------------------------
      2 | rmse | root mean square error
    
    Delete an object by name:
    SELECT madlib.delete_custom_function( 'custom_function_table', 'rmse');
    
    If all objects are deleted from the table using this function, the table itself will be dropped.
  4. Load top 3 accuracy function followed by a top 10 accuracy function:
    DROP TABLE IF EXISTS custom_function_table;
    SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                               3);
    SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                               10);
    SELECT id, name, description FROM madlib.custom_function_table ORDER BY id;
    
     id |      name       |       description
    ----+-----------------+-------------------------
      1 | top_3_accuracy  | returns top_3_accuracy
      2 | top_10_accuracy | returns top_10_accuracy
    
Literature

[1] https://gpdb.docs.pivotal.io/latest/admin_guide/roles_privs.html

[2] https://www.postgresql.org/docs/current/ddl-priv.html

[3] Python catalog for Dill package https://pypi.org/project/dill/

[4] https://keras.io/api/metrics/accuracy_metrics/#topkcategoricalaccuracy-class

Related Topics

See madlib_keras_custom_function.sql_in