pmml ( model_table, name_spec )Arguments
VARCHAR. The name of the table containing the model.
Output XML. The output of this function is a standard PMML document, some examples of which are covered in the next section.
Usually the user wants to export the resulting PMML contents into a PMML file so that external softwares can use it. The following method can be used (Note: the user needs to use unaligned table output mode for psql with '-A' flag. And inside psql client, both '\t' and '\o' should be used):
> # under bash > psql -A my_database # -- in psql now # \t # \o test.pmml -- export to a file # select madlib.pmml('tree_out'); # \o # \t
CREATE TABLE patients( id integer NOT NULL, second_attack integer, treatment integer, trait_anxiety integer); INSERT INTO patients(id, second_attack, treatment, trait_anxiety) VALUES ( 1, 1, 1, 70), ( 3, 1, 1, 50), ( 5, 1, 0, 40), ( 7, 1, 0, 75), ( 9, 1, 0, 70), (11, 0, 1, 65), (13, 0, 1, 45), (15, 0, 1, 40), (17, 0, 0, 55), (19, 0, 0, 50), ( 2, 1, 1, 80), ( 4, 1, 0, 60), ( 6, 1, 0, 65), ( 8, 1, 0, 80), (10, 1, 0, 60), (12, 0, 1, 50), (14, 0, 1, 35), (16, 0, 1, 50), (18, 0, 0, 45), (20, 0, 0, 60);
SELECT madlib.logregr_train( 'patients', 'patients_logregr', 'second_attack', 'ARRAY[1, treatment, trait_anxiety]');
SELECT madlib.pmml('patients_logregr');Result:
<?xml version="1.0" standalone="yes"?> <PMML version="4.1" xmlns="http://www.dmg.org/pmml-v4-1.html"> <Header copyright="redacted for this example"> <Extension extender="MADlib" name="user" value="gpadmin"> <Application name="MADlib" version="1.7"> <Timestamp> 2014-06-13 17:30:14.527899 PDT </Timestamp> </Header> <DataDictionary numberoffields="4"> <DataField datatype="boolean" name="second_attack_pmml_prediction" optype="categorical"> <DataField datatype="double" name="1" optype="continuous"> <DataField datatype="double" name="treatment" optype="continuous"> <DataField datatype="double" name="trait_anxiety" optype="continuous"> </DataDictionary> <RegressionModel functionname="classification" normalizationmethod="softmax"> <MiningSchema> <MiningField name="second_attack_pmml_prediction" usagetype="predicted"> <MiningField name="1"> <MiningField name="treatment"> <MiningField name="trait_anxiety"> </MiningSchema> <RegressionTable intercept="0.0" targetcategory="True"> <NumericPredictor coefficient="-6.36346994178" name="1"> <NumericPredictor coefficient="-1.02410605239" name="treatment"> <NumericPredictor coefficient="0.119044916669" name="trait_anxiety"> </RegressionTable> <RegressionTable intercept="0.0" targetcategory="False"> </RegressionModel> </PMML>
Alternatively, the above can also be invoked as below if custom names are needed for fields in the Data Dictionary:
SELECT madlib.pmml('patients_logregr', 'out_attack~1+in_trait_anxiety+in_treatment');
Note: If the second argument of 'pmml' function is not specified, a default suffix "_pmml_prediction" will be automatically append to the column name to be predicted. This can help avoid name conflicts.
The following example demonstrates grouping columns in the model table for the same dataset as the previous example.
SELECT madlib.logregr_train( 'patients', 'patients_logregr_grouping', 'second_attack', 'ARRAY[1, trait_anxiety]', 'treatment');
SELECT madlib.pmml('patients_logregr_grouping', ARRAY['second_attack','1','in_trait_anxiety']);Result:
<?xml version="1.0" standalone="yes"?> <PMML version="4.1" xmlns="http://www.dmg.org/pmml-v4-1.html"> <Header copyright="redacted for this example"> <Extension extender="MADlib" name="user" value="gpadmin"> <Application name="MADlib" version="1.7"> <Timestamp> 2014-06-13 17:37:55.786307 PDT </Timestamp> </Header> <DataDictionary numberoffields="4"> <DataField datatype="boolean" name="second_attack" optype="categorical"> <DataField datatype="double" name="1" optype="continuous"> <DataField datatype="double" name="in_trait_anxiety" optype="continuous"> <DataField datatype="string" name="treatment" optype="categorical"> </DataDictionary> <MiningModel functionname="classification"> <MiningSchema> <MiningField name="second_attack" usagetype="predicted"> <MiningField name="1"> <MiningField name="in_trait_anxiety"> <MiningField name="treatment"> </MiningSchema> <Segmentation multiplemodelmethod="selectFirst"> <Segment> <SimplePredicate field="treatment" operator="equal" value="1"> <RegressionModel functionname="classification" normalizationmethod="softmax"> <MiningSchema> <MiningField name="second_attack" usagetype="predicted"> <MiningField name="1"> <MiningField name="in_trait_anxiety"> </MiningSchema> <RegressionTable intercept="0.0" targetcategory="True"> <NumericPredictor coefficient="-8.02068430057" name="1"> <NumericPredictor coefficient="0.130090428526" name="in_trait_anxiety"> </RegressionTable> <RegressionTable intercept="0.0" targetcategory="False"> </RegressionModel> </Segment> <Segment> <SimplePredicate field="treatment" operator="equal" value="0"> <RegressionModel functionname="classification" normalizationmethod="softmax"> <MiningSchema> <MiningField name="second_attack" usagetype="predicted"> <MiningField name="1"> <MiningField name="in_trait_anxiety"> </MiningSchema> <RegressionTable intercept="0.0" targetcategory="True"> <NumericPredictor coefficient="-5.75043192191" name="1"> <NumericPredictor coefficient="0.108282446319" name="in_trait_anxiety"> </RegressionTable> <RegressionTable intercept="0.0" targetcategory="False"> </RegressionModel> </Segment> </Segmentation> </MiningModel> </PMML>
Note: MADlib currently supports PMML export for Linear Regression, Logistic Regression, Generalized Linear Regression Model, Multinomial Logistic Regression, Ordinal Linear Regression, Decision Tree and Random Forests.
In Ordinal Regression, the signs of feature coefficients will be different in PMML export and in the default output model table from ordinal(). This is due to the difference of model settings.
MADlib follows the PMML v4.1 standard. For more details about PMML, see http://www.dmg.org/v4-1/GeneralStructure.html.
File table_to_pmml.sql_in documenting the PMML export functions.