MADlib
1.1 A newer version is available
User Documentation
|
C4.5 APIs and main controller written in PL/PGSQL. More...
Go to the source code of this file.
Functions | |
c45_train_result | c45_train (text split_criterion, text training_table_name, text result_tree_table_name, text validation_table_name, text continuous_feature_names, text feature_col_names, text id_col_name, text class_col_name, float confidence_level, text how2handle_missing_value, int max_tree_depth, float node_prune_threshold, float node_split_threshold, int verbosity) |
This is the long form API of training tree with all specified parameters. More... | |
c45_train_result | c45_train (text split_criterion, text training_table_name, text result_tree_table_name, text validation_table_name, text continuous_feature_names, text feature_col_names, text id_col_name, text class_col_name, float confidence_level, text how2handle_missing_value) |
C45 train algorithm in short form. More... | |
c45_train_result | c45_train (text split_criterion, text training_table_name, text result_tree_table_name) |
C45 train algorithm in short form. More... | |
set< text > | c45_genrule (text tree_table_name, int verbosity) |
Display the trained decision tree model with rules. More... | |
set< text > | c45_genrule (text tree_table_name) |
Display the trained decision tree model with rules. More... | |
set< text > | c45_display (text tree_table, int max_depth) |
Display the trained decision tree model with human readable format. More... | |
set< text > | c45_display (text tree_table) |
Display the whole trained decision tree model with human readable format. More... | |
c45_classify_result | c45_classify (text tree_table_name, text classification_table_name, text result_table_name, int verbosity) |
Classify dataset using trained decision tree model. The classification result will be stored in the table which is defined as: CREATE TABLE classification_result ( id INT|BIGINT, class SUPPORTED_DATA_TYPE, prob FLOAT );. More... | |
c45_classify_result | c45_classify (text tree_table_name, text classification_table_name, text result_table_name) |
Classify dataset using trained decision tree model. It runs in quiet mode. The classification result will be stored in the table which is defined as: More... | |
float8 | c45_score (text tree_table_name, text scoring_table_name, int verbosity) |
Check the accuracy of the decision tree model. More... | |
float8 | c45_score (text tree_table_name, text scoring_table_name) |
Check the accuracy of the decision tree model. More... | |
boolean | c45_clean (text result_tree_table_name) |
Cleanup the trained tree table and any relevant tables. More... | |
Definition in file c45.sql_in.
c45_classify_result c45_classify | ( | text | tree_table_name, |
text | classification_table_name, | ||
text | result_table_name, | ||
int | verbosity | ||
) |
tree_table_name | The name of trained tree. |
classification_table_name | The name of the table/view with the source data. |
result_table_name | The name of result table. |
verbosity | > 0 means this function runs in verbose mode. |
Definition at line 1020 of file c45.sql_in.
c45_classify_result c45_classify | ( | text | tree_table_name, |
text | classification_table_name, | ||
text | result_table_name | ||
) |
CREATE TABLE classification_result ( id INT|BIGINT, class SUPPORTED_DATA_TYPE, prob FLOAT );
tree_table_name | The name of trained tree. |
classification_table_name | The name of the table/view with the source data. |
result_table_name | The name of result table. |
Definition at line 1131 of file c45.sql_in.
boolean c45_clean | ( | text | result_tree_table_name) |
result_tree_table_name | The name of the table containing the tree's information. |
Definition at line 1225 of file c45.sql_in.
set<text> c45_display | ( | text | tree_table, |
int | max_depth | ||
) |
tree_table | The name of the table containing the tree's information. |
max_depth | The max depth to be displayed. If null, this function will show all levels. |
Definition at line 936 of file c45.sql_in.
set<text> c45_display | ( | text | tree_table) |
tree_table,: | The name of the table containing the tree's information. |
Definition at line 985 of file c45.sql_in.
set<text> c45_genrule | ( | text | tree_table_name, |
int | verbosity | ||
) |
tree_table_name | The name of the table containing the tree's information. |
verbosity | If >= 1 will run in verbose mode. |
Definition at line 620 of file c45.sql_in.
set<text> c45_genrule | ( | text | tree_table_name) |
tree_table_name | The name of the table containing the tree's information. |
Definition at line 904 of file c45.sql_in.
float8 c45_score | ( | text | tree_table_name, |
text | scoring_table_name, | ||
int | verbosity | ||
) |
tree_table_name | The name of the trained tree. |
scoring_table_name | The name of the table/view with the source data. |
verbosity | > 0 means this function runs in verbose mode. |
Definition at line 1166 of file c45.sql_in.
float8 c45_score | ( | text | tree_table_name, |
text | scoring_table_name | ||
) |
tree_table_name | The name of the trained tree. |
scoring_table_name | The name of the table/view with the source data. |
Definition at line 1196 of file c45.sql_in.
c45_train_result c45_train | ( | text | split_criterion, |
text | training_table_name, | ||
text | result_tree_table_name, | ||
text | validation_table_name, | ||
text | continuous_feature_names, | ||
text | feature_col_names, | ||
text | id_col_name, | ||
text | class_col_name, | ||
float | confidence_level, | ||
text | how2handle_missing_value, | ||
int | max_tree_depth, | ||
float | node_prune_threshold, | ||
float | node_split_threshold, | ||
int | verbosity | ||
) |
split_criterion | The name of the split criterion that should be used for tree construction. The valid values are ‘infogain’, ‘gainratio’, and ‘gini’. It can't be NULL. Information gain(infogain) and gini index(gini) are biased toward multivalued attributes. Gain ratio(gainratio) adjusts for this bias. However, it tends to prefer unbalanced splits in which one partition is much smaller than the others. |
training_table_name | The name of the table/view with the source data. |
result_tree_table_name | The name of the table where the resulting DT will be kept. |
validation_table_name | The name of the table/view that contains the validation set used for tree pruning. The default is NULL, in which case we will not do tree pruning. |
continuous_feature_names | A comma-separated list of the names of features whose values are continuous. The default is null, which means there are no continuous features in the training table. |
feature_col_names | A comma-separated list of the names of table columns, each of which defines a feature. The default value is null, which means all the columns in the training table, except columns named ‘id’ and ‘class’, will be used as features. |
id_col_name | The name of the column containing an ID for each record. |
class_col_name | The name of the column containing the labeled class. |
confidence_level | A statistical confidence interval of the resubstitution error. |
how2handle_missing_value | The way to handle missing value. The valid value is 'explicit' or 'ignore'. |
max_tree_depth | Specifies the maximum number of levels in the result DT to avoid overgrown DTs. |
node_prune_threshold | The minimum percentage of the number of records required in a child node. It can't be NULL. The range of it is in [0.0, 1.0]. This threshold only applies to the non-root nodes. Therefore, if its value is 1, then the trained tree only has one node (the root node); if its value is 0, then no nodes will be pruned by this parameter. |
node_split_threshold | The minimum percentage of the number of records required in a node in order for a further split to be possible. It can't be NULL. The range of it is in [0.0, 1.0]. If it's value is 1, then the trained tree only has two levels, since only the root node can grow; if its value is 0, then trees can grow extensively. |
verbosity | > 0 means this function runs in verbose mode. |
Definition at line 369 of file c45.sql_in.
c45_train_result c45_train | ( | text | split_criterion, |
text | training_table_name, | ||
text | result_tree_table_name, | ||
text | validation_table_name, | ||
text | continuous_feature_names, | ||
text | feature_col_names, | ||
text | id_col_name, | ||
text | class_col_name, | ||
float | confidence_level, | ||
text | how2handle_missing_value | ||
) |
split_criterion | The name of the split criterion that should be used for tree construction. Possible values are ‘gain’, ‘gainratio’, and ‘gini’. |
training_table_name | The name of the table/view with the source data. |
result_tree_table_name | The name of the table where the resulting DT will be kept. |
validation_table_name | The name of the table/view that contains the validation set used for tree pruning. The default is NULL, in which case we will not do tree pruning. |
continuous_feature_names | A comma-separated list of the names of features whose values are continuous. The default is null, which means there are no continuous features in the training table. |
feature_col_names | A comma-separated list of the names of table columns, each of which defines a feature. The default value is null, which means all the columns in the training table, except columns named ‘id’ and ‘class’, will be used as features. |
id_col_name | The name of the column containing an ID for each record. |
class_col_name | The name of the column containing the labeled class. |
confidence_level | A statistical confidence interval of the resubstitution error. |
how2handle_missing_value | The way to handle missing value. The valid value is 'explicit' or 'ignore'. |
Definition at line 516 of file c45.sql_in.
c45_train_result c45_train | ( | text | split_criterion, |
text | training_table_name, | ||
text | result_tree_table_name | ||
) |
split_criterion | The name of the split criterion that should be used for tree construction. Possible values are ‘gain’, ‘gainratio’, and ‘gini’. |
training_table_name | The name of the table/view with the source data. |
result_tree_table_name | The name of the table where the resulting DT will be kept. |
Definition at line 582 of file c45.sql_in.