C4.5 APIs and main controller written in PL/PGSQL. More...

Functions
c45_train_result	c45_train (text split_criterion, text training_table_name, text result_tree_table_name, text validation_table_name, text continuous_feature_names, text feature_col_names, text id_col_name, text class_col_name, float confidence_level, text how2handle_missing_value, int max_tree_depth, float node_prune_threshold, float node_split_threshold, int verbosity)
	This is the long form API of training tree with all specified parameters. More...

c45_train_result	c45_train (text split_criterion, text training_table_name, text result_tree_table_name, text validation_table_name, text continuous_feature_names, text feature_col_names, text id_col_name, text class_col_name, float confidence_level, text how2handle_missing_value)
	C45 train algorithm in short form. More...

c45_train_result	c45_train (text split_criterion, text training_table_name, text result_tree_table_name)
	C45 train algorithm in short form. More...

set< text >	c45_genrule (text tree_table_name, int verbosity)
	Display the trained decision tree model with rules. More...

set< text >	c45_genrule (text tree_table_name)
	Display the trained decision tree model with rules. More...

set< text >	c45_display (text tree_table, int max_depth)
	Display the trained decision tree model with human readable format. More...

set< text >	c45_display (text tree_table)
	Display the whole trained decision tree model with human readable format. More...

c45_classify_result	c45_classify (text tree_table_name, text classification_table_name, text result_table_name, int verbosity)
	Classify dataset using trained decision tree model. The classification result will be stored in the table which is defined as: CREATE TABLE classification_result ( id INT\|BIGINT, class SUPPORTED_DATA_TYPE, prob FLOAT );. More...

c45_classify_result	c45_classify (text tree_table_name, text classification_table_name, text result_table_name)
	Classify dataset using trained decision tree model. It runs in quiet mode. The classification result will be stored in the table which is defined as: More...

float8	c45_score (text tree_table_name, text scoring_table_name, int verbosity)
	Check the accuracy of the decision tree model. More...

float8	c45_score (text tree_table_name, text scoring_table_name)
	Check the accuracy of the decision tree model. More...

boolean	c45_clean (text result_tree_table_name)
	Cleanup the trained tree table and any relevant tables. More...

Detailed Description

Date: April 5, 2012

See Also: For a brief introduction to decision trees, see the module description Decision Tree.

Definition in file c45.sql_in.

Function Documentation

c45_classify_result c45_classify	(	text	tree_table_name,
		text	classification_table_name,
		text	result_table_name,
		int	verbosity
	)

Parameters

tree_table_name	The name of trained tree.
classification_table_name	The name of the table/view with the source data.
result_table_name	The name of result table.
verbosity	> 0 means this function runs in verbose mode.

Returns: A c45_classify_result object.

Definition at line 1020 of file c45.sql_in.

c45_classify_result c45_classify	(	text	tree_table_name,
		text	classification_table_name,
		text	result_table_name
	)

     CREATE TABLE classification_result
     (
         id        INT|BIGINT,
         class     SUPPORTED_DATA_TYPE,
         prob      FLOAT
     );

Parameters

tree_table_name	The name of trained tree.
classification_table_name	The name of the table/view with the source data.
result_table_name	The name of result table.

Returns: A c45_classify_result object.

Definition at line 1131 of file c45.sql_in.

boolean c45_clean ( text result_tree_table_name)

Parameters

result_tree_table_name The name of the table containing the tree's information.

Returns: The status of that cleanup operation.

Definition at line 1225 of file c45.sql_in.

set<text> c45_display	(	text	tree_table,
		int	max_depth
	)

Parameters

tree_table	The name of the table containing the tree's information.
max_depth	The max depth to be displayed. If null, this function will show all levels.

Returns: The text representing the tree with human readable format.

Definition at line 936 of file c45.sql_in.

set<text> c45_display ( text tree_table)

Parameters

tree_table,: The name of the table containing the tree's information.

Returns: The text representing the tree with human readable format.

Definition at line 985 of file c45.sql_in.

set<text> c45_genrule	(	text	tree_table_name,
		int	verbosity
	)

Parameters

tree_table_name	The name of the table containing the tree's information.
verbosity	If >= 1 will run in verbose mode.

Returns: The rule representation text for a decision tree.

Definition at line 620 of file c45.sql_in.

set<text> c45_genrule ( text tree_table_name)

Parameters

tree_table_name The name of the table containing the tree's information.

Returns: The rule representation text for a decision tree.

Definition at line 904 of file c45.sql_in.

float8 c45_score	(	text	tree_table_name,
		text	scoring_table_name,
		int	verbosity
	)

Parameters

tree_table_name	The name of the trained tree.
scoring_table_name	The name of the table/view with the source data.
verbosity	> 0 means this function runs in verbose mode.

Returns: The estimated accuracy information.

Definition at line 1166 of file c45.sql_in.

float8 c45_score	(	text	tree_table_name,
		text	scoring_table_name
	)

Parameters

tree_table_name	The name of the trained tree.
scoring_table_name	The name of the table/view with the source data.

Returns: The estimated accuracy information.

Definition at line 1196 of file c45.sql_in.

c45_train_result c45_train	(	text	split_criterion,
		text	training_table_name,
		text	result_tree_table_name,
		text	validation_table_name,
		text	continuous_feature_names,
		text	feature_col_names,
		text	id_col_name,
		text	class_col_name,
		float	confidence_level,
		text	how2handle_missing_value,
		int	max_tree_depth,
		float	node_prune_threshold,
		float	node_split_threshold,
		int	verbosity
	)

Parameters

split_criterion	The name of the split criterion that should be used for tree construction. The valid values are ‘infogain’, ‘gainratio’, and ‘gini’. It can't be NULL. Information gain(infogain) and gini index(gini) are biased toward multivalued attributes. Gain ratio(gainratio) adjusts for this bias. However, it tends to prefer unbalanced splits in which one partition is much smaller than the others.
training_table_name	The name of the table/view with the source data.
result_tree_table_name	The name of the table where the resulting DT will be kept.
validation_table_name	The name of the table/view that contains the validation set used for tree pruning. The default is NULL, in which case we will not do tree pruning.
continuous_feature_names	A comma-separated list of the names of features whose values are continuous. The default is null, which means there are no continuous features in the training table.
feature_col_names	A comma-separated list of the names of table columns, each of which defines a feature. The default value is null, which means all the columns in the training table, except columns named ‘id’ and ‘class’, will be used as features.
id_col_name	The name of the column containing an ID for each record.
class_col_name	The name of the column containing the labeled class.
confidence_level	A statistical confidence interval of the resubstitution error.
how2handle_missing_value	The way to handle missing value. The valid value is 'explicit' or 'ignore'.
max_tree_depth	Specifies the maximum number of levels in the result DT to avoid overgrown DTs.
node_prune_threshold	The minimum percentage of the number of records required in a child node. It can't be NULL. The range of it is in [0.0, 1.0]. This threshold only applies to the non-root nodes. Therefore, if its value is 1, then the trained tree only has one node (the root node); if its value is 0, then no nodes will be pruned by this parameter.
node_split_threshold	The minimum percentage of the number of records required in a node in order for a further split to be possible. It can't be NULL. The range of it is in [0.0, 1.0]. If it's value is 1, then the trained tree only has two levels, since only the root node can grow; if its value is 0, then trees can grow extensively.
verbosity	> 0 means this function runs in verbose mode.

Returns: An c45_train_result object.

Definition at line 369 of file c45.sql_in.

c45_train_result c45_train	(	text	split_criterion,
		text	training_table_name,
		text	result_tree_table_name,
		text	validation_table_name,
		text	continuous_feature_names,
		text	feature_col_names,
		text	id_col_name,
		text	class_col_name,
		float	confidence_level,
		text	how2handle_missing_value
	)

Parameters

split_criterion	The name of the split criterion that should be used for tree construction. Possible values are ‘gain’, ‘gainratio’, and ‘gini’.
training_table_name	The name of the table/view with the source data.
result_tree_table_name	The name of the table where the resulting DT will be kept.
validation_table_name	The name of the table/view that contains the validation set used for tree pruning. The default is NULL, in which case we will not do tree pruning.
continuous_feature_names	A comma-separated list of the names of features whose values are continuous. The default is null, which means there are no continuous features in the training table.
feature_col_names	A comma-separated list of the names of table columns, each of which defines a feature. The default value is null, which means all the columns in the training table, except columns named ‘id’ and ‘class’, will be used as features.
id_col_name	The name of the column containing an ID for each record.
class_col_name	The name of the column containing the labeled class.
confidence_level	A statistical confidence interval of the resubstitution error.
how2handle_missing_value	The way to handle missing value. The valid value is 'explicit' or 'ignore'.

Returns: An c45_train_result object.

Note

This calls the long form of C45 with the following default parameters:

max_tree_deapth := 10
node_prune_threshold := 0.001
node_split_threshold := 0.01
verbosity := 0

Definition at line 516 of file c45.sql_in.

c45_train_result c45_train	(	text	split_criterion,
		text	training_table_name,
		text	result_tree_table_name
	)

Parameters

split_criterion	The name of the split criterion that should be used for tree construction. Possible values are ‘gain’, ‘gainratio’, and ‘gini’.
training_table_name	The name of the table/view with the source data.
result_tree_table_name	The name of the table where the resulting DT will be kept.

Returns: An c45_train_result object.

Note

This calls the above short form of C45 with the following default parameters:

validation_table_name := NULL
continuous_feature_names := NULL
id_column_name := 'id'
class_column_name := 'class'
confidence_level := 25
how2handle_missing_value := 'explicit'
max_tree_deapth := 10
node_prune_threshold := 0.001
node_split_threshold := 0.01
verbosity := 0

Definition at line 582 of file c45.sql_in.

Functions

Detailed Description

Function Documentation