MADlib
1.1 A newer version is available
User Documentation
|
SQL function for POS/NER feature extraction. More...
Go to the source code of this file.
Functions | |
void | crf_train_fgen (text segmenttbl, text regextbl, text dictionary, text featuretbl, text featureset) |
This function extracts POS/NER features from the training data. More... | |
void | crf_test_fgen (text segmenttbl, text dictionary, text labeltbl, text regextbl, text featuretbl, text viterbi_mtbl, text viterbi_rtbl) |
This function extracts POS/NER features from the testing data. More... | |
Definition in file crf_feature_gen.sql_in.
void crf_test_fgen | ( | text | segmenttbl, |
text | dictionary, | ||
text | labeltbl, | ||
text | regextbl, | ||
text | featuretbl, | ||
text | viterbi_mtbl, | ||
text | viterbi_rtbl | ||
) |
This feature extraction function will produce two factor tables, "m table" (viterbi_mtbl) and "r table" (viterbi_rtbl). The viterbi_mtbl table and viterbi_rtbl table are used to calculate the best label sequence for each sentence.
startFeature is considered as a special edge feature which is from the beginning to the first token. Likewise, endFeature can be considered as a special edge feature which is from the last token to the very end. So m table encodes the edgeFeature, startFeature, and endFeature. If the total number of labels in the label space is 45 from 0 to 44, then the m factor array is as follows:
0 1 2 3 4 5...44 startFeature -1 a a a a a a...a edgeFeature 0 a a a a a a...a edgeFeature 1 a a a a a a...a ... edgeFeature 44 a a a a a a...a endFeature 45 a a a a a a...a
0 1 2 3 4...44 token1 a a a a a...a token2 a a a a a...a
segmenttbl | Name of table containing all the tokenized testing sentences. |
dictionary | Name of table containing the dictionary. |
labeltbl | Name of table containing the the label space used in POS or other NLP tasks. |
regextbl | Name of table containing all the regular expressions to capture regex features. |
viterbi_mtbl | Name of table to store the m factors. |
viterbi_rtbl | Name of table to store the r factors. |
Definition at line 231 of file crf_feature_gen.sql_in.
void crf_train_fgen | ( | text | segmenttbl, |
text | regextbl, | ||
text | dictionary, | ||
text | featuretbl, | ||
text | featureset | ||
) |
segmenttbl | Name of table containing all the tokenized training sentences. |
regextbl | Name of table containing all the regular expressions to capture regex features. |
dictionary | Name of table containing the dictionary. |
featuretbl | features generated from the traning dataset |
featureset | unique featrue set generated from the training dataset |
Definition at line 46 of file crf_feature_gen.sql_in.