The second method we can use for training purposes is known as Support Vector Machine (SVM) classification. SVM is a type of machine learning algorithm derived from statistical learning theory. A property of SVM classification is the ability to learn from a very small sample set.
Using the SVM classifier is much the same as using the Decision Tree classifier, with the following differences.
The preference used in the call to CTX_CLS.TRAIN
should be of type SVM_CLASSIFIER
instead of RULE_CLASSIFIER
. (If you do not want to modify any attributes, you can use the predefined preference CTXSYS.SVM_CLASSIFIER
.)
The CONTEXT
index on the table does not have to be populated; that is, you can use the NOPOPULATE
keyword. The classifier uses it only to find the source of the text, by means of datastore and filter preferences, and to determine how to process the text, through lexer and sectioner preferences.
The table for the generated rules must have (as a minimum) these columns:
cat_id number, type number, rule blob;
As you can see, the generated rule is written into a BLOB
column. It is therefore opaque to the user, and unlike Decision Tree classification rules, it cannot be edited or modified. The trade-off here is that you often get considerably better accuracy with SVM than with Decision Tree classification.
With SVM classification, allocated memory has to be large enough to load the SVM model; otherwise, the application built on SVM will incur an out-of-memory error. Here is how to calculate the memory allocation:
Minimum memory request (in bytes) = number of unique categories x number of features example: (value of MAX_FEATURES attributes) x 8
If necessary to meet the minimum memory requirements, either:
increase SGA memory (if in shared server mode)
increase PGA memory (if in dedicated server mode)