With supervised classification, you employ the CTX_CLS.TRAIN
procedure to automate the rule writing step. CTX_CLS.TRAIN
uses a training set of sample documents to deduce classification rules. This is the major advantage over rule-based classification, in which you must write the classification rules.
However, before you can run the CTX_CLS.TRAIN
procedure, you must manually create categories and assign each document in the sample training set to a category.
Oracle Text Reference for more information on CTX_CLS.TRAIN
When the rules are generated, you index them to create a CTXRULE
index. You can then use the MATCHES
operator to classify an incoming stream of new documents.
You may choose between two different classification algorithms for supervised classification:
Decision Tree Supervised Classification
The advantage of Decision Tree classification is that the generated rules are easily observed (and modified). See "Decision Tree Supervised Classification Example".
SVM-Based Supervised Classification
This method uses the Support Vector Machine (SVM) algorithm for creating rules. The advantage of SVM-based classification is that it is often more accurate than Decision Tree classification. The disadvantage is that it generates binary rules, so the rules themselves are opaque. See "SVM-Based Supervised Classification Example".