Documents are classified according to predefined rules. These rules select for a category. For instance, a query rule of 'presidential elections' might select documents for a category about politics.
Oracle Text provides several types of classification. One type is simple, or rule-based classification, discussed here, in which you create both document categories and the rules for categorizing documents. With supervised classification, Oracle Text derives the rules from a set of training documents you provide. With clustering, Oracle Text does all the work for you, deriving both rules and categories.
"Overview of Document Classification" for more information on classification
To create a simple classification application for document content using Oracle Text, you create rules. Rules are essentially a table of queries that categorize document content. You index these rules in a CTXRULE
index. To classify an incoming stream of text, use the MATCHES
operator in the WHERE
clause of a SELECT
statement. See Figure 2-2 for the general flow of a classification application.
Figure 2-2 Overview of a Document Classification Application