The CTX_CLS.TRAIN
procedure requires an input training document set. A training set is a set of documents that have already been assigned a category.
Perform the following steps to create the category rules.
Create and populate a training document table
Create and load a table of training documents. This example uses a simple set; three concern fast food and three concern computers.
create table docs ( doc_id number primary key, doc_text clob); insert into docs values (1, 'MacTavishes is a fast-food chain specializing in burgers, fries and - shakes. Burgers are clearly their most important line.'); insert into docs values (2, 'Burger Prince are an up-market chain of burger shops, who sell burgers - and fries in competition with the likes of MacTavishes.'); insert into docs values (3, 'Shakes 2 Go are a new venture in the low-cost restaurant arena, specializing in semi-liquid frozen fruit-flavored vegetable oil products.'); insert into docs values (4, 'TCP/IP network engineers generally need to know about routers, firewalls, hosts, patch cables networking etc'); insert into docs values (5, 'Firewalls are used to protect a network from attack by remote hosts, generally across TCP/IP');
Create category tables, category descriptions and ids
----------------------------------------------------------------------------
-- Create category tables -- Note that "category_descriptions" isn't really needed for this demo - -- it just provides a descriptive name for the category numbers in -- doc_categories ---------------------------------------------------------------------------- create table category_descriptions ( cd_category number, cd_description varchar2(80)); create table doc_categories ( dc_category number, dc_doc_id number, primary key (dc_category, dc_doc_id)) organization index; -- descriptons for categories insert into category_descriptions values (1, 'fast food'); insert into category_descriptions values (2, 'computer networking');
Assign each document to a category
In this case, the fast food documents all go into category 1, and the computer documents into category 2.
insert into doc_categories values (1, 1); insert into doc_categories values (1, 2); insert into doc_categories values (1, 3); insert into doc_categories values (2, 4); insert into doc_categories values (2, 5);
Create a CONTEXT index to be used by CTX_CLS.TRAIN
Create an Oracle Text preference for the index. This enables us to experiment with the effects of turning themes on and off:
exec ctx_ddl.create_preference('my_lex', 'basic_lexer'); exec ctx_ddl.set_attribute ('my_lex', 'index_themes', 'no'); exec ctx_ddl.set_attribute ('my_lex', 'index_text', 'yes'); create index docsindex on docs(doc_text) indextype is ctxsys.context parameters ('lexer my_lex');
Create the rules table
Create the table that will be populated by the generated rules.
create table rules( rule_cat_id number, rule_text varchar2(4000), rule_confidence number );
Call CTX_CLS.TRAIN procedure to generate category rules
Now call the CTX_CLS.TRAIN
procedure to generate some rules. Note all the arguments are the names of tables, columns or indexes previously created in this example. The rules
table now contains the rules, which you can view.
begin ctx_cls.train( index_name => 'docsindex', docid => 'doc_id', cattab => 'doc_categories', catdocid => 'dc_doc_id', catid => 'dc_category', restab => 'rules', rescatid => 'rule_cat_id', resquery => 'rule_text', resconfid => 'rule_confidence' ); end; /
Fetch the generated rules, viewed by category
Fetch the generated rules. For convenience's sake, the rules
table is joined with category_descriptions
so we can see to which category each rule applies:
select cd_description, rule_confidence, rule_text from rules, category_descriptions where cd_category = rule_cat_id;