Create the Category Rules

The CTX_CLS.TRAIN procedure requires an input training document set. A training set is a set of documents that have already been assigned a category.

Perform the following steps to create the category rules.

  1. Create and populate a training document table

    Create and load a table of training documents. This example uses a simple set; three concern fast food and three concern computers.

    create table docs (
      doc_id number primary key,
      doc_text   clob);
    
    insert into docs values
    (1, 'MacTavishes is a fast-food chain specializing in burgers, fries and -
    shakes. Burgers are clearly their most important line.');
    insert into docs values
    (2, 'Burger Prince are an up-market chain of burger shops, who sell burgers -
    and fries in competition with the likes of MacTavishes.');
    insert into docs values
    (3, 'Shakes 2 Go are a new venture in the low-cost restaurant arena, 
    specializing in semi-liquid frozen fruit-flavored vegetable oil products.');
    insert into docs values
    (4, 'TCP/IP network engineers generally need to know about routers, 
    firewalls, hosts, patch cables networking etc');
    insert into docs values
    (5, 'Firewalls are used to protect a network from attack by remote hosts,
     generally across TCP/IP');
    
  2. Create category tables, category descriptions and ids

    ----------------------------------------------------------------------------
    
    -- Create category tables
    -- Note that "category_descriptions" isn't really needed for this demo -
    -- it just provides a descriptive name for the category numbers in
    -- doc_categories
    ----------------------------------------------------------------------------
    
    create table category_descriptions (
      cd_category    number,
      cd_description varchar2(80));
    
    create table doc_categories (
      dc_category    number,
      dc_doc_id      number,
      primary key (dc_category, dc_doc_id)) 
      organization index;
    
    -- descriptons for categories
    
    insert into category_descriptions values (1, 'fast food');
    insert into category_descriptions values (2, 'computer networking');
    
  3. Assign each document to a category

    In this case, the fast food documents all go into category 1, and the computer documents into category 2.

    insert into doc_categories values (1, 1);
    insert into doc_categories values (1, 2);
    insert into doc_categories values (1, 3);
    insert into doc_categories values (2, 4);
    insert into doc_categories values (2, 5);
    
  4. Create a CONTEXT index to be used by CTX_CLS.TRAIN

    Create an Oracle Text preference for the index. This enables us to experiment with the effects of turning themes on and off:

    exec ctx_ddl.create_preference('my_lex', 'basic_lexer');
    exec ctx_ddl.set_attribute    ('my_lex', 'index_themes', 'no');
    exec ctx_ddl.set_attribute    ('my_lex', 'index_text',   'yes');
    
    create index docsindex on docs(doc_text) indextype is ctxsys.context
    parameters ('lexer my_lex');
    
  5. Create the rules table

    Create the table that will be populated by the generated rules.

    create table rules(
      rule_cat_id     number,
      rule_text       varchar2(4000),
      rule_confidence number
    );
    
  6. Call CTX_CLS.TRAIN procedure to generate category rules

    Now call the CTX_CLS.TRAIN procedure to generate some rules. Note all the arguments are the names of tables, columns or indexes previously created in this example. The rules table now contains the rules, which you can view.

    begin
      ctx_cls.train(
        index_name => 'docsindex',
        docid      => 'doc_id',
        cattab     => 'doc_categories',
        catdocid   => 'dc_doc_id',
        catid      => 'dc_category',
        restab     => 'rules',
        rescatid   => 'rule_cat_id',
        resquery   => 'rule_text',
        resconfid  => 'rule_confidence'
      );
    end;
    /
    
  7. Fetch the generated rules, viewed by category

    Fetch the generated rules. For convenience's sake, the rules table is joined with category_descriptions so we can see to which category each rule applies:

    select cd_description, rule_confidence, rule_text from rules, 
    category_descriptions where cd_category = rule_cat_id;