SVM-Based Supervised Classification Example

The following example uses SVM-based classification. It uses essentially the same steps as the Decision Tree example. Some differences between the examples are as follows:

  • In this example, we set the SVM_CLASSIFIER preference with CTX_DDL.CREATE_PREFERENCE rather than setting it in CTX_CLS.TRAIN. (You can do it either way.)

  • In this example, our category table includes category descriptions, unlike the category table in the Decision Tree example. (You can do it either way.)

  • CTX_CLS.TRAIN takes fewer arguments than in the Decision Tree example, as rules are opaque to the user.

Perform the following steps to create a SVM-based supervised classification.

  1. Create and populate the training document table

    create table doc (id number primary key, text varchar2(2000));
    insert into doc values(1,'1 2 3 4 5 6');
    insert into doc values(2,'3 4 7 8 9 0');
    insert into doc values(3,'a b c d e f');
    insert into doc values(4,'g h i j k l m n o p q r');
    insert into doc values(5,'g h i j k s t u v w x y z');
    
  2. Create and populate the category table

    create table testcategory (
            doc_id number, 
            cat_id number, 
            cat_name varchar2(100)
             );
    insert into testcategory values (1,1,'number');
    insert into testcategory values (2,1,'number');
    insert into testcategory values (3,2,'letter');
    insert into testcategory values (4,2,'letter');
    insert into testcategory values (5,2,'letter');
    
  3. Create the CONTEXT index on the document table

    In this case, we create the index without populating.

    create index docx on doc(text) indextype is ctxsys.context 
           parameters('nopopulate');
    
  4. Set the SVM_CLASSIFIER

    This can also be done in CTX.CLS_TRAIN.

    exec ctx_ddl.create_preference('my_classifier','SVM_CLASSIFIER'); 
    exec ctx_ddl.set_attribute('my_classifier','MAX_FEATURES','100');
    
  5. Create the result (rule) table

    create table restab (
      cat_id number,
      type number(3) not null,
      rule blob
     );
    
  6. Perform the training

    exec ctx_cls.train('docx', 'id','testcategory','doc_id','cat_id',
         'restab','my_classifier');
    
  7. Create a CTXRULE index on the rules table

    exec ctx_ddl.create_preference('my_filter','NULL_FILTER');
    create index restabx on restab (rule) 
           indextype is ctxsys.ctxrule 
           parameters ('filter my_filter classifier my_classifier');
    

Now we can classify two unknown documents:

select cat_id, match_score(1) from restab 
       where matches(rule, '4 5 6',1)>50;

select cat_id, match_score(1) from restab 
       where matches(rule, 'f h j',1)>50;

drop table doc;
drop table testcategory;
drop table restab;
exec ctx_ddl.drop_preference('my_classifier');
exec ctx_ddl.drop_preference('my_filter');