The following example uses SVM-based classification. It uses essentially the same steps as the Decision Tree example. Some differences between the examples are as follows:
In this example, we set the SVM_CLASSIFIER
preference with CTX_DDL.CREATE_PREFERENCE
rather than setting it in CTX_CLS.TRAIN
. (You can do it either way.)
In this example, our category table includes category descriptions, unlike the category table in the Decision Tree example. (You can do it either way.)
CTX_CLS.TRAIN
takes fewer arguments than in the Decision Tree example, as rules are opaque to the user.
Perform the following steps to create a SVM-based supervised classification.
Create and populate the training document table
create table doc (id number primary key, text varchar2(2000)); insert into doc values(1,'1 2 3 4 5 6'); insert into doc values(2,'3 4 7 8 9 0'); insert into doc values(3,'a b c d e f'); insert into doc values(4,'g h i j k l m n o p q r'); insert into doc values(5,'g h i j k s t u v w x y z');
Create and populate the category table
create table testcategory ( doc_id number, cat_id number, cat_name varchar2(100) ); insert into testcategory values (1,1,'number'); insert into testcategory values (2,1,'number'); insert into testcategory values (3,2,'letter'); insert into testcategory values (4,2,'letter'); insert into testcategory values (5,2,'letter');
Create the CONTEXT index on the document table
In this case, we create the index without populating.
create index docx on doc(text) indextype is ctxsys.context parameters('nopopulate');
Set the SVM_CLASSIFIER
This can also be done in CTX.CLS_TRAIN
.
exec ctx_ddl.create_preference('my_classifier','SVM_CLASSIFIER'); exec ctx_ddl.set_attribute('my_classifier','MAX_FEATURES','100');
Create the result (rule) table
create table restab ( cat_id number, type number(3) not null, rule blob );
Perform the training
exec ctx_cls.train('docx', 'id','testcategory','doc_id','cat_id', 'restab','my_classifier');
Create a CTXRULE index on the rules table
exec ctx_ddl.create_preference('my_filter','NULL_FILTER'); create index restabx on restab (rule) indextype is ctxsys.ctxrule parameters ('filter my_filter classifier my_classifier');
Now we can classify two unknown documents:
select cat_id, match_score(1) from restab where matches(rule, '4 5 6',1)>50; select cat_id, match_score(1) from restab where matches(rule, 'f h j',1)>50; drop table doc; drop table testcategory; drop table restab; exec ctx_ddl.drop_preference('my_classifier'); exec ctx_ddl.drop_preference('my_filter');