You can add your custom thesaurus to a branch in the existing knowledge base. The knowledge base is a hierarchical tree of concepts used for theme indexing, ABOUT
queries, and deriving themes for document services.
When you augment the existing knowledge base with your new thesaurus, you query with the ABOUT
operator which implicitly expands to synonyms and narrower terms. You do not query with the thesaurus operators.
To augment the existing knowledge base with your custom thesaurus, follow these steps:
Compiling your custom thesaurus with the existing knowledge base before indexing enables faster and simpler queries with the ABOUT
operator. Document services can also take full advantage of the customized information for creating theme summaries and Gists.
Use of the ABOUT
operator requires a theme component in the index, which requires slightly more disk space. You must also define the thesaurus before indexing your documents. If you make any change to the thesaurus, you must recompile your thesaurus and re-index your documents.
When adding terms to the knowledge base, Oracle recommends that new terms be linked to one of the categories in the knowledge base for best results in theme proving.
Oracle Text Reference for more information about the supplied English knowledge base
If new terms are kept completely separate from existing categories, fewer themes from new terms will be proven. The result of this is poor precision and recall with ABOUT
queries as well as poor quality of gists and theme highlighting.
You link new terms to existing terms by making an existing term the broader term for the new terms.
You purchase a medical thesaurus medthes
containing a a hierarchy of medical terms. The four top terms in the thesaurus are as follows:
Anesthesia and Analgesia
Anti-Allergic and Respiratory System Agents
Anti-Inflammatory Agents, Antirheumatic Agents, and Inflammation Mediators
Antineoplastic and Immunosuppressive Agents
To link these terms to the existing knowledge base, add the following entries to the medical thesaurus to map the new terms to the existing health and medicine branch:
health and medicine NT Anesthesia and Analgesia NT Anti-Allergic and Respiratory System Agents NT Anti-Inflamammatory Agents, Antirheumatic Agents, and Inflamation Mediators NT Antineoplastic and Immunosuppressive Agents
Assuming the medical thesaurus is in a file called med.thes
, you load the thesaurus as medthes
with ctxload
as follows:
ctxload -thes -thescase y -name medthes -file med.thes -user ctxsys
When you enter the ctxload
command line, you are prompted for the user password. For best security practices, never enter the password at the command line. Alternatively, you may omit the -user
and let ctxload
prompt you for username and password, respectively.
The following example creates a case-sensitive thesaurus named mythesaurus
and imports the thesaurus content present in myclob
into the Oracle Text thesaurus tables:
declare myclob clob; begin myclob := to_clob('peking SYN beijing BT capital country NT beijing tokyo'); ctx_thes.import_thesaurus(‘mythesaurus', myclob, ‘Y'); end;
The format of the thesaurus to be imported (myclob
in this example) should be the same as used by the ctxload
utility. If the format of the thesaurus to be imported is not correct, then IMPORT_THESAURUS
raises an exception.
To link the loaded thesaurus medthes
to the knowledge base, use ctxkbtc
as follows:
ctxkbtc -user ctxsys -name medthes
When you enter the ctxkbtc
command line, you are prompted for the user password. As with ctxload
, for best security practices, do not enter the password at the command line.