Augmenting Knowledge Base with Custom Thesaurus

You can add your custom thesaurus to a branch in the existing knowledge base. The knowledge base is a hierarchical tree of concepts used for theme indexing, ABOUT queries, and deriving themes for document services.

When you augment the existing knowledge base with your new thesaurus, you query with the ABOUT operator which implicitly expands to synonyms and narrower terms. You do not query with the thesaurus operators.

To augment the existing knowledge base with your custom thesaurus, follow these steps:

  1. Create your custom thesaurus, linking new terms to existing knowledge base terms. See "Defining Terms in a Thesaurus" and "Linking New Terms to Existing Terms".
  2. Load thesaurus by one of the following methods:
  3. Compile the loaded thesaurus with ctxkbtc compiler. Refer to "Compiling a Loaded Thesaurus".
  4. Index your documents. By default the system creates a theme component to your index.
  5. Use ABOUT operator to query. For example, to find all documents that are related to the term politics including any synonyms or narrower terms as defined in the knowledge base, enter the query:
    'about(politics)'
    

Advantage

Compiling your custom thesaurus with the existing knowledge base before indexing enables faster and simpler queries with the ABOUT operator. Document services can also take full advantage of the customized information for creating theme summaries and Gists.

Limitations

Use of the ABOUT operator requires a theme component in the index, which requires slightly more disk space. You must also define the thesaurus before indexing your documents. If you make any change to the thesaurus, you must recompile your thesaurus and re-index your documents.

Linking New Terms to Existing Terms

When adding terms to the knowledge base, Oracle recommends that new terms be linked to one of the categories in the knowledge base for best results in theme proving.

See Also:

Oracle Text Reference for more information about the supplied English knowledge base

If new terms are kept completely separate from existing categories, fewer themes from new terms will be proven. The result of this is poor precision and recall with ABOUT queries as well as poor quality of gists and theme highlighting.

You link new terms to existing terms by making an existing term the broader term for the new terms.

Example: Linking New Terms to Existing Terms

You purchase a medical thesaurus medthes containing a a hierarchy of medical terms. The four top terms in the thesaurus are as follows:

  • Anesthesia and Analgesia

  • Anti-Allergic and Respiratory System Agents

  • Anti-Inflammatory Agents, Antirheumatic Agents, and Inflammation Mediators

  • Antineoplastic and Immunosuppressive Agents

To link these terms to the existing knowledge base, add the following entries to the medical thesaurus to map the new terms to the existing health and medicine branch:

health and medicine
 NT Anesthesia and Analgesia
 NT Anti-Allergic and Respiratory System Agents
 NT Anti-Inflamammatory Agents, Antirheumatic Agents, and Inflamation Mediators
 NT Antineoplastic and Immunosuppressive Agents

Loading a Thesaurus with ctxload

Assuming the medical thesaurus is in a file called med.thes, you load the thesaurus as medthes with ctxload as follows:

ctxload -thes -thescase y -name medthes -file med.thes -user ctxsys

When you enter the ctxload command line, you are prompted for the user password. For best security practices, never enter the password at the command line. Alternatively, you may omit the -user and let ctxload prompt you for username and password, respectively.

Loading a Thesaurus with PL/SQL procedure CTX_THES.IMPORT_THESAURUS

The following example creates a case-sensitive thesaurus named mythesaurus and imports the thesaurus content present in myclob into the Oracle Text thesaurus tables:

declare 
 myclob clob; 
begin 
 myclob := to_clob('peking SYN beijing BT capital country NT beijing tokyo');
 ctx_thes.import_thesaurus(‘mythesaurus', myclob, ‘Y');
end;

The format of the thesaurus to be imported (myclob in this example) should be the same as used by the ctxload utility. If the format of the thesaurus to be imported is not correct, then IMPORT_THESAURUS raises an exception.

Compiling a Loaded Thesaurus

To link the loaded thesaurus medthes to the knowledge base, use ctxkbtc as follows:

ctxkbtc -user ctxsys -name medthes 

When you enter the ctxkbtc command line, you are prompted for the user password. As with ctxload, for best security practices, do not enter the password at the command line.

WARNING:

In order to ensure sound security practices, Oracle recommends that you enter the password for ctxload and ctxkbtc using the interactive mode, which prompts you for the user password. Oracle strongly recommends that you do not enter a password on the command line.