Optimizing the Index

Frequent index synchronization ultimately causes fragmentation of your CONTEXT index. Index fragmentation can adversely affect query response time. You should, therefore, allow time to optimize your CONTEXT index in order to reduce fragmentation and index size and to ensure optimal query performance. To understand index optimization, you must understand the structure of the index and what happens when it is synchronized.

This section contains the following topics.

CONTEXT Index Structure

The CONTEXT index is an inverted index where each word contains the list of documents that contain that word. For example, after a single initial indexing operation, the word DOG might have an entry as follows:

DOG DOC1 DOC3 DOC5

Index Fragmentation

When new documents are added to the base table, the index is synchronized by adding new rows. Thus, if you add a new document (for example, DOC 7) with the word dog to the base table and synchronize the index, you now have:

DOG DOC1 DOC3 DOC5
DOG DOC7

Subsequent DML will also create new rows as follows:

DOG DOC1 DOC3 DOC5
DOG DOC7
DOG DOC9
DOG DOC11

Adding new documents and synchronizing the index causes index fragmentation. In particular, background DML, which synchronizes the index frequently, generally produces more fragmentation than synchronizing in batch mode.

Less frequent batch processing results in longer document lists, reducing the number of rows in the index and thus reducing fragmentation.

You can reduce index fragmentation by optimizing the index in either FULL or FAST mode with CTX_DDL.OPTIMIZE_INDEX.

Document Invalidation and Garbage Collection

When documents are removed from the base table, Oracle Text marks the document as removed but does not immediately alter the index.

Because the old information takes up space and can cause extra overhead at query time, you must remove the old information from the index by optimizing it in FULL mode. This is called garbage collection. Optimizing in FULL mode for garbage collection is necessary when you have frequent updates or deletes to the base table.

Single Token Optimization

In addition to optimizing the entire index, you can optimize single tokens. You can use token mode to optimize index tokens that are frequently searched, without spending time on optimizing tokens that are rarely referenced.

For example, you can specify that only the token DOG be optimized in the index, if you know that this token is updated and queried frequently.

An optimized token can improve query response time for the token.

To optimize an index in token mode, use CTX_DDL.OPTIMIZE_INDEX.

Viewing Index Fragmentation and Garbage Data

With the CTX_REPORT.INDEX_STATS procedure, you can create a statistical report on your index. The report includes information on optimal row fragmentation, a list of most fragmented tokens, and the amount of garbage data in your index. Although this report might take a long time to run for large indexes, it can help you decide whether to optimize your index.

See Also:

Oracle Text Reference to learn more about using the CTX_REPORT.INDEX_STATS procedure

Example: Optimizing the Index

To optimize an index, Oracle recommends that you use CTX_DDL.OPTIMIZE_INDEX.

See Also:

Oracle Text Reference for the CTX_DDL.OPTIMIZE_INDEX statement syntax and examples