Frequent index synchronization ultimately causes fragmentation of your CONTEXT
index. Index fragmentation can adversely affect query response time. You should, therefore, allow time to optimize your CONTEXT
index in order to reduce fragmentation and index size and to ensure optimal query performance. To understand index optimization, you must understand the structure of the index and what happens when it is synchronized.
This section contains the following topics.
The CONTEXT
index is an inverted index where each word contains the list of documents that contain that word. For example, after a single initial indexing operation, the word DOG might have an entry as follows:
DOG DOC1 DOC3 DOC5
When new documents are added to the base table, the index is synchronized by adding new rows. Thus, if you add a new document (for example, DOC 7) with the word dog to the base table and synchronize the index, you now have:
DOG DOC1 DOC3 DOC5 DOG DOC7
Subsequent DML will also create new rows as follows:
DOG DOC1 DOC3 DOC5 DOG DOC7 DOG DOC9 DOG DOC11
Adding new documents and synchronizing the index causes index fragmentation. In particular, background DML, which synchronizes the index frequently, generally produces more fragmentation than synchronizing in batch mode.
Less frequent batch processing results in longer document lists, reducing the number of rows in the index and thus reducing fragmentation.
You can reduce index fragmentation by optimizing the index in either FULL
or FAST
mode with CTX_DDL.OPTIMIZE_INDEX
.
When documents are removed from the base table, Oracle Text marks the document as removed but does not immediately alter the index.
Because the old information takes up space and can cause extra overhead at query time, you must remove the old information from the index by optimizing it in FULL
mode. This is called garbage collection. Optimizing in FULL
mode for garbage collection is necessary when you have frequent updates or deletes to the base table.
In addition to optimizing the entire index, you can optimize single tokens. You can use token mode to optimize index tokens that are frequently searched, without spending time on optimizing tokens that are rarely referenced.
For example, you can specify that only the token DOG be optimized in the index, if you know that this token is updated and queried frequently.
An optimized token can improve query response time for the token.
To optimize an index in token mode, use CTX_DDL.OPTIMIZE_INDEX
.
With the CTX_REPORT.INDEX_STATS
procedure, you can create a statistical report on your index. The report includes information on optimal row fragmentation, a list of most fragmented tokens, and the amount of garbage data in your index. Although this report might take a long time to run for large indexes, it can help you decide whether to optimize your index.
Oracle Text Reference to learn more about using the CTX_REPORT.INDEX_STATS
procedure
To optimize an index, Oracle recommends that you use CTX_DDL.OPTIMIZE_INDEX
.
Oracle Text Reference for the CTX_DDL.OPTIMIZE_INDEX
statement syntax and examples