Frequently Asked Questions About Indexing Performance

This section answers some of the frequently asked questions about indexing performance.

How long should indexing take?

Answer: Indexing text is a resource-intensive process. The speed of indexing will depend on the power of the hardware involved. Indexing speed depends on CPU and I/O capacity. Given sufficient I/O capacity to read in the original data and write out index entries, then CPU will be the limiting factor.

Tests with Intel x86 (Core 2 architecture, 2.5GHz) CPUs have shown that Oracle Text can index around 100GB of text per CPU core, per day. This would be expected to increase as CPU clock speeds increase and/or CPU architectures become more efficient.

Other factors such as your document format, location of your data, and the calls to user-defined datastores, filters, and lexers can have an impact on your indexing speed.

Which index memory settings should I use?

Answer: You can set your index memory with the system parameters DEFAULT_INDEX_MEMORY and MAX_INDEX_MEMORY. You can also set your index memory at run time with the CREATE INDEX memory parameter in the parameter string.

You should aim to set the DEFAULT_INDEX_MEMORY value as high as possible, without causing paging.

You can also improve Indexing performance by increasing the SORT_AREA_SIZE system parameter.

Oracle recommends that you use a large index memory setting. Large settings, even up to hundreds of megabytes, can improve the speed of indexing, and reduce the fragmentation of the final indexes. However, if you set the index memory setting too high, then memory paging can occur that will reduce indexing speed.

With parallel indexing, each stream requires its own index memory. When dealing with very large tables, you can tune your database system global area (SGA) differently for indexing and retrieval. For querying, you want to get as much information cached in the system global area's (SGA) block buffer cache as possible. So you should allocate a large amount of memory to the block buffer cache. But this will not make any difference to indexing, so you would be better off reducing the size of the SGA to make more room for a large index memory settings during indexing.

You set the size of SGA in your Oracle Database initialization file.

See Also:

How much disk overhead will indexing require?

Answer: The overhead, the amount of space needed for the index tables, varies between about 50% of the original text volume and 200%. Generally, the larger the total amount of text, the smaller the overhead, but many small records will use more overhead than fewer large records. Also, clean data (such as published text) will require less overhead than dirty data such as e-mails or discussion notes, because the dirty data is likely to include many unique words from mis-spellings and abbreviations.

A text-only index is smaller than a combined text and theme index. A prefix and substring index makes the index significantly larger.

How does the format of my data affect indexing?

Answer: You can expect much lower storage overhead for formatted documents such as Microsoft Word files because such documents tend to be very large compared to the actual text held in them. So 1GB of Word documents might only require 50MB of index space, whereas 1GB of plain text might require 500MB, because there is ten times as much plain text in the latter set.

Indexing time is less clear-cut. Although the reduction in the amount of text to be indexed will have an obvious effect, you must balance this out against the cost of filtering the documents with the AUTO_FILTER filter or other user-defined filters.

Can parallel indexing improve performance?

Answer: Parallel indexing can improve index performance when you have a large amount of data, and have multiple CPUs.

You use the PARALLEL keyword when creating the index:

CREATE INDEX index_name ON table_name (column_name) 
INDEXTYPE IS ctxsys.context PARAMETERS ('...') PARALLEL 3;

This will create the index with up to three separate indexing processes depending on your resources.

Parallel indexing can also be used to create local partitioned indexes on partitioned tables. However, indexing performance only improves when you have multiple CPUs.

Note:

Using PARALLEL to create a local partitioned index enables parallel queries. (Creating a non-partitioned index in parallel does not turn on parallel query processing.)

Parallel querying degrades query throughput especially on heavily loaded systems. Because of this, Oracle recommends that you disable parallel querying after parallel indexing. To do so, use ALTER INDEX NOPARALLEL.

How can I improve index performance for creating local partitioned index?

Answer: When you have multiple CPUs, you can improve indexing performance by creating a local index in parallel. There are two ways to index in parallel:

You can create a local partitioned index in parallel in two ways:

  • Use the PARALLEL clause with the LOCAL clause in CREATE INDEX.In this case, the maximum parallel degree is limited to the number of partitions you have.

  • Create an unusable index first, then run the DBMS_PCLXUTIL.BUILD_PART_INDEX utility. This method can result in a higher degree of parallelism, especially if you have more CPUs than partitions.

The following is an example for the second method. In this example, the base table has three partitions. We create a local partitioned unusable index first, the run the DBMS_PCLUTIL.BUILD_PART_INDEX, which builds the 3 partitions in parallel (inter-partition parallelism). Also inside each partition, index creation is done in parallel (intra-partition parallelism) with a parallel degree of 2.

create index tdrbip02bx on tdrbip02b(text) 
indextype is ctxsys.context local (partition tdrbip02bx1, 
                                   partition tdrbip02bx2, 
                                   partition tdrbip02bx3) 
unusable; 

exec dbms_pclxutil.build_part_index(3,2,'TDRBIP02B','TDRBIP02BX',TRUE); 

How can I tell how much indexing has completed?

Answer: You can use the CTX_OUTPUT.START_LOG procedure to log output from the indexing process. Filename will normally be written to $ORACLE_HOME/ctx/log, but you can change the directory using the LOG_DIRECTORY parameter in CTX_ADM.SET_PARAMETER.

See Also:

Oracle Text Reference to learn more about using this procedure