Document Services Procedures Performance and Forward Index

Oracle Text uses an inverted index while searching for a word in a document, and then displays the results by calculating the snippet from that document. For calculating the snippet, each document returned as part of the search result is re-indexed. The search operation slows down considerably when the size of a document is very large.

The forward index overcomes the performance problem of very large documents. The forward index uses a mapping table $O that refers to the token offsets in the inverted index table $I. Each token offset is translated into the character offset in the original document, and the text surrounding the character offset is then used to generate the text snippet.

As the forward index does not use in-memory indexing of the documents while calculating the snippet, it provides considerable performance improvement over the inverted index while searching for a word in very large size documents.

The forward index improves the performance of the following procedures of Oracle Text's CTX_DOC package:

  • CTX_DOC.SNIPPET

  • CTX_DOC.HIGHLIGHT

  • CTX_DOC.MARKUP

Enabling Forward Index

The following example enables the forward index feature by setting the forward_index attribute value of the BASIC_STORAGE storage type to TRUE.

exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE');
exec ctx_ddl.set_attribute('mystore','forward_index','TRUE');

Using Forward Index with Snippets

In some cases, when the forward_index option is used, snippets generated may be slightly different from the snippets generated when the forward_index option is not used. These differences are generally minimal and do not affect snippet quality. These differences are of the type "few extra white spaces" and "newline" as part of the snippet, and when using forward_index.

Using Forward Index with Save Copy

To use the forward index effectively, copies of all the documents should be stored in the $D table, either in the plain text format or the filtered format, depending upon the CTX_DOC package procedure you use. For example, store the document in plain text format when using the SNIPPET procedure, and store it in the filtered format when using the MARKUP or the HIGHLIGHT procedure.

You should use the Save Copy feature of Oracle Text to store the copies of the documents in the $D table. The Save Copy feature can be implemented in Oracle Text by either using the save_copy basic storage attribute or using the save_copy column index parameter.

  • Using the save_copy basic storage attribute:

    The following example sets the save_copy attribute value of the BASIC_STORAGE storage type to PLAINTEXT. This enables saving a copy of the text document into the $D table while searching for a word in that document.

    exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE');
    exec ctx_ddl.set_attribute('mystore','save_copy','PLAINTEXT');
     
    
  • Using the save_copy column index parameter:

    The following example uses the save_copy column index parameter to save the copy of a text document into the $D table.

    create table docs(
      id       number,
      txt      varchar2(64),
      save     varchar2(3)
    );
    
    insert into docs values(1, 'hello world', 'PLAINTEXT');
    
    create index idx on docs(txt) indextype is ctxsys.context
        parameters('save_copy column save');
    

    The create index statement creates the $D table and copies document 1, that is, "hello world", into the $D table.

Note:

You can specify one of the following values for the save_copy attribute/column parameter: PLAINTEXT, FILTERED, or NONE.

  • Specifying PLAINTEXT saves the copy of the document in a plain text format in the $D index table. The plain text format is defined as the output format of the sectioner. Specify this value when using the SNIPPET procedure.

  • Specifying FILTERED saves the copy of the document in a filtered format in the $D index table. The filtered format is defined as the output format of the filter. Specify this value when using the MARKUP procedure or the HIGHLIGHT procedure.

  • Specifying NONE does not save the copy of the document in the $D index table. Specify this value for any of the following scenarios:

    • when SNIPPET, MARKUP, or HIGHLIGHT procedure is not used.

    • when the indexed column is either VARCHAR2 or CLOB.

Using Forward Index without using Save Copy

There are certain scenarios where you can still take advantage of the forward index performance enhancement without saving the copies of all the documents in the $D table, that is, without using the Save Copy feature. These scenarios are as follows:

  • The set of documents contain HTML and plain text: All the documents should be stored in the base table using either the DIRECT_DATASTORE or the MULTI_COLUMN_DATASTORE datastore type.

  • The set of documents contain HTML, plain text, and binary: All the documents should be stored in the base table using the DIRECT_DATASTORE datastore type, and only the binary documents should be stored in the $D table in the filtered format.

Using Save Copy without using Forward Index

The Save Copy feature improves the performance of the following procedures of the CTX_DOC package, even if the forward index feature is not enabled:

  • CTX_DOC.FILTER

  • CTX_DOC.GIST

  • CTX_DOC.THEMES

  • CTX_DOC.TOKENS

See Also:

Oracle Text Reference for information about the forward_index parameter clause of the BASIC_STORAGE indexing type