Oracle Text uses an inverted index while searching for a word in a document, and then displays the results by calculating the snippet from that document. For calculating the snippet, each document returned as part of the search result is re-indexed. The search operation slows down considerably when the size of a document is very large.
The forward index overcomes the performance problem of very large documents. The forward index uses a mapping table $O
that refers to the token offsets in the inverted index table $I
. Each token offset is translated into the character offset in the original document, and the text surrounding the character offset is then used to generate the text snippet.
As the forward index does not use in-memory indexing of the documents while calculating the snippet, it provides considerable performance improvement over the inverted index while searching for a word in very large size documents.
The forward index improves the performance of the following procedures of Oracle Text's CTX_DOC
package:
CTX_DOC.SNIPPET
CTX_DOC.HIGHLIGHT
CTX_DOC.MARKUP
Enabling Forward Index
The following example enables the forward index feature by setting the forward_index
attribute value of the BASIC_STORAGE
storage type to TRUE
.
exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE'); exec ctx_ddl.set_attribute('mystore','forward_index','TRUE');
Using Forward Index with Snippets
In some cases, when the forward_index
option is used, snippets generated may be slightly different from the snippets generated when the forward_index
option is not used. These differences are generally minimal and do not affect snippet quality. These differences are of the type "few extra white spaces" and "newline" as part of the snippet, and when using forward_index
.
Using Forward Index with Save Copy
To use the forward index effectively, copies of all the documents should be stored in the $D
table, either in the plain text format or the filtered format, depending upon the CTX_DOC
package procedure you use. For example, store the document in plain text format when using the SNIPPET
procedure, and store it in the filtered format when using the MARKUP
or the HIGHLIGHT
procedure.
You should use the Save Copy feature of Oracle Text to store the copies of the documents in the $D
table. The Save Copy feature can be implemented in Oracle Text by either using the save_copy
basic storage attribute or using the save_copy
column index parameter.
Using the save_copy
basic storage attribute:
The following example sets the save_copy
attribute value of the BASIC_STORAGE
storage type to PLAINTEXT
. This enables saving a copy of the text document into the $D
table while searching for a word in that document.
exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE'); exec ctx_ddl.set_attribute('mystore','save_copy','PLAINTEXT');
Using the save_copy column
index parameter:
The following example uses the save_copy column
index parameter to save the copy of a text document into the $D
table.
create table docs( id number, txt varchar2(64), save varchar2(3) ); insert into docs values(1, 'hello world', 'PLAINTEXT'); create index idx on docs(txt) indextype is ctxsys.context parameters('save_copy column save');
The create index
statement creates the $D
table and copies document 1, that is, "hello world", into the $D
table.
You can specify one of the following values for the save_copy
attribute/column parameter: PLAINTEXT
, FILTERED
, or NONE
.
Specifying PLAINTEXT
saves the copy of the document in a plain text format in the $D
index table. The plain text format is defined as the output format of the sectioner. Specify this value when using the SNIPPET
procedure.
Specifying FILTERED
saves the copy of the document in a filtered format in the $D
index table. The filtered format is defined as the output format of the filter. Specify this value when using the MARKUP
procedure or the HIGHLIGHT
procedure.
Specifying NONE
does not save the copy of the document in the $D
index table. Specify this value for any of the following scenarios:
when SNIPPET
, MARKUP
, or HIGHLIGHT
procedure is not used.
when the indexed column is either VARCHAR2
or CLOB
.
Using Forward Index without using Save Copy
There are certain scenarios where you can still take advantage of the forward index performance enhancement without saving the copies of all the documents in the $D
table, that is, without using the Save Copy feature. These scenarios are as follows:
The set of documents contain HTML and plain text: All the documents should be stored in the base table using either the DIRECT_DATASTORE
or the MULTI_COLUMN_DATASTORE
datastore type.
The set of documents contain HTML, plain text, and binary: All the documents should be stored in the base table using the DIRECT_DATASTORE
datastore type, and only the binary documents should be stored in the $D
table in the filtered format.
Using Save Copy without using Forward Index
The Save Copy feature improves the performance of the following procedures of the CTX_DOC
package, even if the forward index feature is not enabled:
CTX_DOC.FILTER
CTX_DOC.GIST
CTX_DOC.THEMES
CTX_DOC.TOKENS
Oracle Text Reference for information about the forward_index
parameter clause of the BASIC_STORAGE
indexing type