Index the HTML files by creating a CONTEXT
index on the text column as follows. Because you are indexing HTML, this example uses the NULL_FILTER
preference type for no filtering and the HTML_SECTION_GROUP
type:
CREATE INDEX idx_docs ON docs(text) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('FILTER CTXSYS.NULL_FILTER SECTION GROUP CTXSYS.HTML_SECTION_GROUP');
Use the NULL_FILTER
, because you do not need to filter HTML documents during indexing. However, if you index PDF, Microsoft Word, or other formatted documents, then use the CTXSYS.AUTO_FILTER
(the default) as your FILTER
preference.
This example also uses the HTML_SECTION_GROUP
section group, which is recommended for indexing HTML documents. Using HTML_SECTION_GROUP
enables you to search within specific HTML tags and eliminates from the index unwanted markup such as font information.