HTML has internal structure in the form of tagged text which you can use for section searching. For example, you can define a section called headings for the <H1>
tag. This enables you to search for terms only within these tags across your document set.
To query, you use the WITHIN
operator. Oracle Text returns all documents that contain your query term within the headings section. Thus, if you wanted to find all documents that contain the word oracle within headings, enter the following query:
'oracle within headings'
This section contains these topics:
The following code defines a section group called htmgroup
of type HTML_SECTION_GROUP
. It then creates a zone section in htmgroup
called heading
identified by the <H1> tag:
begin ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_zone_section('htmgroup', 'heading', 'H1'); end;
You can then index your documents as follows:
create index myindex on docs(htmlfile) indextype is ctxsys.context parameters('filter ctxsys.null_filter section group htmgroup');
After indexing with section group htmgroup
, you can query within the heading section by issuing a query as follows:
'Oracle WITHIN heading'
With HTML documents you can also create sections for NAME/CONTENT
pairs in <META> tags. When you do so you can limit your searches to text within CONTENT
.
Example: Creating Sections for <META>
Tags
Consider an HTML document that has a META
tag as follows:
<META NAME="author" CONTENT="ken">
To create a zone section that indexes all CONTENT
attributes for the META
tag whose NAME
value is author:
begin ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_zone_section('htmgroup', 'author', 'meta@author'); end
After indexing with section group htmgroup
, you can query the document as follows:
'ken WITHIN author'