An MDATA
section is used to reference user-defined metadata for a document. Using MDATA
sections can speed up mixed queries. There is no limit to the number of MDATA
sections that can be returned in a query.
Consider the case in which you want to query both according to text content and document type (magazine or newspaper or novel). You could create an index with a column for text and a column for the document type, and then perform a mixed query of this form—in this case, searching for all novels with the phrase Adam Thorpe (author of the novel Ulverton):
SELECT id FROM documents WHERE doctype = 'novel' AND CONTAINS(text, 'Adam Thorpe')>0;
However, it is usually faster to incorporate the attribute (in this case, the document type) into a field section, rather than use a separate column, and then use a single CONTAINS
query:
SELECT id FROM documents WHERE CONTAINS(text, 'Adam Thorpe AND novel WITHIN doctype')>0;
There are two drawbacks to this approach:
Each time the attribute is updated, the entire text document must be re-indexed, resulting in increased index fragmentation and slower rates of processing DML.
Field sections tokenize the section value. This has several effects. Special characters in metadata, such as decimal points or currency characters, are not easily searchable; value searching (searching for Thurston Howell but not Thurston Howell, Jr.) is difficult; multi-word values are queried by phrase, which is slower than single-token searching; and multi-word values do not show up in browse-words, making author browsing or subject browsing impossible.
For these reasons, using MDATA
sections instead of field sections may be worthwhile. MDATA
sections are indexed like field sections, but metadata values can be added to and removed from documents without the need to re-index the document text. Unlike field sections, MDATA
values are not tokenized. Additionally, MDATA
section indexing generally takes up less disk space than field section indexing.
Use CTX_DDL.ADD_MDATA_SECTION
to add an MDATA
section to a section group. This example adds an MDATA
section called AUTHOR
and gives it the value Soseki Natsume (author of the novel Kokoro).
ctx_ddl.create.section.group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_mdata_section('htmgroup', 'author', 'Soseki Natsume');
MDATA
values can be changed with CTX_DDL.ADD_MDATA
and removed with CTX_DDL.REMOVE_MDATA
. Also, MDATA
sections can have multiple values. Only the owner of the index is allowed to call CTX_DDL.ADD_MDATA
and CTX_DDL.REMOVE_MDATA
.
Neither CTX_DDL.ADD_MDATA
nor CTX_DDL.REMOVE_MDATA
are supported for CTXCAT
and CTXRULE
indexes.
MDATA
values are not passed through a lexer. Instead, all values undergo a simplified normalization as follows:
Leading and trailing whitespace on the value is removed.
The value is truncated to 64 bytes.
The value is indexed as a single value; if the value consists of multiple words, it is not broken up.
Case is preserved. If the document is dynamically generated, you can implement case-insensitivity by uppercasing MDATA
values and making sure to search only in uppercase.
After a document has had MDATA
metadata added to it, you can query for that metadata using the MDATA
CONTAINS
query operator:
SELECT id FROM documents WHERE CONTAINS(text, 'Tokyo and MDATA(author, Soseki Natsume)')>0;
This query will only be successful if an AUTHOR
tag has the exact value Soseki Natsume (after simplified tokenization). Soseki or Natsume Soseki will not work.
Other things to note about MDATA
:
MDATA
values are not highlightable, will not appear in the output of CTX_DOC.TOKENS
, and will not show up when FILTER PLAINTEXT
is enabled.
MDATA
sections must be unique within section groups. You cannot have an MDATA
section named FOO
and a zone or field section of the same name in the same section group.
Like field sections, MDATA
sections cannot overlap or nest. An MDATA
section is implicitly closed by the first tag encountered. For instance, in this example:
<AUTHOR>Dickens <B>Shelley</B> Keats</AUTHOR>
The <B>
tag closes the AUTHOR
MDATA
section; as a result, this document has an AUTHOR
of 'Dickens', but not of 'Shelley' or 'Keats'.
To prevent race conditions, each call to ADD_MDATA
and REMOVE_MDATA
locks out other calls on that rowid for that index for all values and sections. However, since ADD_MDATA
and REMOVE_MDATA
do not commit, it is possible for an application to deadlock when calling them both. It is the application's responsibility to prevent deadlocking.
The CONTAINS
query operators chapter of the Oracle Text Reference for information on the MDATA
operator
The CTX_DDL
package chapter of Oracle Text Reference for information on adding and removing MDATA
sections