Language-Specific Features

You can enable the following language-specific features at index time:

Indexing Themes

For English and French, you can index document theme information. A document theme is a concept that is sufficiently developed in the document. Themes can be queried with the ABOUT operator.

You can index theme information in other languages provided you have loaded and compiled a knowledge base for the language.

By default themes are indexed in English and French. You can enable and disable theme indexing with the index_themes attribute of the BASIC_LEXER preference type.

See Also:

Base-Letter Conversion for Characters with Diacritical Marks

Some languages contain characters with diacritical marks such as tildes, umlauts, and accents. When your indexing operation converts words containing diacritical marks to their base letter form, queries need not contain diacritical marks to score matches. For example, in Spanish with a base-letter index, a query of energía matches energía and energia in the index.

However, with base-letter indexing disabled, a query of energía matches only energía.

You can enable and disable base-letter indexing for your language with the base_letter attribute of the BASIC_LEXER preference type.

See Also:

Oracle Text Reference to learn more about the BASIC_LEXER

Alternate Spelling

Languages such as German, Danish, and Swedish contain words that have more than one accepted spelling. For instance, in German, ae can be substituted for ä. The ae character pair is known as the alternate form.

By default, Oracle Text indexes words in their alternate forms for these languages. Query terms are also converted to their alternate forms. The result is that these words can be queried with either spelling.

You can enable and disable alternate spelling for your language using the alternate_spelling attribute in the BASIC_LEXER preference type.

See Also:

Oracle Text Reference to learn more about the BASIC_LEXER

Composite Words

German and Dutch text contain composite words. By default, Oracle Text creates composite indexes for these languages. The result is that a query on a term returns words that contain the term as a sub-composite.

For example, in German, a query on the term Bahnhof (train station) returns documents that contain Bahnhof or any word containing Bahnhof as a sub-composite, such as Hauptbahnhof, Nordbahnhof, or Ostbahnhof.

You can enable and disable the creation of composite indexes with the composite attribute of the BASIC_LEXER preference.

See Also:

Oracle Text Reference to learn more about the BASIC_LEXER

Korean, Japanese, and Chinese Indexing

Index these languages with specific lexers:


Table 3-3 Lexers for Asian Languages

Language Lexer

Korean

KOREAN_MORPH_LEXER

Japanese

JAPANESE_LEXER, JAPANESE_VGRAM_LEXER

Chinese

CHINESE_LEXER,CHINESE_VGRAM_LEXER


These lexers have their own sets of attributes to control indexing.

See Also:

Oracle Text Reference to learn more about these lexers