You can enable the following language-specific features at index time:
For English and French, you can index document theme information. A document theme is a concept that is sufficiently developed in the document. Themes can be queried with the ABOUT
operator.
You can index theme information in other languages provided you have loaded and compiled a knowledge base for the language.
By default themes are indexed in English and French. You can enable and disable theme indexing with the index_themes
attribute of the BASIC_LEXER
preference type.
Oracle Text Reference to learn more about the BASIC_LEXER
Some languages contain characters with diacritical marks such as tildes, umlauts, and accents. When your indexing operation converts words containing diacritical marks to their base letter form, queries need not contain diacritical marks to score matches. For example, in Spanish with a base-letter index, a query of energía matches energía and energia in the index.
However, with base-letter indexing disabled, a query of energía matches only energía.
You can enable and disable base-letter indexing for your language with the base_letter
attribute of the BASIC_LEXER
preference type.
Oracle Text Reference to learn more about the BASIC_LEXER
Languages such as German, Danish, and Swedish contain words that have more than one accepted spelling. For instance, in German, ae can be substituted for ä. The ae character pair is known as the alternate form.
By default, Oracle Text indexes words in their alternate forms for these languages. Query terms are also converted to their alternate forms. The result is that these words can be queried with either spelling.
You can enable and disable alternate spelling for your language using the alternate_spelling
attribute in the BASIC_LEXER
preference type.
Oracle Text Reference to learn more about the BASIC_LEXER
German and Dutch text contain composite words. By default, Oracle Text creates composite indexes for these languages. The result is that a query on a term returns words that contain the term as a sub-composite.
For example, in German, a query on the term Bahnhof (train station) returns documents that contain Bahnhof or any word containing Bahnhof as a sub-composite, such as Hauptbahnhof, Nordbahnhof, or Ostbahnhof.
You can enable and disable the creation of composite indexes with the composite
attribute of the BASIC_LEXER
preference.
Oracle Text Reference to learn more about the BASIC_LEXER
Index these languages with specific lexers:
Table 3-3 Lexers for Asian Languages
Language | Lexer |
---|---|
Korean |
|
Japanese |
|
Chinese |
|
These lexers have their own sets of attributes to control indexing.
Oracle Text Reference to learn more about these lexers