Document Language

Oracle Text can index most languages. By default, Oracle Text assumes the language of text to index is the language you specify in your database setup. Depending on the language of your documents, use one of the following lexer types:

Use the BASIC_LEXER preference type to index whitespace-delimited languages such as English, French, German, and Spanish. For some of these languages, you can enable alternate spelling, composite word indexing, and base letter conversion.

Use the MULTI_LEXER preference type for indexing tables containing documents of different languages such as English, German, and Japanese.

Use the USER_LEXER preference type to create your own lexer for indexing a particular language.

Use the WORLD_LEXER preference type for indexing tables containing documents of different languages and to autodetect the languages in the document.

You can also use other lexer types that are designed specifically to tokenize and index Japanese, Chinese, and Korean.

See Also:

Oracle Text Reference to learn more about indexing languages and lexer types

Language Features Outside BASIC_LEXER

With the BASIC_LEXER, Japanese, Chinese and Korean lexers, Oracle Text provides a lexing solution for most languages. For other languages, you can create your own lexing solution using the user-defined lexer interface. This interface enables you to create a PL/SQL or Java procedure to process your documents during indexing and querying.

You can also use the user-defined lexer to create your own theme lexing solution or linguistic processing engine.

See Also:

Oracle Text Reference to learn more about the user-defined lexer

Indexing Multi-language Columns

Oracle Text can index text columns that contain documents of different languages, such as a column that contains documents written in English, German, and Japanese. To index a multi-language column, you need a language column in your text table. Use the MULTI_LEXER preference type.

You can also incorporate a multi-language stoplist when you index multi-language columns.