Format and Character Set Columns

If your documents are of mixed formats or of mixed character sets, you can create the following additional columns:

  • A format column to record the format (TEXT or BINARY) to help filtering during indexing. You can also use the format column to ignore rows for indexing by setting the format column to IGNORE. This is useful for bypassing rows that contain data incompatible with text indexing such as images.

  • A character set column to record the document character set for each row.

When you create your index, you must specify the name of the format or character set column in the parameter clause of CREATE INDEX.

For all rows containing the keywords AUTO or AUTOMATIC in character set or language columns, Oracle Text will apply statistical techniques to determine the character set and language respectively of the documents and modify document indexing appropriately.