The following sections describe the globalization support behavior of Data Pump Export and Import with respect to character set conversion of user data and data definition language (DDL).
The Export utility always exports user data, including Unicode data, in the character set of the export system. (Character sets are specified at database creation.) If the character set of the source database is different than the character set of the import database, then a single conversion is performed to automatically convert the data to the character set of the Import system.
If the export character set has a different sorting order than the import character set, then tables that are partitioned on character columns may yield unpredictable results. For example, consider the following table definition, which is produced on a database having an ASCII character set:
CREATE TABLE partlist ( part VARCHAR2(10), partno NUMBER(2) ) PARTITION BY RANGE (part) ( PARTITION part_low VALUES LESS THAN ('Z') TABLESPACE tbs_1, PARTITION part_mid VALUES LESS THAN ('z') TABLESPACE tbs_2, PARTITION part_high VALUES LESS THAN (MAXVALUE) TABLESPACE tbs_3 );
This partitioning scheme makes sense because z
comes after Z
in ASCII character sets.
When this table is imported into a database based upon an EBCDIC character set, all of the rows in the part_mid
partition will migrate to the part_low
partition because z
comes before Z
in EBCDIC character sets. To obtain the desired results, the owner of partlist
must repartition the table following the import.
The Export utility writes dump files using the database character set of the export system.
When the dump file is imported, a character set conversion is required for DDL only if the import system's database character set is different from the export system's database character set.
To minimize data loss due to character set conversions, ensure that the import database character set is a superset of the export database character set.
If the system on which the import occurs uses a 7-bit character set, and you import an 8-bit character set dump file, then some 8-bit characters may be converted to 7-bit equivalents. An indication that this has happened is when accented characters lose the accent mark.
To avoid this unwanted conversion, ensure that the export database and the import database use the same character set.
During character set conversion, any characters in the export file that have no equivalent in the import database character set are replaced with a default character. The import database character set defines the default character.
If the import system has to use replacement characters while converting DDL, then a warning message is displayed and the system attempts to load the converted DDL.
If the import system has to use replacement characters while converting user data, then the default behavior is to load the converted data. However, it is possible to instruct the import system to reject rows of user data that were converted using replacement characters. See the Import "DATA_OPTIONS" parameter for details.
To guarantee 100% conversion, the import database character set must be a superset (or equivalent) of the character set used to generate the export file.
When the database character set of the export system differs from that of the import system, the import system displays informational messages at the start of the job that show what the database character set is.
When the import database character set is not a superset of the character set used to generate the export file, the import system displays a warning that possible data loss may occur due to character set conversions.