What Happens During Execution of a Data Pump Job?

Data Pump jobs use a master table, a master process, and worker processes to perform the work and keep track of progress.

Coordination of a Job

For every Data Pump Export job and Data Pump Import job, a master process is created. The master process controls the entire job, including communicating with the clients, creating and controlling a pool of worker processes, and performing logging operations.

Tracking Progress Within a Job

While the data and metadata are being transferred, a master table is used to track the progress within a job. The master table is implemented as a user table within the database. The specific function of the master table for export and import jobs is as follows:

  • For export jobs, the master table records the location of database objects within a dump file set. Export builds and maintains the master table for the duration of the job. At the end of an export job, the content of the master table is written to a file in the dump file set.

  • For import jobs, the master table is loaded from the dump file set and is used to control the sequence of operations for locating objects that need to be imported into the target database.

The master table is created in the schema of the current user performing the export or import operation. Therefore, that user must have the CREATE TABLE system privilege and a sufficient tablespace quota for creation of the master table. The name of the master table is the same as the name of the job that created it. Therefore, you cannot explicitly give a Data Pump job the same name as a preexisting table or view.

For all operations, the information in the master table is used to restart a job. (Note that transportable jobs are not restartable.)

The master table is either retained or dropped, depending on the circumstances, as follows:

  • Upon successful job completion, the master table is dropped. You can override this by setting the Data Pump KEEP_MASTER=YES parameter for the job.

  • The master table is automatically retained for jobs that do not complete successfully.

  • If a job is stopped using the STOP_JOB interactive command, then the master table is retained for use in restarting the job.

  • If a job is killed using the KILL_JOB interactive command, then the master table is dropped and the job cannot be restarted.

  • If a job terminates unexpectedly, then the master table is retained. You can delete it if you do not intend to restart the job.

  • If a job stops before it starts running (that is, before any database objects have been copied), then the master table is dropped.

See Also:

"JOB_NAME" for more information about how job names are formed

Filtering Data and Metadata During a Job

Within the master table, specific objects are assigned attributes such as name or owning schema. Objects also belong to a class of objects (such as TABLE, INDEX, or DIRECTORY). The class of an object is called its object type. You can use the EXCLUDE and INCLUDE parameters to restrict the types of objects that are exported and imported. The objects can be based upon the name of the object or the name of the schema that owns the object. You can also specify data-specific filters to restrict the rows that are exported and imported.

Transforming Metadata During a Job

When you are moving data from one database to another, it is often useful to perform transformations on the metadata for remapping storage between tablespaces or redefining the owner of a particular set of objects. This is done using the following Data Pump Import parameters: REMAP_DATAFILE, REMAP_SCHEMA, REMAP_TABLE,REMAP_TABLESPACE, TRANSFORM, and PARTITION_OPTIONS.

Maximizing Job Performance

Data Pump can employ multiple worker processes, running in parallel, to increase job performance. Use the PARALLEL parameter to set a degree of parallelism that takes maximum advantage of current conditions. For example, to limit the effect of a job on a production system, the database administrator (DBA) might want to restrict the parallelism. The degree of parallelism can be reset at any time during a job. For example, PARALLEL could be set to 2 during production hours to restrict a particular job to only two degrees of parallelism, and during nonproduction hours it could be reset to 8. The parallelism setting is enforced by the master process, which allocates work to be executed to worker processes that perform the data and metadata processing within an operation. These worker processes operate in parallel. For recommendations on setting the degree of parallelism, see the Export PARALLEL and Import PARALLEL parameter descriptions.

Note:

The ability to adjust the degree of parallelism is available only in the Enterprise Edition of Oracle Database.

Loading and Unloading of Data

The worker processes unload and load metadata and table data. During import they also rebuild indexes. Some of these operations may be done in parallel: unloading and loading table data, rebuilding indexes, and loading package bodies. All other operations are done serially. Worker processes are created as needed until the number of worker processes equals the value supplied for the PARALLEL command-line parameter. The number of active worker processes can be reset throughout the life of a job. Worker processes can be started on different nodes in an Oracle Real Application Clusters (Oracle RAC) environment.

Note:

The value of PARALLEL is restricted to 1 in the Standard Edition of Oracle Database.

When a worker process is assigned the task of loading or unloading a very large table or partition, it may choose to use the external tables access method to make maximum use of parallel execution. In such a case, the worker process becomes a parallel execution coordinator. The actual loading and unloading work is divided among some number of parallel I/O execution processes (sometimes called slaves) allocated from a pool of available processes in an Oracle RAC environment.

See Also: