Parallel DML combined with the updatable join views facility provides an efficient solution for refreshing the tables of a data warehouse system. To refresh tables is to update them with the differential data generated from the OLTP production system.
In the following example, assume a refresh of a table named customers
that has columns c_key
, c_name
, and c_addr
. The differential data contains either new rows or rows that have been updated since the last refresh of the data warehouse. In this example, the updated data is shipped from the production system to the data warehouse system by means of ASCII files. These files must be loaded into a temporary table, named diff_customer
, before starting the refresh process. You can use SQL*Loader with both the parallel and direct options to efficiently perform this task. You can use the APPEND
hint when loading in parallel as well.
After diff_customer
is loaded, the refresh process can be started. It can be performed in two phases or by merging in parallel, as demonstrated in the following:
The following statement is a straightforward SQL implementation of the update using subqueries:
UPDATE customers SET(c_name, c_addr) = (SELECT c_name, c_addr FROM diff_customer WHERE diff_customer.c_key = customer.c_key) WHERE c_key IN(SELECT c_key FROM diff_customer);
Unfortunately, the two subqueries in this statement affect performance.
An alternative is to rewrite this query using updatable join views. To rewrite the query, you must first add a primary key constraint to the diff_customer
table to ensure that the modified columns map to a key-preserved table:
CREATE UNIQUE INDEX diff_pkey_ind ON diff_customer(c_key) PARALLEL NOLOGGING; ALTER TABLE diff_customer ADD PRIMARY KEY (c_key);
You can then update the customers
table with the following SQL statement:
UPDATE /*+ PARALLEL(cust_joinview) */ (SELECT /*+ PARALLEL(customers) PARALLEL(diff_customer) */ CUSTOMER.c_name AS c_name CUSTOMER.c_addr AS c_addr, diff_customer.c_name AS c_newname, diff_customer.c_addr AS c_newaddr FROM diff_customer WHERE customers.c_key = diff_customer.c_key) cust_joinview SET c_name = c_newname, c_addr = c_newaddr;
The underlying scans feeding the join view cust_joinview
are done in parallel. You can then parallelize the update to further improve performance, but only if the customers
table is partitioned.
The last phase of the refresh process consists of inserting the new rows from the diff_customer
temporary table to the customers
table. Unlike the update case, you cannot avoid having a subquery in the INSERT
statement:
INSERT /*+PARALLEL(customers)*/ INTO customers SELECT * FROM diff_customer s);
However, you can guarantee that the subquery is transformed into an anti-hash join by using the HASH_AJ
hint. Doing so enables you to use parallel INSERT
to execute the preceding statement efficiently. Parallel INSERT
is applicable even if the table is not partitioned.
You can combine update and insert operations into one statement, commonly known as a merge. The following statement achieves the same result as all of the statements in "Optimizing Performance for Updating the Table in Parallel" and "Efficiently Inserting the New Rows into the Table in Parallel":
MERGE INTO customers USING diff_customer ON (diff_customer.c_key = customer.c_key) WHEN MATCHED THEN UPDATE SET (c_name, c_addr) = (SELECT c_name, c_addr FROM diff_customer WHERE diff_customer.c_key = customers.c_key) WHEN NOT MATCHED THEN INSERT VALUES (diff_customer.c_key,diff_customer.c_data);