13 Managing Optimizer Statistics: Advanced Topics

This chapter explains advanced concepts and tasks relating to optimizer statistics management, including extended statistics.

This chapter contains the following topics:

Controlling Dynamic Statistics

By default, when optimizer statistics are missing, stale, or insufficient, dynamic statistics automatically run recursive SQL during parsing to scan a small random sample of table blocks.

This section contains the following topics:

See Also:

"Dynamic Statistics"

About Dynamic Statistics Levels

The dynamic statistics level controls both when the database gathers dynamic statistics, and the size of the sample that the optimizer uses to gather the statistics. Set the dynamic statistics level using either the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter (dynamic statistics were called dynamic sampling in releases earlier than Oracle Database 12c) or a statement hint.

Dynamic statistics are enabled in the database by default. Table 13-1 describes the levels. The default level is 2.

Table 13-1 Dynamic Statistics Levels

Level	When the Optimizer Uses Dynamic Statistics	Sample Size (Blocks)
0	Do not use dynamic statistics	n/a
1	Use dynamic statistics for all tables that do not have statistics, but only if the following criteria are met: There is at least 1 nonpartitioned table in the query that does not have statistics. This table has no indexes. This table has more blocks than the number of blocks that would be used for dynamic statistics of this table.	32
2	Use dynamic statistics if at least one table in the statement has no statistics. This is the default setting.	64
3	Use dynamic statistics if any of the following conditions is true: The statement meets level 2 criteria. The statement has one or more expressions used in the `WHERE` clause predicates, for example, `WHERE SUBSTR(CUSTLASTNAME,1,3)`.	64
4	Use dynamic statistics if any of the following conditions is true: The statement meets level 3 criteria. The statement uses complex predicates (an `OR` or `AND` operator between multiple predicates on the same table).	64
5	Use dynamic statistics if the statement meets level 4 criteria.	128
6	Use dynamic statistics if the statement meets level 4 criteria.	256
7	Use dynamic statistics if the statement meets level 4 criteria.	512
8	Use dynamic statistics if the statement meets level 4 criteria.	1024
9	Use dynamic statistics if the statement meets level 4 criteria.	4086
10	Use dynamic statistics if the statement meets level 4 criteria.	All blocks
11	Use dynamic statistics automatically when the optimizer deems it necessary. The resulting statistics are persistent in the statistics repository, making them available to other queries.	Automatically determined

See Also:

Oracle Database Reference to learn about the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter

Setting Dynamic Statistics Levels Manually

When setting the level for dynamic statistics, the best practice is to use ALTER SESSION to set the value for the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter. Determining a systemwide setting that would be beneficial to all SQL statements can be difficult.

Assumptions

This tutorial assumes the following:

You want correct selectivity estimates for the following query, which has WHERE clause predicates on two correlated columns:
```
  SELECT *
  FROM   sh.customers
  WHERE  cust_city='Los Angeles'
  AND    cust_state_province='CA';
```
The preceding query uses serial processing.
The sh.customers table contains 932 rows that meet the conditions in the query.
You have gathered statistics on the sh.customers table.
You created an index on the cust_city and cust_state_province columns.
The OPTIMIZER_DYNAMIC_SAMPLING initialization parameter is set to the default level of 2.

To set the dynamic statistics level manually:

Connect SQL*Plus to the database with the appropriate privileges, and then explain the execution plan as follows:

EXPLAIN PLAN FOR
  SELECT *
  FROM   sh.customers
  WHERE  cust_city='Los Angeles'
  AND    cust_state_province='CA';

Query the plan as follows:

SET LINESIZE 130
SET PAGESIZE 0
SELECT * 
FROM   TABLE(DBMS_XPLAN.DISPLAY);

The output appears below (the example has been reformatted to fit on the page):

-------------------------------------------------------------------------------
|Id| Operation                   | Name             |Rows|Bytes|Cost | Time   |
-------------------------------------------------------------------------------
| 0| SELECT STATEMENT            |                   | 53| 9593|53(0)|00:00:01|
| 1|  TABLE ACCESS BY INDEX ROWID|CUSTOMERS          | 53| 9593|53(0)|00:00:01|
|*2|   INDEX RANGE SCAN          |CUST_CITY_STATE_IND| 53| 9593| 3(0)|00:00:01|
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("CUST_CITY"='Los Angeles' AND "CUST_STATE_PROVINCE"='CA')

The columns in the WHERE clause have a real-world correlation, but the optimizer is not aware that Los Angeles is in California and assumes both predicates reduce the number of rows returned. Thus, the table contains 932 rows that meet the conditions, but the optimizer estimates 53, as shown in bold.

If the database had used dynamic statistics for this plan, then the Note section of the plan output would have indicated this fact. The optimizer did not use dynamic statistics because the statement executed serially, standard statistics exist, and the parameter OPTIMIZER_DYNAMIC_SAMPLING is set to the default of 2.

Set the dynamic statistics level to 4 in the session using the following statement:
```
ALTER SESSION SET OPTIMIZER_DYNAMIC_SAMPLING=4;
```

Explain the plan again:

EXPLAIN PLAN FOR
  SELECT *
  FROM   sh.customers
  WHERE  cust_city='Los Angeles'
  AND    cust_state_province='CA';

The new plan shows a more accurate estimate of the number of rows, as shown by the value 932 in bold:

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 2008213504
 
---------------------------------------------------------------------------
| Id  | Operation         | Name      |Rows | Bytes |Cost (%CPU)|Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           | 932 |   271K|   406 (1)| 00:00:05 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS | 932 |   271K|   406 (1)| 00:00:05 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("CUST_CITY"='Los Angeles' AND "CUST_STATE_PROVINCE"='CA')
 
Note
-----
   - dynamic statistics used for this statement (level=4)

The note at the bottom of the plan indicates that the sampling level is 4. The additional dynamic statistics made the optimizer aware of the real-world relationship between the cust_city and cust_state_province columns, thereby enabling it to produce a more accurate estimate for the number of rows: 932 rather than 53.

See Also:

Oracle Database SQL Language Reference to learn about setting sampling levels with the DYNAMIC_SAMPLING hint
Oracle Database PL/SQL Packages and Types Reference to learn about the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter

Disabling Dynamic Statistics

In general, the best practice is not to incur the cost of dynamic statistics for queries whose compile times must be as fast as possible, for example, unrepeated OLTP queries. You can disable the feature by setting the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter.

To disable dynamic statistics at the session level:

Connect SQL*Plus to the database with the appropriate privileges.
Set the dynamic statistics level to 0.

For example, run the following statement:
```
ALTER SESSION SET OPTIMIZER_DYNAMIC_SAMPLING=0;
```

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter

Publishing Pending Optimizer Statistics

By default, the database automatically publishes statistics when the statistics collection ends. Alternatively, you can use pending statistics to save the statistics and not publish them immediately after the collection. This technique is useful for testing queries in a session with pending statistics. When the test results are satisfactory, you can publish the statistics to make them available for the entire database.

About Pending Optimizer Statistics

The database stores pending statistics in the data dictionary just as for published statistics. By default, the optimizer uses published statistics. You can change the default behavior by setting the OPTIMIZER_USE_PENDING_STATISTICS initialization parameter to true (the default is false).

The top part of Figure 13-1 shows the optimizer gathering statistics for the sh.customers table and storing them in the data dictionary with pending status. The bottom part of the diagram shows the optimizer using only published statistics to process a query of sh.customers.

Figure 13-1 Published and Pending Statistics

Description of "Figure 13-1 Published and Pending Statistics"

In some cases, the optimizer can use a combination of published and pending statistics. For example, the database stores both published and pending statistics for the customers table. For the orders table, the database stores only published statistics. If OPTIMIZER_USE_PENDING_STATS = true, then the optimizer uses pending statistics for customers and published statistics for orders. If OPTIMIZER_USE_PENDING_STATS = false, then the optimizer uses published statistics for customers and orders.

See Also:

Oracle Database Reference to learn about the OPTIMIZER_USE_PENDING_STATISTICS initialization parameter

User Interfaces for Publishing Optimizer Statistics

You can use the DBMS_STATS package to perform operations relating to publishing statistics. Table 13-2 lists the relevant program units.

Table 13-2 DBMS_STATS Program Units Relevant for Publishing Optimizer Statistics

Program Unit	Description
`GET_PREFS`	Check whether the statistics are automatically published as soon as `DBMS_STATS` gathers them. For the parameter `PUBLISH`, `true` indicates that the statistics must be published when the database gathers them, whereas `false` indicates that the database must keep the statistics pending.
`SET_TABLE_PREFS`	Set the `PUBLISH` setting to `true` or `false` at the table level.
`SET_SCHEMA_PREFS`	Set the `PUBLISH` setting to `true` or `false` at the schema level.
`PUBLISH_PENDING_STATS`	Publish valid pending statistics for all objects or only specified objects.
`DELETE_PENDING_STATS`	Delete pending statistics.
`EXPORT_PENDING_STATS`	Export pending statistics.

The initialization parameter OPTIMIZER_USE_PENDING_STATISTICS determines whether the database uses pending statistics when they are available. The default value is false, which means that the optimizer uses only published statistics. Set to true to specify that the optimizer uses any existing pending statistics instead. The best practice is to set this parameter at the session level rather than at the database level.

You can use access information about published statistics from data dictionary views. Table 13-3 lists relevant views.

Table 13-3 Views Relevant for Publishing Optimizer Statistics

View	Description
`USER_TAB_STATISTICS`	Displays optimizer statistics for the tables accessible to the current user.
`USER_TAB_COL_STATISTICS`	Displays column statistics and histogram information extracted from `ALL_TAB_COLUMNS`.
`USER_PART_COL_STATISTICS`	Displays column statistics and histogram information for the table partitions owned by the current user.
`USER_SUBPART_COL_STATISTICS`	Describes column statistics and histogram information for subpartitions of partitioned objects owned by the current user.
`USER_IND_STATISTICS`	Displays optimizer statistics for the indexes accessible to the current user.
`USER_TAB_PENDING_STATS`	Describes pending statistics for tables, partitions, and subpartitions accessible to the current user.
`USER_COL_PENDING_STATS`	Describes the pending statistics of the columns accessible to the current user.
`USER_IND_PENDING_STATS`	Describes the pending statistics for tables, partitions, and subpartitions accessible to the current user collected using the `DBMS_STATS` package.

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS package
Oracle Database Reference to learn about USER_TAB_PENDING_STATS and related views

Managing Published and Pending Statistics

This section explains how to use DBMS_STATS program units to change the publishing behavior of optimizer statistics, and also to export and delete these statistics.

Assumptions

This tutorial assumes the following:

You want to change the preferences for the sh.customers and sh.sales tables so that newly collected statistics have pending status.
You want the current session to use pending statistics.
You want to gather and publish pending statistics on the sh.customers table.
You gather the pending statistics on the sh.sales table, but decide to delete them without publishing them.
You want to change the preferences for the sh.customers and sh.sales tables so that newly collected statistics are published.

To manage published and pending statistics:

Start SQL*Plus and connect to the database as user sh.
Query the global optimizer statistics publishing setting.

Run the following query (sample output included):
```
sh@PROD> SELECT DBMS_STATS.GET_PREFS('PUBLISH') PUBLISH FROM DUAL;

PUBLISH
-------
TRUE
```
The value true indicates that the database publishes statistics as it gathers them. Every table uses this value unless a specific table preference has been set.

When using GET_PREFS, you can also specify a schema and table name. The function returns a table preference if it is set. Otherwise, the function returns the global preference.
Query the pending statistics.

For example, run the following query (sample output included):
```
sh@PROD> SELECT * FROM USER_TAB_PENDING_STATS;
 
no rows selected
```
This example shows that the database currently stores no pending statistics for the sh schema.
Change the publishing preferences for the sh.customers table.

For example, execute the following procedure so that statistics are marked as pending:
```
BEGIN
  DBMS_STATS.SET_TABLE_PREFS('sh', 'customers', 'publish', 'false');
END;
/
```
Subsequently, when you gather statistics on the customers table, the database does not automatically publish statistics when the gather job completes. Instead, the database stores the newly gathered statistics in the USER_TAB_PENDING_STATS table.
Gather statistics for sh.customers.

For example, run the following program:
```
BEGIN
  DBMS_STATS.GATHER_TABLE_STATS('sh','customers');
END;
/
```

Query the pending statistics.

For example, run the following query (sample output included):

sh@PROD> SELECT TABLE_NAME, NUM_ROWS FROM USER_TAB_PENDING_STATS;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
CUSTOMERS                           55500

This example shows that the database now stores pending statistics for the sh.customers table.

Instruct the optimizer to use the pending statistics in this session.

Set the initialization parameter OPTIMIZER_USE_PENDING_STATISTICS to true as shown:
```
ALTER SESSION SET OPTIMIZER_USE_PENDING_STATISTICS = true;
```
Run a workload.

The following example changes the email addresses of all customers named Bruce Chalmers:
```
UPDATE  sh.customers 
  SET   cust_email='ChalmersB@company.com' 
  WHERE cust_first_name = 'Bruce' 
  AND   cust_last_name = 'Chalmers';
COMMIT;
```
The optimizer uses the pending statistics instead of the published statistics when compiling all SQL statements in this session.
Publish the pending statistics for sh.customers.

For example, execute the following program:
```
BEGIN
  DBMS_STATS.PUBLISH_PENDING_STATS('SH','CUSTOMERS');
END;
/
```
Change the publishing preferences for the sh.sales table.

For example, execute the following program:
```
BEGIN
  DBMS_STATS.SET_TABLE_PREFS('sh', 'sales', 'publish', 'false');
END;
/
```
Subsequently, when you gather statistics on the sh.sales table, the database does not automatically publish statistics when the gather job completes. Instead, the database stores the statistics in the USER_TAB_PENDING_STATS table.
Gather statistics for sh.sales.

For example, run the following program:
```
BEGIN
  DBMS_STATS.GATHER_TABLE_STATS('sh','sales');
END;
/
```
Delete the pending statistics for sh.sales.

Assume you change your mind and now want to delete pending statistics for sh.sales. Run the following program:
```
BEGIN
  DBMS_STATS.DELETE_PENDING_STATS('sh','sales');
END;
/
```
Change the publishing preferences for the sh.customers and sh.sales tables back to their default setting.

For example, execute the following program:
```
BEGIN
  DBMS_STATS.SET_TABLE_PREFS('sh', 'customers', 'publish', null);
  DBMS_STATS.SET_TABLE_PREFS('sh', 'sales', 'publish', null);
END;
/
```

Managing Extended Statistics

DBMS_STATS enables you to collect extended statistics, which are statistics that can improve cardinality estimates when multiple predicates exist on different columns of a table, or when predicates use expressions. An extension is either a column group or an expression.

Oracle Database supports the following types of extended statistics:

Column group statistics

This type of extended statistics can improve cardinality estimates when multiple columns from the same table occur together in a SQL statement. An example might be a car make and model, or a city and state. See "Managing Column Group Statistics".
Expression statistics

This type of extended statistics improves optimizer estimates when predicates use expressions, for example, built-in or user-defined functions. An example might be the UPPER function applied to an employee last name. See "Managing Expression Statistics".

Note:

You cannot create extended statistics on virtual columns. See Oracle Database SQL Language Reference for a list of restrictions on virtual columns.

Managing Column Group Statistics

A column group is a set of columns that is treated as a unit. Essentially, a column group is a virtual column. By gathering statistics on a column group, the optimizer can more accurately determine the cardinality estimate when a query groups these columns together.

The following sections provide an overview of column group statistics, and explain how to manage them manually:

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS package

About Statistics on Column Groups

Individual column statistics are useful for determining the selectivity of a single predicate in a WHERE clause. However, when the WHERE clause includes multiple predicates on different columns from the same table, individual column statistics do not show the relationship between the columns. This is the problem solved by a column group.

The optimizer calculates the selectivity of the predicates independently, and then combines them. However, if a correlation between the individual columns exists, then the optimizer cannot take it into account when determining a cardinality estimate, which it creates by multiplying the selectivity of each table predicate by the number of rows.

The following graphic contrasts two ways of gathering statistics on the cust_state_province and country_id columns of the sh.customers table. The diagram shows DBMS_STATS collecting statistics on each column individually and on the group. The column group has a system-generated name.

Figure 13-2 Column Group Statistics

Description of "Figure 13-2 Column Group Statistics"

Note:

The optimizer uses column group statistics for equality predicates, inlist predicates, and for estimating the GROUP BY cardinality.

Why Column Group Statistics Are Needed: Example

The following query of the DBA_TAB_COL_STATISTICS table shows information about statistics that have been gathered on the columns cust_state_province and country_id from the sh.customers table:

COL COLUMN_NAME FORMAT a20
COL NDV FORMAT 999

SELECT COLUMN_NAME, NUM_DISTINCT AS "NDV", HISTOGRAM
FROM   DBA_TAB_COL_STATISTICS
WHERE  OWNER = 'SH'
AND    TABLE_NAME = 'CUSTOMERS'
AND    COLUMN_NAME IN ('CUST_STATE_PROVINCE', 'COUNTRY_ID');

Sample output is as follows:

COLUMN_NAME                 NDV HISTOGRAM
-------------------- ---------- ---------------
CUST_STATE_PROVINCE         145 FREQUENCY
COUNTRY_ID                   19 FREQUENCY

As shown in the following query, 3341 customers reside in California:

SELECT COUNT(*)
FROM   sh.customers 
WHERE  cust_state_province = 'CA';

 COUNT(*)
----------
    3341

Consider an explain plan for a query of customers in the state CA and in the country with ID 52790 (USA):

EXPLAIN PLAN FOR
  SELECT *
  FROM   sh.customers
  WHERE  cust_state_province = 'CA'
  AND    country_id=52790;
 
Explained.
 
sys@PROD> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1683234692
 
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   128 | 24192 |   442   (7)| 00:00:06 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |   128 | 24192 |   442   (7)| 00:00:06 |
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
 
   1 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)
 
13 rows selected.

Based on the single-column statistics for the country_id and cust_state_province columns, the optimizer estimates that the query of California customers in the USA will return 128 rows. In fact, 3341 customers reside in California, but the optimizer does not know that California is in the USA, and so greatly underestimates cardinality by assuming that both predicates reduce the number of returned rows.

You can make the optimizer aware of the real-world relationship between values in country_id and cust_state_province by gathering column group statistics. These statistics enable the optimizer to give a more accurate cardinality estimate.

See Also:

User Interface for Column Group Statistics

Table 13-4 lists the DBMS_STATS program units that are relevant for detecting and creating column groups.

Table 13-4 DBMS_STATS Column Group Program Units

Program Unit Description

Program Unit	Description
`SEED_COL_USAGE`	Iterates over the SQL statements in the specified workload, compiles them, and then seeds column usage information for the columns that appear in these statements. To determine the appropriate column groups, the database must observe a representative workload. You do not need to run the queries themselves during the monitoring period. Instead, you can run `EXPLAIN PLAN` for some longer-running queries in your workload to ensure that the database is recording column group information for these queries.
`REPORT_COL_USAGE`	Generates a report that lists the columns that were seen in filter predicates, join predicates, and `GROUP BY` clauses in the workload. You can use this function to review column usage information recorded for a specific table.
`CREATE_EXTENDED_STATS`	Creates extensions, which are either column groups or expressions. The database gathers statistics for the extension when either a user-generated or automatic statistics gathering job gathers statistics for the table.

SEED_COL_USAGE

Iterates over the SQL statements in the specified workload, compiles them, and then seeds column usage information for the columns that appear in these statements.

To determine the appropriate column groups, the database must observe a representative workload. You do not need to run the queries themselves during the monitoring period. Instead, you can run EXPLAIN PLAN for some longer-running queries in your workload to ensure that the database is recording column group information for these queries.

REPORT_COL_USAGE

Generates a report that lists the columns that were seen in filter predicates, join predicates, and GROUP BY clauses in the workload.

You can use this function to review column usage information recorded for a specific table.

CREATE_EXTENDED_STATS

Creates extensions, which are either column groups or expressions. The database gathers statistics for the extension when either a user-generated or automatic statistics gathering job gathers statistics for the table.

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS package

Detecting Useful Column Groups for a Specific Workload

You can use DBMS_STATS.SEED_COL_USAGE and REPORT_COL_USAGE to determine which column groups are required for a table based on a specified workload. This technique is useful when you do not know which extended statistics to create. This technique does not work for expression statistics.

Note:

You can seed column usage from a SQL tuning set (see Managing SQL Tuning Sets ).

Assumptions

This tutorial assumes the following:

Cardinality estimates have been incorrect for queries of the sh.customers_test table (created from the customers table) that use predicates referencing the columns country_id and cust_state_province.
You want the database to monitor your workload for 5 minutes (300 seconds).
You want the database to determine which column groups are needed automatically.

To detect column groups:

Connect SQL*Plus to the database as user sh, and then create the customers_test table and gather statistics for it:

CONNECT SH/SH
DROP TABLE customers_test;
CREATE TABLE customers_test AS SELECT * FROM customer;
EXEC DBMS_STATS.GATHER_TABLE_STATS(user, 'customers_test');

Enable workload monitoring.

In a different SQL*Plus session, connect as SYS and run the following PL/SQL program to enable monitoring for 300 seconds:
```
BEGIN
  DBMS_STATS.SEED_COL_USAGE(null,null,300);
END;
/
```

As user sh, run explain plans for two queries in the workload.

The following examples show the explain plans for two queries on the customers_test table:

EXPLAIN PLAN FOR
  SELECT *
  FROM   customers_test
  WHERE  cust_city = 'Los Angeles'
  AND    cust_state_province = 'CA'
  AND    country_id = 52790;
 
SELECT PLAN_TABLE_OUTPUT 
FROM   TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));
 
EXPLAIN PLAN FOR
  SELECT   country_id, cust_state_province, count(cust_city)
  FROM     customers_test
  GROUP BY country_id, cust_state_province;
 
SELECT PLAN_TABLE_OUTPUT 
FROM   TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));

Sample output appears below:

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 4115398853
 
----------------------------------------------------
| Id  | Operation         | Name           | Rows  |
----------------------------------------------------
|   0 | SELECT STATEMENT  |                |     1 |
|   1 |  TABLE ACCESS FULL| CUSTOMERS_TEST |     1 |
----------------------------------------------------
 
8 rows selected.
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 3050654408
 
-----------------------------------------------------
| Id  | Operation          | Name           | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT   |                |  1949 |
|   1 |  HASH GROUP BY     |                |  1949 |
|   2 |   TABLE ACCESS FULL| CUSTOMERS_TEST | 55500 |
-----------------------------------------------------
 
9 rows selected.

The first plan shows a cardinality of 1 row for a query that returns 932 rows. The second plan shows a cardinality of 1949 rows for a query that returns 145 rows.

Optionally, review the column usage information recorded for the table.

Call the DBMS_STATS.REPORT_COL_USAGE function to generate a report:

SET LONG 100000
SET LINES 120
SET PAGES 0
SELECT DBMS_STATS.REPORT_COL_USAGE(user, 'customers_test')
FROM   DUAL;

The report appears below:

LEGEND:
.......
 
EQ         : Used in single table EQuality predicate
RANGE      : Used in single table RANGE predicate
LIKE       : Used in single table LIKE predicate
NULL       : Used in single table is (not) NULL predicate
EQ_JOIN    : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER     : Used in single table FILTER predicate
JOIN       : Used in JOIN predicate
GROUP_BY   : Used in GROUP BY expression
...........................................................................
 
###########################################################################
 
COLUMN USAGE REPORT FOR SH.CUSTOMERS_TEST
.........................................
 
1. COUNTRY_ID                          : EQ
2. CUST_CITY                           : EQ
3. CUST_STATE_PROVINCE                 : EQ
4. (CUST_CITY, CUST_STATE_PROVINCE,
    COUNTRY_ID)                        : FILTER
5. (CUST_STATE_PROVINCE, COUNTRY_ID)   : GROUP_BY
###########################################################################

In the preceding report, the first three columns were used in equality predicates in the first monitored query:

...
WHERE  cust_city = 'Los Angeles'
AND    cust_state_province = 'CA'
AND    country_id = 52790;

All three columns appeared in the same WHERE clause, so the report shows them as a group filter. In the second query, two columns appeared in the GROUP BY clause, so the report labels them as GROUP_BY. The sets of columns in the FILTER and GROUP_BY report are candidates for column groups.

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS package

Creating Column Groups Detected During Workload Monitoring

As explained in Table 13-4, you can use the DBMS_STATS.CREATE_EXTENDED_STATS function to create column groups that were detected previously by executing DBMS_STATS.SEED_COL_USAGE.

Assumptions

This tutorial assumes that you have performed the steps in "Detecting Useful Column Groups for a Specific Workload".

To create column groups:

Create column groups for the customers_test table based on the usage information captured during the monitoring window.

For example, run the following query:

SELECT DBMS_STATS.CREATE_EXTENDED_STATS(user, 'customers_test') FROM DUAL;

Sample output appears below:

###########################################################################
EXTENSIONS FOR SH.CUSTOMERS_TEST
................................
1. (CUST_CITY, CUST_STATE_PROVINCE,
    COUNTRY_ID)                     :SYS_STUMZ$C3AIHLPBROI#SKA58H_N created
2. (CUST_STATE_PROVINCE, COUNTRY_ID):SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ created
###########################################################################

The database created two column groups for customers_test: one column group for the filter predicate and one group for the GROUP BY operation.

Regather table statistics.

Run GATHER_TABLE_STATS to regather the statistics for customers_test:
```
EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'customers_test');
```
As user sh, run explain plans for two queries in the workload.

Check the USER_TAB_COL_STATISTICS view to determine which additional statistics were created by the database:
```
SELECT COLUMN_NAME, NUM_DISTINCT, HISTOGRAM
FROM   USER_TAB_COL_STATISTICS
WHERE  TABLE_NAME = 'CUSTOMERS_TEST'
ORDER BY 1;
```
Partial sample output appears below:
```
CUST_CITY                               620 HEIGHT BALANCED
...
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_          145 NONE
SYS_STUMZ$C3AIHLPBROI#SKA58H_N          620 HEIGHT BALANCED
```
This example shows the two column group names returned from the DBMS_STATS.CREATE_EXTENDED_STATS function. The column group created on CUST_CITY, CUST_STATE_PROVINCE, and COUNTRY_ID has a height-balanced histogram.

Explain the plans again.

The following examples show the explain plans for two queries on the customers_test table:

EXPLAIN PLAN FOR
  SELECT *
  FROM   customers_test
  WHERE  cust_city = 'Los Angeles'
  AND    cust_state_province = 'CA'
  AND    country_id = 52790;
 
SELECT PLAN_TABLE_OUTPUT 
FROM   TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));
 
EXPLAIN PLAN FOR
  SELECT   country_id, cust_state_province, count(cust_city)
  FROM     customers_test
  GROUP BY country_id, cust_state_province;
 
SELECT PLAN_TABLE_OUTPUT 
FROM   TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));

The new plans show more accurate cardinality estimates:

----------------------------------------------------
| Id  | Operation         | Name           | Rows  |
----------------------------------------------------
|   0 | SELECT STATEMENT  |                |  1093 |
|   1 |  TABLE ACCESS FULL| CUSTOMERS_TEST |  1093 |
----------------------------------------------------
 
8 rows selected.
 
Plan hash value: 3050654408
 
-----------------------------------------------------
| Id  | Operation          | Name           | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT   |                |   145 |
|   1 |  HASH GROUP BY     |                |   145 |
|   2 |   TABLE ACCESS FULL| CUSTOMERS_TEST | 55500 |
-----------------------------------------------------
9 rows selected.

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS package

Creating and Gathering Statistics on Column Groups Manually

In some cases, you may know the column group that you want to create. The METHOD_OPT argument of the DBMS_STATS.GATHER_TABLE_STATS function can create and gather statistics on a column group automatically. You can create a new column group by specifying the group of columns using FOR COLUMNS.

Assumptions

This tutorial assumes the following:

You want to create a column group for the cust_state_province and country_id columns in the customers table in sh schema.
You want to gather statistics (including histograms) on the entire table and the new column group.

To create a column group and gather statistics for this group:

Start SQL*Plus and connect to the database as the sh user.

Create the column group and gather statistics.

For example, execute the following PL/SQL program:

BEGIN
  DBMS_STATS.GATHER_TABLE_STATS( 'sh','customers',
  METHOD_OPT => 'FOR ALL COLUMNS SIZE SKEWONLY ' ||
                'FOR COLUMNS SIZE SKEWONLY (cust_state_province,country_id)' );
END;
/

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.GATHER_TABLE_STATS procedure

Displaying Column Group Information

To obtain the name of a column group, use the DBMS_STATS.SHOW_EXTENDED_STATS_NAME function or a database view. You can also use views to obtain information such as the number of distinct values, and whether the column group has a histogram.

Assumptions

This tutorial assumes the following:

You created a column group for the cust_state_province and country_id columns in the customers table in sh schema.
You want to determine the column group name, the number of distinct values, and whether a histogram has been created for a column group.

To monitor a column group:

Start SQL*Plus and connect to the database as the sh user.

To determine the column group name, do one of the following.

Execute the SHOW_EXTENDED_STATS_NAME function.

For example, run the following PL/SQL program:

SELECT SYS.DBMS_STATS.SHOW_EXTENDED_STATS_NAME( 'sh','customers',
       '(cust_state_province,country_id)' ) col_group_name 
FROM   DUAL;

The output is similar to the following:

COL_GROUP_NAME
----------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_

Query the USER_STAT_EXTENSIONS view.

For example, run the following query:

SELECT EXTENSION_NAME, EXTENSION 
FROM   USER_STAT_EXTENSIONS 
WHERE  TABLE_NAME='CUSTOMERS';

EXTENSION_NAME                     EXTENSION
-----------------------------------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_     ("CUST_STATE_PROVINCE","COUNTRY_ID")

Query the number of distinct values and find whether a histogram has been created for a column group.

For example, run the following query:

SELECT e.EXTENSION col_group, t.NUM_DISTINCT, t.HISTOGRAM
FROM   USER_STAT_EXTENSIONS e, USER_TAB_COL_STATISTICS t
WHERE  e.EXTENSION_NAME=t.COLUMN_NAME
AND    e.TABLE_NAME=t.TABLE_NAME
AND    t.TABLE_NAME='CUSTOMERS';

COL_GROUP                             NUM_DISTINCT        HISTOGRAM
-------------------------------------------------------------------
("COUNTRY_ID","CUST_STATE_PROVINCE")  145                 FREQUENCY

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.SHOW_EXTENDED_STATS_NAME function

Dropping a Column Group

Use the DBMS_STATS.DROP_EXTENDED_STATS function to delete a column group from a table.

Assumptions

This tutorial assumes the following:

You created a column group for the cust_state_province and country_id columns in the customers table in sh schema.
You want to drop the column group.

To drop a column group:

Start SQL*Plus and connect to the database as the sh user.

Drop the column group.

For example, the following PL/SQL program deletes a column group from the customers table:

BEGIN
  DBMS_STATS.DROP_EXTENDED_STATS( 'sh', 'customers', 
                                  '(cust_state_province, country_id)' );
END;
/

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.DROP_EXTENDED_STATS function

Managing Expression Statistics

The type of extended statistics known as expression statistics improve optimizer estimates when a WHERE clause has predicates that use expressions.

This section contains the following topics:

About Expression Statistics

When an expression is applied to a column in the WHERE clause in the form(function(col)=constant), the optimizer has no way of knowing how this SQL function affects the cardinality of the predicate unless a function-based index had been created. Starting in Oracle Database 11g, you can gather expression statistics on the expression(function(col) itself.

Figure 13-3 shows the optimizer using statistics to generate a plan for a query that uses a function. The top shows the optimizer checking statistics for the column. The bottom shows the optimizer checking statistics corresponding to the expression used in the query. The expression statistics yield more accurate estimates.

Figure 13-3 Expression Statistics

Description of "Figure 13-3 Expression Statistics"

As shown in Figure 13-3, when expression statistics are not available, the optimizer can produce suboptimal plans.

See Also:

Oracle Database SQL Language Reference to learn about SQL functions

When Expression Statistics Are Useful: Example

The following query of the sh.customers table shows that 3341 customers are in the state of California:

sys@PROD> SELECT COUNT(*) FROM sh.customers WHERE cust_state_province='CA';
 
  COUNT(*)
----------
      3341

Consider the plan for the same query with the LOWER() function applied:

sys@PROD> EXPLAIN PLAN FOR
  2  SELECT * FROM sh.customers WHERE LOWER(cust_state_province)='ca';
Explained.

sys@PROD> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 2008213504

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   555 |   108K|   406   (1)| 00:00:05 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |   555 |   108K|   406   (1)| 00:00:05 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(LOWER("CUST_STATE_PROVINCE")='ca')

Because no expression statistics exist for LOWER(cust_state_province)='ca', the optimizer estimate is significantly off. You can use DBMS_STATS procedures to correct these estimates.

Creating Expression Statistics

You can use DBMS_STATS to create statistics for a user-specified expression. You have the option of using either of the following program units:

GATHER_TABLE_STATS procedure
CREATE_EXTENDED_STATISTICS function followed by the GATHER_TABLE_STATS procedure

Assumptions

This tutorial assumes the following:

Selectivity estimates are inaccurate for queries of sh.customers that use the UPPER(cust_state_province) function.
You want to gather statistics on the UPPER(cust_state_province) expression.

To create expression statistics:

Start SQL*Plus and connect to the database as the sh user.

Gather table statistics.

For example, run the following command, specifying the function in the method_opt argument:

BEGIN
  DBMS_STATS.GATHER_TABLE_STATS( 
    'sh'
,   'customers'
,   method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS
                  (LOWER(cust_state_province)) SIZE SKEWONLY' 
);
END;

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.GATHER_TABLE_STATS procedure

Displaying Expression Statistics

You can use the database view DBA_STAT_EXTENSIONS and the DBMS_STATS.SHOW_EXTENDED_STATS_NAME function to obtain information about expression statistics. You can also use views to obtain information such as the number of distinct values, and whether the column group has a histogram.

Assumptions

This tutorial assumes the following:

You created extended statistics for the LOWER(cust_state_province) expression.
You want to determine the column group name, the number of distinct values, and whether a histogram has been created for a column group.

To monitor expression statistics:

Start SQL*Plus and connect to the database as the sh user.

Query the name and definition of the statistics extension.

For example, run the following query:

COL EXTENSION_NAME FORMAT a30
COL EXTENSION FORMAT a35

SELECT EXTENSION_NAME, EXTENSION
FROM   USER_STAT_EXTENSIONS
WHERE  TABLE_NAME='CUSTOMERS';

Sample output appears as follows:

EXTENSION_NAME                 EXTENSION
------------------------------ ------------------------------
SYS_STUBPHJSBRKOIK9O2YV3W8HOUE (LOWER("CUST_STATE_PROVINCE"))

Query the number of distinct values and find whether a histogram has been created for the expression.

For example, run the following query:

SELECT e.EXTENSION expression, t.NUM_DISTINCT, t.HISTOGRAM
FROM   USER_STAT_EXTENSIONS e, USER_TAB_COL_STATISTICS t
WHERE  e.EXTENSION_NAME=t.COLUMN_NAME
AND    e.TABLE_NAME=t.TABLE_NAME
AND    t.TABLE_NAME='CUSTOMERS';

EXPRESSION                            NUM_DISTINCT        HISTOGRAM
-------------------------------------------------------------------
(LOWER("CUST_STATE_PROVINCE"))        145                 FREQUENCY

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.SHOW_EXTENDED_STATS_NAME procedure
Oracle Database Reference to learn about the DBA_STAT_EXTENSIONS view

Dropping Expression Statistics

Use the DBMS_STATS.DROP_EXTENDED_STATS function to delete a column group from a table.

Assumptions

This tutorial assumes the following:

You created extended statistics for the LOWER(cust_state_province) expression.
You want to drop the expression statistics.

To drop expression statistics:

Start SQL*Plus and connect to the database as the sh user.

Drop the column group.

For example, the following PL/SQL program deletes a column group from the customers table:

BEGIN
  DBMS_STATS.DROP_EXTENDED_STATS(
    'sh'
,   'customers'
,   '(LOWER(cust_state_province))'
);
END;
/

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.DROP_EXTENDED_STATS procedure

Locking and Unlocking Optimizer Statistics

You can lock statistics to prevent them from changing. After statistics are locked, you cannot make modifications to the statistics until the statistics have been unlocked.

Locking procedures are useful in a static environment when you want to guarantee that the statistics and resulting plan never change. For example, you may want to prevent new statistics from being gathered on a table or schema by the DBMS_STATS_JOB process, such as highly volatile tables.

When you lock statistics on a table, all dependent statistics are locked. The locked statistics include table statistics, column statistics, histograms, and dependent index statistics. To overwrite statistics even when they are locked, you can set the value of the FORCE argument in various DBMS_STATS procedures, for example, DELETE_*_STATS and RESTORE_*_STATS, to true.

Locking Statistics

The DBMS_STATS package provides two procedures for locking statistics: LOCK_SCHEMA_STATS and LOCK_TABLE_STATS.

Assumptions

This tutorial assumes the following:

You gathered statistics on the oe.orders table and on the hr schema.
You want to prevent the oe.orders table statistics and hr schema statistics from changing.

To lock statistics:

Start SQL*Plus and connect to the database as the oe user.
Lock the statistics on oe.orders.

For example, execute the following PL/SQL program:
```
BEGIN
  DBMS_STATS.LOCK_TABLE_STATS('OE','ORDERS');
END;
/
```
Connect to the database as the hr user.
Lock the statistics in the hr schema.

For example, execute the following PL/SQL program:
```
BEGIN
  DBMS_STATS.LOCK_SCHEMA_STATS('HR');
END;
/
```

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.LOCK_TABLE_STATS procedure

Unlocking Statistics

The DBMS_STATS package provides two procedures for unlocking statistics: UNLOCK_SCHEMA_STATS and UNLOCK_TABLE_STATS.

Assumptions

This tutorial assumes the following:

You locked statistics on the oe.orders table and on the hr schema.
You want to unlock these statistics.

To unlock statistics:

Start SQL*Plus and connect to the database as the oe user.
Unlock the statistics on oe.orders.

For example, execute the following PL/SQL program:
```
BEGIN
  DBMS_STATS.UNLOCK_TABLE_STATS('OE','ORDERS');
END;
/
```
Connect to the database as the hr user.
Unlock the statistics in the hr schema.

For example, execute the following PL/SQL program:
```
BEGIN
  DBMS_STATS.UNLOCK_SCHEMA_STATS('HR');
END;
/
```

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.UNLOCK_TABLE_STATS procedure

Restoring Optimizer Statistics

This topic contains the following topics:

About Restore Operations for Optimizer Statistics

Whenever statistics in the data dictionary are modified, the database automatically saves old versions of statistics. If newly collected statistics lead to suboptimal execution plans, then you may want to revert to the previous statistics. In this way, restoring optimizer statistics can aid in troubleshooting suboptimal plans.

Figure 13-4 illustrates a timeline for restoring statistics. In the graphic, statistics collection occurs on August 10 and August 20. On August 24, the DBA determines that the current statistics may be causing the optimizer to generate suboptimal plans. On August 25, the administrator restores the statistics collected on August 10.

Figure 13-4 Restoring Optimizer Statistics

Description of "Figure 13-4 Restoring Optimizer Statistics"

Guidelines for Restoring Optimizer Statistics

Restoring statistics is similar to importing and exporting statistics. In general, restore statistics instead of exporting them in the following situations:

You want to recover older versions of the statistics. For example, you want to restore the optimizer behavior to an earlier date.
You want the database to manage the retention and purging of statistics histories.

Export statistics rather than restoring them in the following situations:

You want to experiment with multiple sets of statistics and change the values back and forth.
You want to move the statistics from one database to another database. For example, moving statistics from a production system to a test system.
You want to preserve a known set of statistics for a longer period than the desired retention date for restoring statistics.

See Also:

Oracle Database PL/SQL Packages and Types Reference for an overview of the procedures for restoring and importing statistics

Restrictions for Restoring Optimizer Statistics

When restoring previous versions of statistics, the following limitations apply:

DBMS_STATS.RESTORE_*_STATS procedures cannot restore user-defined statistics.
Old versions of statistics are not stored when the ANALYZE command has been used for collecting statistics.
When you drop a table, workload information used by the auto-histogram gathering feature and saved statistics history used by the RESTORE_*_STATS procedures is lost. Without this data, these features do not function properly. To remove all rows from a table, and to restore these statistics with DBMS_STATS, use TRUNCATE instead of dropping and re-creating the same table.

Restoring Optimizer Statistics Using DBMS_STATS

You can restore statistics using the DBMS_STATS.RESTORE_*_STATS procedures. The procedures listed in Table 13-5 accept a timestamp as an argument and restore statistics as of the specified time (as_of_timestamp).

Table 13-5 DBMS_STATS Restore Procedures

Procedure	Description
`RESTORE_DICTIONARY_STATS`	Restores statistics of all dictionary tables (tables of `SYS`, `SYSTEM`, and `RDBMS` component schemas) as of a specified timestamp.
`RESTORE_FIXED_OBJECTS_STATS`	Restores statistics of all fixed tables as of a specified timestamp.
`RESTORE_SCHEMA_STATS`	Restores statistics of all tables of a schema as of a specified timestamp.
`RESTORE_SYSTEM_STATS`	Restores system statistics as of a specified timestamp.
`RESTORE_TABLE_STATS`	Restores statistics of a table as of a specified timestamp. The procedure also restores statistics of associated indexes and columns. If the table statistics were locked at the specified timestamp, then the procedure locks the statistics.

Dictionary views display the time of statistics modifications. You can use the following views to determine the time stamp to be use for the restore operation:

The DBA_OPTSTAT_OPERATIONS view contain history of statistics operations performed at schema and database level using DBMS_STATS.
The DBA_TAB_STATS_HISTORY views contains a history of table statistics modifications.

Assumptions

This tutorial assumes the following:

After the most recent statistics collection for the oe.orders table, the optimizer began choosing suboptimal plans for queries of this table.
You want to restore the statistics from before the most recent statistics collection to see if the plans improve.

To restore optimizer statistics:

Start SQL*Plus and connect to the database with administrator privileges.

Query the statistics history for oe.orders.

For example, run the following query:

COL TABLE_NAME FORMAT a10
SELECT TABLE_NAME,
       TO_CHAR(STATS_UPDATE_TIME,'YYYY-MM-DD:HH24:MI:SS') AS STATS_MOD_TIME
FROM   DBA_TAB_STATS_HISTORY 
WHERE  TABLE_NAME='ORDERS'
AND    OWNER='OE'
ORDER BY STATS_UPDATE_TIME DESC;

Sample output is as follows:

TABLE_NAME STATS_MOD_TIME
---------- -------------------
ORDERS     2012-08-20:11:36:38
ORDERS     2012-08-10:11:06:20

Restore the optimizer statistics to the previous modification time.

For example, restore the oe.orders table statistics to August 10, 2012:
```
BEGIN
  DBMS_STATS.RESTORE_TABLE_STATS( 'OE','ORDERS', 
               TO_TIMESTAMP('2012-08-10:11:06:20','YYYY-MM-DD:HH24:MI:SS') );
END;
/
```
You can specify any date between 8/10 and 8/20 because DBMS_STATS restores statistics as of the specified time.

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn more about the DBMS_STATS.RESTORE_TABLE_STATS procedure

Managing Optimizer Statistics Retention

By default, the database retains optimizer statistics for 31 days, after which time the statistics are scheduled for purging. You can use the DBMS_STATS package to determine the retention period, change the period, and manually purge old statistics.

This section contains the following topics:

Obtaining Optimizer Statistics History

You can use DBMS_STATS procedures to obtain historical information for optimizer statistics. This information is useful when you want to determine how long the database retains optimizer statistics, and how far back these statistics can be restored.

You can use the following procedure to obtain information about the optimizer statistics history:

GET_STATS_HISTORY_RETENTION

This function can retrieve the current statistics history retention value.
GET_STATS_HISTORY_AVAILABILITY

This function retrieves the oldest time stamp when statistics history is available. Users cannot restore statistics to a time stamp older than the oldest time stamp.

To obtain optimizer statistics history information:

Start SQL*Plus and connect to the database with the necessary privileges.

Execute the following PL/SQL program:

DECLARE
  v_stats_retn  NUMBER;
  v_stats_date  DATE;
BEGIN
  v_stats_retn := DBMS_STATS.GET_STATS_HISTORY_RETENTION;
  DBMS_OUTPUT.PUT_LINE('The retention setting is ' || v_stats_retn || '.');
  v_stats_date := DBMS_STATS.GET_STATS_HISTORY_AVAILABILITY;
  DBMS_OUTPUT.PUT_LINE('Earliest restore date is ' || v_stats_date || '.');
END;
/

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.GET_STATS_HISTORY_RETENTION procedure

Changing the Optimizer Statistics Retention Period

By default, the database retains optimizer statistics for 31 days. You can configure the retention period using the DBMS_STATS.ALTER_STATS_HISTORY_RETENTION procedure.

Prerequisites

To run this procedure, you must have either the SYSDBA privilege, or both the ANALYZE ANY DICTIONARY and ANALYZE ANY system privileges.

Assumptions

This tutorial assumes the following:

The current retention period for optimizer statistics is 31 days.
You run queries annually as part of an annual report. To keep the statistics history for more than 365 days so that you have access to last year's plan (in case a suboptimal plan occurs now), you set the retention period to 366 days.
You want to create a PL/SQL procedure set_opt_stats_retention that you can use to change the optimizer statistics retention period.

To change the optimizer statistics retention period:

Start SQL*Plus and connect to the database with the necessary privileges.

Create a procedure that changes the retention period.

For example, create the following procedure:

CREATE OR REPLACE PROCEDURE set_opt_stats_retention
  ( p_stats_retn   IN NUMBER )
IS
  v_stats_retn NUMBER;
BEGIN
  v_stats_retn := DBMS_STATS.GET_STATS_HISTORY_RETENTION;
  DBMS_OUTPUT.PUT_LINE('Old retention setting is ' ||v_stats_retn|| '.');
  DBMS_STATS.ALTER_STATS_HISTORY_RETENTION(p_stats_retn);
  v_stats_retn := DBMS_STATS.GET_STATS_HISTORY_RETENTION;
  DBMS_OUTPUT.PUT_LINE('New retention setting is ' ||v_stats_retn|| '.');
END;
/

Change the retention period to 366 days.

For example, execute the procedure that you created in the previous step (sample output included):

SQL> EXECUTE set_opt_stats_retention(366)

The old retention setting is 31.
The new retention setting is 366.
 
PL/SQL procedure successfully completed.

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.ALTER_STATS_HISTORY_RETENTION procedure

Purging Optimizer Statistics

Automatic purging is enabled when the STATISTICS_LEVEL initialization parameter is set to TYPICAL or ALL. The database purges all history older than the older of (current time - the ALTER_STATS_HISTORY_RETENTION setting) and (time of the most recent statistics gathering - 1).

You can purge old statistics manually using the PURGE_STATS procedure. If you do not specify an argument, then this procedure uses the automatic purging policy. If you specify the before_timestamp parameter, then the database purges statistics saved before the specified timestamp.

Prerequisites

To run this procedure, you must have either the SYSDBA privilege, or both the ANALYZE ANY DICTIONARY and ANALYZE ANY system privileges.

Assumptions

This tutorial assumes that you want to purge statistics more than one week old.

To purge optimizer statistics:

Start SQL*Plus and connect to the database with the necessary privileges.
Execute the DBMS_STATS.PURGE_STATS procedure.

For example, execute the procedure as follows:
```
EXEC DBMS_STATS.PURGE_STATS( SYSDATE-7 );
```

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.PURGE_STATS procedure

Importing and Exporting Optimizer Statistics

You can export and import optimizer statistics from the data dictionary to user-defined statistics tables. You can also copy statistics from one database to another database.

Importing and exporting are especially useful for testing an application using production statistics. You use DBMS_STATS to export schema statistics from a production database to a test database so that developers can tune execution plans in a realistic environment before deploying applications.

About Transporting Optimizer Statistics

When you transport optimizer statistics between databases, you must use DBMS_STATS to copy the statistics to and from a staging table, and tools to make the table contents accessible to the destination database. Figure 13-5 illustrates the process using Oracle Data Pump and ftp.

Figure 13-5 Transporting Optimizer Statistics

Description of "Figure 13-5 Transporting Optimizer Statistics"

As shown in Figure 13-5, the basic steps are as follows:

In the production database, copy the statistics from the data dictionary to a staging table using DBMS_STATS.EXPORT_SCHEMA_STATS.
Export the statistics from the staging table to a .dmp file using Oracle Data Pump.
Transfer the .dmp file from the production host to the test host using a transfer tool such as ftp.
In the test database, import the statistics from the .dmp file to a staging table using Oracle Data Pump.
Copy the statistics from the staging table to the data dictionary using DBMS_STATS.IMPORT_SCHEMA_STATS.

Transporting Optimizer Statistics to a Test Database

This section explains how to transport schema statistics from a production database to a test database.

Prerequisites and Restrictions

When preparing to export optimizer statistics, note the following:

Before exporting statistics, you must create a table to hold the statistics. The procedure DBMS_STATS.CREATE_STAT_TABLE creates the statistics table.
The optimizer does not use statistics stored in a user-owned table. The only statistics used by the optimizer are the statistics stored in the data dictionary. To make the optimizer use statistics in user-defined tables, import these statistics into the data dictionary using the DBMS_STATS import procedure.
The Data Pump Export and Import utilities export and import optimizer statistics from the database along with the table. When a column has system-generated names, Original Export (exp) does not export statistics with the data, but this restriction does not apply to Data Pump Export.

Note:

Exporting and importing statistics using DBMS_STATS is a distinct operation from using Data Pump Export and Import.

Assumptions

This tutorial assumes the following:

You want to generate representative sh schema statistics on a production database and use DBMS_STATS to import them into a test database.
Administrative user dba1 exists on both production and test databases.
You intend to create table opt_stats to store the schema statistics.
You intend to use Oracle Data Pump to export and import table opt_stats.

To generate schema statistics and import them into a separate database:

On the production host, start SQL*Plus and connect to the production database as administrator dba1.
Create a table to hold the production statistics.

For example, execute the following PL/SQL program to create user statistics table opt_stats:
```
BEGIN
  DBMS_STATS.CREATE_STAT_TABLE ( 
    ownname => 'dba1'
,   stattab => 'opt_stats'
);
END;
/
```
Gather schema statistics.

For example, manually gather schema statistics as follows:
```
-- generate representative workload
EXEC DBMS_STATS.GATHER_SCHEMA_STATS('SH');
```
Use DBMS_STATS to export the statistics.

For example, retrieve schema statistics and store them in the opt_stats table created previously:
```
BEGIN
  DBMS_STATS.EXPORT_SCHEMA_STATS (
    ownname => 'dba1'
,   stattab => 'opt_stats'
);
END;
/
```
Use Oracle Data Pump to export the contents of the statistics table.

For example, run the expdp command at the operating schema prompt:
```
expdp dba1 DIRECTORY=dpump_dir1 DUMPFILE=stat.dmp TABLES=opt_stats
```
Transfer the dump file to the test database host.
Log in to the test host, and then use Oracle Data Pump to import the contents of the statistics table.

For example, run the impdp command at the operating schema prompt:
```
impdp dba1 DIRECTORY=dpump_dir1 DUMPFILE=stat.dmp TABLES=opt_stats 
```
On the test host, start SQL*Plus and connect to the test database as administrator dba1.
Use DBMS_STATS to import statistics from the user statistics table and store them in the data dictionary.

The following PL/SQL program imports schema statistics from table opt_stats into the data dictionary:
```
BEGIN
  DBMS_STATS.IMPORT_SCHEMA_STATS( 
    ownname => 'dba1'
,   stattab => 'opt_stats' 
);
END;
/
```

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.CREATE_STAT_TABLE function
Oracle Database PL/SQL Packages and Types Reference for an overview of the statistics transfer functions
Oracle Database Utilities to learn about Oracle Data Pump

Running Statistics Gathering Functions in Reporting Mode

You can run the DBMS_STATS statistics gathering procedures in reporting mode. In this case, the optimizer does not actually gather statistics, but reports objects that would be processed if you were to use a specified statistics gathering function.

Table 13-6 lists the DBMS_STATS.REPORT_GATHER_*_STATS functions. For all functions, the input parameters are the same as for the corresponding GATHER_*_STATS procedure, with the following additional parameters: detail_level and format. Supported formats are XML, HTML, and TEXT. See Oracle Database PL/SQL Packages and Types Reference for complete syntax and semantics for the reporting mode functions.

Table 13-6 DBMS_STATS Reporting Mode Functions

Function	Description
`REPORT_GATHER_TABLE_STATS`	Runs `GATHER_TABLE_STATS` in reporting mode. The procedure does not collect statistics, but reports all objects that would be affected by invoking `GATHER_TABLE_STATS`.
`REPORT_GATHER_SCHEMA_STATS`	Runs `GATHER_SCHEMA_STATS` in reporting mode. The procedure does not actually collect statistics, but reports all objects that would be affected by invoking `GATHER_SCHEMA_STATS`.
`REPORT_GATHER_DICTIONARY_STATS`	Runs `GATHER_DICTIONARY_STATS` in reporting mode. The procedure does not actually collect statistics, but reports all objects that would be affected by invoking `GATHER_DICTIONARY_STATS`.
`REPORT_GATHER_DATABASE_STATS`	Runs `GATHER_DATABASE_STATS` in reporting mode. The procedure does not actually collect statistics, but reports all objects that would be affected by invoking `GATHER_DATABASE_STATS`.
`REPORT_GATHER_FIXED_OBJ_STATS`	Runs `GATHER_FIXED_OBJ_STATS` in reporting mode. The procedure does not actually collect statistics, but reports all objects that would be affected by invoking `GATHER_FIXED_OBJ_STATS`.
`REPORT_GATHER_AUTO_STATS`	Runs the automatic statistics gather job in reporting mode. The procedure does not actually collect statistics, but reports all objects that would be affected by running the job.

Assumptions

This tutorial assumes that you want to generate an HTML report of the objects that would be affected by running GATHER_SCHEMA_STATS on the oe schema.

To report on objects affected by running GATHER_SCHEMA_STATS:

Start SQL*Plus and connect to the database with administrator privileges.

Run the DBMS_STATS.REPORT_GATHER_SCHEMA_STATS function.

For example, run the following commands in SQL*Plus:

SET LINES 200 PAGES 0
SET LONG 100000
COLUMN REPORT FORMAT A200

VARIABLE my_report CLOB;
BEGIN
  :my_report :=DBMS_STATS.REPORT_GATHER_SCHEMA_STATS(
    ownname      => 'OE'       , 
    detail_level => 'TYPICAL'  ,
    format       => 'HTML'     );
END;
/

The following graphic shows a partial example report:

Description of the illustration report_gs_oe.png

See Also:

Oracle Database PL/SQL Packages and Types Reference to learn more about DBMS_STATS

Reporting on Past Statistics Gathering Operations

You can use DBMS_STATS functions to report on a specific statistics gathering operation or on operations that occurred during a specified time. This section shows the command-line interface. To learn about the Cloud Control interface, see "Graphical Interface for Optimizer Statistics Management".

Table 13-7 lists the functions. See Oracle Database PL/SQL Packages and Types Reference for complete syntax and semantics for the functions that report on statistics operations.

Table 13-7 DBMS_STATS Reporting Functions

Function	Description
`REPORT_STATS_OPERATIONS`	Generates a report of all statistics operations that occurred between two points in time. You can narrow the scope of the report to include only automatic statistics gathering runs. You can also provide a set of pluggable database (PDB) IDs so that the database reports only statistics operations from the specified PDBs.
`REPORT_SINGLE_STATS_OPERATION`	Generates a report of the specified operation. Optionally, you can specify a particular PDB ID in a container database (CDB).

Assumptions

This tutorial assumes that you want to generate HTML reports of the following:

All statistics gathering operations within the last day
The most recent statistics gathering operation

To report on all operations in the past day:

Start SQL*Plus and connect to the database with administrator privileges.

Run the DBMS_STATS.REPORT_STATS_OPERATIONS function.

For example, run the following commands:

SET LINES 200 PAGES 0
SET LONG 100000
COLUMN REPORT FORMAT A200

VARIABLE my_report CLOB;
BEGIN
  :my_report := DBMS_STATS.REPORT_STATS_OPERATIONS (
     since        => SYSDATE-1
,    until        => SYSDATE 
,    detail_level => 'TYPICAL' 
,    format       => 'HTML'      
);
END;
/

The following graphic shows a sample report:

Description of the illustration reporting_past_ops.gif

Run the DBMS_STATS.REPORT_SINGLE_STATS_OPERATION function for an individual operation.

For example, run the following program to generate a report of operation 848:
```
BEGIN
  :my_report :=DBMS_STATS.REPORT_SINGLE_STATS_OPERATION (
     OPID    => 848
,    FORMAT  => 'HTML'
);
END;
```

The following graphic shows a sample report:

Description of the illustration reporting_past_op.gif

See Also:

"Graphical Interface for Optimizer Statistics Management" to learn about the Cloud Control GUI for statistics management
Oracle Database PL/SQL Packages and Types Reference to learn more about DBMS_STATS

Managing SQL Plan Directives

As explained in "SQL Plan Directives", the database automatically manages SQL plan directives. If the directives are not used in 53 weeks, then the database automatically purges them.

You can use DBMS_SPD procedures and functions to manage directives manually.

Table 13-8 lists some of the more commonly used procedures and functions. See Oracle Database PL/SQL Packages and Types Reference for complete syntax and semantics for the DBMS_SPD package.

Table 13-8 DBMS_SPD Procedures

Procedure	Description
`FLUSH_SQL_PLAN_DIRECTIVE`	Forces the database to write directives from memory to persistent storage in the `SYSAUX` tablespace.
`DROP_SQL_PLAN_DIRECTIVE`	Drops a SQL plan directive.

Prerequisites

You must have the Administer SQL Management Object privilege to execute the DBMS_SPD APIs.

Assumptions

This tutorial assumes that you want to do the following:

Write all directives for the sh schema to persistent storage.
Delete all directives for the sh schema.

To write and then delete all sh schema plan directives:

Start SQL*Plus and connect to the database with administrator privileges.
Force the database to write the SQL plan directives to disk.

For example, execute the following DBMS_SPD program:
```
BEGIN 
  DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE;
END;
/
```
Query the data dictionary for information about existing directives in the sh schema.

Example 13-1 queries the data dictionary for information about the directive.
Delete the existing SQL plan directive for the sh schema.

The following PL/SQL program unit deletes the SQL plan directive with the ID 1484026771529551585:
```
BEGIN
  DBMS_SPD.DROP_SQL_PLAN_DIRECTIVE ( directive_id => 1484026771529551585 );
END;
/
```

See Also:

Oracle Database Reference to learn about DBA_SQL_PLAN_DIRECTIVES
Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE procedure

Example 13-1 Display Directives for sh Schema

SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME, 
       o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON
FROM   DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o
WHERE  d.DIRECTIVE_ID=o.DIRECTIVE_ID
AND    o.OWNER IN ('SH')
ORDER BY 1,2,3,4,5;
 
DIR_ID              OWN OBJECT_NA COL_NAME   OBJECT  TYPE     STATE      REASON
------------------- --- --------- ---------- ------- -------- ---------- ------------
1484026771529551585  SH CUSTOMERS COUNTRY_ID  COLUMN DYNAMIC_ SUPERSEDED SINGLE TABLE  
                                                     SAMPLING            CARDINALITY 
                                                                         MISESTIMATE
1484026771529551585  SH CUSTOMERS CUST_STATE_ COLUMN DYNAMIC_ SUPERSEDED SINGLE TABLE
                                  PROVINCE           SAMPLING            CARDINALITY 
                                                                         MISESTIMATE
1484026771529551585  SH CUSTOMERS              TABLE DYNAMIC_ SUPERSEDED SINGLE TABLE 
                                                     SAMPLING            CARDINALITY 
                                                                         MISESTIMATE