This chapter explains basic concepts relating to optimizer statistics.
This chapter includes the following topics:
Oracle Database optimizer statistics describe details about the database and its objects. The optimizer cost model relies on statistics collected about the objects involved in a query, and the database and host where the query runs. Statistics are critical to the optimizer's ability to pick the best execution plan for a SQL statement.
Optimizer statistics include the following:
Table statistics
Number of rows
Number of blocks
Average row length
Column statistics
Number of distinct values (NDV) in a column
Number of nulls in a column
Data distribution (histogram)
Extended statistics
Index statistics
Number of leaf blocks
Number of levels
Index clustering factor
System statistics
I/O performance and utilization
CPU performance and utilization
As shown in Figure 10-1, the database stores optimizer statistics for tables, columns, indexes, and the system in the data dictionary. You can access these statistics using data dictionary views.
Note:
The optimizer statistics are different from the performance statistics visible through V$
views.
The optimizer collects statistics on different types of database objects and characteristics of the database environment. This section contains the following topics:
In Oracle Database, table statistics include information about rows and blocks. The optimizer uses these statistics to determine the cost of table scans and table joins. DBMS_STATS
can gather statistics for both permanent and temporary tables.
The database tracks all relevant statistics about permanent tables. DBMS_STATS.GATHER_TABLE_STATS
commits before gathering statistics on permanent tables. For example, table statistics stored in DBA_TAB_STATISTICS
track the following:
Number of rows and average row length
The database uses the row count stored in DBA_TAB_STATISTICS
when determining cardinality.
Number of data blocks
The optimizer uses the number of data blocks with the DB_FILE_MULTIBLOCK_READ_COUNT
initialization parameter to determine the base table access cost.
Example 10-1 Table Statistics
This example queries some table statistics for the sh.customers
table.
sys@PROD> SELECT NUM_ROWS, AVG_ROW_LEN, BLOCKS, LAST_ANALYZED 2 FROM DBA_TAB_STATISTICS 3 WHERE OWNER='SH' 4 AND TABLE_NAME='CUSTOMERS'; NUM_ROWS AVG_ROW_LEN BLOCKS LAST_ANAL ---------- ----------- ---------- --------- 55500 181 1486 14-JUN-10
See Also:
Oracle Database Reference for a description of the DBA_TAB_STATISTICS
view and the DB_FILE_MULTIBLOCK_READ_COUNT
initialization parameter
Column statistics track information about column values and data distribution. The optimizer uses these statistics to generate accurate cardinality estimates and make better decisions about index usage, join orders, join methods, and so on.
For example, index statistics in DBA_TAB_COL_STATISTICS
track the following:
Number of distinct values (NDV)
Number of nulls
High and low values
Histogram-related information (see "Histograms ")
The optimizer can use extended statistics, which are a special type of column statistics. These statistics are useful for informing the optimizer of logical relationships among columns.
See Also:
Oracle Database Reference for a description of the DBA_TAB_COL_STATISTICS
view
The index statistics include information about the number of index levels, the number of index blocks, and the relationship between the index and the data blocks. The optimizer uses these statistics to determine the cost of index scans.
Index statistics stored in the DBA_IND_STATISTICS
view track the following:
Levels
The BLEVEL
column shows the number of blocks required to go from the root block to a leaf block. A B-tree index has two types of blocks: branch blocks for searching and leaf blocks that store values. See Oracle Database Concepts for a conceptual overview of B-tree indexes.
Distinct keys
This columns tracks the number of distinct indexed values. If a unique constraint is defined, and if no NOT NULL
constraint is defined, then this value equals the number of non-null values.
Average number of leaf blocks for each distinct indexed key
Average number of data blocks pointed to by each distinct indexed key
See Also:
Oracle Database Reference for a description of the DBA_IND_STATISTICS
view
Example 10-2 Index Statistics
This example queries some index statistics for the cust_lname_ix
and customers_pk
indexes on the sh.customers
table (sample output included):
SELECT INDEX_NAME, BLEVEL, LEAF_BLOCKS AS "LEAFBLK", DISTINCT_KEYS AS "DIST_KEY", AVG_LEAF_BLOCKS_PER_KEY AS "LEAFBLK_PER_KEY", AVG_DATA_BLOCKS_PER_KEY AS "DATABLK_PER_KEY" FROM DBA_IND_STATISTICS WHERE OWNER = 'SH' AND INDEX_NAME IN ('CUST_LNAME_IX','CUSTOMERS_PK'); INDEX_NAME BLEVEL LEAFBLK DIST_KEY LEAFBLK_PER_KEY DATABLK_PER_KEY -------------- ------ ------- -------- --------------- --------------- CUSTOMERS_PK 1 115 55500 1 1 CUST_LNAME_IX 1 141 908 1 10
For a B-tree index, the index clustering factor measures the physical grouping of rows in relation to an index value, such as last name. The index clustering factor helps the optimizer decide whether an index scan or full table scan is more efficient for certain queries). A low clustering factor indicates an efficient index scan.
A clustering factor that is close to the number of blocks in a table indicates that the rows are physically ordered in the table blocks by the index key. If the database performs a full table scan, then the database tends to retrieve the rows as they are stored on disk sorted by the index key. A clustering factor that is close to the number of rows indicates that the rows are scattered randomly across the database blocks in relation to the index key. If the database performs a full table scan, then the database would not retrieve rows in any sorted order by this index key.
The clustering factor is a property of a specific index, not a table (see Oracle Database Concepts for an overview). If multiple indexes exist on a table, then the clustering factor for one index might be small while the factor for another index is large. An attempt to reorganize the table to improve the clustering factor for one index may degrade the clustering factor of the other index.
Example 10-3 Index Clustering Factor
This example shows how the optimizer uses the index clustering factor to determine whether using an index is more effective than a full table scan.
Start SQL*Plus and connect to a database as sh
, and then query the number of rows and blocks in the sh.customers
table (sample output included):
SELECT table_name, num_rows, blocks FROM user_tables WHERE table_name='CUSTOMERS'; TABLE_NAME NUM_ROWS BLOCKS ------------------------------ ---------- ---------- CUSTOMERS 55500 1486
Create an index on the customers.cust_last_name
column.
For example, execute the following statement:
CREATE INDEX CUSTOMERS_LAST_NAME_IDX ON customers(cust_last_name);
Query the index clustering factor of the newly created index.
The following query shows that the customers_last_name_idx
index has a high clustering factor because the clustering factor is significantly more than the number of blocks in the table:
SELECT index_name, blevel, leaf_blocks, clustering_factor
FROM user_indexes
WHERE table_name='CUSTOMERS'
AND index_name= 'CUSTOMERS_LAST_NAME_IDX';
INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
CUSTOMERS_LAST_NAME_IDX 1 141 9859
Create a new copy of the customers
table, with rows ordered by cust_last_name
.
For example, execute the following statements:
DROP TABLE customers3 PURGE; CREATE TABLE customers3 AS SELECT * FROM customers ORDER BY cust_last_name;
Gather statistics on the customers3
table.
For example, execute the GATHER_TABLE_STATS
procedure as follows:
EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'CUSTOMERS3');
Query the number of rows and blocks in the customers3
table .
For example, enter the following query (sample output included):
SELECT TABLE_NAME, NUM_ROWS, BLOCKS FROM USER_TABLES WHERE TABLE_NAME='CUSTOMERS3'; TABLE_NAME NUM_ROWS BLOCKS ------------------------------ ---------- ---------- CUSTOMERS3 55500 1485
Create an index on the cust_last_name
column of customers3
.
For example, execute the following statement:
CREATE INDEX CUSTOMERS3_LAST_NAME_IDX ON customers3(cust_last_name);
Query the index clustering factor of the customers3_last_name_idx
index.
The following query shows that the customers3_last_name_idx
index has a lower clustering factor:
SELECT INDEX_NAME, BLEVEL, LEAF_BLOCKS, CLUSTERING_FACTOR
FROM USER_INDEXES
WHERE TABLE_NAME = 'CUSTOMERS3'
AND INDEX_NAME = 'CUSTOMERS3_LAST_NAME_IDX';
INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
CUSTOMERS3_LAST_NAME_IDX 1 141 1455
The table customers3
has the same data as the original customers
table, but the index on customers3
has a much lower clustering factor because the data in the table is ordered by the cust_last_name
. The clustering factor is now about 10 times the number of blocks instead of 70 times.
Query the customers
table.
For example, execute the following query (sample output included):
SELECT cust_first_name, cust_last_name FROM customers WHERE cust_last_name BETWEEN 'Puleo' AND 'Quinn'; CUST_FIRST_NAME CUST_LAST_NAME -------------------- ---------------------------------------- Vida Puleo Harriett Quinlan Madeleine Quinn Caresse Puleo
Display the cursor for the query.
For example, execute the following query (partial sample output included):
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR()); ------------------------------------------------------------------------------- | Id | Operation | Name | Rows |Bytes|Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0| SELECT STATEMENT | | | | 405 (100)| | |* 1| TABLE ACCESS STORAGE FULL| CUSTOMERS | 2335|35025| 405 (1)|00:00:01| -------------------------------------------------------------------------------
The preceding plan shows that the optimizer did not use the index on the original customers
tables.
Query the customers3
table.
For example, execute the following query (sample output included):
SELECT cust_first_name, cust_last_name FROM customers3 WHERE cust_last_name BETWEEN 'Puleo' AND 'Quinn'; CUST_FIRST_NAME CUST_LAST_NAME -------------------- ---------------------------------------- Vida Puleo Harriett Quinlan Madeleine Quinn Caresse Puleo
Display the cursor for the query.
For example, execute the following query (partial sample output included):
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR()); --------------------------------------------------------------------------------------- |Id| Operation | Name |Rows|Bytes|Cost(%CPU)| Time| --------------------------------------------------------------------------------------- | 0| SELECT STATEMENT | | | |69(100)| | | 1| TABLE ACCESS BY INDEX ROWID|CUSTOMERS3 |2335|35025|69(0) |00:00:01| |*2| INDEX RANGE SCAN |CUSTOMERS3_LAST_NAME_IDX|2335| |7(0) |00:00:01| ---------------------------------------------------------------------------------------
The result set is the same, but the optimizer chooses the index. The plan cost is much less than the cost of the plan used on the original customers
table.
Query customers
with a hint that forces the optimizer to use the index.
For example, execute the following query (partial sample output included):
SELECT /*+ index (Customers CUSTOMERS_LAST_NAME_IDX) */ cust_first_name, cust_last_name FROM customers WHERE cust_last_name BETWEEN 'Puleo' and 'Quinn'; CUST_FIRST_NAME CUST_LAST_NAME -------------------- ---------------------------------------- Vida Puleo Caresse Puleo Harriett Quinlan Madeleine Quinn
Display the cursor for the query.
For example, execute the following query (partial sample output included):
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR()); ----------------------------------------------------------------------------------------- | Id | Operation | Name |Rows|Bytes|Cost(%CPU)| Time | ----------------------------------------------------------------------------------------- | 0| SELECT STATEMENT | | | | 422(100) | | | 1| TABLE ACCESS BY INDEX ROWID|CUSTOMERS |335 |35025| 422(0) |00:00:01| |*2| INDEX RANGE SCAN |CUSTOMERS_LAST_NAME_IDX|2335| | 7(0) |00:00:01| -----------------------------------------------------------------------------------------
The preceding plan shows that the cost of using the index on customers
is higher than the cost of a full table scan. Thus, using an index does not necessarily improve performance. The index clustering factor is a measure of whether an index scan is more effective than a full table scan.
To illustrate how the index clustering factor can influence the cost of table access, consider the following scenario:
A table contains 9 rows that are stored in 3 data blocks.
The col1
column currently stores the values A
, B
, and C
.
A nonunique index named col1_idx
exists on col1
for this table.
Example 10-4 Collocated Data
Assume that the rows are stored in the data blocks as follows:
Block 1 Block 2 Block 3 ------- ------- ------- A A A B B B C C C
In this example, the index clustering factor for col1_idx
is low. The rows that have the same indexed column values for col1
are in the same data blocks in the table. Thus, the cost of using an index range scan to return all rows with value A
is low because only one block in the table must be read.
Example 10-5 Scattered Data
Assume that the same rows are scattered across the data blocks as follows:
Block 1 Block 2 Block 3 ------- ------- ------- A B C A C B B A C
In this example, the index clustering factor for col1_idx
is higher. The database must read all three blocks in the table to retrieve all rows with the value A
in col1
.
See Also:
Oracle Database Reference for a description of the DBA_INDEXES
view
A global temporary table is a special table that stores intermediate session-private data for a specific duration. The ON COMMIT
clause of CREATE GLOBAL TEMPORARY TABLE
indicates whether the table is transaction-specific (DELETE ROWS
) or session-specific (PRESERVE ROWS
). Thus, a temporary table holds intermediate result sets for the duration of either a transaction or a session.
When you create a global temporary table, you create a definition that is visible to all sessions. No physical storage is allocated. When a session first puts data into the table, the database allocates storage space. The data in the temporary table is only visible to the current session.
In releases before Oracle Database 12c, the database did not maintain statistics for global temporary tables and non-global temporary tables differently. The database maintained one version of the statistics shared by all sessions, even though data in different sessions could differ.
Starting in Oracle Database 12c, you can set the table-level preference GLOBAL_TEMP_TABLE_STATS
to make statistics on a global temporary table shared or session-specific. When set to session-specific, you can gather statistics for a global temporary table in one session, and then use the statistics for this session only. Meanwhile, users can continue to maintain a shared version of the statistics. During optimization, the optimizer first checks whether a global temporary table has session-specific statistics. If yes, the optimizer uses them. Otherwise, the optimizer uses shared statistics if they exist.
Session-specific statistics have the following characteristics:
Dictionary views that track statistics show both the shared statistics and the session-specific statistics in the current session.
The views are DBA_TAB_STATISTICS
, DBA_IND_STATISTICS
, DBA_TAB_HISTOGRAMS
, and DBA_TAB_COL_STATISTICS
(each view has a corresponding USER_
and ALL_
version). The SCOPE
column shows whether statistics are session-specific or shared.
Other sessions do not share the cursor using the session-specific statistics.
Different sessions can share the cursor using shared statistics, as in releases earlier than Oracle Database 12c. The same session can share the cursor using session-specific statistics.
Pending statistics are not supported for session-specific statistics.
When the GLOBAL_TEMP_TABLE_STATS
preference is set to SESSION
, by default GATHER_TABLE_STATS
immediately invalidates previous cursors compiled in the same session. However, this procedure does not invalidate cursors compiled in other sessions.
DBMS_STATS
commits changes to session-specific global temporary tables, but not to transaction-specific global temporary tables. Before Oracle Database 12c, running DBMS_STATS.GATHER_TABLE_STATS
on a transaction-specific temporary table (ON COMMIT DELETE ROWS
) would delete all rows, making the statistics show the table as empty. Starting in Oracle Database 12c, the following procedures do not commit for transaction-specific temporary tables, so that rows in these tables are not deleted:
GATHER_TABLE_STATS
DELETE_TABLE_STATS
DELETE_COLUMN_STATS
DELETE_INDEX_STATS
SET_TABLE_STATS
SET_COLUMN_STATS
SET_INDEX_STATS
GET_TABLE_STATS
GET_COLUMN_STATS
GET_INDEX_STATS
The preceding program units observe the GLOBAL_TEMP_TABLE_STATS
preference. For example, if the table preference is set to SESSION
, then SET_TABLE_STATS
sets the session statistics, and GATHER_TABLE_STATS
preserves all rows in a transaction-specific temporary table. If the table preference is set to SHARED
, then SET_TABLE_STATS
sets the shared statistics, and GATHER_TABLE_STATS
deletes all rows from a transaction-specific temporary table.
See Also:
Oracle Database Concepts to learn about global temporary tables
Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.GATHER_TABLE_STATS
procedure
The system statistics describe hardware characteristics such as I/O and CPU performance and utilization. System statistics enable the query optimizer to more accurately estimate I/O and CPU costs when choosing execution plans.
The database does not invalidate previously parsed SQL statements when updating system statistics. The database parses all new SQL statements using new statistics.
See Also:
The extensible optimizer enables authors of user-defined functions and indexes to create statistics collection, selectivity, and cost functions for the optimizer to use when choosing a execution plan. The optimizer cost model is extended to integrate information supplied by the user to assess CPU and the I/O cost.
Statistics types act as interfaces for user-defined functions that influence the choice of execution plan by the optimizer. However, to use a statistics type, the optimizer requires a mechanism to bind the type to a database object such as a column, standalone function, object type, index, indextype, or package. The SQL statement ASSOCIATE STATISTICS
creates this association.
Functions for user-defined statistics are relevant for columns that use both standard SQL data types and object types, and for domain indexes. When you associate a statistics type with a column or domain index, the database calls the statistics collection method in the statistics type whenever DBMS_STATS
gathers statistics for database objects.
See Also:
Oracle Database Data Cartridge Developer's Guide to learn about the extensible optimizer and user-defined statistics
Oracle Database provides several mechanisms to gather statistics. This section contains the following topics:
The DBMS_STATS
PL/SQL package collects and manages optimizer statistics. This package enables you to control what and how statistics are collected, including the degree of parallelism for statistics collection, sampling methods, granularity of statistics collection in partitioned tables, and so on.
Note:
Do not use the COMPUTE
and ESTIMATE
clauses of the ANALYZE
statement to collect optimizer statistics. These clauses have been deprecated. Instead, use DBMS_STATS
.
Statistics gathered with the DBMS_STATS
package are required for the creation of accurate execution plans. For example, table statistics gathered by DBMS_STATS
include the number of rows, number of blocks, and average row length.
By default, Oracle Database uses automatic optimizer statistics collection. In this case, the database automatically runs DBMS_STATS
to collect optimizer statistics for all schema objects for which statistics are missing or stale. The process eliminates many manual tasks associated with managing the optimizer, and significantly reduces the risks of generating suboptimal execution plans because of missing or stale statistics. You can also update and manage optimizer statistics by manually executing DBMS_STATS
.
See Also:
Oracle Database Administrator's Guide to learn more about automated maintenance tasks
Oracle Database PL/SQL Packages and Types Reference to learn about DBMS_STATS
By default, when optimizer statistics are missing, stale, or insufficient, the database automatically gathers dynamic statistics during a parse. The database uses recursive SQL to scan a small random sample of table blocks.
Note:
Dynamic statistics augment statistics rather than providing an alternative to them.
Dynamic statistics can supplement statistics such as table and index block counts, table and join cardinalities (estimated number of rows), join column statistics, and GROUP BY
statistics. This information helps the optimizer improve plans by making better estimates for predicate selectivity.
Dynamic statistics are beneficial in the following situations:
An execution plan is suboptimal because of complex predicates.
The sampling time is a small fraction of total execution time for the query.
The query is executed many times so that the sampling time is amortized.
Starting in Oracle Database 12c, the database can gather table statistics automatically during the following types of bulk load operations:
CREATE TABLE AS SELECT
INSERT INTO ... SELECT
into an empty table using a direct path insert
Note:
By default, a parallel insert uses a direct path insert. You can force a direct path insert by using the /*+APPEND */
hint.
This section contains the following topics:
See Also:
Oracle Database Data Warehousing Guide to learn more about bulk loads
Data warehouses typically load large amounts of data into the database. For example, a sales data warehouse might load sales data nightly.
In releases earlier than Oracle Database 12c, to avoid the possibility of a suboptimal plan caused by stale statistics, you needed to gather statistics manually after a bulk load. The ability to gather statistics automatically during bulk loads has the following benefits:
Improved performance
Gathering statistics during the load avoids an additional table scan to gather table statistics.
Improved manageability
No user intervention is required to gather statistics after a bulk load.
When inserting rows into an empty partitioned table, the database gathers global statistics during the insert. For example, if you run INSERT INTO sales SELECT
, and if sales
is an empty partitioned table, then the database gathers global statistics for sales
, but does not gather partition-level statistics.
If you insert rows into a empty partitioned table using extended syntax, and if the specified partition or subpartition is empty, then the database gathers the statistics on the specified partition or subpartition during the insert. No global level statistics are gathered. For example, if you run INSERT INTO sales PARTITION (sales_q4_2000) SELECT
, and if partition sales_q4_2000
is empty before the insert (other partitions need not be empty), then the database gathers statistics during the insert. Moreover, if the INCREMENTAL
preference is enabled for sales
, then the database also gathers synopses for sales_q4_2000
. Statistics are immediately available after the INSERT
statement. However, if you roll back the transaction, then the database automatically deletes statistics gathered during the bulk load.
See Also:
While gathering online statistics, the database does not gather index statistics or create histograms. If these statistics are required, then Oracle recommends running DBMS_STATS.GATHER_TABLE_STATS
with the options
parameter set to GATHER AUTO
after the bulk load. For example, the following command gathers statistics for the bulk-loaded sh_ctas
table:
EXEC DBMS_STATS.GATHER_TABLE_STATS( user, 'SH_CTAS', options => 'GATHER AUTO' );
The preceding example only gathers missing or stale statistics. The database does not gather table and basic column statistics collected during the bulk load.
Note:
You can set the table preference options
to GATHER AUTO
on the tables that you plan to bulk load. In this way, you need not explicitly set the options
parameter when running GATHER_TABLE_STATS
.
See Also:
Currently, statistics gathering does not occur automatically for bulk loads when any of the following conditions apply to the target table, partition, or subpartition:
It is not empty, and you perform an INSERT INTO ... SELECT
.
In this case, an OPTIMIZER STATISTICS GATHERING
row source appears in the plan, but this row source is only a pass-through. The database does not actually gather optimizer statistics.
Note:
The DBA_TAB_COL_STATISTICS.NOTES
column is set to STATS_ON_LOAD
by a bulk load into an empty table. However, subsequent bulk loads into the non-empty table do not reset the NOTES
column. One method for determining whether the database gathered statistics is to execute DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO
, and then query USER_TAB_MODIFICATIONS.INSERTS
. If the query returns a row indicating the number of rows loaded, then the statistics were not gathered automatically during the most recent bulk load.
It is in an Oracle-owned schema such as SYS
.
It is a nested table.
It is an index-organized table (IOT).
It is an external table.
It is a global temporary table defined as ON COMMIT DELETE ROWS
.
It has virtual columns.
It has a PUBLISH
preference set to FALSE
.
/? Bug 16922369 ?/
Its statistics are locked.
It is partitioned, INCREMENTAL
is set to true
, and extended syntax is not used.
For example, assume that you execute DBMS_STATS.SET_TABLE_PREFS(null, 'sales', incremental', 'true')
. In this case, the database does not gather statistics for INSERT INTO sales SELECT
, even when sales
is empty. However, the database does gather statistics automatically for INSERT INTO sales PARTITION
(sales_q4_2000) SELECT
.
It is loaded using a multitable insert statement.
See Also:
By default, the database gathers statistics during bulk loads. You can disable the feature at the statement level by using the NO_GATHER_OPTIMIZER_STATISTICS
hint, and enable the feature at the statement level by using the GATHER_OPTIMIZER_STATISTICS
hint. For example, the following statement disables online statistics gathering for bulk loads:
CREATE TABLE employees2 AS SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS */* FROM employees
See Also:
Oracle Database SQL Language Reference to learn about the GATHER_OPTIMIZER_STATISTICS
and NO_GATHER_OPTIMIZER_STATISTICS
hints
The database collects optimizer statistics at various times and from various sources.
This section contains the following topics:
The database uses the following sources:
DBMS_STATS
execution, automatic or manual
This PL/SQL package is the primary means of gathering optimizer statistics. See Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.GATHER_TABLE_STATS
procedure.
SQL compilation
During SQL compilation, the database can augment the statistics previously gathered by DBMS_STATS
. In this stage, the database runs additional queries to obtain more accurate information on how many rows in the tables satisfy the WHERE
clause predicates in the SQL statement (see "When the Database Samples Data").
SQL execution
During execution, the database can further augment previously gathered statistics. In this stage, Oracle Database collects the number of rows produced by every row source during the execution of a SQL statement. At the end of execution, the optimizer determines whether the estimated number of rows is inaccurate enough to warrant reparsing at the next statement execution. If the cursor is marked for reparsing, then the optimizer uses actual row counts from the previous execution instead of estimates.
SQL profiles
A SQL profile is a collection of auxiliary statistics on a query. The profile stores these supplemental statistics in the data dictionary. The optimizer uses SQL profiles during optimization to determine the most optimal plan (see "About SQL Profiles").
The database stores optimizer statistics in the data dictionary and updates or replaces them as needed. You can query statistics in data dictionary views.
A SQL plan directive is additional information and instructions that the optimizer can use to generate a more optimal plan. For example, a SQL plan directive can instruct the optimizer to record a missing extension.
During SQL execution, if a cardinality misestimate occurs, then the database creates SQL plan directives. During SQL compilation, the optimizer examines the query corresponding to the directive to determine whether missing extensions or histograms exist (see "Managing Extended Statistics"). The optimizer records any missing extensions. Subsequent DBMS_STATS
calls collect statistics for the extensions.
The optimizer uses dynamic statistics whenever it does not have sufficient statistics corresponding to the directive. For example, the optimizer gathers dynamic statistics until the creation of column group statistics, and also after this point when misestimates occur. Currently, the optimizer monitors only column groups. The optimizer does not create an extension on expressions.
SQL plan directives are not tied to a specific SQL statement or SQL ID. The optimizer can use directives for statements that are nearly identical because directives are defined on a query expression. For example, directives can help the optimizer with queries that use similar patterns, such as queries that are identical except for a select list item.
The database automatically manages SQL plan directives. The database initially creates directives in the shared pool. The database periodically writes the directives to the SYSAUX
tablespace. You can manage directives with the APIs available in the DBMS_SPD
package.
See Also:
This example shows how the database automatically creates and uses SQL plan directives for SQL statements.
Assumptions
You plan to run queries against the sh
schema, and you have privileges on this schema and on data dictionary and V$
views.
To see how the database uses a SQL plan directive:
Query the sh.customers
table.
SELECT /*+gather_plan_statistics*/ * FROM customers WHERE cust_state_province='CA' AND country_id='US';
The gather_plan_statistics
hint shows the actual number of rows returned from each operation in the plan. Thus, you can compare the optimizer estimates with the actual number of rows returned.
Query the plan for the preceding query.
The following example shows the execution plan (sample output included):
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID b74nw722wjvy3, child number 0 ------------------------------------- select /*+gather_plan_statistics*/ * from customers where CUST_STATE_PROVINCE='CA' and country_id='US' Plan hash value: 1683234692 -------------------------------------------------------------------------------------------- | Id| Operation | Name | Starts |E-Rows| A-Rows | A-Time |Buffers | Reads | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 29 |00:00:00.01 | 17 | 14 | |*1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 8 | 29 |00:00:00.01 | 17 | 14 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US'))
The actual number of rows (A-Rows
) returned by each operation in the plan varies greatly from the estimates (E-Rows
). This statement is a candidate for automatic reoptimization (see "Automatic Reoptimization").
Check whether the customers
query can be reoptimized.
The following statement queries the V$SQL.IS_REOPTIMIZABLE
value (sample output included):
SELECT SQL_ID, CHILD_NUMBER, SQL_TEXT, IS_REOPTIMIZABLE FROM V$SQL WHERE SQL_TEXT LIKE 'SELECT /*+gather_plan_statistics*/%'; SQL_ID CHILD_NUMBER SQL_TEXT I ------------- ------------ ----------- - b74nw722wjvy3 0 select /*+g Y ather_plan_ statistics* / * from cu stomers whe re CUST_STA TE_PROVINCE ='CA' and c ountry_id=' US'
The IS_REOPTIMIZABLE
column is marked Y
, so the database will perform a hard parse of the customers
query on the next execution. The optimizer uses the execution statistics from this initial execution to determine the plan. The database persists the information learned from reoptimization as a SQL plan directive.
Display the directives for the sh
schema.
The following example uses DBMS_SPD
to write the SQL plan directives to disk, and then shows the directives for the sh
schema only:
EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE; SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER AS "OWN", o.OBJECT_NAME AS "OBJECT", o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON FROM DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o WHERE d.DIRECTIVE_ID=o.DIRECTIVE_ID AND o.OWNER IN ('SH') ORDER BY 1,2,3,4,5; DIR_ID OWN OBJECT COL_NAME OBJECT TYPE STATE REASON ------------------- --- --------- ----------- ------ ---------------- ------ ------------ 1484026771529551585 SH CUSTOMERS COUNTRY_ID COLUMN DYNAMIC_SAMPLING USABLE SINGLE TABLE CARDINALITY MISESTIMATE 1484026771529551585 SH CUSTOMERS CUST_STATE_ COLUMN DYNAMIC_SAMPLING USABLE SINGLE TABLE PROVINCE CARDINALITY MISESTIMATE 1484026771529551585 SH CUSTOMERS TABLE DYNAMIC_SAMPLING USABLE SINGLE TABLE CARDINALITY MISESTIMATE
Initially, the database stores SQL plan directives in memory, and then writes them to disk every 15 minutes. Thus, the preceding example calls DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE
to force the database to write the directives to the SYSAUX
tablespace.
Monitor directives using the views DBA_SQL_PLAN_DIRECTIVES
and DBA_SQL_PLAN_DIR_OBJECTS
. Three entries appear in the views, one for the customers
table itself, and one for each of the correlated columns. Because the customers
query has the IS_REOPTIMIZABLE
value of Y
, if you reexecute the statement, then the database will hard parse it again, and then generate a plan based on the previous execution statistics.
Query the customers
table again.
For example, enter the following statement:
SELECT /*+gather_plan_statistics*/ * FROM customers WHERE cust_state_province='CA' AND country_id='US';
Query the plan in the cursor.
The following example shows the execution plan (sample output included):
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID b74nw722wjvy3, child number 1
-------------------------------------
select /*+gather_plan_statistics*/ * from customers where
CUST_STATE_PROVINCE='CA' and country_id='US'
Plan hash value: 1683234692
---------------------------------------------------------------------------
|Id | Operation |Name |Start|E-Rows|A-Rows| A-Time |Buffers|
---------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1| | 29|00:00:00.01| 17|
|* 1| TABLE ACCESS FULL|CUSTOMERS| 1| 29| 29|00:00:00.01| 17|
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US'))
Note
-----
- cardinality feedback used for this statement
The Note
section indicates that the database used reoptimization for this statement. The estimated number of rows (E-Rows
) is now correct. The SQL plan directive has not been used yet.
Query the cursors for the customers
query.
For example, run the following query (sample output included):
SELECT SQL_ID, CHILD_NUMBER, SQL_TEXT, IS_REOPTIMIZABLE FROM V$SQL WHERE SQL_TEXT LIKE 'SELECT /*+gather_plan_statistics*/%'; SQL_ID CHILD_NUMBER SQL_TEXT I ------------- ------------ ----------- - b74nw722wjvy3 0 select /*+g Y ather_plan_ statistics* / * from cu stomers whe re CUST_STA TE_PROVINCE ='CA' and c ountry_id=' US' b74nw722wjvy3 1 select /*+g N ather_plan_ statistics* / * from cu stomers whe re CUST_STA TE_PROVINCE ='CA' and c ountry_id=' US'
A new plan exists for the customers
query, and also a new child cursor.
Confirm that a SQL plan directive exists and is usable for other statements.
For example, run the following query, which is similar but not identical to the original customers
query (the state is MA
instead of CA
):
SELECT /*+gather_plan_statistics*/ CUST_EMAIL FROM CUSTOMERS WHERE CUST_STATE_PROVINCE='MA' AND COUNTRY_ID='US';
Query the plan in the cursor.
The following statement queries the cursor (sample output included).:
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID 3tk6hj3nkcs2u, child number 0 ------------------------------------- Select /*+gather_plan_statistics*/ cust_email From customers Where cust_state_province='MA' And country_id='US' Plan hash value: 1683234692 --------------------------------------------------------------------------- |Id | Operation | Name |Starts|E-Rows|A-Rows| A-Time |Buffers| --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01| 16 | |*1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 2 | 2 |00:00:00.01| 16 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("CUST_STATE_PROVINCE"='MA' AND "COUNTRY_ID"='US')) Note ----- - dynamic sampling used for this statement (level=2) - 1 Sql Plan Directive used for this statement
The Note
section of the plan shows that the optimizer used the SQL directive for this statement, and also used dynamic statistics.
See Also:
Oracle Database Reference to learn about DBA_SQL_PLAN_DIRECTIVES
, V$SQL
, and other database views
Oracle Database Reference to learn about DBMS_SPD
This example is a continuation of "How the Optimizer Uses SQL Plan Directives: Example". The example shows how the database uses a SQL plan directive until the optimizer verifies that an extension exists and the statistics are applicable. At this point, the directive changes its status to SUPERSEDED
. Subsequent compilations use the statistics instead of the directive.
Assumptions
This example assumes you have already followed the steps in "How the Optimizer Uses SQL Plan Directives: Example".
To see how the optimizer uses an extension and SQL plan directive:
Gather statistics for the sh.customers
table.
For example, execute the following PL/SQL program:
BEGIN DBMS_STATS.GATHER_TABLE_STATS('SH','CUSTOMERS'); END; /
Check whether an extension exists on the customers
table.
For example, execute the following query (sample output included):
SELECT TABLE_NAME, EXTENSION_NAME, EXTENSION FROM DBA_STAT_EXTENSIONS WHERE OWNER='SH' AND TABLE_NAME='CUSTOMERS'; TABLE_NAM EXTENSION_NAME EXTENSION --------- ------------------------------ ----------------------- CUSTOMERS SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ ("CUST_STATE_PROVINCE", "COUNTRY_ID")
The preceding output indicates that a column group extension exists on the cust_state_province
and country_id
columns.
Query the state of the SQL plan directive.
Example 10-6 queries the data dictionary for information about the directive.
Although column group statistics exist, the directive has a state of USABLE
because the database has not yet recompiled the statement. During the next compilation, the optimizer verifies that the statistics are applicable. If they are applicable, then the status of the directive changes to SUPERSEDED
. Subsequent compilations use the statistics instead of the directive.
Query the sh.customers
table.
SELECT /*+gather_plan_statistics*/ * FROM customers WHERE cust_state_province='CA' AND country_id='US';
Query the plan in the cursor.
Example 10-7 shows the execution plan (sample output included).
The Note
section shows that the optimizer used the directive and not the extended statistics. During the compilation, the database verified the extended statistics.
Query the state of the SQL plan directive.
Example 10-8 queries the data dictionary for information about the directive.
The state of the directive, which has changed to SUPERSEDED
, indicates that the corresponding column or groups have an extension or histogram, or that another SQL plan directive exists that can be used for the directive.
Query the sh.customers
table again, using a slightly different form of the statement.
For example, run the following query:
SELECT /*+gather_plan_statistics*/ /* force reparse */ * FROM customers WHERE cust_state_province='CA' AND country_id='US';
If the cursor is in the shared SQL area, then the database typically shares the cursor. To force a reparse, this step changes the SQL text slightly by adding a comment.
Query the plan in the cursor.
Example 10-9 shows the execution plan (sample output included).
The absence of a Note
shows that the optimizer used the extended statistics instead of the SQL plan directive. If the directive is not used for 53 weeks, then the database automatically purges it.
See Also:
Oracle Database Reference to learn about DBA_SQL_PLAN_DIRECTIVES
, V$SQL
, and other database views
Oracle Database Reference to learn about DBMS_SPD
Example 10-6 Display Directives for sh Schema
EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE; SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME, o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON FROM DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o WHERE d.DIRECTIVE_ID=o.DIRECTIVE_ID AND o.OWNER IN ('SH') ORDER BY 1,2,3,4,5; DIR_ID OWN OBJECT_NA COL_NAME OBJECT TYPE STATE REASON ------------------- --- --------- ---------- ------- ---------------- ------ ------------ 1484026771529551585 SH CUSTOMERS COUNTRY_ID COLUMN DYNAMIC_SAMPLING USABLE SINGLE TABLE CARDINALITY MISESTIMATE 1484026771529551585 SH CUSTOMERS CUST_STATE_ COLUMN DYNAMIC_SAMPLING USABLE SINGLE TABLE PROVINCE CARDINALITY MISESTIMATE 1484026771529551585 SH CUSTOMERS TABLE DYNAMIC_SAMPLING USABLE SINGLE TABLE CARDINALITY MISESTIMATE
Example 10-7 Execution Plan
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID b74nw722wjvy3, child number 0
-------------------------------------
select /*+gather_plan_statistics*/ * from customers where
CUST_STATE_PROVINCE='CA' and country_id='US'
Plan hash value: 1683234692
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 29 |00:00:00.01 | 16 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 29 | 29 |00:00:00.01 | 16 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US'))
Note
-----
- dynamic sampling used for this statement (level=2)
- 1 Sql Plan Directive used for this statement
Example 10-8 Display Directives for sh Schema
EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE; SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME, o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON FROM DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o WHERE d.DIRECTIVE_ID=o.DIRECTIVE_ID AND o.OWNER IN ('SH') ORDER BY 1,2,3,4,5; DIR_ID OWN OBJECT_NA COL_NAME OBJECT TYPE STATE REASON ------------------- --- --------- ---------- ------ -------- --------- ------------ 1484026771529551585 SH CUSTOMERS COUNTRY_ID COLUMN DYNAMIC_ SUPERSEDED SINGLE TABLE SAMPLING CARDINALITY MISESTIMATE 1484026771529551585 SH CUSTOMERS CUST_STATE_ COLUMN DYNAMIC_ SUPERSEDED SINGLE TABLE PROVINCE SAMPLING CARDINALITY MISESTIMATE 1484026771529551585 SH CUSTOMERS TABLE DYNAMIC_ SUPERSEDED SINGLE TABLE SAMPLING CARDINALITY MISESTIMATE
Example 10-9 Execution Plan
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID b74nw722wjvy3, child number 0 ------------------------------------- select /*+gather_plan_statistics*/ * from customers where CUST_STATE_PROVINCE='CA' and country_id='US' Plan hash value: 1683234692 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 29 |00:00:00.01 | 17 | |* 1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 29 | 29 |00:00:00.01 | 17 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US')) 19 rows selected.
In releases earlier than Oracle Database 12c, the database always gathered dynamic statistics (formerly called dynamic sampling) during optimization, and only when a table in the query had no statistics. Starting in Oracle Database 12c, the optimizer automatically decides whether dynamic statistics are useful and which statistics level to use for all SQL statements.
The primary factor in the decision to use dynamic statistics is whether available statistics are sufficient to generate an optimal plan. If statistics are insufficient, then the optimizer uses dynamic statistics.
Automatic dynamic statistics are enabled when any of the following conditions is true:
The OPTIMIZER_DYNAMIC_SAMPLING
initialization parameter uses its default value, which means that it is not explicitly set.
The dynamic statistics level is set to 11
either through the OPTIMIZER_DYNAMIC_SAMPLING
initialization parameter or a SQL hint (see "Controlling Dynamic Statistics").
In general, the optimizer uses default statistics rather than dynamic statistics to compute statistics needed during optimizations on tables, indexes, and columns. The optimizer decides whether to use dynamic statistics based on several factors. For example, the database uses automatic dynamic statistics in the following situations:
The SQL statement uses parallel execution.
A SQL plan directive exists.
The SQL statement is known to the database, which means that it was captured in SQL Plan Management or Automatic Workload Repository, or is currently in the shared SQL area.
Figure 10-2 illustrates the process of gathering dynamic statistics.
As shown in Figure 10-2, the optimizer automatically gathers dynamic statistics in the following cases:
Missing statistics
When tables in a query have no statistics, the optimizer gathers basic statistics on these tables before optimization. Statistics can be missing because the application creates new objects without a follow-up call to DBMS_STATS
to gather statistics, or because statistics were locked on an object before statistics were gathered.
In this case, the statistics are not as high-quality or as complete as the statistics gathered using the DBMS_STATS
package. This trade-off is made to limit the impact on the compile time of the statement.
Stale statistics
Statistics gathered by DBMS_STATS
can become out-of-date. Typically, statistics are stale when 10% or more of the rows in the table have changed since the last time statistics were gathered.
For an example of the problem posed by stale statistics, consider a sales
table that includes the sales date. After an application inserts new rows, the maximum statistics on the sales date column becomes stale because new rows have a higher sales date than the maximum value seen during the last statistics gathering. For any query that fetches the most recently added sales data, the optimizer assumes that table access will return very few or no rows, which leads to the selection of a suboptimal access path to the sales table (for example, the index on the sales date column), a suboptimal join method (typically a cartesian product), or an inefficient join order. This is commonly known as the out-of-range condition: the value specified in the predicate on the sales date column is outside the column statistics value domain.
Insufficient statistics
Statistics can be insufficient whenever the optimizer estimates the selectivity of predicates (filter or join) or the GROUP BY
clause without taking into account correlation between columns, skew in the column data distribution, statistics on expressions, and so on.
Extended statistics help the optimizer obtain accurate quality cardinality estimates for complex predicate expressions (see "About Statistics on Column Groups"). The optimizer can use dynamic statistics to compensate for the lack of extended statistics or when it cannot use extended statistics, for example, for non-equality predicates.
Note:
The database does not use dynamic statistics for queries that contain the AS OF
clause.
See Also:
Oracle Database Reference to learn about the OPTIMIZER_DYNAMIC_SAMPLING
initialization parameter
At the beginning of optimization, when deciding whether a table is a candidate for dynamic statistics, the optimizer checks for the existence of persistent SQL plan directives on the table (see Figure 10-2). For each directive, the optimizer registers a statistics expression that the optimizer computes when it must determine the selectivity of a predicate involving the table.
When sampling is necessary, the database must determine the sample size (see Figure 10-2). Starting in Oracle Database 12c, if the OPTIMIZER_DYNAMIC_SAMPLING
initialization parameter is not explicitly set to a value other than 11
, then the optimizer automatically decides whether to use dynamic statistics and which level to use.
In Figure 10-2, the database issues a recursive SQL statement to scan a small random sample of the table blocks. The database applies the relevant single-table predicates and joins to estimate predicate selectivities.
The database persists the results of dynamic statistics as sharable statistics. The database can share the results during the SQL compilation of one query with recompilations of the same query. The database can also reuse the results for queries that have the same patterns. If no rows have been inserted, deleted, or updated in the table being sampled, then the use of dynamic statistics is repeatable.
See Also:
"Controlling Dynamic Statistics" to learn how to set the dynamic statistics level
Oracle Database Reference for details about the OPTIMIZER_DYNAMIC_SAMPLING
initialization parameter