11g Pending and Published Statistics
- Posted by Gavin Soorma
- On September 18, 2009
- 0 Comments
Prior to Oracle 11g, the default behaviour was to make statistics available for the optimizer to use as soon as they were gathered.
While statistics are required to enable the optimizer to generate optimal execution plans, sometimes just by gathering fresh statistics tried and trusted execution plans can abruptly change and thereby adversely affect application performance.
In Oracle 11g we can now ‘defer’ the publication of statistics until they have been tried and tested and once we have confirmed that the execution plans are correct and optimal. Statistics remain in the pending state until they are published and the parameter optimizer_use_pending_statistics (default value FALSE) which when set at the session level will enables us to test the pending statistics independently of other database sessions.
Let us look at a test case using the SALES table in the SH schema.
If we see the query below, it appears that the data for the column CHANNEL_ID is skewed where majority of the rows have the value 3 and a very small majority have the value 9.
SQL> select channel_id,count(*) from sales
2 group by channel_id order by 2;
CHANNEL_ID COUNT(*)
———- ———-
9 2074
4 118416
2 258025
3 540328
But both the queries below are performing a full table scan of the SALES table when ideally it should be performing an index scan when the value 9 is used in the WHERE clause considering it accounts for a very small proportion of the rows in the SALES table.
SELECT
S.cust_id
,S.prod_id
,SUM(S.amount_sold)
FROM sh.sales S
WHERE
channel_id=9
GROUP BY S.cust_id, S.prod_id
ORDER BY S.cust_id, S.prod_id
/
SELECT
S.cust_id
,S.prod_id
,SUM(S.amount_sold)
FROM sh.sales S
WHERE
channel_id=3
GROUP BY S.cust_id, S.prod_id
ORDER BY S.cust_id, S.prod_id
/
As earlier mentioned, statistics for a table are published immediately by default, so we use the DBMS_STATS package to change this default behaviour.
SQL> exec dbms_stats.set_table_prefs('SH','SALES','PUBLISH','FALSE'); PL/SQL procedure successfully completed. SQL> select dbms_stats.get_prefs('PUBLISH', 'SH', 'SALES' ) FROM DUAL; DBMS_STATS.GET_PREFS('PUBLISH','SH','SALES') ---------------------------------------------------------------------------------------------- FALSE
Since we observed that the data in the table is skewed, we will gather histograms which may help the optimizer make more informed decisions when generating an execution plan.
Note that since we have disabled the publishing of statistics for the table SALES, even though we have analyzed the table, the LAST_ANALYZED column shows that the table has not been recently analyzed.
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','SALES',METHOD_OPT => 'FOR ALL COLUMNS SIZE AUTO'); PL/SQL procedure successfully completed. SQL> SELECT LAST_ANALYZED FROM DBA_TABLES WHERE TABLE_NAME='SALES'; LAST_ANAL --------- 18-AUG-2009
However, we we query the DBA_TAB_PENDING_STATS view, it shows that the SALES table which is partitioned has been analyzed today.
SQL> SELECT TABLE_NAME,PARTITION_NAME ,LAST_ANALYZED FROM DBA_TAB_PENDING_STATS; TABLE_NAME PARTITION_NAME LAST_ANAL ------------------------------ ------------------------------ --------- SALES 17-SEP-09 SALES SALES_Q3_1998 17-SEP-09 SALES SALES_Q3_1999 17-SEP-09 SALES SALES_Q3_2000 17-SEP-09 SALES SALES_Q3_2001 17-SEP-09 SALES SALES_Q3_2002 17-SEP-09 SALES SALES_Q3_2003 17-SEP-09 SALES SALES_Q4_1998 17-SEP-09 SALES SALES_Q4_1999 17-SEP-09 SALES SALES_Q4_2000 17-SEP-09 SALES SALES_Q4_2001 17-SEP-09 SALES SALES_Q4_2002 17-SEP-09 SALES SALES_Q4_2003 17-SEP-09 SALES SALES_1995 17-SEP-09 SALES SALES_1996 17-SEP-09 SALES SALES_H1_1997 17-SEP-09 SALES SALES_H2_1997 17-SEP-09 SALES SALES_Q1_1998 17-SEP-09 SALES SALES_Q1_1999 17-SEP-09 SALES SALES_Q1_2000 17-SEP-09 SALES SALES_Q1_2001 17-SEP-09 SALES SALES_Q1_2002 17-SEP-09 SALES SALES_Q1_2003 17-SEP-09 SALES SALES_Q2_1998 17-SEP-09 SALES SALES_Q2_1999 17-SEP-09 SALES SALES_Q2_2000 17-SEP-09 SALES SALES_Q2_2001 17-SEP-09 SALES SALES_Q2_2002 17-SEP-09 SALES SALES_Q2_2003 17-SEP-09
We would like to see if the query execution plan has changed or improved after the recent gathering of statistics. We can use the parameter optimizer_use_pending_statistics for this by setting the value to TRUE (default is FALSE)
SQL> alter session set optimizer_use_pending_statistics=TRUE;
Session altered.
We now find that the optimizer is performing an index scan when the value 9 is used as a predicate value for CHANNEL_ID while it performs a full table scan when the value 3 is used which is a much more optimal plan than the original one before gathering fresh statistics.
explain plan for SELECT S.cust_id ,S.prod_id ,SUM(S.amount_sold) FROM sh.sales S WHERE channel_id=3 GROUP BY S.cust_id, S.prod_id ORDER BY S.cust_id, S.prod_id / 2 3 4 5 6 7 8 9 10 11 12 Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------- Plan hash value: 3701591983 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4992 | 79872 | 2649 (4)| 00:00:32 | | | | 1 | SORT GROUP BY | | 4992 | 79872 | 2649 (4)| 00:00:32 | | | | 2 | PARTITION RANGE ALL| | 541K| 8461K| 2601 (2)| 00:00:32 | 1 | 28 | |* 3 | TABLE ACCESS FULL | SALES | 541K| 8461K| 2601 (2)| 00:00:32 | 1 | 28 | ---------------------------------------------------------------------------------------------- SQL> explain plan for SELECT S.cust_id ,S.prod_id ,SUM(S.amount_sold) FROM sh.sales S WHERE channel_id=9 GROUP BY S.cust_id, S.prod_id ORDER BY S.cust_id, S.prod_id / 2 3 4 5 6 7 8 9 10 11 12 Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------- Plan hash value: 3360497850 -------------------------------------------------------------------------------------------------------- - | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------- - | 0 | SELECT STATEMENT | | 913 | 14608 | 315 (1)| 00:00:04 | | 1 | SORT GROUP BY | | 913 | 14608 | 315 (1)| 00:00:04 | | 2 | PARTITION RANGE ALL | | 1423 | 22768 | 314 (0)| 00:00:04 | | 3 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES | 1423 | 22768 | 314 (0)| 00:00:04 | | 4 | BITMAP CONVERSION TO ROWIDS | | | | | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_CHANNEL_BIX | | | | | --------------------------------------------------------------------------------------------------------
Since we have now confirmed that the plan is acceptable and the new statistics can be used by the optimizer, we PUBLISH the statistics now for the SALES table. Note that the LAST_ANALYZED column is also updated and the DBA_TAB_PENDING_STATS shows that there are no more pending statistics which need to be published.
SQL> EXEC DBMS_STATS.PUBLISH_PENDING_STATS ('SH','SALES'); PL/SQL procedure successfully completed. SQL> SELECT LAST_ANALYZED FROM DBA_TABLES WHERE TABLE_NAME='SALES'; LAST_ANAL --------- 17-SEP-09 SELECT COUNT(*) FROM DBA_TAB_PENDING_STATS; COUNT(*) ---------- 0 SQL> alter session set optimizer_use_pending_statistics=FALSE; Session altered.
0 Comments