News
Perth, Australia
+ (61) 417713124
prosolutions@gavinsoorma.com

11g Pending and Published Statistics

  • Posted by Gavin Soorma
  • On September 18, 2009
  • 0 Comments

Prior to Oracle 11g, the default behaviour was to make statistics available for the optimizer to use as soon as they were gathered.

While statistics are required to enable the optimizer to generate optimal execution plans, sometimes just by gathering fresh statistics tried and trusted execution plans can abruptly change and thereby adversely affect application performance.

In Oracle 11g we can now ‘defer’ the publication of statistics until they have been tried and tested and once we have confirmed that the execution plans are correct and optimal. Statistics remain in the pending state until they are published and the parameter optimizer_use_pending_statistics (default value FALSE) which when set at the session level will enables us to test the pending statistics independently of other database sessions.

Let us look at a test case using the SALES table in the SH schema.

If we see the query below, it appears that the data for the column CHANNEL_ID is skewed where majority of the rows have the value 3 and a very small majority have the value 9.

SQL> select channel_id,count(*) from sales
2 group by channel_id order by 2;

CHANNEL_ID COUNT(*)
———- ———-
9 2074
4 118416
2 258025
3 540328

But both the queries below are performing a full table scan of the SALES table when ideally it should be performing an index scan when the value 9 is used in the WHERE clause considering it accounts for a very small proportion of the rows in the SALES table.

SELECT
S.cust_id
,S.prod_id
,SUM(S.amount_sold)
FROM sh.sales S
WHERE
channel_id=9
GROUP BY S.cust_id, S.prod_id
ORDER BY S.cust_id, S.prod_id
/

SELECT
S.cust_id
,S.prod_id
,SUM(S.amount_sold)
FROM sh.sales S
WHERE
channel_id=3
GROUP BY S.cust_id, S.prod_id
ORDER BY S.cust_id, S.prod_id
/

As earlier mentioned, statistics for a table are published immediately by default, so we use the DBMS_STATS package to change this default behaviour.

SQL> exec dbms_stats.set_table_prefs('SH','SALES','PUBLISH','FALSE');

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_prefs('PUBLISH', 'SH', 'SALES' ) FROM DUAL;

DBMS_STATS.GET_PREFS('PUBLISH','SH','SALES')
----------------------------------------------------------------------------------------------
FALSE

Since we observed that the data in the table is skewed, we will gather histograms which may help the optimizer make more informed decisions when generating an execution plan.

Note that since we have disabled the publishing of statistics for the table SALES, even though we have analyzed the table, the LAST_ANALYZED column shows that the table has not been recently analyzed.

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','SALES',METHOD_OPT => 'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.


SQL> SELECT LAST_ANALYZED FROM DBA_TABLES WHERE TABLE_NAME='SALES';

LAST_ANAL
---------
18-AUG-2009

However, we we query the DBA_TAB_PENDING_STATS view, it shows that the SALES table which is partitioned has been analyzed today.

SQL> SELECT TABLE_NAME,PARTITION_NAME ,LAST_ANALYZED  FROM DBA_TAB_PENDING_STATS;

TABLE_NAME                     PARTITION_NAME                 LAST_ANAL
------------------------------ ------------------------------ ---------
SALES                                                         17-SEP-09
SALES                          SALES_Q3_1998                  17-SEP-09
SALES                          SALES_Q3_1999                  17-SEP-09
SALES                          SALES_Q3_2000                  17-SEP-09
SALES                          SALES_Q3_2001                  17-SEP-09
SALES                          SALES_Q3_2002                  17-SEP-09
SALES                          SALES_Q3_2003                  17-SEP-09
SALES                          SALES_Q4_1998                  17-SEP-09
SALES                          SALES_Q4_1999                  17-SEP-09
SALES                          SALES_Q4_2000                  17-SEP-09
SALES                          SALES_Q4_2001                  17-SEP-09
SALES                          SALES_Q4_2002                  17-SEP-09
SALES                          SALES_Q4_2003                  17-SEP-09
SALES                          SALES_1995                     17-SEP-09
SALES                          SALES_1996                     17-SEP-09
SALES                          SALES_H1_1997                  17-SEP-09
SALES                          SALES_H2_1997                  17-SEP-09
SALES                          SALES_Q1_1998                  17-SEP-09
SALES                          SALES_Q1_1999                  17-SEP-09
SALES                          SALES_Q1_2000                  17-SEP-09
SALES                          SALES_Q1_2001                  17-SEP-09
SALES                          SALES_Q1_2002                  17-SEP-09
SALES                          SALES_Q1_2003                  17-SEP-09
SALES                          SALES_Q2_1998                  17-SEP-09
SALES                          SALES_Q2_1999                  17-SEP-09
SALES                          SALES_Q2_2000                  17-SEP-09
SALES                          SALES_Q2_2001                  17-SEP-09
SALES                          SALES_Q2_2002                  17-SEP-09
SALES                          SALES_Q2_2003                  17-SEP-09

We would like to see if the query execution plan has changed or improved after the recent gathering of statistics. We can use the parameter optimizer_use_pending_statistics for this by setting the value to TRUE (default is FALSE)

SQL> alter session set optimizer_use_pending_statistics=TRUE;

Session altered.

We now find that the optimizer is performing an index scan when the value 9 is used as a predicate value for CHANNEL_ID while it performs a full table scan when the value 3 is used which is a much more optimal plan than the original one before gathering fresh statistics.

explain plan
for
SELECT
     S.cust_id
    ,S.prod_id
    ,SUM(S.amount_sold)
  FROM sh.sales S
 WHERE
channel_id=3
 GROUP BY S.cust_id, S.prod_id
 ORDER BY S.cust_id, S.prod_id
/  2    3    4    5    6    7    8    9   10   11   12

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 3701591983

----------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |  4992 | 79872 |  2649   (4)| 00:00:32 |       |       |
|   1 |  SORT GROUP BY       |       |  4992 | 79872 |  2649   (4)| 00:00:32 |       |       |
|   2 |   PARTITION RANGE ALL|       |   541K|  8461K|  2601   (2)| 00:00:32 |     1 |    28 |
|*  3 |    TABLE ACCESS FULL | SALES |   541K|  8461K|  2601   (2)| 00:00:32 |     1 |    28 |
----------------------------------------------------------------------------------------------

SQL> explain plan
for
SELECT
     S.cust_id
    ,S.prod_id
    ,SUM(S.amount_sold)
  FROM sh.sales S
 WHERE
channel_id=9
 GROUP BY S.cust_id, S.prod_id
 ORDER BY S.cust_id, S.prod_id
/
  2    3    4    5    6    7    8    9   10   11   12
Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
Plan hash value: 3360497850

--------------------------------------------------------------------------------------------------------
-

| Id  | Operation                           | Name              | Rows  | Bytes | Cost (%CPU)| Time
|

-------------------------------------------------------------------------------------------------------
-

|   0 | SELECT STATEMENT                    |                   |   913 | 14608 |   315   (1)| 00:00:04
|

|   1 |  SORT GROUP BY                      |                   |   913 | 14608 |   315   (1)| 00:00:04
|

|   2 |   PARTITION RANGE ALL               |                   |  1423 | 22768 |   314   (0)| 00:00:04
|

|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID| SALES             |  1423 | 22768 |   314   (0)| 00:00:04
|

|   4 |     BITMAP CONVERSION TO ROWIDS     |                   |       |       |            |
|

|*  5 |      BITMAP INDEX SINGLE VALUE      | SALES_CHANNEL_BIX |       |       |            |
|

--------------------------------------------------------------------------------------------------------

Since we have now confirmed that the plan is acceptable and the new statistics can be used by the optimizer, we PUBLISH the statistics now for the SALES table. Note that the LAST_ANALYZED column is also updated and the DBA_TAB_PENDING_STATS shows that there are no more pending statistics which need to be published.

SQL> EXEC DBMS_STATS.PUBLISH_PENDING_STATS ('SH','SALES');

PL/SQL procedure successfully completed.

SQL> SELECT LAST_ANALYZED FROM DBA_TABLES WHERE TABLE_NAME='SALES';

LAST_ANAL
---------
17-SEP-09


SELECT COUNT(*) FROM DBA_TAB_PENDING_STATS;

  COUNT(*)
----------
         0

SQL>  alter session set optimizer_use_pending_statistics=FALSE;

Session altered.

 

0 Comments

Leave Reply

Your email address will not be published. Required fields are marked *