Using the AWR History Tables to compare performance – Part 1
- Posted by Gavin Soorma
- On April 1, 2010
- 1 Comments
Using the DBA_HIST AWR history tables, we can compare the top wait events between different days or different time periods.
Let us assume that we find that batch jobs which are executed at night particularly between 1 and 2 AM are experiencing performance issues.
In this case we are comparing performance of a particular database on the 20th and 21st of July for the time periods 01:00 to 02:00.
SQL>
1 select snap_id,to_char(BEGIN_INTERVAL_TIME,’DD-MON-YY HH24:MI:SS’) “Runtime”
2 from dba_hist_snapshot
3* where trunc(BEGIN_INTERVAL_TIME)=’21-JUL-2009′ order by snap_id;
SNAP_ID Runtime
———- ——————
10781 21-JUL-09 00:00:14
10782 21-JUL-09 01:01:01
10783 21-JUL-09 02:00:27
10784 21-JUL-09 03:00:42
……..
Since the snapshots are collected every hour, for the same time period on the previous day we substract 24 – so the snap_ids for the 20th of July are 10758 and 10759.
select * from
(select event, waits “Waits”, time “Wait Time (s)”, pct*100 “Percent of Total”, waitclass “Wait Class”
from (select e.event_name event, e.total_waits – nvl(b.total_waits,0) waits,
(e.time_waited_micro – nvl(b.time_waited_micro,0))/1000000 time
, (e.time_waited_micro – nvl(b.time_waited_micro,0))/(select sum(e1.time_waited_micro – nvl(b1.time_waited_micro,0))
from dba_hist_system_event b1 , dba_hist_system_event e1
where b1.snap_id(+) = b.snap_id and e1.snap_id = e.snap_id and b1.dbid(+) = b.dbid
and e1.dbid = e.dbid and b1.instance_number(+) = b.instance_number
and e1.instance_number = e.instance_number
and b1.event_id(+) = e1.event_id
and e1.total_waits > nvl(b1.total_waits,0)
and e1.wait_class <> ‘Idle’
) pct
, e.wait_class waitclass
from
dba_hist_system_event b ,
dba_hist_system_event e
where b.snap_id(+) = &pBgnSnap
and e.snap_id = &pEndSnap
and b.event_id(+) = e.event_id
and e.total_waits > nvl(b.total_waits,0)
and e.wait_class <> ‘Idle’
order by waits desc
)
where rownum < 11)
;
Note: we are filtering the rows to display only the Top 10 Wait Events
SnapID’s 10758 and 10759 (20th July)
EVENT Waits Wait Time (s) Percent of Total Wait Class ---------------------------------------- ---------- ------------- ---------------- -------------------- control file sequential read 4803994 79.589894 2.4698686 System I/O PX qref latch 1451898 3.587572 .111331112 Other db file sequential read 57463 1914.86419 59.4229078 User I/O SQL*Net message to client 52040 .077309 .002399087 Network db file scattered read 9169 223.497921 6.9356858 User I/O log file parallel write 6692 371.933451 11.5420025 System I/O SQL*Net break/reset to client 6610 3.75458 .116513778 Application log file sync 3175 211.309884 6.55746127 Commit direct path read 2077 1.349465 .041877191 User I/O control file parallel write 1857 112.049356 3.4771649 System I/O 10 rows selected.
SnapId’s 10782 and 10783 (21st July)
EVENT Waits Wait Time (s) Percent of Total Wait Class ---------------------------------------- ---------- ------------- ---------------- -------------------- control file sequential read 6006299 101.375645 .512584025 System I/O SQL*Net message to client 1045461 1.16951 .005913374 Network db file sequential read 312287 7147.10849 36.1378085 User I/O log file sync 86597 8044.46297 40.6750873 Commit log file parallel write 62614 3257.61227 16.4714119 System I/O SQL*Net more data from client 25708 6.923053 .03500492 Network SQL*Net break/reset to client 24824 361.693308 1.82882399 Application db file scattered read 18289 178.619015 .903148421 User I/O SQL*Net more data to client 14721 .531947 .002689675 Network db file parallel write 3128 310.442873 1.56968725 System I/O 10 rows selected.
Looking at this output for the same time period on two days, we find that on the 21st of July the top wait events seem to be all I/O related and if we see the wait event “log file sync” on the second day is significantly higher than the first day. The wait event “db file sequential read” is also significantly higher on the second day as well as compared to the first day.
We can use this information to quickly triage the problem and make the following checks:
Has any thing changed on the storage front especially where the redo log or archive log files are located?
Has there been any new indexes created or modified in some way?
Have any changes been made to the init.ora parameters especially those related to the CBO?
Has the volume of data which is being processed changed significantly between the two days?
1 Comments