Saturday 5 December 2009

Measurement Error Trap In Trace File (event 10046)


Some time ago I had an interesting case which I can use to clearly describe how one can be caught in measurement error trap.

But let us start at the beginning with this response time analysis:


Response Time Component Time % Elap AvgEla
---------------------------------------- ----------- ------- ---------
CPU service 3934.97s 48.39% 0.000716
un-accounted for time 1363.01s 16.76%
db file sequential read 1122.00s 13.80% 0.032253
gc buffer busy 451.73s 5.56% 0.011746
log buffer space 451.64s 5.55% 0.123974
buffer busy waits 176.79s 2.17% 0.029579
gc cr block 2-way 156.49s 1.92% 0.003287
gc cr grant 2-way 100.20s 1.23% 0.006541
latch: cache buffers chains 98.92s 1.22% 0.005708
gc current grant 2-way 69.68s 0.86% 0.006728
latch: library cache 30.10s 0.37% 0.010030
row cache lock 28.95s 0.36% 0.018727
gc current block 2-way 26.72s 0.33% 0.003828
gc cr block busy 19.35s 0.24% 0.006802
gc current grant busy 15.30s 0.19% 0.004999
latch: row cache objects 14.28s 0.18% 0.006165
gc cr block 3-way 11.73s 0.14% 0.002952
gc current block 3-way 11.34s 0.14% 0.003440
log file sync 10.71s 0.13% 0.315066
enq: SQ - contention 9.14s 0.11% 0.060911

My first thought was that there is an I/O problem as the average single block I/O took 32 milliseconds.









After digging for a while I have produced the following graphs which represent the I/O timing.











Let's first look at some facts:
- 3 node RAC
- Same storage
- Single block read time for Instance 1 was substantially different from read times for other instances during off hours
- 5 batch jobs during off hours
- The timings for Instance 1 are obviously not correct

I used the fact that I was performing the analysis on a 3 node RAC to check what are the timings on other two nodes and as you can see from the second graph, they were quite different.

Here is now the explanation:

The output from strace showed:


gettimeofday({1159440978, 931945}, NULL) = 0
pread(14, "\6\242\0\0\375\23\0\2+\254.\0\0\0\1\6\0054\0\0\1\0\5\0"..., 8192, 455 057408) = 8192
gettimeofday({1159440978, 944159}, NULL) = 0



Oracle records time just before performing a system call (pread). When the system call completes Oracle again records the current time and the difference reports as a wait time. Unfortunately due to high CPU load the process was for quite a while waiting at the system level to get on CPU and only then was able to read the current time and therefore the reported elapsed time was quite exaggerated.


Conclusion: Waiting in runque for CPU exaggerates all wait times of the process.

It is always a good practice to confirm findings with a different method. In this case I could use also the operating system tools to measure I/O timings. The same situation one can have on any other kind of wait event but there is not always a possibility to check it independently.

Finally, there was not really an I/O problem but the system was quite CPU bound. Of course the average single block I/O time over 5 - 10 ms shows that probably we are experiencing also I/O bottleneck. The customer later on replaced disk storage with a faster one together with the HW used for RAC.

2 comments:

  1. "wainting in runque for CPU" is also called CPU starving.

    ReplyDelete
  2. Bernard,
    Thanks for the nice suggestion. Right now I'm trying to loose additional 10 kg and I really know what means starving :-)

    ReplyDelete