Saturday, 5 December 2009

Measurement Error Trap In Trace File (event 10046)

Some time ago I had an interesting case which I can use to clearly describe how one can be caught in measurement error trap.

But let us start at the beginning with this response time analysis:

Response Time Component Time % Elap AvgEla
---------------------------------------- ----------- ------- ---------
CPU service 3934.97s 48.39% 0.000716
un-accounted for time 1363.01s 16.76%
db file sequential read 1122.00s 13.80% 0.032253
gc buffer busy 451.73s 5.56% 0.011746
log buffer space 451.64s 5.55% 0.123974
buffer busy waits 176.79s 2.17% 0.029579
gc cr block 2-way 156.49s 1.92% 0.003287
gc cr grant 2-way 100.20s 1.23% 0.006541
latch: cache buffers chains 98.92s 1.22% 0.005708
gc current grant 2-way 69.68s 0.86% 0.006728
latch: library cache 30.10s 0.37% 0.010030
row cache lock 28.95s 0.36% 0.018727
gc current block 2-way 26.72s 0.33% 0.003828
gc cr block busy 19.35s 0.24% 0.006802
gc current grant busy 15.30s 0.19% 0.004999
latch: row cache objects 14.28s 0.18% 0.006165
gc cr block 3-way 11.73s 0.14% 0.002952
gc current block 3-way 11.34s 0.14% 0.003440
log file sync 10.71s 0.13% 0.315066
enq: SQ - contention 9.14s 0.11% 0.060911

My first thought was that there is an I/O problem as the average single block I/O took 32 milliseconds.

After digging for a while I have produced the following graphs which represent the I/O timing.

Let's first look at some facts:
- 3 node RAC
- Same storage
- Single block read time for Instance 1 was substantially different from read times for other instances during off hours
- 5 batch jobs during off hours
- The timings for Instance 1 are obviously not correct

I used the fact that I was performing the analysis on a 3 node RAC to check what are the timings on other two nodes and as you can see from the second graph, they were quite different.

Here is now the explanation:

The output from strace showed:

gettimeofday({1159440978, 931945}, NULL) = 0
pread(14, "\6\242\0\0\375\23\0\2+\254.\0\0\0\1\6\0054\0\0\1\0\5\0"..., 8192, 455 057408) = 8192
gettimeofday({1159440978, 944159}, NULL) = 0

Oracle records time just before performing a system call (pread). When the system call completes Oracle again records the current time and the difference reports as a wait time. Unfortunately due to high CPU load the process was for quite a while waiting at the system level to get on CPU and only then was able to read the current time and therefore the reported elapsed time was quite exaggerated.

Conclusion: Waiting in runque for CPU exaggerates all wait times of the process.

It is always a good practice to confirm findings with a different method. In this case I could use also the operating system tools to measure I/O timings. The same situation one can have on any other kind of wait event but there is not always a possibility to check it independently.

Finally, there was not really an I/O problem but the system was quite CPU bound. Of course the average single block I/O time over 5 - 10 ms shows that probably we are experiencing also I/O bottleneck. The customer later on replaced disk storage with a faster one together with the HW used for RAC.


  1. "wainting in runque for CPU" is also called CPU starving.

  2. Bernard,
    Thanks for the nice suggestion. Right now I'm trying to loose additional 10 kg and I really know what means starving :-)