Tuesday 8 December 2009

CBO oddities in determining selectivity on NVARCHAR2 data type

Some time ago I was involved in solving of a performance problem which was really weird - a simple select on a indexed column with unique value always resulted in a full table scan. The key column was of type NVARCHAR2. To better understand the behavior I created a simple reproducible test case.

Let us start with table and index creation.

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE 11.1.0.6.0 Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

SQL> show parameter semant

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
nls_length_semantics string CHAR

SQL> create table x2 (id varchar2(32), idn nvarchar2(32), id16 nvarchar2(16), id20 nvarchar2(20));

Table created.

SQL> insert into x2 select lpad(rownum,32,'0'), lpad(rownum,32,'0'),lpad(rownum,16,'0'),lpad(rownum,20,'0')
from dual connect by level <= 20000; 20000 rows created. SQL> commit;

Commit complete.

SQL> create index x2_idn on x2(idn);

Index created.

SQL> analyze table x2 compute statistics;

Table analyzed.

SQL> explain plan for select * from x2 where idn=U'00000000000000000000000000000009';

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 156302112

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 3281K| 136 (0)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| X2 | 20000 | 3281K| 136 (0)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("IDN"=U'00000000000000000000000000000009')


We have 20,000 rows with unique values in IDN column in the table, we have statistics and still the CBO produces an execution plan with FULL TABLE SCAN and with a very strange estimate of 20,000 rows reurned by this query.
Therefore our first question should be: What is wrong with the statistics? A simple query on USER_TAB_COLS shows the following results:

SQL> select column_name,num_distinct, low_value,high_value from user_tab_cols where table_name='X2';

COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE
------------------------------ ------------ ---------------------------------------------------------------- ----------------------------------------------------------------
ID 20000 3030303030303030303030303030303030303030303030303030303030303031 3030303030303030303030303030303030303030303030303030303230303030
IDN 1 0030003000300030003000300030003000300030003000300030003000300030 0030003000300030003000300030003000300030003000300030003000300030
ID16 20000 0030003000300030003000300030003000300030003000300030003000300031 0030003000300030003000300030003000300030003000320030003000300030
ID20 3 0030003000300030003000300030003000300030003000300030003000300030 0030003000300030003000300030003000300030003000300030003000300032

To display raw values I used function developed by Greg Rahn which gives the following result:

SQL> column lo_val format a32
SQL> column hi_val format a32
SQL> select column_name,
2 num_distinct,
3 display_raw(low_value,data_type) lo_val,
4 display_raw(high_value,data_type) hi_val
5 from user_tab_cols
6 where table_name='X2';

COLUMN_NAME NUM_DISTINCT LO_VAL HI_VAL
------------------------------ ------------ -------------------------------- --------------------------------
ID 20000 00000000000000000000000000000001 00000000000000000000000000020000
IDN 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ID16 20000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
ID20 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2

We can see that the NUM_DISTINCT column has quite strange values. Reason for this is that Oracle analyzes only first 32 bytes (and not characters) of the string. The default character set for NVARCHAR2 type is AL16UTF16 - a 2 byte character set.
The column IDN was defined as NVARCHAR2(32) and was actually completely filled so every row stores 64 bytes in this column. Here is the proof:


SQL> select dump(idn) from x2 where rownum = 1;

DUMP(IDN)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Typ=1 Len=64: 0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,56,0,51

Due to considering only first 32 bytes analyze comes back with a strange result that there is only one distinct value, what is of course true, as all distinct digits are beyond first 32 bytes.

Let us continue and use DBMS_STATS.GATHER_TABLE_STATS for analysis.

SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname=> user ,
4 tabname=> 'X2',
5 cascade=> TRUE,
6 estimate_percent=> DBMS_STATS.AUTO_SAMPLE_SIZE,
7 degree=> DBMS_STATS.AUTO_DEGREE,
8 no_invalidate=> FALSE,
9 granularity=> 'ALL',
10 method_opt=> 'FOR ALL COLUMNS SIZE AUTO');
11 end;
12 /

SQL> select column_name,num_distinct,
2 display_raw(low_value,data_type) lo_val,
3 display_raw(high_value,data_type) hi_val
4 from user_tab_cols where table_name='X2'
5 /

COLUMN_NAME NUM_DISTINCT LO_VAL HI_VAL
------------------------------ ------------ -------------------------------- --------------------------------
ID 19820 00000000000000000000000000000001 00000000000000000000000000020000
IDN 20000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ID16 20000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
ID20 19942 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2

DBMS_STATS produces much more reliable cardinalities but the low/high values are still showing only 0 for IDN column. Here there is no difference between ANALYZE and DBMS_STATS as it can't be. Oracle looks only at first 32 bytes and that is the fact. Therefore the CBO will be not able to properly determine the cardinality for the range predicates. We will look closer this in the continuation of this post.
This fact might become dangerous also for all databases which use multi-byte character set as default character set. Actually Oracle will always look only at first 32 bytes and in case that the column will contain many many multi-byte characters this can become pretty odd. All this is something new and becomes more critical in now days when a lot of databases are migrated from single-byte character set to a multi-byte character sets. For sure migrating to any flavor of UTF-8/AL32UTF8 character set represent such a thread.

Conclusions about statistics for NVARCHAR2 data type
When I realized what is the reason behind for such behavior I suggested to my customer to switch back to rule based optimizer what solved the problem. The other possibility would be to change all NVARCHAR2 columns to VARCHAR2 but the customer accept this.
First, but the most important conclusion would be not to use NVARCHAR2 data type for key columns, but rather VARCHAR2. Or paraphrasing Mogens Norgard: "We don't use NVARCHAR2 data type!" :-)

*Range cardinalities on columns of NVARCHAR2 data type*
Reason for this part of post is my involvement in one thread on OTN where a lady from Austria, my neighbor country, was complaining about the "strange" execution plans and consequently also "un-wanted" performance. In one mail she explained me that they are experiencing performance problems since they have upgraded from Oracle9i to 10g. She also said that at the same time they moved from VARCHAR2 data type to NVARCHAR2 datatype.
My first impression was that this might be the same situation as I have already explained above. But after digging deeply in the problem, I realized that the problem lies in completely mis-estimated selectivity and consequently also the cardinality when NVARCHAR2 data type column is in question.

To be able to study the case I prepared an extremely simple reproducible test case. Let us start with the setup of the test environment:


SQL> create table x1 (cv varchar2(10), cn nvarchar2(10));

Table created.

SQL> insert into x1 select lpad(rownum,8,'0'), lpad(rownum,8,'0')
from dual connect by level <= 20000; 20000 rows created. SQL> commit;

Commit complete.

SQL> select cv,cn from x1 where rownum <= 10; CV CN ---------- ---------- 00000469 00000469 00000470 00000470 00000471 00000471 00000472 00000472 00000473 00000473 00000474 00000474 00000475 00000475 00000476 00000476 00000477 00000477 00000478 00000478 10 rows selected. SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname=> user ,
4 tabname=> 'X1',
5 cascade=> TRUE,
6 estimate_percent=> DBMS_STATS.AUTO_SAMPLE_SIZE,
7 degree=> DBMS_STATS.AUTO_DEGREE,
8 no_invalidate=> FALSE,
9 granularity=> 'ALL',
10 method_opt=> 'FOR ALL COLUMNS SIZE 1');
11* end;
SQL> /

PL/SQL procedure successfully completed.


Let us run our first test SQL. I have used bind variables named :a1, :a2 of type VARCHAR2 and :n1, :n2 of type NVARCHAR2. I have event written a simple anonymous PL/SQL block where I selected the values for bind variables from the same table in order to be sure that the binding is done properly. While I was running the tests I enabled events 10053 and 10046.


SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
SQL_ID 1fj17ram77n5w, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ /* run 20 */ *
from x1
where cn >= to_nchar(:a1)
and cn <= to_nchar(:a2) Plan hash value: 2189453339 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 9999 |00:00:00.08 | 102 | |* 2 | TABLE ACCESS FULL| X1 | 1 | 1 | 9999 |00:00:00.03 | 102 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(SYS_OP_C2C(:A1)<=SYS_OP_C2C(:A2)) 2 - filter(("CN"<=SYS_OP_C2C(:A2) AND "CN">=SYS_OP_C2C(:A1)))


PARSING IN CURSOR #12 len=111 dep=0 uid=88 oct=3 lid=88 tim=254514943459 hv=2792607932 ad='23f7056c' sqlid='1fj17ram77n5w'
select /*+ gather_plan_statistics */ /* run 20 */ *
from x1
where cn >= to_nchar(:a1)
and cn <= to_nchar(:a2) END OF STMT PARSE #12:c=0,e=734,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=254514943452 BINDS #12: Bind#0 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=64 off=0 kxsbbbfp=0fec0abc bln=32 avl=08 flg=05 value="00000000" Bind#1 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=0 off=32 kxsbbbfp=0fec0adc bln=32 avl=08 flg=01 value="00009999"

I got a hard parse (mis=1), optimizer peeked at the values of bind variables and the estimated cardinality was 1 while the actual one was 9999. From 10046 trace we can clearly see the values of bind variables used by optimizer for the optimization. I used VARCHAR2 type of columns as bind variables but used TO_NCHAR function to convert to NVARCHAR2 data type. The optimizer internally rewrote the SQL so that instead of TO_NCHAR used internal function SYS_OP_C2C which performs conversion from one character set to another.


SQL> select SYS_OP_C2C('Š') from dual;

S
-
S


For next run I used NVARCHAR2 type bind variables and here is the result:

SQL_ID d1bn5vyb6dcgn, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ /* run 15 */ *
from x1
where cn >= to_nchar(:n1)
and cn <= to_nchar(:n2) Plan hash value: 2189453339 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 9999 |00:00:00.08 | 759 | |* 2 | TABLE ACCESS FULL| X1 | 1 | 50 | 9999 |00:00:00.04 | 759 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(:N1<=:N2) 2 - filter(("CN">=:N1 AND "CN"<=:N2)) PARSING IN CURSOR #15 len=111 dep=0 uid=88 oct=3 lid=88 tim=254434564390 hv=2523312628 ad='2c8d2200' sqlid='d1bn5vyb6dcgn' select /*+ gather_plan_statistics */ /* run 15 */ * from x1 where cn >= to_nchar(:n1)
and cn <= to_nchar(:n2) END OF STMT PARSE #15:c=0,e=746,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=254434564383 BINDS #15: Bind#0 oacdty=01 mxl=128(40) mxlc=20 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=02 csi=2000 siz=256 off=0 kxsbbbfp=0febbee0 bln=128 avl=16 flg=05 value="" Bind#1 oacdty=01 mxl=128(40) mxlc=20 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=02 csi=2000 siz=0 off=128 kxsbbbfp=0febbf60 bln=128 avl=16 flg=01 value=""

Here we got the estimated number of rows 50 what is exactly the default use by the CBO for ranges - selectivity is 0.0025. So the CBO simply decided to throw in the default selectivity. The values of bind variables are empty ("") - CBO just ignored them during the optimization phase but they were of course respected by the runtime engine.


SQL_ID cx88wv0q05tst, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ /* run 14 */ *
from x1
where substr(cn,1,10) >= to_nchar(:a1)
and substr(cn,1,10) <= to_nchar(:a2) Plan hash value: 2189453339 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 9999 |00:00:00.09 | 759 | |* 2 | TABLE ACCESS FULL| X1 | 1 | 50 | 9999 |00:00:00.05 | 759 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(SYS_OP_C2C(:A1)<=SYS_OP_C2C(:A2)) 2 - filter((SUBSTR("CN",1,10)>=SYS_OP_C2C(:A1) AND
SUBSTR("CN",1,10)<=SYS_OP_C2C(:A2))) PARSING IN CURSOR #12 len=137 dep=0 uid=88 oct=3 lid=88 tim=254227180414 hv=2415463600 ad='23f7e900' sqlid='f5wapff7zk35h' select /*+ gather_plan_statistics */ /* run 16 */ * from x1 where substr(cn,1,10) >= to_nchar(:a1)
and substr(cn,1,10) <= to_nchar(:a2) END OF STMT PARSE #12:c=0,e=767,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=254227180407 BINDS #12: Bind#0 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=64 off=0 kxsbbbfp=0ff75ac0 bln=32 avl=08 flg=05 value="00000000" Bind#1 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=0 off=32 kxsbbbfp=0ff75ae0 bln=32 avl=08 flg=01 value="00009999"

In this case I added used SUBSTR function on the column and VARCHAR2 type bind variables and again the CBO threw in the default for range 0.0025, but we can see the values of bind variables.


SQL_ID auyacfyb6bj9r, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ /* run 14 */ *
from x1
where substr(cn,1,10) >= to_nchar(:n1)
and substr(cn,1,10) <= to_nchar(:n2) Plan hash value: 2189453339 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 9999 |00:00:00.09 | 759 | |* 2 | TABLE ACCESS FULL| X1 | 1 | 50 | 9999 |00:00:00.05 | 759 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(:N1<=:N2) 2 - filter((SUBSTR("CN",1,10)>=:N1 AND SUBSTR("CN",1,10)<=:N2)) PARSING IN CURSOR #16 len=137 dep=0 uid=88 oct=3 lid=88 tim=254203929826 hv=1230940210 ad='23f78074' sqlid='13u552p4pxa1k' select /*+ gather_plan_statistics */ /* run 15 */ * from x1 where substr(cn,1,10) >= to_nchar(:n1)
and substr(cn,1,10) <= to_nchar(:n2) END OF STMT PARSE #16:c=0,e=770,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=254203929819 BINDS #16: Bind#0 oacdty=01 mxl=128(40) mxlc=20 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=02 csi=2000 siz=256 off=0 kxsbbbfp=10460bd0 bln=128 avl=16 flg=05 value="" Bind#1 oacdty=01 mxl=128(40) mxlc=20 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=02 csi=2000 siz=0 off=128 kxsbbbfp=10460c50 bln=128 avl=16 flg=01 value=""

In the CBO trace the inforamation was:

*******************************************
Peeked values of the binds in SQL statement
*******************************************
----- Bind Info (kkscoacd) -----
Bind#0
oacdty=01 mxl=32(20) mxlc=10 mal=00 scl=00 pre=00
oacflg=03 fl2=1000010 frm=02 csi=2000 siz=64 off=0
kxsbbbfp=0f1f5a1c bln=32 avl=16 flg=05
value=""
Bind#1
oacdty=01 mxl=32(20) mxlc=10 mal=00 scl=00 pre=00
oacflg=03 fl2=1000010 frm=02 csi=2000 siz=0 off=32
kxsbbbfp=0f1f5a3c bln=32 avl=16 flg=01
value=""

In the last case I just changed from VARCHAR2 to NVARCHAR2 bind variables and the result was expected - the default selectivity and the values of bind variables were again "".

Every time, when the CBO used binds for estimating selectivity I could see the following section after the execution plan:

Peeked Binds
============
Bind variable information
position=1
datatype(code)=1
datatype(string)=VARCHAR2(32)
char set id=170
char format=1
max length=32
value=00000000
Bind variable information
position=2
datatype(code)=1
datatype(string)=VARCHAR2(32)
char set id=170
char format=1
max length=32
value=00009999
Outline Data:
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('11.1.0.6')
DB_VERSION('11.1.0.6')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "X1"@"SEL$1")
END_OUTLINE_DATA
*/

And finally one case with VARCHAR2 data type.

SQL_ID bznnx6zk4hm11, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ /* run 22 */ *
from x1 where cv >=:a1
and cv <= :a2 Plan hash value: 2189453339 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 9999 |00:00:00.07 | 102 | |* 2 | TABLE ACCESS FULL| X1 | 1 | 9999 | 9999 |00:00:00.02 | 102 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(:A1<=:A2) 2 - filter(("CV"<=:A2 AND "CV">=:A1))


PARSING IN CURSOR #13 len=89 dep=0 uid=88 oct=3 lid=88 tim=265619391820 hv=3829943329 ad='2e0c3568' sqlid='bznnx6zk4hm11'
select /*+ gather_plan_statistics */ /* run 22 */ *
from x1
where cv >= :a1
and cv <= :a2 END OF STMT PARSE #13:c=0,e=762,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=265619391813 BINDS #13: Bind#0 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=64 off=0 kxsbbbfp=0fec0abc bln=32 avl=08 flg=05 value="00000000" Bind#1 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=0 off=32 kxsbbbfp=0fec0adc bln=32 avl=08 flg=01 value="00009999"

In this last case I was querying the VARCHAR2 column and using bind variables of VARCHAR2 type and this time the optimizer estimated cardinality correctly. Of course it performed peeking at the values of bind variables.

Randolf Geist responded to my post on OTN and he suggested to confuse the CBO by using values like 'A0000000' to 'A00009999'. According to his observations also the selectivity for range predicates on VARCHAR2 data types is quite off after using this trick. Building a histogram returns things back to normal on both data types.
He pointed out that the CBO is obviously so smart to spot stored digits in VARCHAR2 columns and do good estimations by treating them as numbers while this is not true for the VARCHAR2 data type.

CONCLUSION
It is obvious that the CBO is not capable to perforem same estimates for ranges on NVARCHAR2 column as it is capable to do on VARCHAR2 column when numbers are stored as strings.
But the most important conclusion to be made is that one should use *appropriate* data types.

Of course that are many possibilities that the CBO might behave differently (like using DYNAMIC_SAMPLING, ...). So I am looking forward for the contributions of others, not just Randolf Geist.

This case was used to raise SR #3-1193039441: CBO and nvarchar2 --> A new Bug: 9220704 has been filed with development.

2 comments:

  1. Joze,
    as suggested we opened an SR on Oracle Support and it has now been declared as bug:
    SR #3-1193039441: file 12-14 LT] CBO and nvarchar2 --> A new Bug: 9220704 has been filed with development.

    Kind regards, Eva

    ReplyDelete
  2. Eva,

    Thank you very much for this information which I will add to the post as people usually don't read the discussion.

    ReplyDelete