使用索引的誤區之四：空值對索引的影響

使用索引的誤區之四：空值對索引的影響
我們首先做一些測試數據：

SQL> create table t(x int, y int);

Table created

請注意，這裏我對錶t做了一個唯一（聯合）索引：

SQL> create unique index t_idx on t(x,y);

Index created

SQL> insert into t values(1,1);

1 row inserted

SQL> insert into t values(1,NULL);

1 row inserted

SQL> insert into t values(NULL,1);

1 row inserted

SQL> insert into t values(NULL,NULL);

1 row inserted

SQL> commit;

Commit complete

下面我們分析一下索引：

SQL> analyze index t_idx validate structure;

Index analyzed

SQL> select name,lf_rows from index_stats;

NAME                              LF_ROWS

------------------------------ ----------

T_IDX                                   3

SQL>

然後，我們就可以看到，當前的索引中僅僅保存了3行數據。

請注意，上面我們插入並提交了四行數據。

所以，這裏就有一個結論：

Oracle的索引不保存全部爲空的行。

我們繼續插入數據，現在再插入幾行全部爲空的行：

SQL> insert into t values(NULL,NULL);

1 row inserted

SQL> insert into t values(NULL,NULL);

1 row inserted

我們看到這樣的插入，居然沒有違反前面我們設定的唯一約束（unique on t(x,y)），

所以，這裏我們又得出一個結論：

Oracle認爲 NULL<>NULL ，進而 (NULL,NULL)<>(NULL,NULL)

換句話說，Oracle認爲空值（NULL）不等於任何值，包括空值也不等於空值。

我們看到下面的插入會違反唯一約束(DEMO.T_IDX)，這個很好理解了，因爲它不是全部爲空的值，即它不是（NULL,NULL），只有全部爲空的行才被認爲是不同的行：

SQL> insert into t values(1,null);

insert into t values(1,null)

ORA-00001: 違反唯一約束條件 (DEMO.T_IDX)

SQL> insert into t values(null,1);

insert into t values(null,1)

ORA-00001: 違反唯一約束條件 (DEMO.T_IDX)

SQL>

請看下面的例子:

SQL> select x,y,count(*) from t group by x,y;

    X        Y   COUNT(*)

----- -------- ----------

                        3

             1          1

    1                   1

    1        1          1

Executed in 0.03 seconds

SQL> select x,y,count(*) from t where x is null and y is null group by x,y;

   X       Y   COUNT(*)

---- ------- ----------

                      3

Executed in 0.01 seconds

SQL>

SQL> select x,y,count(*) from t group by x,y having count(*)>1;

     X                    Y   COUNT(*)

------ -------------------- ----------

                                     3

Executed in 0.02 seconds

SQL>

可以看見，完全爲空的行有三行，這裏我們又可以得出一個結論：

oracle在group by子句中認爲完全爲空的行是相同的行

換句話說，在group by子句中，oracle認爲(NULL,NULL)=(NULL,NULL)

下面的語句，使用了複合索引（x,y）的前導列，通常這樣的查詢是會使用索引的，我們看看下面的例子：

select * from t where x is null;

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

--------------------------------------------------------------------

| Id | Operation            | Name       | Rows | Bytes | Cost |

--------------------------------------------------------------------

|   0 | SELECT STATEMENT     |             |       |       |       |

|* 1 | TABLE ACCESS FULL   | T           |       |       |       |

--------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - filter("T"."X" IS NULL)

Note: rule based optimization

14 rows selected

Executed in 0.06 seconds

我們看到上面的查詢並沒有使用索引，那麼對比一下不使用控制的情況：

對比一下下面的查詢：

select * from t where x=1;

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

--------------------------------------------------------------------

| Id | Operation            | Name       | Rows | Bytes | Cost |

--------------------------------------------------------------------

|   0 | SELECT STATEMENT     |             |       |       |       |

|* 1 | INDEX RANGE SCAN    | T_IDX       |       |       |       |

--------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - access("T"."X"=1)

Note: rule based optimization

14 rows selected

Executed in 0.04 seconds

這個查詢（where x=1）如我們所希望的那樣使用了t_idx(x,y)複合索引，這裏我們可以得出一個結論：

在使用IS NULL 和 IS NOT NULL條件的時候，Oracle不使用索引（因爲Oracle的索引不存儲空值，詳細請參見前面的相關內容）

那麼我們如何使用空值的比較條件呢？

首先，儘量不在前導列上使用空值，請看下面的例子：

select * from t where x=1 and y is null;

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

--------------------------------------------------------------------

| Id | Operation            | Name       | Rows | Bytes | Cost |

--------------------------------------------------------------------

|   0 | SELECT STATEMENT     |             |       |       |       |

|* 1 | INDEX RANGE SCAN    | T_IDX       |       |       |       |

--------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - access("T"."X"=1)

       filter("T"."Y" IS NULL)

Note: rule based optimization

15 rows selected

select * from t where x is null and y=1;

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

--------------------------------------------------------------------

| Id | Operation            | Name       | Rows | Bytes | Cost |

--------------------------------------------------------------------

|   0 | SELECT STATEMENT     |             |       |       |       |

|* 1 | TABLE ACCESS FULL   | T           |       |       |       |

--------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - filter("T"."Y"=1 AND "T"."X" IS NULL)

Note: rule based optimization

14 rows selected

還有一個可以變通的方法，即我們在創建表的時候，爲每個列都指定爲非空約束（NOT NULL），並且在必要的列上使用default值，如：

SQL> create table lunar(

2   c1 varchar2(10) default @#empty@#

3     constraint c1_notnull not null,

4   c2 number(10) default 0

5     constraint c2_notnull not null,

6   c3 date default to_date(@#20990101@#,@#yyyymmdd@#)

7     constraint c3_notnull not null);

表已創建。

已用時間: 00: 00: 00.00

SQL> insert into lunar(c1) values(@#first@#);

已創建 1 行。

已用時間: 00: 00: 00.00

SQL> insert into lunar(c2) values(99);

已創建 1 行。

已用時間: 00: 00: 00.00

SQL> insert into lunar(c3) values(sysdate);

已創建 1 行。

已用時間: 00: 00: 00.00

SQL> insert into lunar(c1,c3) values(@#ok@#,sysdate);

已創建 1 行。

已用時間: 00: 00: 00.00

SQL> insert into lunar(c2,c1) values(999,@#hello@#);

已創建 1 行。

已用時間: 00: 00: 00.00

SQL> commit;

提交完成。

已用時間: 00: 00: 00.00

SQL> select * from lunar;

C1                 C2 C3

---------- ---------- ----------

first              0 01-1月 -99

empty              99 01-1月 -99

empty               0 19-10月-04

ok                  0 19-10月-04

hello             999 01-1月 -99

已用時間: 00: 00: 00.00

SQL> select c1,c2,to_char(c3,@#yyyy-mm-yy@#) from lunar;

C1                 C2 TO_CHAR(C3

---------- ---------- ----------

first               0 2099-01-99

empty              99 2099-01-99

empty               0 2004-10-04

ok                  0 2004-10-04

hello             999 2099-01-99

已用時間: 00: 00: 00.00

SQL>

然後我們再像使用一般的列那樣，使用他們，並且合理的爲他們建立索引相信就可以很好的提高應用的查詢效率。

使用索引的誤區之四：空值對索引的影響

access轉MSSQL經驗之談

我國教育存在嚴重失衡問題

使用索引的誤區之三：基於函數的索引

Gene6 FTP Server的web端口和ip修改

如何提高數據庫訪問效率

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結