數據倉庫設計的隱患-標量子查詢

首先,來理解一下標量子查詢:處於select之後from之前的子查詢稱爲標量子查詢 .比如:select  num1,cal,(select name from t2 where t2.id = t1.id)from t1;舉這個例子只是爲了方便理解標量的含義。當然定義爲返回單列的選擇語句,或者返回一行的表達式的子查詢稱爲標量子查詢。

標量子查詢的缺點十分明顯:驅動表固定是外表t1, t1返回的結果集傳值t2得到結果。所以如果t1表過大(或者以後t1表逐漸增長以後會變得很大)。將會引起很大的性能問題【數據倉庫跑批流程應該禁用標量子查詢】

今天這條SQL運行了5個小時沒出結果。。。。。。。(不得不說耐心真好,我一般最多等十分鐘)

 

SELECT /*+ NO_USE_HASH(C,B)*/
 C.CCCC_AAAA_NO,
 C.PRIM_ACCT,
 ACCOUNT_SYSTEM,
 CUSTOMER_TYPE,
 CUSTOMER_STATUS,
 CREATE_DT,
 HOME_BRANCH_NO,
 COMPANY_SIZE,
 NOTICE_IND,
 NOTICE_CUST_NO,
 STMT_FREQUENCY,
 STMT_CYCLE,
 STMT_DAY,
 ID_NO,
 ID_TYPE,
 SHORT_NAME,
 EMAIL_ADD1,
 EMAIL_ADD2,
 CREDIT_RANKING,
 TITLE_CODE,
 NAME1,
 ADD1,
 POSTCODE,
 PHONE_NO_RES,
 PHONE_RES_EXT,
 PHONE_NO_BUS,
 PHONE_BUS_EXT,
 FAX_NO,
 TELEX_NO,
 PCODE_RGSTER,
 REGSTR_ADD1,
 REGSTR_ADD2,
 PHONE_RGSTR_NO,
 PHONE_RGSTR_EXT,
 BIRTH_DATE_1,
 SEX_CODE,
 EMPLOYER_NAME,
 EMPLOYED_FROM,
 EMPLOYER_ADDR,
 OCCUP_DESCRIP,
 OCCUPATION_CODE,
 INCOME,
 INCOME_WMY,
 COMPANY_NO,
 BUSINESS_NO,
 LICENCE_NO,
 BOSS_NAME,
 BOSS_BDAY,
 BUS_RGSTR_DATE,
 CAPITAL_AMT,
 CONTACT_REL_1,
 PHONE_NO_1,
 ADD2,
 ADD3,
 ADD4,
 MOBILE_NO,
 FXSP_TYPE,
 INDUSTRY_CODE,
 BUS_SECTOR_CODE,
 CUST_SUB_TYPE,
 DEP_STMT_TYPE,
 ID_ISSUE_DATE,
 ID_EXP_DATE,
 REGISTRY_ADD,
 ID_ISSUE_PLAC,
 LST_MNT_DATE,
 B.BRANCH_NO
  FROM CUSM_M C
 INNER JOIN (SELECT
             DISTINCT CCCCCCCC_NO,
                      (SELECT SJJGM
                         FROM JGDY H
                        WHERE H.JGM = CB_ACCT.BRANCH_NO) BRANCH_NO
               FROM CB_ACCT) B ON C.CCCC_AAAA_NO = B.CCCCCCCC_NO;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 2079508004
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |    18M|    14G|       |  1793K  (1)| 00:04:41 |
|*  1 |  INDEX SKIP SCAN              | JJJJ_IDX3 |     1 |     8 |       |     1   (0)| 00:00:01 |
|   2 |  MERGE JOIN                   |           |    18M|    14G|       |  1793K  (1)| 00:04:41 |
|   3 |   SORT JOIN                   |           |    18M|   397M|       |   147K  (1)| 00:00:24 |
|   4 |    VIEW                       |           |    18M|   397M|       |   147K  (1)| 00:00:24 |
|   5 |     HASH UNIQUE               |           |    18M|   380M|  1107M|   147K  (1)| 00:00:24 |
|   6 |      TABLE ACCESS STORAGE FULL| CC_ACCTT  |    36M|   760M|       | 71431   (1)| 00:00:12 |
|*  7 |   SORT JOIN                   |           |    19M|    14G|    35G|  1645K  (1)| 00:04:18 |
|   8 |    TABLE ACCESS STORAGE FULL  | CUSM_M    |    19M|    14G|       |   306K  (1)| 00:00:48 |
---------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("H"."JGM"=:B1)
       filter("H"."JGM"=:B1)
   7 - access("C"."CCCC_AAAA_NO"="B"."CCCCCCCC_NO")
       filter("C"."CCCC_AAAA_NO"="B"."CCCCCCCC_NO")

通過SQL和PLAN都可以很容易的找出標量子查詢

 

1.select之後from之前,當然這個SQL的標量隱藏在了內聯視圖裏面了

(SELECT
             DISTINCT CCCCCCCC_NO,
                      (SELECT SJJGM
                         FROM JGDY H
                        WHERE H.JGM = CB_ACCT.BRANCH_NO) BRANCH_NO
               FROM CB_ACCT) B

2.PLAN的id=1和2那兩步,縮進一樣,而且沒有連接方式的父親節點

|*  1 |  INDEX SKIP SCAN              | JJJJ_IDX3 |     1 |     8 |       |     1   (0)| 00:00:01 |
|   2 |  MERGE JOIN                   |           |    18M|    14G|       |  1793K  (1)| 00:04:41 |

 

通過這兩點均可以判斷SQL裏面包含標量.如果SQL特別長就直接看PLAN就行了

標量是否產生性能問題,注意取決於主表(外表)返回的行數.其實我們都知道這種數據倉庫跑批的表不可能小。象徵性的查一下

我之前的博客裏面發過這個腳本 http://blog.csdn.net/skybig1988/article/details/71125223 也可以自己定製,很簡單

可以看出表的行數很大,不適合走標量(>10000行)

對於標量子查詢,只能通過改寫【標量子查詢可以等價改寫爲外連接】

當然此處的標量改寫十分簡單.有些複雜的比如 聚合類、不等值、樹形查詢的標量千萬需要注意改寫前後是否等價

 

SELECT /*+ NO_USE_HASH(C,B)*/
 C.CCCC_AAAA_NO,
 C.PRIM_ACCT,
 ACCOUNT_SYSTEM,
 CUSTOMER_TYPE,
 CUSTOMER_STATUS,
 CREATE_DT,
 HOME_BRANCH_NO,
 COMPANY_SIZE,
 NOTICE_IND,
 NOTICE_CUST_NO,
 STMT_FREQUENCY,
 STMT_CYCLE,
 STMT_DAY,
 ID_NO,
 ID_TYPE,
 SHORT_NAME,
 EMAIL_ADD1,
 EMAIL_ADD2,
 CREDIT_RANKING,
 TITLE_CODE,
 NAME1,
 ADD1,
 POSTCODE,
 PHONE_NO_RES,
 PHONE_RES_EXT,
 PHONE_NO_BUS,
 PHONE_BUS_EXT,
 FAX_NO,
 TELEX_NO,
 PCODE_RGSTER,
 REGSTR_ADD1,
 REGSTR_ADD2,
 PHONE_RGSTR_NO,
 PHONE_RGSTR_EXT,
 BIRTH_DATE_1,
 SEX_CODE,
 EMPLOYER_NAME,
 EMPLOYED_FROM,
 EMPLOYER_ADDR,
 OCCUP_DESCRIP,
 OCCUPATION_CODE,
 INCOME,
 INCOME_WMY,
 COMPANY_NO,
 BUSINESS_NO,
 LICENCE_NO,
 BOSS_NAME,
 BOSS_BDAY,
 BUS_RGSTR_DATE,
 CAPITAL_AMT,
 CONTACT_REL_1,
 PHONE_NO_1,
 ADD2,
 ADD3,
 ADD4,
 MOBILE_NO,
 FXSP_TYPE,
 INDUSTRY_CODE,
 BUS_SECTOR_CODE,
 CUST_SUB_TYPE,
 DEP_STMT_TYPE,
 ID_ISSUE_DATE,
 ID_EXP_DATE,
 REGISTRY_ADD,
 ID_ISSUE_PLAC,
 LST_MNT_DATE,
 B.BRANCH_NO
  FROM CUSM_M C
 INNER JOIN (SELECT DISTINCT CCCCCCCC_NO,
                      sjjgm BRANCH_NO
               FROM CB_ACCT  LEFT JOIN jgdy  ON  cb_acct.branch_no=jgm
               ) B ON C.CCCC_AAAA_NO = B.CCCCCCCC_NO;
Plan hash value: 2285049241
 
----------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |    36M|    28G|       |  1834K  (1)| 00:04:47 |
|   1 |  MERGE JOIN                    |           |    36M|    28G|       |  1834K  (1)| 00:04:47 |
|   2 |   SORT JOIN                    |           |    36M|   829M|       |   188K  (1)| 00:00:30 |
|   3 |    VIEW                        |           |    36M|   829M|       |   188K  (1)| 00:00:30 |
|   4 |     HASH UNIQUE                |           |    36M|  1037M|  1384M|   188K  (1)| 00:00:30 |
|*  5 |      HASH JOIN RIGHT OUTER     |           |    36M|  1037M|       | 71501   (1)| 00:00:12 |
|   6 |       INDEX FULL SCAN          | JJJJ_IDX3 |  1241 |  9928 |       |     1   (0)| 00:00:01 |
|   7 |       TABLE ACCESS STORAGE FULL| CC_ACCTT  |    36M|   760M|       | 71431   (1)| 00:00:12 |
|*  8 |   SORT JOIN                    |           |    19M|    14G|    35G|  1645K  (1)| 00:04:18 |
|   9 |    TABLE ACCESS STORAGE FULL   | CUSM_M    |    19M|    14G|       |   306K  (1)| 00:00:48 |
----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access("CB_ACCT"."BRANCH_NO"="JGM"(+))
   8 - access("C"."CCCC_AAAA_NO"="B"."CCCCCCCC_NO")
       filter("C"."CCCC_AAAA_NO"="B"."CCCCCCCC_NO")

改寫之後標量消失.SQL運行了7分鐘出結果。但是這個SQL裏面沒有不等值連接,走MERGE JOIN顯然毫無意義。明顯走HASH是最好的選擇

 

一直不理解SQL上面的/*+ NO_USE_HASH(C,B)*/ 的意義,最後開發迴應說這個 HINT是爲了讓SQL走嵌套循環,因爲走NL比較快。聽到這個理由我也是呵呵了!

這裏我簡單的說一下NL、HASH、SMJ在實際工作中該如何選擇:

嵌套循環:
     看SQL語句的返回條數 太大的話一般都是錯誤的
     看驅動表返回的行數   一般不能超過1w   最好在1k 以內(但是這個取決於服務器性能,可能性能好的服務器臨界值超過20w都可行)
     看被驅動表的鏈接列 是否包含在索引裏面     (必須包含在索引裏面)
     看到distinct ,group by ,sum()一般不走嵌套循環(數據量超級多才去group by)當然數據量少的話也可以走NL

哈希連接只能用於等值連接

排序合併連接唯一的作用:非等值連接

去掉/*+ NO_USE_HASH(C,B)*/ 之後.SQL運行了30秒便出結果

 

SELECT
 C.CCCC_AAAA_NO,
 C.PRIM_ACCT,
 ACCOUNT_SYSTEM,
 CUSTOMER_TYPE,
 CUSTOMER_STATUS,
 CREATE_DT,
 HOME_BRANCH_NO,
 COMPANY_SIZE,
 NOTICE_IND,
 NOTICE_CUST_NO,
 STMT_FREQUENCY,
 STMT_CYCLE,
 STMT_DAY,
 ID_NO,
 ID_TYPE,
 SHORT_NAME,
 EMAIL_ADD1,
 EMAIL_ADD2,
 CREDIT_RANKING,
 TITLE_CODE,
 NAME1,
 ADD1,
 POSTCODE,
 PHONE_NO_RES,
 PHONE_RES_EXT,
 PHONE_NO_BUS,
 PHONE_BUS_EXT,
 FAX_NO,
 TELEX_NO,
 PCODE_RGSTER,
 REGSTR_ADD1,
 REGSTR_ADD2,
 PHONE_RGSTR_NO,
 PHONE_RGSTR_EXT,
 BIRTH_DATE_1,
 SEX_CODE,
 EMPLOYER_NAME,
 EMPLOYED_FROM,
 EMPLOYER_ADDR,
 OCCUP_DESCRIP,
 OCCUPATION_CODE,
 INCOME,
 INCOME_WMY,
 COMPANY_NO,
 BUSINESS_NO,
 LICENCE_NO,
 BOSS_NAME,
 BOSS_BDAY,
 BUS_RGSTR_DATE,
 CAPITAL_AMT,
 CONTACT_REL_1,
 PHONE_NO_1,
 ADD2,
 ADD3,
 ADD4,
 MOBILE_NO,
 FXSP_TYPE,
 INDUSTRY_CODE,
 BUS_SECTOR_CODE,
 CUST_SUB_TYPE,
 DEP_STMT_TYPE,
 ID_ISSUE_DATE,
 ID_EXP_DATE,
 REGISTRY_ADD,
 ID_ISSUE_PLAC,
 LST_MNT_DATE,
 B.BRANCH_NO
  FROM CUSM_M C
 INNER JOIN (SELECT DISTINCT CCCCCCCC_NO,
                      sjjgm BRANCH_NO
               FROM CB_ACCT  LEFT JOIN jgdy  ON  cb_acct.branch_no=jgm
               ) B ON C.CCCC_AAAA_NO = B.CCCCCCCC_NO

Plan hash value: 967350049
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |    36M|    28G|       |  1059K  (1)| 00:02:46 |
|*  1 |  HASH JOIN                    |           |    36M|    28G|  1244M|  1059K  (1)| 00:02:46 |
|   2 |   VIEW                        |           |    36M|   829M|       |   188K  (1)| 00:00:30 |
|   3 |    HASH UNIQUE                |           |    36M|  1037M|  1384M|   188K  (1)| 00:00:30 |
|*  4 |     HASH JOIN RIGHT OUTER     |           |    36M|  1037M|       | 71501   (1)| 00:00:12 |
|   5 |      INDEX FULL SCAN          | JJJJ_IDX3 |  1241 |  9928 |       |     1   (0)| 00:00:01 |
|   6 |      TABLE ACCESS STORAGE FULL| CC_ACCTT  |    36M|   760M|       | 71431   (1)| 00:00:12 |
|   7 |   TABLE ACCESS STORAGE FULL   | CUSM_M    |    19M|    14G|       |   306K  (1)| 00:00:48 |
---------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("C"."CCCC_AAAA_NO"="B"."CCCCCCCC_NO")
   4 - access("CB_ACCT"."BRANCH_NO"="JGM"(+))

其實這個SQL還可以繼續優化,ID=5這一步INDEX FULL SCAN是單塊讀改成全表掃描可以提升100+倍,加上一體機本身的全表掃描優化TABLE ACCESS STORAGE FULL。提升會更多!!!
 

如上可知標量子查詢是一個非常恐怖的用法。當外部表返回的數據量不大時。完全不會引起性能問題。但是此時隱患已經埋下

隨着外部表數據量的增加。標量的性能會慢慢受到影響,一旦過了這個臨界值。性能下降的非常明顯和可怕。所以在數據倉庫

中應該用外連接代替標量,避免給程序埋下隱患。

 

 

 

發佈了47 篇原創文章 · 獲贊 9 · 訪問量 3萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章