[20150709]慎用标量子查询.txt
--花了几天的时间调试sql语句,最终发现是标量子查询在作怪,原始的语句太复杂,我拿scott做一个例子来说明问题。
1.建立测试环境:
SCOTT@test> @ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.3.0 Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
create view v_deptemp1 as select emp.*,dname,loc from emp,dept where dept.deptno=emp.deptno ;
create view v_deptemp2 as select emp.*,(select dname from dept where dept.deptno=emp.deptno) dname ,(select loc from dept where dept.deptno=emp.deptno )loc from emp ;
--第2种写法就是标量子查询。
2.测试1:
create index i_dept_dname on scott.dept (dname);
create index i_emp_deptno on scott.emp (deptno)
SCOTT@test> select * from v_deptemp1 where dname='ACCOUNTING' ;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC
---------- ---------- --------- ---------- ------------------- ---------- ---------- ---------- -------------- -------------
7782 CLARK MANAGER 7839 1981-06-09 00:00:00 2450 10 ACCOUNTING NEW YORK
7839 KING PRESIDENT 1981-11-17 00:00:00 5000 10 ACCOUNTING NEW YORK
7934 MILLER CLERK 7782 1982-01-23 00:00:00 1300 10 ACCOUNTING NEW YORK
SCOTT@test> @dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 09tbznuqjnxa4, child number 0
-------------------------------------
select * from v_deptemp1 where dname='ACCOUNTING'
Plan hash value: 2863776355
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 5 | 295 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | I_DEPT_DNAME | 1 | | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | I_EMP_DEPTNO | 5 | | 0 (0)| |
| 6 | TABLE ACCESS BY INDEX ROWID | EMP | 5 | 195 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$F5BB74E1
3 - SEL$F5BB74E1 / DEPT@SEL$2
4 - SEL$F5BB74E1 / DEPT@SEL$2
5 - SEL$F5BB74E1 / EMP@SEL$2
6 - SEL$F5BB74E1 / EMP@SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("DNAME"='ACCOUNTING')
5 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
--oracle 很好的分析确定索引,并且对emp表也可以使用i_emp_deptno。
--看看第2种情况:
SCOTT@test> select * from v_deptemp2 where dname='ACCOUNTING' ;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC
---------- ---------- --------- ---------- ------------------- ---------- ---------- ---------- -------------- -------------
7782 CLARK MANAGER 7839 1981-06-09 00:00:00 2450 10 ACCOUNTING NEW YORK
7839 KING PRESIDENT 1981-11-17 00:00:00 5000 10 ACCOUNTING NEW YORK
7934 MILLER CLERK 7782 1982-01-23 00:00:00 1300 10 ACCOUNTING NEW YORK
SCOTT@test> @dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 45acjyf0f8x54, child number 0
-------------------------------------
select * from v_deptemp2 where dname='ACCOUNTING'
Plan hash value: 2698003519
-------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 13 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| |
| 3 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| |
|* 5 | VIEW | V_DEPTEMP2 | 14 | 1456 | 3 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | EMP | 14 | 546 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$3 / DEPT@SEL$3
2 - SEL$3 / DEPT@SEL$3
3 - SEL$4 / DEPT@SEL$4
4 - SEL$4 / DEPT@SEL$4
5 - SEL$2 / V_DEPTEMP2@SEL$1
6 - SEL$2 / EMP@SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DEPT"."DEPTNO"=:B1)
4 - access("DEPT"."DEPTNO"=:B1)
5 - filter("DNAME"='ACCOUNTING')
--这样无论如何都不会使用emp的i_emp_deptno。当然我实际上的情况实际上开发想避免dname 为null的情况。
--因为标量子查询如果查询dname为null不是他所需要的。
3.测试2:
--再建立一个表tx
create table tx ( id number,name varchar2(20));
insert into tx values (7654,'MARTIN');
commit;
SCOTT@test> select * from v_deptemp1 v ,tx where tx.id=v.empno and dname is not null;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC ID NAME
---------- ---------- --------- ---------- ------------------- ---------- ---------- ---------- -------------- ------------- ---------- ------------
7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00 1250 1400 30 SALES CHICAGO 7654 MARTIN
SCOTT@test> @dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID gskw0uw49rw0k, child number 0
-------------------------------------
select * from v_deptemp1 v ,tx where tx.id=v.empno and dname is not null
Plan hash value: 2230440165
------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 5 (100)| |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 70 | 5 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 50 | 4 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | TX | 1 | 11 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 39 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_EMP_EMPNO | 1 | | 0 (0)| |
|* 7 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| |
|* 8 | TABLE ACCESS BY INDEX ROWID | DEPT | 1 | 20 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$F5BB74E1
4 - SEL$F5BB74E1 / TX@SEL$1
5 - SEL$F5BB74E1 / EMP@SEL$2
6 - SEL$F5BB74E1 / EMP@SEL$2
7 - SEL$F5BB74E1 / DEPT@SEL$2
8 - SEL$F5BB74E1 / DEPT@SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("TX"."ID"="EMP"."EMPNO")
7 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
8 - filter("DNAME" IS NOT NULL)
--而如果换成v_deptemp2
SCOTT@test> select * from v_deptemp2 v ,tx where tx.id=v.empno and dname is not null;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC ID NAME
---------- ---------- --------- ---------- ------------------- ---------- ---------- ---------- -------------- ------------- ---------- --------
7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00 1250 1400 30 SALES CHICAGO 7654 MARTIN
SCOTT@test> @dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 7b8950at9688x, child number 0
-------------------------------------
select * from v_deptemp2 v ,tx where tx.id=v.empno and dname is not null
Plan hash value: 2887512270
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 7 (100)| | | | |
| 1 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 13 | 1 (0)| 00:00:01 | | | |
|* 2 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| | | | |
| 3 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 | | | |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| | | | |
|* 5 | HASH JOIN | | 1 | 106 | 7 (15)| 00:00:01 | 1156K| 1156K| 383K (0)|
| 6 | TABLE ACCESS FULL | TX | 1 | 11 | 3 (0)| 00:00:01 | | | |
|* 7 | VIEW | V_DEPTEMP2 | 14 | 1330 | 3 (0)| 00:00:01 | | | |
| 8 | TABLE ACCESS FULL | EMP | 14 | 546 | 3 (0)| 00:00:01 | | | |
----------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$3 / DEPT@SEL$3
2 - SEL$3 / DEPT@SEL$3
3 - SEL$4 / DEPT@SEL$4
4 - SEL$4 / DEPT@SEL$4
5 - SEL$1
6 - SEL$1 / TX@SEL$1
7 - SEL$2 / V@SEL$1
8 - SEL$2 / EMP@SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DEPT"."DEPTNO"=:B1)
4 - access("DEPT"."DEPTNO"=:B1)
5 - access("TX"."ID"="V"."EMPNO")
7 - filter("DNAME" IS NOT NULL)
--emp始终选择的是全表扫描。
4.继续测试:
--当把dname is not null删除后,结果呢?
SCOTT@test> select * from v_deptemp2 v ,tx where tx.id=v.empno ;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC ID NAME
---------- ---------- --------- ---------- ------------------- ---------- ---------- ---------- -------------- ------------- ---------- ---------
7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00 1250 1400 30 SALES CHICAGO 7654 MARTIN
SCOTT@test> @dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 1sd5umnwjt823, child number 0
-------------------------------------
select * from v_deptemp2 v ,tx where tx.id=v.empno
Plan hash value: 3698375752
----------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID | DEPT | 1 | 13 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| |
| 3 | TABLE ACCESS BY INDEX ROWID | DEPT | 1 | 11 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| |
| 5 | NESTED LOOPS | | | | | |
| 6 | NESTED LOOPS | | 1 | 50 | 4 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | TX | 1 | 11 | 3 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | PK_EMP_EMPNO | 1 | | 0 (0)| |
| 9 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 39 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$3 / DEPT@SEL$3
2 - SEL$3 / DEPT@SEL$3
3 - SEL$4 / DEPT@SEL$4
4 - SEL$4 / DEPT@SEL$4
5 - SEL$F5BB74E1
7 - SEL$F5BB74E1 / TX@SEL$1
8 - SEL$F5BB74E1 / EMP@SEL$2
9 - SEL$F5BB74E1 / EMP@SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DEPT"."DEPTNO"=:B1)
4 - access("DEPT"."DEPTNO"=:B1)
8 - access("TX"."ID"="EMP"."EMPNO")
--可以发现没有这个条件(dname is not null),emp可以很好的选择索引。
--实际我们的视图定义有多处使用标量子查询,当然我只要把在where出现的改写成不使用标量子查询。
--如果不愿意修改还有1个简单的改写方法,就是:
SCOTT@test> select * from (select * from v_deptemp2 v ,tx where tx.id=v.empno ) where dname is not null ;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC ID NAME
---------- ---------- --------- ---------- ------------------- ---------- ---------- ---------- -------------- ------------- ---------- --------
7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00 1250 1400 30 SALES CHICAGO 7654 MARTIN
SCOTT@test> @dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 8yy4n9t1dqhmm, child number 0
-------------------------------------
select * from (select * from v_deptemp2 v ,tx where tx.id=v.empno )
where dname is not null
Plan hash value: 3563298867
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID | DEPT | 1 | 13 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| |
| 3 | TABLE ACCESS BY INDEX ROWID | DEPT | 1 | 11 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| |
|* 5 | VIEW | | 1 | 129 | 4 (0)| 00:00:01 |
| 6 | NESTED LOOPS | | | | | |
| 7 | NESTED LOOPS | | 1 | 50 | 4 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | TX | 1 | 11 | 3 (0)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | PK_EMP_EMPNO | 1 | | 0 (0)| |
| 10 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 39 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$4 / DEPT@SEL$4
2 - SEL$4 / DEPT@SEL$4
3 - SEL$5 / DEPT@SEL$5
4 - SEL$5 / DEPT@SEL$5
5 - SEL$335DD26A / from$_subquery$_001@SEL$1
6 - SEL$335DD26A
8 - SEL$335DD26A / TX@SEL$2
9 - SEL$335DD26A / EMP@SEL$3
10 - SEL$335DD26A / EMP@SEL$3
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DEPT"."DEPTNO"=:B1)
4 - access("DEPT"."DEPTNO"=:B1)
5 - filter("DNAME" IS NOT NULL)
9 - access("TX"."ID"="EMP"."EMPNO")
--这样改写后emp可以使用索引!看来oracle cbo 还是不够成熟,无法识别这种情况。
5.也许有人问v_deptemp1 与 v_deptemp2是不等价的。继续测试:
SCOTT@test> create or replace view v_deptemp3 as select emp.*,dname,loc from emp,dept where dept.deptno(+)=emp.deptno ;
View created.
alter table emp modify constraint fk_deptno disable novalidate;
--取消主外键约束。
SCOTT@test> update emp set deptno=90 where empno=7654;
1 row updated.
SCOTT@test> commit ;
Commit complete.
SCOTT@test> select * from v_deptemp3 v ,tx where tx.id=v.empno and dname is not null;
no rows selected
SCOTT@test> @dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID bwytb3gt4vtx1, child number 0
-------------------------------------
select * from v_deptemp3 v ,tx where tx.id=v.empno and dname is not null
Plan hash value: 2230440165
------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 5 (100)| |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 70 | 5 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 50 | 4 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | TX | 1 | 11 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 39 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_EMP_EMPNO | 1 | | 0 (0)| |
|* 7 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| |
|* 8 | TABLE ACCESS BY INDEX ROWID | DEPT | 1 | 20 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$EB0D93E2
4 - SEL$EB0D93E2 / TX@SEL$1
5 - SEL$EB0D93E2 / EMP@SEL$2
6 - SEL$EB0D93E2 / EMP@SEL$2
7 - SEL$EB0D93E2 / DEPT@SEL$2
8 - SEL$EB0D93E2 / DEPT@SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("TX"."ID"="EMP"."EMPNO")
7 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
8 - filter("DNAME" IS NOT NULL)
--就是这样定义,oracle 一样选择合理的执行方式。
总结:
1.不要大量不顾各种情况的使用标量子查询,实际上标量子查询存在许多缺陷,仅仅能查询1个字段,仅仅返回1行。
2.并且导致内部嵌套连接,而不能采用hash join等连接方式。假如emp表1万行,这样查询dept表要有1万次(我上面执行2次标量子查询,也就是2万次)。
即使dept有索引,逻辑读的数量不可小看.
3.另外一个原因我觉得可能是数据结构问题,这种情况如果太遵守范式,数据结构设计耦合太紧,开发很容易选择标量子查询的写法。
例如:1个表有6个字段有科室代码字段,这样要显示科室名称,这样from中写6次科室表,做连接6次查询,这样许多开发更愿意选择标量子查询写sql语句。
4.总之要开发合理选择这种写法,不能到处滥用。