[20130809]12c Clustering Factor.txt

简介: [20130809]12c Clustering Factor.txt以前在11G以前,如果使用assm,表的CF经常会很大,即使你插入的像顺序号这样的字段,由于多个会话同时操作,插入的数据分布的不同的块中,以顺序号为索引的CF也会变得很大,甚至接近记录的数量。
[20130809]12c Clustering Factor.txt

以前在11G以前,如果使用assm,表的CF经常会很大,即使你插入的像顺序号这样的字段,由于多个会话同时操作,
插入的数据分布的不同的块中,以顺序号为索引的CF也会变得很大,甚至接近记录的数量。这个在《基于成本的优化》里面也有介绍。

但是在12g可以设置一个参数改善这种情况,做一些测试看看。 参考了Richard Foote大师的blog:
http://richardfoote.wordpress.com/2013/05/08/important-clustering-factor-calculation-improvement-fix-you/

1.测试环境:

SQL> @ver
BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production

SQL> create table bowie (id number, name varchar2(30));
Table created.

SQL> create sequence bowie_seq order;
Sequence created.

CREATE OR REPLACE PROCEDURE bowie_proc AS
BEGIN
   FOR i IN 1..100000 LOOP
       INSERT INTO bowie VALUES (bowie_seq.NEXTVAL, 'ZIGGY STARDUST');
       COMMIT;
   END LOOP;
END;
/

select table_name, i.tablespace_name, segment_space_managementement
from dba_tables i, dba_tablespaces t   where i.tablespace_name = t.tablespace_name and table_name='BOWIE';

TABLE_NAME TABLESPACE_NAME                SEGMEN
---------- ------------------------------ ------
BOWIE      USERS                          AUTO

2.建立过程,调用job,注意bowie_proc;后面有分号。

CREATE OR REPLACE PROCEDURE do_bowie_proc
AS
   v_jobno   NUMBER := 0;
BEGIN
   FOR i IN 1 .. 4
   LOOP
      DBMS_JOB.submit (v_jobno, 'bowie_proc;', SYSDATE);
   END LOOP;

   COMMIT;
END;
/

SQL> exec do_bowie_proc
PL/SQL procedure successfully completed.

SQL> SELECT COUNT (*)  FROM dba_jobs_running WHERE ROWNUM
  COUNT(*)
----------
         0

--等待job结束。

SQL> select count(*) from bowie ;
  COUNT(*)
----------
    400000


3.建立索引:
SQL> create index bowie_id_i on bowie(id);
Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.

SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';

TABLE_NAME INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
---------- ------------------------------ ---------- ---------- -----------------
BOWIE      BOWIE_ID_I                           1630     400000            352954

--可以发现CLUSTERING_FACTOR=352954,非常接近NUM_ROWS=40000.按照这样讲数据非常离散。

SQL> set autot traceonly
SQL> select * from bowie where id between 42 and 540;
499 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |   500 | 10000 |   445   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |   500 | 10000 |   445   (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1425  consistent gets
          0  physical reads
          0  redo size
       5480  bytes sent via SQL*Net to client
        566  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        499  rows processed

-- 我们查询返回499条,占500/400000=0.00125,很少的一部分oracle选择的是全表扫描而非使用索引。
-- 注:我测试执行计划改变的边界select * from bowie where id between 42 and 539;使用索引。
-- 也许你可能情况有点不同,但是误差不会太大。

SQL> select * from bowie where id between 42 and 539;
498 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |   499 |  9980 |   445   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |   499 |  9980 |   445   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |   499 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        435  consistent gets
          0  physical reads
          0  redo size
      12925  bytes sent via SQL*Net to client
        566  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        498  rows processed

--逻辑读435,明显比全表扫描少,全表扫描逻辑读1425.另外注意执行计划12c叫 TABLE ACCESS BY INDEX ROWID BATCHED,
--这里的BATCHED表示什么? 自己不是很清楚。

http://richardfoote.wordpress.com/2013/05/08/important-clustering-factor-calculation-improvement-fix-you/

    Once applied (the following demo is on a patched 11.2.0.3 database), there is a new statistics collection preference
that can be defined, called TABLE_CACHED_BLOCKS. This basically sets the number of table blocks we can assume would
already be cached when performing an index scan and can be ignored when incrementing the CF during statistics gathering.
The default is 1 (i.e. as performed presently) but can be set up to be a value between 1 and 255, meaning during the
collection of index statistics, it will not increment the CF if the table block being referenced by the current index
entry has already been referenced by any of the prior 255 index entries (if set to 255). It basically sets the
appropriate parameter in the sys_op_countchg function used to calculate the CF value during statistic gathering to not
increment the CF if the current table block has already been accessed "x" index entries previously.

--我不翻译,避免错误!大概意思是可以定义一个参数TABLE_CACHED_BLOCKS,值是cache的数量,缺省是1.范围是1-255.
--文章的作者执行的过程是3次。而我是4次。
--我的感觉oracle应该采用一个百分比更合适一些,或者2者都支持,
--也许理解有误差,英文实在太差!!

column cascade format a30
column degree format a10
column estimate_percent format a30
column method_opt format a30
column no_invalidate format a30
column granularity format a10
column publish format a10
column INCREMENTAL format a10
column STALE_PERCENT format a10
column AUTOSTATS_TARGET format a10
column TABLE_CACHED_BLOCKS format a10

SELECT DBMS_STATS.get_param ('CASCADE') CASCADE,
       DBMS_STATS.get_param ('DEGREE') DEGREE,
       DBMS_STATS.get_param ('ESTIMATE_PERCENT') estimate_percent,
       DBMS_STATS.get_param ('METHOD_OPT') method_opt,
       DBMS_STATS.get_param ('NO_INVALIDATE') no_invalidate,
       DBMS_STATS.get_param ('GRANULARITY') granularity,
       DBMS_STATS.get_param ('PUBLISH') publish,
       DBMS_STATS.get_param ('INCREMENTAL') incremental,
       DBMS_STATS.get_param ('STALE_PERCENT') stale_percent,
       DBMS_STATS.get_param ('AUTOSTATS_TARGET') autostats_target,
       DBMS_STATS.get_param ('TABLE_CACHED_BLOCKS') TABLE_CACHED_BLOCKS
  FROM DUAL;


CASCADE                  DEGREE  ESTIMATE_PERCENT             METHOD_OPT                 NO_INVALIDATE               GRANULARIT PUBLISH    INCREMENTA STALE_PERC AUTOSTATS_ TABLE_CACH
------------------------ ------- ---------------------------- -------------------------- --------------------------- ---------- ---------- ---------- ---------- ---------- ----------
DBMS_STATS.AUTO_CASCADE  NULL    DBMS_STATS.AUTO_SAMPLE_SIZE  FOR ALL COLUMNS SIZE AUTO  DBMS_STATS.AUTO_INVALIDATE  AUTO       TRUE       FALSE      10         AUTO       1

--TABLE_CACHED_BLOCKS缺省=1.我的测试表bowie占用的块1630,估计要缓存255块才行,而不是作者的42.
--exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE', pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE', pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);
PL/SQL procedure successfully completed.

--EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.

SELECT t.table_name,
       i.index_name,
       t.blocks,
       t.num_rows,
       i.clustering_factor
  FROM user_tables t, user_indexes i
 WHERE t.table_name = i.table_name AND i.index_name = 'BOWIE_ID_I';

TABLE_NAME INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
---------- ------------------------------ ---------- ---------- -----------------
BOWIE      BOWIE_ID_I                           1630     400000              1383

--CLUSTERING_FACTOR=1383,结果比blocks还要小。说明数据很集中。
--btw,如果设置TABLE_CACHED_BLOCKS=42,结果也一样,不知道为什么。

--清除以前的执行计划。
SQL> alter system flush shared_pool;
System altered.

SQL> set autot traceonly;
SQL> select * from bowie where id between 42 and 540;
499 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |   500 | 10000 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |   500 | 10000 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |   500 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        436  consistent gets
          0  physical reads
          0  redo size
      12949  bytes sent via SQL*Net to client
        566  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        499  rows processed

--取1%的记录看看。
SQL> select * from bowie where id between 42 and 40041;
40000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            | 40001 |   781K|   231   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      | 40001 |   781K|   231   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I | 40001 |       |    92   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      35438  consistent gets
          0  physical reads
          0  redo size
    1028834  bytes sent via SQL*Net to client
       2732  bytes received via SQL*Net from client
        201  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      40000  rows processed

--很明显这样CF设置的太小了。


4.处于好奇,测试不同的TABLE_CACHED_BLOCKS的情况:
我发现
TABLE_CACHED_BLOCKS=2,CF=275559
TABLE_CACHED_BLOCKS=3,CF=1561
--TABLE_CACHED_BLOCKS=3后下降也太快一点,不知道是否存在什么bug。

我以前一般是使用设置参数命令,根据业务并行的情况,使用原CF/并发执行的用户,一般是4-6.
例子:
EXECUTE SYS.DBMS_STATS.set_index_stats (OWNNAME=>user, INDNAME=>'BOWIE_ID_I', clstfct => 352954/4);


SQL> EXECUTE SYS.DBMS_STATS.set_index_stats (OWNNAME=>user, INDNAME=>'BOWIE_ID_I', clstfct => 352954/4);
PL/SQL procedure successfully completed.

SQL> alter system flush shared_pool;
System altered.

SQL> select * from bowie where id between 42 and 2025;
1984 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |  1985 | 39700 |   445   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |  1985 | 39700 |   445   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |  1985 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1574  consistent gets
          0  physical reads
          0  redo size
      49932  bytes sent via SQL*Net to client
        642  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1984  rows processed

--感觉这样合理一些。

目录
相关文章
|
3月前
|
机器学习/深度学习 人工智能
【CatBoost报错解决】CatBoostError: Bad value for num feature[non default doc idx=0,feature idx=19]=
【CatBoost报错解决】CatBoostError: Bad value for num feature[non default doc idx=0,feature idx=19]=
成功解决_catboost.CatBoostError: Invalid cat_features[4] = 8 value: index must be < 8.
成功解决_catboost.CatBoostError: Invalid cat_features[4] = 8 value: index must be < 8.
|
C++
Data Structures and Algorithms (English) - 6-9 Sort Three Distinct Keys(20 分)
Data Structures and Algorithms (English) - 6-9 Sort Three Distinct Keys(20 分)
88 0
Data Structures and Algorithms (English) - 6-10 Sort Three Distinct Keys(30 分)
Data Structures and Algorithms (English) - 6-10 Sort Three Distinct Keys(30 分)
91 0
Data Structures and Algorithms (English) - 6-11 Shortest Path [2](25 分)
Data Structures and Algorithms (English) - 6-11 Shortest Path [2](25 分)
103 0
Data Structures and Algorithms (English) - 6-17 Shortest Path [4](25 分)
Data Structures and Algorithms (English) - 6-17 Shortest Path [4](25 分)
90 0
Data Structures and Algorithms (English) - 6-16 Shortest Path [3](25 分)
Data Structures and Algorithms (English) - 6-16 Shortest Path [3](25 分)
83 0
Data Structures and Algorithms (English) - 6-11 Shortest Path [1](25 分)
Data Structures and Algorithms (English) - 6-11 Shortest Path [1](25 分)
90 0
Data Structures and Algorithms (English) - 6-13 Topological Sort(25 分)
Data Structures and Algorithms (English) - 6-13 Topological Sort(25 分)
86 0
|
存储 缓存 Oracle
TPCH 深入剖析 - part1 Hidden Messages and Lessons Learned from an Influential Benchmark
TPC-H可以说是世界上最为流行的OLAP workload的benchmark程序,无论你看什么样的论文或技术文章,只要是和query processing相关的,大多会在evaluation时使用TPC-H作为评估工具。而如果你从事query optimization/query execution的工作,则怎么都会和TPC-H打上交道,即使是TP型的数据库系统。
308 0
TPCH 深入剖析 - part1 Hidden Messages and Lessons Learned from an Influential Benchmark