Optimizer Statistics Concepts
The optimizer estimates the cost for the access, determines the cost for possible plans, and then picks the execution plan with the lowest cost. We can categorize stats collections as per Object Types. We will discuss Index Clustering Factor as we can see in above screenshot Clustering factor is type of INDEX STATISTICS. Index Clustering Factor The index clustering factor measures the physical grouping of rows in relation to an index value, such as last name. The index clustering factor helps the optimizer decide whether an index scan or full table scan is more efficient for certain queries.A low clustering factor indicates an efficient index scan. A clustering factor that is close to the number of blocks in a table indicates that the rows are physically ordered in the table blocks by the index key. A clustering factor that is close to the number of rows indicates that the rows are scattered randomly across the database blocks in relation to the index key. The clustering factor is a property of a specific index, not a table. If multiple indexes exist on a table, then the clustering factor for one index might be small while the factor for another index is large. An attempt to reorganize the table to improve the clustering factor for one index may degrade the clustering factor of the other index. So in other terms we can say that Clustering factor is represantation of number by which Index key value is Snchronized with table Data in manner or order. It's calclute number of I/Os which will be required to access datablocks if entry accessed using Index key value. Clustering factor is the relationship between the order of the table and the order of the index. Indexes are always stored in key value sequence. Tables are stored in the order of the inserts…generally. Low Clustering Factor The minimum possible clustering factor is equal to the number of distinct blocks identified through the index’s list of rowid’s. An index with a low clustering_factor is closely aligned with the table and related rows reside together inside each data block, making indexes very desirable for optimal access. Low Clustering Factor Index -> as we can see for emp_age=80 only Block1 has to be scanned in Table Block and hence resulting Lower Index Clustering Factor. And will take Single I/O High Clustering Factor The maximum clustering factor is the number of entries in the index i.e. each rowid points to a different block in the table. An index with a high clustering factor is out-of-sequence with the rows in the table and large index range scans will consume lots of I/O. High Clustering Factor Index -> as we can see for emp_age=80, Block1 & Block3 & Block4 has to be scanned in Table Block and hence resulting Higher Index Clustering Factor. And will take three I/Os.
Demo of Clustering Factor
For this demo we will be creating 2 tables among which first table will be ordered entry and second table will be having random allocation in table blocks. => Creating Tables with ordered record SQL> create table low_clustering (l_id number, l_txt char(10)); begin for i in 1..200 loop insert into low_clustering select i, lpad('M', 10, 'M') from dba_objects where rownum < 50; end loop; end; / Table created. PL/SQL procedure successfully completed. SQL> commit; Commit complete. SQL> select count(*) from low_clustering; COUNT(*) ---------- 9800 => Creating Tables with Random ordered record SQL> create table high_clustering as select * from low_clustering order by dbms_random.random; Table created. SQL> select count(*) from high_clustering; COUNT(*) ---------- 9800 => Gather stats for both tables SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'LOW_CLUSTERING'); PL/SQL procedure successfully completed. EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'HIGH_CLUSTERING'); SQL> PL/SQL procedure successfully completed. => Check block and Number of records for TABLES SQL> SELECT TABLE_NAME, NUM_ROWS, BLOCKS FROM USER_TABLES WHERE TABLE_NAME IN ('LOW_CLUSTERING','HIGH_CLUSTERING'); TABLE_NAME NUM_ROWS BLOCKS ------------------------------------------------------- HIGH_CLUSTERING 9800 31 LOW_CLUSTERING 9800 28 => Creating Indexes for both tables SQL> CREATE INDEX LOW_CLS_IDX ON LOW_CLUSTERING(L_ID); Index created. SQL> CREATE INDEX HIGH_CLS_IDX ON HIGH_CLUSTERING(L_ID); Index created. => Check Clustering Factor for Both indexes as below. SQL> SELECT INDEX_NAME, TABLE_NAME,BLEVEL, LEAF_BLOCKS, CLUSTERING_FACTOR FROM USER_INDEXES WHERE TABLE_NAME IN ('LOW_CLUSTERING','HIGH_CLUSTERING'); 2 3 INDEX_NAME TABLE_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ ---------------------------------------- ---------- ---------- HIGH_CLS_IDX HIGH_CLUSTERING 1 20 4560 LOW_CLS_IDX LOW_CLUSTERING 1 20 35 SQL> As we can see Clustering factor for LOW_CLS_IDX is very low as compared to HIGH_CLS_IDX. Clustering Factor for LOW_CLS_IDX => As clustering Factor 35 is near to Block count of TABLE LOW_CLUSTERING ,hence it is good candidate for Index Scan, Clustering Factor for HIGH_CLS_IDX => As clustering Factor is 4560 which is compararitvle very close to NUMBER of ROWS in table and hence it is candidate for Full table scan, Let we test above statements as below, => Using LOW_CLUSTERING table SQL> explain plan for select * from LOW_CLUSTERING where l_id=1; Explained. SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); PLAN_TABLE_OUTPUT --------- Plan hash value: 1092861473 ------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 49 | 686 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| LOW_CLUSTERING | 49 | 686 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | LOW_CLS_IDX | 49 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("L_ID"=1) 14 rows selected. => Using HIGH_CLUSTERING table SQL> explain plan for select * from HIGH_CLUSTERING where l_id=1; Explained. SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); PLAN_TABLE_OUTPUT ------------ Plan hash value: 3779316751 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 49 | 686 | 10 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| HIGH_CLUSTERING | 49 | 686 | 10 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("L_ID"=1) 13 rows selected. As we can see both table is having same data but due to Index Clustering Factor ,Query may go with FULL TABLE SCAN even there is index on that table. Let we see cost of HIGH_CLS_IDX and why this not picked by optimizer. SQL> SELECT /*+ index(HIGH_CLUSTERING HIGH_CLS_IDX) */ * FROM HIGH_CLUSTERING WHERE L_ID=1; SQL> select * from table(dbms_xplan.display_cursor()); PLAN_TABLE_OUTPUT ----------- SQL_ID gku74xupndzbg, child number 0 ------------------------------------- SELECT /*+ index(HIGH_CLUSTERING HIGH_CLS_IDX) */ * FROM HIGH_CLUSTERING WHERE L_ID=1 Plan hash value: 2701133531 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 24 (100)| | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| HIGH_CLUSTERING | 49 | 686 | 24 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | HIGH_CLS_IDX | 49 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("L_ID"=1) 20 rows selected. SQL> So as we can see due to High cost in selection on INDEX , HIGH_CLS_IDX got dithed by optimizer.
Solution for High CF
There are multiple solutions available for reducing High CF. I will be covering creation of Base table using ORDERD BY cluase for the same as below. => Create new table using ORDER BY on Index Key value as below. SQL> create table HIGH_clustering1 as select * from low_clustering order by L_ID; Table created. => Create Index for the same and gethering stats SQL> CREATE INDEX HIGH_CLS_IDX1 ON HIGH_CLUSTERING1(L_ID); Index created. SQL> exec dbms_stats.gather_table_stats(USER, 'HIGH_CLUSTERING1',estimate_percent => 100, method_opt=> 'for all indexed columns size 254'); PL/SQL procedure successfully completed. => Verifying Details as below. SELECT INDEX_NAME, TABLE_NAME,BLEVEL, LEAF_BLOCKS, CLUSTERING_FACTOR FROM USER_INDEXES WHERE TABLE_NAME IN ('LOW_CLUSTERING','HIGH_CLUSTERING','HIGH_CLUSTERING1'); INDEX_NAME TABLE_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR ---------------------------------------- ---------------------------------------- ---------- ----------- ----------------- HIGH_CLS_IDX HIGH_CLUSTERING 1 20 4560 HIGH_CLS_IDX1 HIGH_CLUSTERING1 1 20 27 LOW_CLS_IDX LOW_CLUSTERING 1 20 35 SQL> SQL> SELECT TABLE_NAME, NUM_ROWS, BLOCKS FROM USER_TABLES WHERE TABLE_NAME IN ('LOW_CLUSTERING','HIGH_CLUSTERING','HIGH_CLUSTERING1'); 2 3 TABLE_NAME NUM_ROWS BLOCKS ---------------------------------------- ---------- ---------- HIGH_CLUSTERING 9800 31 HIGH_CLUSTERING1 9800 31 LOW_CLUSTERING 9800 28 => Selecting Values and checking plan selection. SQL> SELECT * FROM ABHI_TEST.HIGH_CLUSTERING1 WHERE L_ID=1; SQL> select * from table(dbms_xplan.display_cursor()); PLAN_TABLE_OUTPUT -------- SQL_ID 0d3rybv1fdb4v, child number 0 ------------------------------------- SELECT * FROM ABHI_TEST.HIGH_CLUSTERING1 WHERE L_ID=1 Plan hash value: 2167766801 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 2 (100)| | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| HIGH_CLUSTERING1 | 49 | 686 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | HIGH_CLS_IDX1 | 49 | | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("L_ID"=1) 19 rows selected. Hence we can see Index getting selected by optimzer due to lower Clustering Factor as result of Ordered by cluase on Index key value.
© 2021 Ace2Oracle. All Rights Reserved | Developed By IBOX444