Category - TUNING

The Index Clustering Factor - Demystified.

Optimizer Statistics Concepts

The optimizer estimates the cost for the access, determines the cost for possible plans, and then picks the execution plan with the lowest cost.

We can categorize stats collections as per Object Types.



We will discuss Index Clustering Factor as we can see in above screenshot Clustering factor is type of INDEX STATISTICS.

Index Clustering Factor

The index clustering factor measures the physical grouping of rows in relation to an index value, such as last name.

The index clustering factor helps the optimizer decide whether an index scan or full table scan is more efficient for certain queries.A low clustering factor indicates an efficient index scan.

A clustering factor that is close to the number of blocks in a table indicates that the rows are physically ordered in the table blocks by the index key.
A clustering factor that is close to the number of rows indicates that the rows are scattered randomly across the database blocks in relation to the index key.

The clustering factor is a property of a specific index, not a table.
If multiple indexes exist on a table, then the clustering factor for one index might be small while the factor for another index is large.
An attempt to reorganize the table to improve the clustering factor for one index may degrade the clustering factor of the other index.

So in other terms we can say that Clustering factor is represantation of  number by which Index key value is Snchronized with table Data in manner or order.
It's calclute number of I/Os which will be required to access datablocks if entry accessed using Index key value.

Clustering factor is the relationship between the order of the table and the order of the index.
Indexes are always stored in key value sequence. Tables are stored in the order of the inserts…generally.

Low Clustering Factor 
The minimum possible clustering factor is equal to the number of distinct  blocks identified through the index’s list of rowid’s.  
An index with a low clustering_factor is closely aligned with the table and related rows reside together inside each data block, making indexes very desirable for optimal access.


Low Clustering Factor Index -> as we can see for emp_age=80 only Block1 has to be scanned in Table Block and hence resulting Lower Index Clustering Factor.
                                                          And will take Single I/O
High Clustering Factor
The maximum clustering factor is the number of entries in the index i.e. each rowid points to a different block in the table.  
An index with a high clustering factor is out-of-sequence with the rows in the table and large index range scans will consume lots of I/O.


High Clustering Factor Index -> as we can see for emp_age=80, Block1 & Block3 & Block4  has to be scanned in Table Block and hence resulting Higher Index Clustering Factor.
                                                        And will take three I/Os.

Demo of Clustering Factor

For this demo we will be creating 2 tables among which first table will be ordered entry and second table will be having random allocation in table blocks.

=> Creating Tables with ordered record 
SQL> create table low_clustering (l_id number, l_txt char(10));
           begin
           for i in 1..200 loop
               insert into low_clustering select i, lpad('M', 10, 'M')
               from    dba_objects where rownum < 50;
           end loop;
          end;
          /
Table created.

PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
SQL> select count(*) from low_clustering;
  COUNT(*)
----------
      9800
=> Creating Tables with Random ordered record 
SQL> create table high_clustering  as select * from low_clustering  order by dbms_random.random;
Table created.

SQL> select count(*) from high_clustering;

  COUNT(*)
----------
      9800
=> Gather stats for both tables
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'LOW_CLUSTERING');
PL/SQL procedure successfully completed.

EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'HIGH_CLUSTERING');
SQL> 
PL/SQL procedure successfully completed.

=> Check block and Number of records for TABLES

SQL> SELECT    TABLE_NAME, NUM_ROWS, BLOCKS
FROM      USER_TABLES
WHERE     TABLE_NAME IN ('LOW_CLUSTERING','HIGH_CLUSTERING');  

TABLE_NAME			NUM_ROWS	BLOCKS
-------------------------------------------------------
HIGH_CLUSTERING 	9800	    31
LOW_CLUSTERING		9800	    28

=> Creating Indexes for both tables

SQL> CREATE INDEX LOW_CLS_IDX ON LOW_CLUSTERING(L_ID);
Index created.
SQL> CREATE INDEX HIGH_CLS_IDX ON HIGH_CLUSTERING(L_ID);
Index created.

=> Check Clustering Factor for Both indexes as below.

SQL> SELECT INDEX_NAME, TABLE_NAME,BLEVEL, LEAF_BLOCKS, CLUSTERING_FACTOR
FROM   USER_INDEXES
WHERE  TABLE_NAME IN ('LOW_CLUSTERING','HIGH_CLUSTERING');  2    3  

INDEX_NAME		       TABLE_NAME				    BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------------------------------------- ---------- ----------
HIGH_CLS_IDX		       HIGH_CLUSTERING					 1	    20		    4560
LOW_CLS_IDX		       LOW_CLUSTERING					 1	    20		      35

SQL> 
As we can see Clustering factor for LOW_CLS_IDX is very low as compared to HIGH_CLS_IDX.

Clustering Factor for LOW_CLS_IDX  => As clustering Factor 35 is near to Block count of TABLE LOW_CLUSTERING ,hence it is good candidate for Index Scan,

Clustering Factor for HIGH_CLS_IDX => As clustering Factor is 4560 which is compararitvle very close to NUMBER of ROWS in table and hence it is candidate for Full table scan,

Let we test above statements as below,

=> Using LOW_CLUSTERING table

SQL> explain plan for select * from LOW_CLUSTERING where l_id=1;
Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
---------
Plan hash value: 1092861473
------------------------------------------------------------------------------------------------------
| Id  | Operation			    | Name	     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT		    |		     |	  49 |	 686 |	   2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| LOW_CLUSTERING |	  49 |	 686 |	   2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN		    | LOW_CLS_IDX    |	  49 |	     |	   1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("L_ID"=1)

14 rows selected.

=> Using HIGH_CLUSTERING table
SQL>  explain plan for select * from HIGH_CLUSTERING  where l_id=1;
Explained.

SQL>  SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
------------
Plan hash value: 3779316751
-------------------------------------------------------------------------------------
| Id  | Operation	  | Name	    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |		    |	 49 |	686 |	 10   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| HIGH_CLUSTERING |	 49 |	686 |	 10   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("L_ID"=1)
13 rows selected.

As we can see both table is having same data but due to Index Clustering Factor ,Query may go with FULL TABLE SCAN even there is index on that table.
Let we see cost of HIGH_CLS_IDX and why this not picked by optimizer.

SQL> SELECT /*+ index(HIGH_CLUSTERING HIGH_CLS_IDX) */ * FROM HIGH_CLUSTERING WHERE L_ID=1;

SQL> select * from table(dbms_xplan.display_cursor());
PLAN_TABLE_OUTPUT
-----------
SQL_ID	gku74xupndzbg, child number 0
-------------------------------------
SELECT /*+ index(HIGH_CLUSTERING HIGH_CLS_IDX) */ * FROM
HIGH_CLUSTERING WHERE L_ID=1

Plan hash value: 2701133531
-------------------------------------------------------------------------------------------------------
| Id  | Operation			    | Name	      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT		    |		      |       |       |    24 (100)|	      |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| HIGH_CLUSTERING |    49 |   686 |    24	(0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN		    | HIGH_CLS_IDX    |    49 |       |     1	(0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("L_ID"=1)
20 rows selected.
SQL> 

So as we can see due to High cost in selection on INDEX , HIGH_CLS_IDX got dithed by optimizer.

Solution for High CF

There are multiple solutions available for reducing High CF. I will be covering creation of Base table using ORDERD BY cluase for the same as below.

=> Create new table using ORDER BY on Index Key value as below.

SQL> create table HIGH_clustering1  as select * from low_clustering  order by L_ID;
Table created.

=> Create Index for the same and gethering stats
SQL> CREATE INDEX HIGH_CLS_IDX1 ON HIGH_CLUSTERING1(L_ID);
Index created.

SQL> exec dbms_stats.gather_table_stats(USER, 'HIGH_CLUSTERING1',estimate_percent => 100, method_opt=> 'for all indexed columns size 254');
PL/SQL procedure successfully completed.

=> Verifying Details as below.
SELECT INDEX_NAME, TABLE_NAME,BLEVEL, LEAF_BLOCKS, CLUSTERING_FACTOR
FROM   USER_INDEXES
WHERE  TABLE_NAME IN ('LOW_CLUSTERING','HIGH_CLUSTERING','HIGH_CLUSTERING1');

INDEX_NAME				 TABLE_NAME				      BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
---------------------------------------- ---------------------------------------- ---------- ----------- -----------------
HIGH_CLS_IDX				 HIGH_CLUSTERING				   1	      20	      4560
HIGH_CLS_IDX1				 HIGH_CLUSTERING1				   1	      20		27
LOW_CLS_IDX				 LOW_CLUSTERING 				   1	      20		35

SQL> 

SQL> SELECT    TABLE_NAME, NUM_ROWS, BLOCKS
FROM      USER_TABLES
WHERE     TABLE_NAME IN ('LOW_CLUSTERING','HIGH_CLUSTERING','HIGH_CLUSTERING1');
  2    3  
TABLE_NAME				   NUM_ROWS	BLOCKS
---------------------------------------- ---------- ----------
HIGH_CLUSTERING 			       9800	    31
HIGH_CLUSTERING1			       9800	    31
LOW_CLUSTERING				       9800	    28

=> Selecting Values and checking plan selection.

SQL> SELECT  * FROM ABHI_TEST.HIGH_CLUSTERING1 WHERE L_ID=1;

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------
SQL_ID	0d3rybv1fdb4v, child number 0
-------------------------------------
SELECT	* FROM ABHI_TEST.HIGH_CLUSTERING1 WHERE L_ID=1
Plan hash value: 2167766801
--------------------------------------------------------------------------------------------------------
| Id  | Operation			    | Name	       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT		    |		       |       |       |     2 (100)|	       |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| HIGH_CLUSTERING1 |    49 |   686 |     2	 (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN		    | HIGH_CLS_IDX1    |    49 |       |     1	 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("L_ID"=1)
19 rows selected.
Hence we can see Index getting selected by optimzer due to lower Clustering Factor as result of Ordered by cluase on Index key value.