Category - TUNING

Role of Selectivity in Index Scan and Full Scan

The Selectivity - Why INDEX is Unselective.

The Selectivity plays major role in Selection on INDEX during Query Optimization.
The selectivity represents fraction of rows in row-set results.
The row-set could be table / views / joins from other tables.
Selectivity is tightly coupled with PREDICATES as like where name=’abhi’.
Selectivity ranges from 0.0 to 1.0.
A selectivity of 0.0 means that no rows are selected from a row set, whereas a selectivity of 1.0 means that all rows are selected.
A predicate becomes more selective as the value approaches 0.0 and less selective (or more unselective) as the value approaches 1.0.

 

Selective and Unselective Indexes

An index could become Selective if

There are less Duplicate data in INDEXED columns.
Data uniqueness is pretty good
There are very less count of number for each distinct value in Index.

An Index could become Unselective if

There are large Duplicate data in INDEXES columns.
Data uniqueness is very bad
There are very large counts of numbers for each distinct value in INDEX.

INDEX UNSELECTIVITY CAUSE FULL TABLE SCAN

The Optimizer may choose FULL table scan on basis of INDEX selectivity. If Index in unselective it will choose FULL TABLE SCAN.

DEMOSTRATION

As below example states that there are total 2500 Rows in table EMP_NAMES and rows distribution as per screenshot.

SQL> select emp_lname,count(*) from emp_names group by emp_lname;

EMP_LNAME    COUNT(*)
---------- ----------
REDDY		  300
DIWAN		 1500
TIWARI		  500
SINGH		 4200

There is index on column emp_lname as in screenshot.

SQL> SET LINE 500 PAGES 500
SQL> COL INDEX_NAME FOR A30
SQL> COL COLUMN_NAME FOR A20
SQL> COL TABLE_NAME FOR A20
SQL> select index_name,column_name,table_name from dba_ind_columns where table_name='EMP_NAMES';

INDEX_NAME		       COLUMN_NAME	    TABLE_NAME
------------------------------ -------------------- --------------------
IDX_SELECTIVE		    EMP_LNAME	    EMP_NAMES

Let we Predict SELECTIVITY with below SQL for predicates.
SQL> @selectivity_check.sql

EMP_LNAME  COUNT_OF_EMP_LNAME TOTAL_RECORDS SELECTIVITY_RATIO SELECTIVITY
---------- ------------------ ------------- ----------------- -----------
REDDY			  300	       6500	   .046153846 SELECTIVE
DIWAN			 1500	       6500	   .230769231 UNSELECTIVE
TIWARI			  500	       6500	   .076923077 SELECTIVE
SINGH			 4200	       6500	   .646153846 UNSELECTIVE

Now we will select data for each and every values and test whether INDEX is being used or not as per SELECTIVITY.

AS we select EMP_LNAME=“DIWAN” which is having SELECTIVITY of 0.2 which is NEAR to UNSELECTIVITY ,hence INDEX is  NOT being used.

SQL> EXPLAIN PLAN FOR SELECT * FROM EMP_NAMES WHERE EMP_LNAME='DIWAN';

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT
--------------
Plan hash value: 2618431616
-------------------------------------------------------------------------------
| Id  | Operation	  | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	      |  1497 | 14970 |     7	(0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP_NAMES |  1497 | 14970 |     7	(0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("EMP_LNAME"='DIWAN')

13 rows selected.

Now as we used predicate EMP_LNAME=’REDDY’ which is having selectivity ratio of .04 ,hence INDEX is being used for that selection as per below screenshots.

SQL> EXPLAIN PLAN FOR SELECT * FROM EMP_NAMES WHERE EMP_LNAME='REDDY';
Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT
----------
Plan hash value: 4062229589

-----------------------------------------------------------------------------------------------------
| Id  | Operation			    | Name	    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT		    |		    |	284 |  2840 |	  3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| EMP_NAMES     |	284 |  2840 |	  3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN		    | IDX_SELECTIVE |	286 |	    |	  2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("EMP_LNAME"='REDDY')

14 rows selected.

After using predicate EMP_LNAME=’SINGH’ results as below screenshot.

SQL>  EXPLAIN PLAN FOR SELECT * FROM EMP_NAMES WHERE EMP_LNAME='SINGH';

Explained.

SQL>  SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT
----------
Plan hash value: 2618431616
-------------------------------------------------------------------------------
| Id  | Operation	  | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	      |  4167 | 41670 |     7	(0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP_NAMES |  4167 | 41670 |     7	(0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("EMP_LNAME"='SINGH')

13 rows selected.

Below is selectivity script ** this is only for testing and demo **

SQL> !cat selectivity_check.sql

Select distinct EMP_LNAME,count(EMP_LNAME) COUNT_OF_EMP_LNAME,(SELECT COUNT(*) FROM EMP_NAMES) TOTAL_RECORDS ,
Count(EMP_LNAME)/(select count(*) from emp_names) SELECTIVITY_RATIO ,
CASE WHEN Count(EMP_LNAME)/(select count(*) from emp_names) > .1 
THEN 'UNSELECTIVE' ELSE 'SELECTIVE' END AS SELECTIVITY FROM emp_names group by EMP_LNAME;