Category - TUNING

Online Statistics Gathering

Online Statistics Gathering

In some circumstances, DDL and DML operations automatically trigger online statistics gathering.Few Operations are mentioned as below

=> Online Statistics Gathering during CTAS (Create Table As Select)

CTAS
The database can gather table statistics automatically during the following types of bulk loads CREATE TABLE AS SELECT  as we can see below.

SQL> SELECT COUNT(*) FROM EMP;

COUNT(*) 
---------- 
9003 

SQL> create table stats_test as select * from emp;
Table created.

SQL> SELECT NUM_ROWS,last_analyzed,TABLE_NAME from dba_Tables where table_name='STATS_TEST';

  NUM_ROWS LAST_ANAL TABLE_NAME
---------- --------- --------------------
      9003 31-MAR-22 STATS_TEST
SQL> 

=> Online Statistics Gathering during INSERT INTO ...SELECT

Direct Path Insert
The database can gather table statistics automatically during the following types of bulk loads: INSERT INTO ... SELECT using a direct path insert.

By default, a parallel insert uses a direct path insert. You can force a direct path insert by using the /*+APPEND*/ hint. 

SQL>  create table online_stats as select * from EMP where 1=2;
Table created.

SQL> SELECT NUM_ROWS,last_analyzed,TABLE_NAME from dba_Tables where table_name='ONLINE_STATS';
  NUM_ROWS LAST_ANAL TABLE_NAME
---------- --------- --------------------
	 0 31-MAR-22 ONLINE_STATS

SQL> insert  /*+ APPEND */  INTO online_stats SELECT * FROM EMP;
9003 rows created.

SQL> COMMIT;
Commit complete.

SQL> SELECT NUM_ROWS,last_analyzed,TABLE_NAME from dba_Tables where table_name='ONLINE_STATS';
  NUM_ROWS LAST_ANAL TABLE_NAME
---------- --------- --------------------
      9003 31-MAR-22 ONLINE_STATS
SQL> 

Now let we test without using APPEND hints.

SQL>  drop table online_stats;
Table dropped.

SQL> create table online_stats as select * from EMP where 1=2;
Table created.

SQL> SELECT NUM_ROWS,last_analyzed,TABLE_NAME from dba_Tables where table_name='ONLINE_STATS';
  NUM_ROWS LAST_ANAL TABLE_NAME
---------- --------- --------------------
	 0 31-MAR-22 ONLINE_STATS

SQL> insert  INTO online_stats SELECT * FROM EMP;
9003 rows created.

SQL> COMMIT;
Commit complete.

SQL>  SELECT NUM_ROWS,last_analyzed,TABLE_NAME from dba_Tables where table_name='ONLINE_STATS';
  NUM_ROWS LAST_ANAL TABLE_NAME
---------- --------- --------------------
	 0 31-MAR-22 ONLINE_STATS

SQL> 
As we can see without APPEND hints NUM_ROWS are not updated.

=> Online Global Statistics When working with Partitioned Tables

When inserting rows into a partitioned table, the database gathers global statistics during the insert.
However, the database does not gather partition-level statistics.

SQL> select count(*) from TEST;
  COUNT(*)
----------
       101
SQL>
SQL> CREATE TABLE global_stat
(id            NUMBER)
PARTITION BY RANGE (ID)
(PARTITION less_than_50 VALUES LESS THAN (51),
 PARTITION less_than_100 VALUES LESS THAN (101));
Table created.

SQL> INSERT /*+ APPEND */ INTO global_stat SELECT ID FROM  test;
101 rows created.

SQL> commit;
Commit complete.

SQL>  select table_name,last_analyzed,NUM_ROWS from dba_tables where table_name='GLOBAL_STAT';
TABLE_NAME           LAST_ANAL   NUM_ROWS
-------------------- --------- ----------
GLOBAL_STAT          05-APR-22        101

SQL>
COLUMN table_name FORMAT A20
COLUMN partition_name FORMAT A20

SELECT table_name,
       partition_name,
       num_rows
FROM   dba_tab_partitions
WHERE  table_name = 'GLOBAL_STAT'
ORDER BY partition_name;

TABLE_NAME           PARTITION_NAME         NUM_ROWS
-------------------- -------------------- ----------
GLOBAL_STAT          LESS_THAN_100
GLOBAL_STAT          LESS_THAN_50

When insert rows into a specific partition or subpartition on a partitioned table,the database gathers statistics on the partition 
during the insert.However, the database does not gather global statistics.

SQL> select count(*) from TEST;
  COUNT(*)
----------
       101
       
SQL> CREATE TABLE only_partition_stat
(id            NUMBER)
PARTITION BY RANGE (ID)
(PARTITION less_than_50 VALUES LESS THAN (51),
 PARTITION less_than_100 VALUES LESS THAN (101)); 
Table created.

SQL>
SQL> select table_name,last_analyzed,NUM_ROWS from dba_tables where table_name='ONLY_PARTITION_STAT';
TABLE_NAME           LAST_ANAL   NUM_ROWS
-------------------- --------- ----------
ONLY_PARTITION_STAT

SQL> INSERT /*+ APPEND */ INTO ONLY_PARTITION_STAT PARTITION (less_than_50) SELECT ID FROM  test WHERE ID < 51;
51 rows created.

SQL> COMMIT;
Commit complete.

SQL>  select table_name,last_analyzed,NUM_ROWS from dba_tables where table_name='ONLY_PARTITION_STAT';
TABLE_NAME           LAST_ANAL   NUM_ROWS
-------------------- --------- ----------
ONLY_PARTITION_STAT

SQL> SELECT table_name,
       partition_name,
       num_rows
FROM   dba_tab_partitions
WHERE  table_name = 'ONLY_PARTITION_STAT'
ORDER BY partition_name; 
TABLE_NAME           PARTITION_NAME         NUM_ROWS
-------------------- -------------------- ----------
ONLY_PARTITION_STAT  LESS_THAN_100
ONLY_PARTITION_STAT  LESS_THAN_50                 51

=> Hints for Online stat gathering 

By default, the database gathers statistics during bulk loads as we have seen above.

We can enable the feature at the statement level by using the GATHER_OPTIMIZER_STATISTICS hint. 
We can disable the feature at the statement level by using the NO_GATHER_OPTIMIZER_STATISTICS hint. 

SQL> CREATE TABLE TEST1 AS SELECT * FROM TEST WHERE 1=2;
Table created.

SQL>
SQL> SELECT NUM_ROWS,last_analyzed,TABLE_NAME from dba_Tables where table_name='TEST1';

  NUM_ROWS LAST_ANAL TABLE_NAME
---------- --------- ----------------------------------------
                     TEST1

SQL> insert  /*+ APPEND */  INTO test1 SELECT /*+ NO_GATHER_OPTIMIZER_STATISTICS */ * FROM test;
101 rows created.

SQL> commit;
Commit complete.

SQL>  SELECT NUM_ROWS,last_analyzed,TABLE_NAME from dba_Tables where table_name='TEST1';
 NUM_ROWS LAST_ANAL TABLE_NAME
---------- --------- ----------------------------------------
                     TEST1
SQL>

=> Restrictions for Online Statistics Gathering for Bulk Loads

-> Specifically, bulk loads do not gather statistics automatically when any of the following conditions applies to the target table, 
partition, or subpartition:
-> The object contains data. Bulk loads only gather online statistics automatically when the object is empty.
-> It is in an Oracle-owned schema such as SYS.
-> It is one of the following types of tables: nested table, index-organized table (IOT), external table, or global temporary table 
defined as ON COMMIT DELETE ROWS.
-> Its statistics are locked.
-> It is loaded using a multitable INSERT statement.

Real-Time Statistics Gathering (New in 19c and Availalble on Exadata)

As we have seen above if we are using Direct Insert and CTAS in that case Bulk Load operation will collect statistics.
But for Conventional Load no statistics are being updated .
So to correct this Oracle Database came up with real-time statistics gathering during conventional DML operations 
and this is valid for Exadata flavors only  as per Licensing Information.

Oracle Database 19c introduces real-time statistics, which extend online support to conventional DML statements. 
Because statistics can go stale between DBMS_STATS jobs, real-time statistics help the optimizer generate more optimal plans.
Whereas bulk load operations gather all necessary statistics, real-time statistics augment rather than replace traditional statistics. 
For this reason, you must continue to gather statistics regularly using DBMS_STATS, preferably using the AutoTask job.

Enabling Exa Feature for TESTING Purpose

As stated earlier this is Exa Feature let we convert Our Test database to support Exadata Feature as below.

SQL> alter system set "_exadata_feature_on"=true scope=spfile;
System altered.

SQL> shutdown immediate;

Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup;

ORACLE instance started.
Total System Global Area 3388995104 bytes
Fixed Size            8902176 bytes
Variable Size          687865856 bytes
Database Buffers     2684354560 bytes
Redo Buffers            7872512 bytes
Database mounted.
Database opened.
SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
[oracle@ace2oracledb ~]$

Real-Time stats Testing and Setups

Step 1=> Create table as below.
SQL> CREATE TABLE ONLINE_REAL_STATS (
  emp_id           NUMBER,
  emp_desc  VARCHAR2(50)
); 

Table created.

Note -> Before getting Real-Time Statistics normal statistics is reuqired. So we will first try to insert data with APPEND hint to populate statistics.
Step 2=> Insert Sample record using APPEND HINTS .

SQL> INSERT /*+ APPEND */  INTO REAL_STATS
SELECT level+1000 AS emp_id,'Description of ' || level AS emp_desc
FROM   dual
CONNECT BY level <= 1000;
1000 rows created.

SQL> commit;
Commit complete.

SQL> SELECT NUM_ROWS,last_analyzed,TABLE_NAME from dba_Tables where table_name='REAL_STATS';
  NUM_ROWS LAST_ANAL TABLE_NAME
---------- --------- -------------------------------------------------
      1000 22-MAY-22 REAL_STATS

SQL> SELECT NUM_ROWS, BLOCKS, NOTES 
FROM   USER_TAB_STATISTICS
WHERE  TABLE_NAME = 'REAL_STATS';
  NUM_ROWS     BLOCKS NOTES
---------- ---------- -------------------------
      1000	    7

SQL> SET PAGESIZE 5000
SET LINESIZE 200
COL COLUMN_NAME FORMAT a13 
COL LOW_VALUE FORMAT a14
COL HIGH_VALUE FORMAT a14
COL NOTES FORMAT a25
COL PARTITION_NAME FORMAT a13

SELECT COLUMN_NAME, LOW_VALUE, HIGH_VALUE, SAMPLE_SIZE, NOTES
FROM   USER_TAB_COL_STATISTICS
WHERE  TABLE_NAME = 'REAL_STATS'
ORDER BY 1, 5;

COLUMN_NAME   LOW_VALUE      HIGH_VALUE     SAMPLE_SIZE NOTES
------------- -------------- -------------- ----------- -------------------------
EMP_DESC      44657363726970 44657363726970	   1000 STATS_ON_LOAD
	      74696F6E206F66 74696F6E206F66
	      2031	     20393939
EMP_ID	      C20B02	     C215		   1000 STATS_ON_LOAD

SQL> 

Step 3=> As we can see Normal Statistics are now available let we  Insert Sample record now as below .

SQL> INSERT  INTO REAL_STATS
SELECT level+100 AS emp_id,
       'Description of ' || level AS emp_desc
FROM   dual
CONNECT BY level <= 100; 

100 rows created.

SQL> commit;
Commit complete.

SQL> 

Step 4=> Check Statistics post above operation , And we can see in Note Section "STATS_ON_CONVENTIONAL_DML"

SQL> SET PAGESIZE 5000
SET LINESIZE 200
COL COLUMN_NAME FORMAT a13 
COL LOW_VALUE FORMAT a14
COL HIGH_VALUE FORMAT a14
COL NOTES FORMAT a25
COL PARTITION_NAME FORMAT a13

SELECT COLUMN_NAME, LOW_VALUE, HIGH_VALUE, SAMPLE_SIZE, NOTES
FROM   USER_TAB_COL_STATISTICS
WHERE  TABLE_NAME = 'REAL_STATS'
ORDER BY 1, 5;

SELECT NUM_ROWS, BLOCKS, NOTES 
FROM   USER_TAB_STATISTICS
WHERE  TABLE_NAME = 'REAL_STATS';

SQL> SQL> SQL> SQL> SQL> SQL> SQL> SQL>   2    3    4  
COLUMN_NAME   LOW_VALUE      HIGH_VALUE     SAMPLE_SIZE NOTES
------------- -------------- -------------- ----------- -------------------------
EMP_DESC      44657363726970 44657363726970	      3 STATS_ON_CONVENTIONAL_DML
	      74696F6E206F66 74696F6E206F66
	      2031	     20393939
EMP_DESC      44657363726970 44657363726970	   1000 STATS_ON_LOAD
	      74696F6E206F66 74696F6E206F66
	      2031	     20393939
EMP_ID	      C20217	     C215		      3 STATS_ON_CONVENTIONAL_DML
EMP_ID	      C20B02	     C215		   1000 STATS_ON_LOAD


Step 5=> If we fetch records from table and check estimation by Optimizer we can see 
         Estimation is Not correct as we still did not Flushed stats.

SQL> select count(*) from REAL_STATS;

  COUNT(*)
----------
      1100

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
SQL_ID	fdjt9jym54d10, child number 0
-------------------------------------
select count(*) from REAL_STATS
Plan hash value: 1309114426
-------------------------------------------------------------------------
| Id  | Operation	   | Name	| Rows	| Cost (%CPU)| Time	|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |		|	|     4 (100)|		|
|   1 |  SORT AGGREGATE    |		|     1 |	     |		|
|   2 |   TABLE ACCESS FULL| REAL_STATS |  1000 |     4   (0)| 00:00:01 |
-------------------------------------------------------------------------
14 rows selected.

Step 6=>Let we FLush Statistics and retry to execute above query .
         We can see Estimation is correct as below.

SQL> select /*Testing*/ count(*) from REAL_STATS;

  COUNT(*)
----------
      1100
SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------
SQL_ID	gqpps182p8wad, child number 0
-------------------------------------
select /*Testing*/ count(*) from REAL_STATS
Plan hash value: 1309114426

-------------------------------------------------------------------------
| Id  | Operation	   | Name	| Rows	| Cost (%CPU)| Time	|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |		|	|     4 (100)|		|
|   1 |  SORT AGGREGATE    |		|     1 |	     |		|
|   2 |   TABLE ACCESS FULL| REAL_STATS |  1100 |     4   (0)| 00:00:01 |
-------------------------------------------------------------------------
Note
-----
   - dynamic statistics used: statistics for conventional DML
18 rows selected.
SQL> 

Step 7=>  Check if REAL-TIME STATISTICS being GATHERED on CONVENTIONAL DML
       Let we insert few more records and see explain plan as below.
       And as we can see REAL-TIME STATISTICS being GATHERED on CONVENTIONAL DML

SQL> INSERT  INTO REAL_STATS
SELECT level+200 AS emp_id,
       'Description of ' || level AS emp_desc
FROM   dual
CONNECT BY level <= 200;

200 rows created.
SQL> COMMIT; 
Commit complete.

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------
SQL_ID	3tb0htwqmk30z, child number 0
-------------------------------------
INSERT	INTO REAL_STATS SELECT level+200 AS emp_id,	   'Description
of ' || level AS emp_desc FROM	 dual CONNECT BY level <= 200
Plan hash value: 1236776825

---------------------------------------------------------------------------------------
| Id  | Operation			 | Name       | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT		 |	      |       |     2 (100)|	      |
|   1 |  LOAD TABLE CONVENTIONAL	 | REAL_STATS |       | 	   |	      |
|   2 |   OPTIMIZER STATISTICS GATHERING |	      |     1 |     2	(0)| 00:00:01 |
|   3 |    CONNECT BY WITHOUT FILTERING  |	      |       | 	   |	      |
|   4 |     FAST DUAL			 |	      |     1 |     2	(0)| 00:00:01 |
---------------------------------------------------------------------------------------
Note
-----
   - dynamic statistics used: statistics for conventional DML
21 rows selected.

SQL> 

Disbaling Exa Feature for TESTING Purpose
SQL> alter system reset "_exadata_feature_on" scope=spfile;
System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup;
ORACLE instance started.

Total System Global Area 3388995104 bytes
Fixed Size		    8902176 bytes
Variable Size		  687865856 bytes
Database Buffers	 2684354560 bytes
Redo Buffers		    7872512 bytes
Database mounted.
Database opened.