If you find that SYSAUX is growing and its size is too big, besides figuring out why this has happened (bug, some purge job disabled, or some structure problems in objects), you need to find the objects and purge them manually. The cause of the problem Two components of Server Manageability (SM) that reside in the SYSAUX tablespaces can cause this problem. The components involved are the Automatic Workload Repository (AWR) and Optimizer Statistics History (OPTSTAT). Both of these components have retention periods associated with their data, and the MMON process should run nightly, as part of the scheduled maintenance tasks, to purge data that exceeds these retention periods. From version 11G onwards, the MMON purging process has been constrained to a time-limited window for each of the purges; if this window is exceeded, then the purging stops and an ORA-12751 error is written to an m000 trace file. For the AWR data, held in tables with names commencing with WRH$, the probable cause is that a number of the tables are partitioned. New partitions are created for these tables as part of the MMON process. Unfortunately, it seems that the partition-splitting process is the final task in the purge process. As the later partitions are not split, they end up containing more data. This results in partition pruning within the purge process, making it less effective. The second component in the AWR data is the WRM$ tables, which are actually metadata, and, in my praxis, even if they’re big, they are easily fixable directly by Oracle…of course, that is when their child WRH$ tables data have previously been fixed. For the OPTSTAT data, held in tables with names commencing with WRI$, the problem is also more likely to be related to the volume of data held in the tables. WRI$ tables hold historical statistical data for all segments in the database for as long as specified by the stats history retention period. Thus, if the database contains a large number of tables with a long retention period – say 30 days – then the purge process will have issues trying to purge all of the old statistics within the specified window. Analyzing the problem One way to analyze the problem is to use the original Oracle script. For this approach you must have access to the DB server—which is not always possible. @?/rdbms/admin/awrinfo Also, this is a fixed script, and not so easy to modify without a deeper understanding of what the script does (if you need to adapt it to your needs). So, I would use a different approach – a small, modified script: break on report compute sum OF MB on report select occupant_desc, space_usage_kbytes/1024 MB from v$sysaux_occupants where space_usage_kbytes > 0 order by space_usage_kbytes; The result is: OCCUPANT_DESC MB ---------------------------------------------------------------- ---------- Automated Maintenance Tasks 0 Oracle Streams 1 Logical Standby 1 OLAP API History Tables 1 Analytical Workspace Object Table 1 PL/SQL Identifier Collection 2 Transaction Layer - SCN to TIME mapping 3 Unified Job Scheduler 11 LogMiner 12 Server Manageability - Other Components 16 Server Manageability - Advisor Framework 304 SQL Management Base Schema 10,113 Server Manageability - Optimizer Statistics History 14,179 Server Manageability - Automatic Workload Repository 361,724 ---------- sum 386,369 With this easy script you focus directly on where to search for solutions. However, another simple, “ top 10 SYS objects by size ” script can show you more details on the object level: col MB for 999G990 col blocks for 9999999999 col segment_name for a30 col partition_name for a30 col segment_type for a20 col tablespace_name for a20 select * from ( select bytes/1024/1024 MB, blocks, s.SEGMENT_NAME, s.partition_name, s.segment_type, s.tablespace_name from dba_segments s where owner='SYS' order by bytes desc ) where rownum = SYSDATE -35; MAX(SNAP_ID) ------------ 259459 So everything below the snap_id=259459 is obsolete data and should be removed . (Please be aware that “should” does not actually indicate that those data “can” be removed!) Now let us see how many records are below that snap_id in table sys. WRH$_SQLTEXT: select /*+ FULL (t) PARALLEL(t, 4) */ count(*) from sys.WRH$_SQLTEXT t where snap_id xxxx,high_snap_id =>zzzz); where you indicate a range of xxxx and zzzz, which you easily can retrieve by query: select min(snap_id),max(snap_id) from sys.WRM$_SNAPSHOT where begin_interval_time = b.start_snap_id) AND (tab.snap_id =(trunc(sysdate)-30); select &&v_snap_id snap_id from dual; COL a_dbid new_value v_dbid; SELECT TO_CHAR(dbid) a_dbid FROM gv$database where inst_id=1; select &&v_dbid dbid from dual; exec dbms_workload_repository.drop_snapshot_range(low_snap_id => 1, high_snap_id=>&&v_snap_id); select dbms_stats.get_stats_history_availability from dual; exec dbms_stats.purge_stats(sysdate-30); select dbms_stats.get_stats_history_availability from dual; exec dbms_stats.gather_table_stats('SYS','WRM$_DATABASE_INSTANCE'); exec dbms_stats.gather_table_stats('SYS','WRM$_SNAPSHOT'); exec dbms_stats.gather_table_stats('SYS','WRM$_SNAPSHOT_DETAILS'); …which is important to run at the end . And this part fixes WRM$ tables, directly by Oracle! The log of the execute action is in the master.lst file. I will show just the end of the script: 17:06:34 SQL>exec dbms_stats.gather_table_stats('SYS','WRH$_TABLESPACE'); --last fix statement of last script PL/SQL procedure successfully completed. Elapsed: 00:00:00.23 17:06:34 SQL> 17:06:34 SQL> 17:06:34 SQL>PROMPT Post purge tasks ... Post purge tasks ... 17:06:34 SQL>select dbms_stats.get_stats_history_retention from dual; GET_STATS_HISTORY_RETENTION --------------------------- 31 Elapsed: 00:00:00.41 17:06:34 SQL> 17:06:34 SQL>select dbms_stats.get_stats_history_availability from dual; GET_STATS_HISTORY_AVAILABILITY --------------------------------------------------------------------------- 12-MAY-15 12.57.04.400279000 AM +02:00 Elapsed: 00:00:00.02 17:06:34 SQL> 17:06:34 SQL>column a_snap_id new_value v_snap_id 17:06:34 SQL>select min(snap_id) a_snap_id from sys.WRM$_SNAPSHOT where begin_interval_time>=(trunc(sysdate)-30); A_SNAP_ID ---------- 259129 Elapsed: 00:00:00.04 17:06:34 SQL>select &&v_snap_id snap_id from dual; SNAP_ID ---------- 259129 Elapsed: 00:00:00.01 17:06:34 SQL> 17:06:34 SQL>COL a_dbid new_value v_dbid; 17:06:34 SQL>SELECT TO_CHAR(dbid) a_dbid FROM gv$database where inst_id=1; A_DBID ---------------------------------------- 928736751 Elapsed: 00:00:00.02 17:06:35 SQL>select &&v_dbid dbid from dual; DBID ---------- 928736751 Elapsed: 00:00:00.02 17:06:35 SQL> 17:06:35 SQL>exec dbms_workload_repository.drop_snapshot_range(low_snap_id => 1, high_snap_id=>&&v_snap_id); PL/SQL procedure successfully completed. Elapsed: 00:09:31.26 17:16:06 SQL> 17:16:06 SQL>select dbms_stats.get_stats_history_availability from dual; GET_STATS_HISTORY_AVAILABILITY --------------------------------------------------------------------------- 24-AUG-17 12.57.04.400279000 AM +02:00 Elapsed: 00:00:00.02 17:16:06 SQL> 17:16:06 SQL>exec dbms_stats.purge_stats(sysdate-31); PL/SQL procedure successfully completed. Elapsed: 00:04:37.31 17:20:43 SQL> 17:20:43 SQL>select dbms_stats.get_stats_history_availability from dual; GET_STATS_HISTORY_AVAILABILITY --------------------------------------------------------------------------- 24-AUG-17 05.16.06.000000000 PM +02:00 Elapsed: 00:00:00.02 17:20:43 SQL> 17:20:43 SQL>exec dbms_stats.gather_table_stats('SYS','WRM$_DATABASE_INSTANCE'); PL/SQL procedure successfully completed. Elapsed: 00:00:00.51 17:20:44 SQL> 17:20:44 SQL>exec dbms_stats.gather_table_stats('SYS','WRM$_SNAPSHOT'); PL/SQL procedure successfully completed. Elapsed: 00:00:01.04 17:20:45 SQL> 17:20:45 SQL>exec dbms_stats.gather_table_stats('SYS','WRM$_SNAPSHOT_DETAILS'); PL/SQL procedure successfully completed. Elapsed: 00:00:12.85 17:20:58 SQL> As you can see, in my case the whole action lasted around 10 hours. This depends on the size of your data as well as the speed of your database. The whole master script can be run multiple times as a whole execution (all 115 scripts), but not in parallel . Ideally the start of script would be after the Oracle automation purge job finish Final checking After I run this on my DB, the situation with the SYSAUX occupants was as follows: OCCUPANT_DESC MB ---------------------------------------------------------------- ---------- Automated Maintenance Tasks 0 Oracle Streams 1 Logical Standby 1 OLAP API History Tables 1 Analytical Workspace Object Table 1 PL/SQL Identifier Collection 2 Transaction Layer - SCN to TIME mapping 3 Unified Job Scheduler 11 LogMiner 12 Server Manageability - Other Components 16 Server Manageability - Advisor Framework 304 SQL Management Base Schema 10,113 Server Manageability - Automatic Workload Repository 13,738 Server Manageability - Optimizer Statistics History 14,179 ---------- sum 38,384 From 370GB the space used dropped to 13.7GB , which is 3.70% of the remaining space. Graphically it looks like this ( before and after the purge ): Talking in number of records ( before and after ): 1,176,879,395 records 44,627,169 records Which is 3.8% of the remaining records! Imagine what the overhead on the database would be to delete 1.1 billion records in a classic delete action. The horror! As you see, 269,394MB of space has been recovered and returned to the system. Purge job After all is done and fixed, check that the purge job is present and enabled. If it is not, you can create it with a simple script, which is necessary to ensure the purging of data automatically: BEGIN sys.dbms_scheduler.create_job( job_name => '"SYS"."PURGE_OPTIMIZER_STATS"', job_type => 'PLSQL_BLOCK', job_action => 'begin dbms_stats.purge_stats(sysdate-3); end;', repeat_interval => 'FREQ=DAILY;BYHOUR=6;BYMINUTE=0;BYSECOND=0', start_date => systimestamp at time zone 'Europe/Paris', job_class => '"DEFAULT_JOB_CLASS"', comments => 'job to purge old optimizer stats', auto_drop => FALSE, enabled => TRUE); END; Last but not least I must admit that in my purge solution, there is a place where some new AWR data will not be saved for short time. If you look in the code: create table WRH$_RSRC_PLAN_2 tablespace SOME_TABLESPACE as ( SELECT * FROM sys.WRH$_RSRC_PLAN tab WHERE (tab.dbid = &&v_dbid AND tab.snap_id >= &&v_snap_id) OR EXISTS (SELECT 1 FROM sys.wrm$_baseline b WHERE b.dbid = &&v_dbid AND tab.snap_id >= b.start_snap_id AND tab.snap_id <= b.end_snap_id)) ; truncate table sys.WRH$_RSRC_PLAN drop storage; select /*+ PARALLEL(t, 4) */ count(*) from WRH$_RSRC_PLAN_2 t ; So, from the moment when the script starts to create the WRH$_2 replica table and the time when it truncate original one, all new inserted records in original table will be lost with the truncate. This is not a problem from my point of view, because this is not so long and all other data are still inserting (in other WRH$ tables), and if something small is missing…in AWR that might not be such a big problem – it will be purged in 30 days. To be 100% sure that there will be no AWR activity in the database while you perform maintenance, Oracle suggests doing maintenance with the database opened in restricted mode, and then after maintenance returning the DB to the normal opened state: shutdown immediate; startup restrict; master.sql; shutdown immediate; startup; But this is general advice and IMHO more oriented to segments that have child/parent records, which was not the case here. Another tip is to stop AWR (the database can be online--no need to restart it), but this is not really necessary either, in my opinion. Don’t touch any partitions, regardless of whether or not they are empty, because Oracle will drop them (if needed) by itself. So, the general approach for any environment that cannot be handled normally with the Oracle purge call is: 1) trace the Oracle purge job 2) create scripts in order from the trace results (avoiding WRM$ tables) 3) use truncate in scripts while preserving records that are needed (period and baselines) 4) run all the scripts sequentially as sys user 5) enable or start the Oracle automation purge job, which should fix the remaining unfixed records If you have questions or enhancements to this discussion, I’m open for talk at any time. Doc references How to Purge WRH$_SQL_PLAN Table in AWR Repository, Occupying Large Space in SYSAUX Tablespace. (Doc ID 1478615.1) High Storage Consumption for LOBs in SYSAUX Tablespace (Doc ID 396502.1) Hope this helps someone. Cheers! Damir Vadas
↧