Introduction In my last article , we have explored about the architecture of GIMR in 12c. This article describes about the various options available to manage and maintain the Grid Infrastructure Management repository (GIMR) . Oracle provides a command line utility called OCLUMON (Oracle Cluster Monitor) which is part of the CHM (Cluster Health Monitor) component and can be used to perform miscellaneous administrative tasks like changing the debug levels of logs, changing repository size/retention, querying repository path, etc. Apart from OCLUMON utility, we have a set of SRVCTL commands which can be used to perform various administrative tasks on the management repository resources. In the upcoming sections, we are going to explore both OCLUMON and SRVCTL utilities for administrating GIMR repository and its resources. How to find repository version Cluster Health Monitor (CHM) is the primary component, which collects Clusterware diagnostic data and persists those data in repository database (MGMTDB). Oracle provides an utility called OCLUMON , which can be used to manage CHM components as well as its associated diagnostic repository. We can use the following command to find the version of OCLUMON utility, which in turn tells us the version of CHM and its repository. ---// command to find OCLUMON version //--- $GRID_HOME/bin/oclumon version Example: ---// checking CHM version //--- myracserver1 {/home/oracle}: oclumon version Cluster Health Monitor (OS), Version 12.1.0.2.0 - Production Copyright 2007, 2014 Oracle. All rights reserved. How to find repository location CHM persists the diagnostic data in the management repository database (MGMTDB), which consists of a set of datafiles. We can use the following OCLUMON command to locate the database file (datafile) in the MGMTDB database which is associated with the GIMR repository ---// command to find repository path //--- $GRID_HOME/bin/oclumon manage -get reppath Example: ---// locating GIMR repository path //--- myracserver2 {/home/oracle}: oclumon manage -get reppath CHM Repository Path = /data/clusterfiles/_MGMTDB/datafile/o1_mf_sysmgmtd__374325064041_.dbf myracserver2 {/home/oracle}: By having this output we can also verify that, this file actually belong to the pluggable (PDB) database created during the MGMTDB database creation ---// ---// validating repository path against MGMTDB //--- ---// SQL> select con_id,name,open_mode 2 from v$pdbs 3 where con_id= 4 ( 5 select con_id 6 from v$datafile 7 where name='/data/clusterfiles/_MGMTDB/datafile/o1_mf_sysmgmtd__374325064041_.dbf' 8 ); CON_ID NAME OPEN_MODE ---------- ------------------------------ ---------- 3 MY_RAC_CLUSTER READ WRITE How to find repository size/retention The diagnostic data in the GIMR repository database is retained based on the size/retention defined for the repository. Once the size/retention threshold is reached, the diagnostic data is overwritten. We can use the following OCLUMON command to find the current size of GIMR repository. ---// command to find repository size/retention //--- $GRID_HOME/bin/oclumon manage -get repsize Example: ---// finding repository retention/size //--- myracserver2 {/home/oracle}: oclumon manage -get repsize CHM Repository Size = 136320 seconds myracserver2 {/home/oracle}: Here is the catch. OCLUMON never shows the size of repository in terms of storage units (KB/MB/GB), rather it displays the size of the repository in terms of duration (in seconds). This duration indicates the retention time of the repository data. OCLUMON basically queries the size of the repository and then determines how long it can retain data for the current repository size and displays that information to the user. To know the actual size of the repository, we can query the database directly as shown below ---// query MGMTDB database to find repository size //--- SQL> alter session set container=MY_RAC_CLUSTER; Session altered. SQL> show con_name CON_NAME ------------------------------ MY_RAC_CLUSTER SQL> select TABLESPACE_NAME,FILE_NAME,BYTES/1024/1024 Size_MB,MAXBYTES/1024/1024 Max_MB,AUTOEXTENSIBLE 2 from dba_data_files 3 where file_name='/data/clusterfiles/_MGMTDB/datafile/o1_mf_sysmgmtd__374325064041_.dbf'; TABLESPACE_NAME FILE_NAME SIZE_MB MAX_MB AUT ---------------- ------------------------------------------------------------------------- ---------- ---------- --- SYSMGMTDATA /data/clusterfiles/_MGMTDB/datafile/o1_mf_sysmgmtd__374325064041_.dbf 2048 0 NO Note: Replace container name with Cluster Name and file_name with the output of reppath. We can see our repository is 2 GB in size and the datafile associated with the repository is not AUTOEXTENSIBLE. Observation: Oracle by default creates the repository with 2 GB size (136320 secs retention) for a 2 node cluster regardless of space availability on the underlying file system. How to change repository size We may want to retain the diagnostic data for a specific number of days. In that case, we can increase (change) the repository size to accommodate more diagnostic data using the following OCLUMON command. ---// command to change repository size //--- $GRID_HOME/bin/oclumon manage -repsos changerepossize Example: ---// changing repository size //--- myracserver2 {/home/oracle}: oclumon manage -repos changerepossize 2200 The Cluster Health Monitor repository was successfully resized.The new retention is 146400 seconds. myracserver2 {/home/oracle}: This command acts in dual mode where it first resizes the repository with the specified size (MB) and then recalculates the retention of the repository based on the new repository size. As we can see here, since we had increased the size of repository from 2048 MB (default) to 2200 MB; Oracle has recalculated the retention against the new size and increased it from 136320 seconds (default) to 146400 seconds We can also validate the retention, following a resize operation. ---// validating new repository size/retention //--- myracserver2 {/home/oracle}: oclumon manage -get repsize CHM Repository Size = 146400 seconds myracserver2 {/home/oracle}: Internals on repository resize operation What Oracle did to the MGMTDB database during the resize operation? Well, here is what it did. ---// ---// impact of size change in the repository database //--- ---// SQL> select TABLESPACE_NAME,FILE_NAME,BYTES/1024/1024 Size_MB,MAXBYTES/1024/1024 Max_MB,AUTOEXTENSIBLE 2 from dba_data_files 3 where file_name='/data/clusterfiles/_MGMTDB/datafile/o1_mf_sysmgmtd__374325064041_.dbf'; TABLESPACE_NAME FILE_NAME SIZE_MB MAX_MB AUT ---------------- ------------------------------------------------------------------------ ---------- ---------- --- SYSMGMTDATA /data/clusterfiles/_MGMTDB/datafile/o1_mf_sysmgmtd__374325064041_.dbf 2200 0 NO It has re-sized the datafile in the database internally. We can also verify the same by viewing the MGMTDB database alert log file How to change repository retention Technically, there is no command available to change the retention of the data stored in the repository. However, there is a alternative way to do that. We can use OCLUMON utility to check whether if a desired retention can be set for the repository using the following command. ---// command to check if a specific retention can be set //--- $GRID_HOME/bin/oclumon manage -repos checkretentiontime Example: ---// checking if retention 260000 secs can be set //--- myracserver2 {/home/oracle}: oclumon manage -repos checkretentiontime 260000 The Cluster Health Monitor repository is too small for the desired retention. Please first resize the repository to 3908 MB I know, you have figured it out! I wanted to change the retention of the repository to 260000 secs. I have used the command "oclumon manage -repos checkretentiontime 260000" to see if that retention can be set. Oracle just came back to me and asked to increase the size of repository to 3908 MB in order to be able to set that retention. Here is the simple interpretation. Changing the repository retention period is a two phase process. Use checkretentiontime to find how much more space needs to be added to the repository to satisfy the desired retention Use changerepossize to change the size of the repository in order to meet the desired retention. If the desired retention is less than the current retention, then checkretentiontime will show an output like below ---// myracserver2 {/home/oracle}: oclumon manage -repos checkretentiontime 136320 The Cluster Health Monitor repository can support the desired retention for 2 hosts How to purge repository data There is no need to manually purge the repository as it is automatically taken care by the cluster logger service (ologgerd) based on the repository size and retention setup. However, if desired we can simulate a purge of the repository by decreasing the repository size using OCLUMON changerepossize command as shown below ---// trick to manually purge repository data //--- myracserver2 {/home/oracle}: oclumon manage -repos changerepossize 100 Warning: Entire data in Cluster Health Monitor repository will be deleted.Do you want to continue(Yes/No) ? No Operation aborted on user request What we tried to do here is, we tried to decrease the size of the GIMR repository which will in turn delete all the data stored in the repository. Once the data is purged, we can revert the repository size to the required value. How to locate cluster logger service We know that the cluster logger service (ologgerd) of the Cluster Health Monitor (CHM) component is the service responsible for persisting the diagnostic data collected by the system monitor service (osysmond) in the repository (MGMTDB). There is one Cluster Logger Service (ologgerd) running per 32 nodes in a cluster. We can use the following OCLUMON commands to query where the cluster logger services (ologgerd) are running. ---// commands to locate cluster logger services //--- $GRID_HOME/bin/oclumon manage -get alllogger -details (Lists all logger services available in the cluster) $GRID_HOME/bin/oclumon manage -get mylogger (Lists the logger service for the current cluster node) Example: ---// listing all logger services in the cluster //--- myracserver2 {/home/oracle}: oclumon manage -get alllogger -details Logger = myracserver2 Nodes = myracserver1,myracserver2 In this particular example, I have only one Cluster logger service (ologgerd) running on node myracserver2 for my cluster and is logging diagnostic data for nodes myracserver1 and myracserver2. How to change logging level We know that Cluster Health Monitor (CHM) monitors real-time Operating system and Clusterware metrics and logs them in the GIMR repository dayabase. By default, CHM logging level is set to 1 which collects basic diagnostic data. At times we may need to change the CHM logging level to collect extended diagnostic data. That can be done using the following OCLUMON command ---// command to change CHM logging levels //--- $GRID_HOME/bin/oclumon debug [log daemon module:log_level] The supported daemon and their respective modules with log_level are listed below DAEMON MODULE LOG LEVEL osysmond CRFMOND, CRFM, allcomp 0, 1, 2, 3 ologgerd CRFLOGD, CRFLDREP, CRFM, allcomp 0, 1, 2, 3 client OCLUMON, CRFM, allcomp 0, 1, 2, 3 all allcomp 0, 1, 2, 3 Example: The following command sets the logging level of cluster logger service (ologgerd) to level 3 ---// changing CHM loggerd logging to level 3 //--- myracserver2 {/home/oracle}: oclumon debug log ologgerd CRFLOGD:3 Manage repository resources with SRVCTL commands With the introduction of GIMR, we have two additional resources ora.mgmtdb and ora.MGMTLSNR which are added in the Clusterware stack. Oracle provides a dedicated set of SRVCTL commands to monitor and manage these two new clusterware resources. Following are the new set of SRVCTL commands which are specific to GIMR resources (MGMTDB and MGMTLSNR) ---// list of srvctl commands available to operate on GIMR resources //--- myracserver2 {/home/oracle}: srvctl -h | grep -i mgmt | sort | awk -F ":" '{print $2}' srvctl add mgmtdb [-domain ] srvctl add mgmtlsnr [-endpoints "[TCP srvctl config mgmtdb [-verbose] [-all] srvctl config mgmtlsnr [-all] srvctl disable mgmtdb [-node ] srvctl disable mgmtlsnr [-node ] srvctl enable mgmtdb [-node ] srvctl enable mgmtlsnr [-node ] srvctl getenv mgmtdb [-envs "[,...]"] srvctl getenv mgmtlsnr [ -envs "[,...]"] srvctl modify mgmtdb [-pwfile ] [-spfile ] srvctl modify mgmtlsnr -endpoints "[TCP srvctl relocate mgmtdb [-node ] srvctl remove mgmtdb [-force] [-noprompt] [-verbose] srvctl remove mgmtlsnr [-force] srvctl setenv mgmtdb {-envs "=[,...]" | -env ""} srvctl setenv mgmtlsnr { -envs "=[,...]" | -env "="} srvctl start mgmtdb [-startoption ] [-node ] srvctl start mgmtlsnr [-node ] srvctl status mgmtdb [-verbose] srvctl status mgmtlsnr [-verbose] srvctl stop mgmtdb [-stopoption ] [-force] srvctl stop mgmtlsnr [-node ] [-force] srvctl unsetenv mgmtdb -envs "[,..]" srvctl unsetenv mgmtlsnr -envs "[,...]" srvctl update mgmtdb -startoption myracserver2 {/home/oracle}: Lets go though a few examples to get ourselves familiarized with these new set of commands. We can use the SRVCTL STATUS command to find the current status of repository database and listener as shown below. ---// checking MGMTDB status //--- myracserver2 {/home/oracle}: srvctl status mgmtdb Database is enabled Instance -MGMTDB is running on node myracserver2 ---// checking MGMTLSNR status //--- myracserver2 {/home/oracle}: srvctl status mgmtlsnr Listener MGMTLSNR is enabled Listener MGMTLSNR is running on node(s): myracserver2 We can use the SRVCTL CONFIG commands to find out the current configuration of repository database and listener as shown below. ---// finding configuration of MGMTDB //--- myracserver2 {/home/oracle}: srvctl config mgmtdb Database unique name: _mgmtdb Database name: Oracle home: Oracle user: oracle Spfile: /data/clusterfiles/_mgmtdb/spfile-MGMTDB.ora Password file: Domain: Start options: open Stop options: immediate Database role: PRIMARY Management policy: AUTOMATIC Type: Management PDB name: my_rac_cluster PDB service: my_rac_cluster Cluster name: my-rac-cluster Database instance: -MGMTDB ---// finding configuration of MGMTLSNR //--- myracserver2 {/home/oracle}: srvctl config MGMTLSNR Name: MGMTLSNR Type: Management Listener Owner: oracle Home: End points: TCP:1521 Management listener is enabled. Management listener is individually enabled on nodes: Management listener is individually disabled on nodes: Note: It is not recommended to modify the default configuration of MGMTDB. However, we may choose to modify the default configuration of MGMTLSNR to change the listener port (by default listens on port 1521) if desired as shown below. ---// change listener port for MGMTLSNR //--- myracserver2 {/home/oracle}: srvctl modify MGMTLSNR -endpoints "TCP:1540" ---// validate new MGMTLSNR configuration //--- myracserver2 {/home/oracle}: srvctl config MGMTLSNR Name: MGMTLSNR Type: Management Listener Owner: oracle Home: End points: TCP:1540 Management listener is enabled. Management listener is individually enabled on nodes: Management listener is individually disabled on nodes: Similarly, we can use the other set of commands like SRVCTL MODIFY to change MGMTDB and MGTLSNR properties, SRVCTL SETENV to set specific environment for MGMTDB and MGMTLSNR, SRVCTL DISABLE to disable MGMTDB and MGMTLSNR resources, SRVCTL REMOVE to remove MGMTDB and MGMTLSNR from clusterware stack and so on. How to perform manual failover (relocation) of repository resources The management repository resources ( ora.mgmtdb and ora.MGMTLSNR ) are entirely managed by the Clusterware stack, which takes care of failing over the repository database resources to other available node when the hosting node fails. However, we can also manually failover these resources to other cluster nodes when desired. We can make use of SRVCTL RELOCATE MGMTDB command to relocate the repository database resources from one cluster node to another cluster node as shown below. ---// command to relocate repository resources //--- srvctl relocate mgmtdb -node Example: ---// we have two node cluster with nodes myracserver1 and myracserver2 //--- myracserver2 {/home/oracle}: olsnodes myracserver1 myracserver2 ---// repository database resources are running on myracserver2 //--- myracserver2 {/home/oracle}: srvctl status mgmtdb Database is enabled Instance -MGMTDB is running on node myracserver2 myracserver2 {/home/oracle}: srvctl status mgmtlsnr Listener MGMTLSNR is enabled Listener MGMTLSNR is running on node(s): myracserver2 ---// relocating repository database resources to myracserver1 //--- myracserver2 {/home/oracle}: srvctl relocate mgmtdb -node myracserver1 ---// validate the repository resources are relocated //--- myracserver2 {/home/oracle}: srvctl status mgmtdb Database is enabled Instance -MGMTDB is running on node myracserver1 myracserver2 {/home/oracle}: srvctl status mgmtlsnr Listener MGMTLSNR is enabled Listener MGMTLSNR is running on node(s): myracserver1 Relocating the repository database MGMTDB also results in automatic relocation of the repository database listener as seen in the previous example. This type of manual relocation would be very useful during planned maintenance of the hosting cluster node. Conclusion In this article, we have explored various options available to administer and manage the Grid Infrastructure Management repository as well as seen few tricks that can be used to alter the repository attributes/characteristics based on specific requirements. Oracle provides a rich set of commands to monitor and manage the repository and its associated clusterware components.
↧