Introduction In today's article, I am going to discuss about yet another cool feature introduced with Oracle Database 12c. This feature provides an ability to the data to make itself invisible when desired. You might have already figured out the context of this article. Yes, I am taking about the In Database Archiving (row archival) feature introduced with Oracle database 12c; which lets us archive the data within the same database table without a need to move them to a different archive store. What is In Database Archiving (row archival)? Archiving is generally defined as process of moving INACTIVE data to a different storage device for long term retention. In database archiving (Raw Archival) provides us the feature of marking these INACTIVE data as ARCHIVED within the same table without actually moving them to a separate storage device, which means the data can still be present in the same database tables without visibility to the application queries. This feature is typically useful to applications, where there is a requirement to mark application data as deleted/inactive (archived) without physically deleting (moving them to separate storage) those data. Prior to Oracle Database 12c, this type of requirements were met by defining an additional column in the database table with specific flags indicating particular table record is archived (deleted) and then making necessary adjustments in application queries to check this flag while querying data. Raw Archival also provides additional benefits such as compression and keeping archived data in low tier storage units apart from archiving the data in the same table. In today's article we will explore the basic row archival feature. We will discuss about the additional benefits in a separate article. How to enable row archival In Database Archiving is defined at the table level by means of a new clause called ROW ARCHIVAL . Including this clause in a table definition indicates that the table records are enabled for archiving. A table can be either created as row archival enabled by means of CREATE TABLE statement or can be later enabled for row archival by means of ALTER TABLE command. When we enable a table for row archival, Oracle creates an additional (HIDDEN) column named as ORA_ARCHIVE_STATE for that table. This column (ORA_ARCHIVE_STATE) controls whether a table record is ACTIVE or ARCHIVED. By default the column ORA_ARCHIVE_STATE is set to a value of 0 for each table record, which indicates the data is ACTIVE. Example: Lets quickly go through an example for enabling row archival for a database table ----// ----// Creating table with row archival enabled //---- ----// SQL> create table test_data_archival 2 ( 3 id number not null, 4 name varchar(20) not null, 5 join_date date not null 6 ) 7 row archival; Table created. ----// ----// Populate the table with some data //---- ----// SQL> insert into test_data_archival 2 select rownum, rpad('X',15,'X'), sysdate 3 from dual connect by rownum commit; Commit complete. SQL> select count(*) from test_data_archival; COUNT(*) ---------- 500 In this example we have created a table (test_data_archival) with in database archiving (row archival) enabled and populated it with some dummy data (500 records with ID ranges from 1 to 500). We can also enable row archival for existing tables by specifying the ROW ARCHIVAL clause along with the ALTER TABLE statement as shown below. ----// ----// Enabling row archival for existing tables //---- ----// SQL> alter table test_data_arch row archival; Table altered. Note: Trying to enable row archival for a table which is already enabled for row archival, will result into errors similar to the following ----// SQL> alter table test_data_archival row archival; alter table test_data_archival row archival * ERROR at line 1: ORA-38396: table is already enabled for the ILM feature Validating row archival We can't identify if a table is enabled for row archival by just describing (DESC) the table as that would not show different output for a general table and for table with row archival enabled. Since the column ORA_ARCHIVE_STATE (that controls in database archiving)is in hidden state, it is not displayed using DESC command. ----// ----// DESC command doesn't indicate if row archival is enabled or not //---- ----// SQL> desc test_data_archival Name Null? Type ----------------------------------------- -------- ---------------------------- ID NOT NULL NUMBER NAME NOT NULL VARCHAR2(20) JOIN_DATE NOT NULL DATE However, we can query the DBA/USER/ALL_TAB_COL views to validate if a table is enabled for row archival. If a table has the ORA_ARCHIVE_STATE hidden column listed in these views, then the table is enabled for row archival. ----// ----// query DBA_TAB_COLS to check if we have the HIDDEN column //---- ----// ORA_ARCHIVE_STATE available for the database table //---- ----// SQL> select owner,table_name,column_id,column_name,hidden_column 2 from dba_tab_cols where table_name='TEST_DATA_ARCHIVAL' order by column_id; OWNER TABLE_NAME COLUMN_ID COLUMN_NAME HID ---------- -------------------- ---------- -------------------- --- MYAPP TEST_DATA_ARCHIVAL 1 ID NO 2 NAME NO 3 JOIN_DATE NO ORA_ARCHIVE_STATE YES ---> This column indicates the table is enabled for row archival Another way to check, if a table is defined for row archival is to check the table metadata. If the table metadata has a clause "ILM ENABLE LIFECYCLE MANAGEMENT" , then it indicates that the table is enabled for row archival. However, this is only applicable to Oracle Database 12c release 12.1.0.2 ----// ----// query table metadata to validate row archival enabled or not //---- ----// SQL> select dbms_metadata.get_ddl('TABLE','TEST_DATA_ARCHIVAL') ddl from dual; DDL -------------------------------------------------------------------------------- CREATE TABLE "MYAPP"."TEST_DATA_ARCHIVAL" ( "ID" NUMBER NOT NULL ENABLE, "NAME" VARCHAR2(20) NOT NULL ENABLE, "JOIN_DATE" DATE NOT NULL ENABLE ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "APPDATA" ILM ENABLE LIFECYCLE MANAGEMENT ---> This clause indicates that the table is enabled for row archival ILM (Information Lifecycle Management) is Oracle Database feature that helps to manage data by storing them into different storage and compression tiers based on a Organizations business and performance needs. Raw Archival is ILM feature and hence the table definition has a clause indicating the same. For more details about ILM, please refer here Archiving table data As mentioned earlier, by default Oracle populates the ORA_ARCHIVE_STATE column with a value of 0 (zero), which indicates the table data is in ACTIVE state. This can be verified as follow. ----// ----// validate default value for ORA_ARCHIVE_STATE column //---- ----// SQL> select ora_archive_state,count(*) from test_data_archival group by ora_archive_state; ORA_ARCHIVE_STATE COUNT(*) -------------------- ---------- 0 500 We had populated 500 records in our table and we can see all these records have a value 0 (zero) for the row archival column ORA_ARCHIVE_STATE. This means all of these records are ACTIVE and application queries can access them. To mark a table record ARCHIVED, we need to update the row archival column ORA_ARCHIVE_STATE for that record to a value 1. This is done through a procedure call to DBMS_ILM.ARCHIVESTATENAME . The syntax for marking table record as ARCHIVED is as follows ----// ----// Syntax for archiving data using row archival feature //---- ----// UPDATE table_name SET ORA_ARCHIVE_STATE=DBMS_ILM.ARCHIVESTATENAME(1) where column_predicates Example: Lets say we want to archive the records with ID 100 and 200 in TEST_DATA_ARCHIVAL table. This can be done as follows. ----// ----// Querying records before archiving //---- ----// SQL> select * from test_data_archival where id in (100,200); ID NAME JOIN_DATE ---------- -------------------- --------- 100 XXXXXXXXXXXXXXX 19-SEP-15 200 XXXXXXXXXXXXXXX 19-SEP-15 ----// ----// Archive records with ID 100 and 200 using row archival //---- ----// SQL> update test_data_archival 2 set ora_archive_state=dbms_ilm.archivestatename(1) 3 where id in (100,200); 2 rows updated. SQL> commit; Commit complete. ----// ----// Querying records after archiving //---- ----// SQL> select * from test_data_archival where id in (100,200); no rows selected ----// ----// Row count also excludes the archived records //---- ----// SQL> select count(*) from test_data_archival; COUNT(*) ---------- 498 As we can see, we were able to query the table records before archiving them using row archival feature. However, the records went invisible once we archived them using the row archival feature. These archived records are still present in the database and we can view them if desired as explained in the next section. Note: We can even use the generic update command like " UPDATE table_name SET ORA_ARCHIVE_STATE=0 WHERE column_predicates" to mark the data as ARCHIVED. We can even set the value of ORA_ARCHIVE_STATE to anything other than 0 (zero) to indicate the table record is ARCHIVED. However, the ILM package only recognizes the value 1 (one) as INACTIVE and 0 (zero) as ACTIVE. Setting ORA_ARCHIVE_STATE other than these values may impact the ILM functionalities. Viewing archived records Once we archive table records using the row archival feature (by means of DBMS_ILM.ARCHIVESTATENAME procedure), the records are no longer visible to application queries. However, there is a way we can view those archive records. We can enable a database session to view the archived rows by setting the parameter ROW ARCHIVAL VISIBILITY to the value ALL as shown below ----// ----// Enable database session to view archived records //---- ----// SQL> alter session set ROW ARCHIVAL VISIBILITY=ALL; Session altered. ----// ----// Query the archived records //---- ----// SQL> select * from test_data_archival where id in (100,200); ID NAME JOIN_DATE ---------- -------------------- --------- 100 XXXXXXXXXXXXXXX 19-SEP-15 200 XXXXXXXXXXXXXXX 19-SEP-15 ----// ----// Row count includes archived records too //---- ----// SQL> select count(*) from test_data_archival; COUNT(*) ---------- 500 We can set the same session parameter ROW ARCHIVAL VISIBILITY to the value ACTIVE to prevent a database session from viewing archived records as show below ----// ----// with session's visibility to archived records set to ALL //---- ----// SQL> select count(*) from test_data_archival; COUNT(*) ---------- 500 ----// ----// change session's visibility to for archived records to ACTIVE //---- ----// SQL> alter session set ROW ARCHIVAL VISIBILITY=ACTIVE; Session altered. SQL> select count(*) from test_data_archival; COUNT(*) ---------- 498 Restoring archived data In Database Archival (row archival) makes it very easy to restore the archived data back to its original state. Since the data is archived within the same database table (It is just marked as ARCHIVED), we just need to change the state of the archived record as ACTIVE by setting the row archival column ORA_ARCHIVE_STATE back to the value 0 (zero). This can be done by calling the DBMS_ILM.ARCHIVESTATENAME procedure. However, before the archived data can be marked as ACTIVE (restored), we need to have visibility to the archived data. This is why the restoration of archived data is a two phase process as listed below Change ROW ARCHIVAL VISIBILITY to ALL Restore (mark data as ACTIVE) by updating it through DBMS_ILM.ARCHIVESTATENAME procedure using the following syntax ----// ----// syntax for restoring archived data from row archival //--- ----// UPDATE table_name SET ORA_ARCHIVE_STATE=DBMS_ILM.ARCHIVESTATENAME(0) WHERE column_predicates Example: In the following example, I am restoring the archived record having ID 100 for table TEST_DATA_ARCHIVAL ----// ----// restoring archived record without ROW ARCHIVAL VISIBILITY is not permitted //--- ----// SQL> update test_data_archival 2 set ora_archive_state=dbms_ilm.archivestatename(0) 3 where id=100; 0 rows updated. ----// ----// change ROW ARCHIVAL VISIBILITY to ALL //---- ----// SQL> alter session set ROW ARCHIVAL VISIBILITY=ALL; Session altered. ----// ----// restore (mark record as ACTIVE) archived record with ID=100 //---- ----// SQL> update test_data_archival 2 set ora_archive_state=dbms_ilm.archivestatename(0) 3 where id=100; 1 row updated. SQL> commit; Commit complete. ----// ----// validate if we can query the record with ROW ARCHIVAL VISIBILITY being set to ACTIVE //---- ----// SQL> alter session set ROW ARCHIVAL VISIBILITY=ACTIVE; Session altered. SQL> select * from test_data_archival where id=100; ID NAME JOIN_DATE ---------- -------------------- --------- 100 XXXXXXXXXXXXXXX 19-SEP-15 Disabling Raw Archival We can disable Raw Archival for a table using NO ROW ARCHIVAL clause with ALTER TABLE statement and the syntax is. ----// ----// syntax for disabling Raw Archival for a table //---- ----// ALTER TABLE table_name NO ROW ARCHIVAL; Example: In the following example, I am disabling Raw Archival for table TEST_DATA_ARCHIVAL ----// ----// Record count with row archival being enabled //---- ----// SQL> select count(*) from test_data_archival; COUNT(*) ---------- 499 ----// ----// disable row archival for table //---- ----// SQL> alter table test_data_archival no row archival; Table altered. ----// ----// Check if the hidden column ORA_ARCHIVE_STATE exists //---- ----// SQL> select owner,table_name,column_id,column_name,hidden_column,default_on_null 2 from dba_tab_cols where table_name='TEST_DATA_ARCHIVAL' order by column_id; OWNER TABLE_NAME COLUMN_ID COLUMN_NAME HID DEF ---------- -------------------- ---------- -------------------- --- --- MYAPP TEST_DATA_ARCHIVAL 1 ID NO NO 2 NAME NO NO 3 JOIN_DATE NO NO ----// ----// Record count after disabling Raw Archival //---- ----// SQL> select count(*) from test_data_archival; COUNT(*) ---------- 500 When we disable the Raw Archival for a table, the hidden column ORA_ARCHIVE_STATE gets dropped automatically, which in turn restores all the table records to ACTIVE state and gets visibility to application queries. Copying table (CTAS) with Raw Archival enabled When we create a copy of row archival enabled table with CTAS statement, the resulting tables doesn't get created with row archival enabled. Therefore all the table records become ACTIVE on the resulting table as show below ----// ----// Check count of records in table TEST_DATA_ARCH //---- ----// SQL> select count(*) from TEST_DATA_ARCH; COUNT(*) ---------- 500 ----// ----// Archive few records in the table TEST_DATA_ARCH //---- ----// SQL> update TEST_DATA_ARCH set ORA_ARCHIVE_STATE=1 where id commit; Commit complete. ----// ----// Check the count of records after row archival //----- ----// SQL> select count(*) from TEST_DATA_ARCH; COUNT(*) ---------- 400 ----// ----// Create a new table from TEST_DATA_ARCH using CTAS //---- ----// SQL> CREATE TABLE TEST_DATA_ARCH_COPY1 2 AS SELECT * FROM TEST_DATA_ARCH; Table created. ----// ----// Check the count of records on the resulting table //---- ----// SQL> select count(*) from TEST_DATA_ARCH_COPY1; COUNT(*) ---------- 500 As we can see, even though we had row archival enabled on our source table (TEST_DATA_ARCH), it did not propagate to the resulting table when we created that using CREATE TABLE AS SELECT statement. Conclusion We have explored the row archival (In Database Archiving) feature of Oracle Database 12c and how it can be used as a local archive store for storing INACTIVE data rather than moving the data to a remote archive store. This feature would be very useful for specific set of applications where we have a requirement to mark the data as ARCHIVED within the database itself so that the data is not visible to application queries; however is ready to be restored when desired. Row archival also speeds up the archiving process as we do not have to run expensive select/insert/delete queries to archive table records. We will explore few other aspects (benefits and considerations) of this new feature in a upcoming article. Till then stay tuned...
↧
Wiki Page: Oracle 12c: Data invisibility (archiving) with In Database Archival
↧
Wiki Page: Oracle 12c: Invisibility is now extended to table columns
Introduction Oracle Database 12c has brought a never ending list of new features and today I would like to talk about another new feature from this list. Oracle had introduced invisible indexes in Oracle 11g (11.1) which gave us the power to create an index in INVISIBLE mode and then evaluate its functioning before exposing it to database queries. Oracle has extended that feature of invisibility one step further with the introduction of Oracle database 12c. We can now even create table columns in the INVISIBLE mode, preventing them being exposed to database queries unless explicitly mentioned. Lets walk through this feature and explore, what it has to offer. Making columns invisible We can define a table column in invisible mode either while creating the table using CREATE TABLE statement or later using ALTER TABLE statement. The syntax for defining a column for both of these cases are as follows: ----// ----// syntax to define invisible column with CREATE TABLE //---- ----// CREATE TABLE table_name ( column_name data_type INVISIBLE column_properties ) ----// ----// syntax to make an existing column invisible //---- ----// ALTER TABLE table_name MODIFY column_name INVISIBLE In the following example, I am creating a table called TEST_TAB_INV with two invisible columns with column name CONTACT and ADDRESS respectively. ----// ----// creating table TEST_TAB_INV with two invisible columns //---- ----// SQL> create table TEST_TAB_INV 2 ( 3 id number not null, 4 name varchar2(15) not null, 5 join_date date not null, 6 contact number invisible not null, ----// invisible column, but defined as mandatory //---- 7 address varchar(200) invisible ----// invisible column, defined as optional //---- 8 ); Table created. SQL> alter table TEST_TAB_INV add constraint PK_TEST_TAB_INV primary key (id); Table altered. SQL> As you can observe, I have defined one of the invisible column (CONTACT) as MANADATORY using the NOT NULL option, while defined the other one (ADDRESS) as optional. The intention behind creating two different type of invisible columns is to test the behaviour of this new feature in case of MANADATORY and OPTIONAL column values. Listing invisible columns In general we use the DESCRIBE command to list the columns defined for a table. Lets see, what DESC command shows when we create a table with invisible columns. ----// ----// DESC command doesn't show invisible columns by default //---- ----// SQL> desc TEST_TAB_INV Name Null? Type ----------------------------------------- -------- ---------------------------- ID NOT NULL NUMBER NAME NOT NULL VARCHAR2(15) JOIN_DATE NOT NULL DATE The DESC[RIBE] command is not showing the invisible columns that we had defined during table creation. This is the default behaviour of invisible columns and we need to set COLINVISIBLE to ON to be able to view the invisible columns using DESC command as show below ----// ----// set COLINVISIBLE to ON to be able to list invisible columns with DESC command //---- ----// SQL> SET COLINVISIBLE ON ----// ----// DESC now lists the invisible columns as well //---- ----// SQL> desc TEST_TAB_INV Name Null? Type ----------------------------------------- -------- ---------------------------- ID NOT NULL NUMBER NAME NOT NULL VARCHAR2(15) JOIN_DATE NOT NULL DATE CONTACT (INVISIBLE) NOT NULL NUMBER ADDRESS (INVISIBLE) VARCHAR2(200) We can alternatively query DBA/ALL/USER_TAB_COLS views to find the invisible columns defined for a table as shown below. If a column is marked as YES for the hidden_column property, it is treated as a invisible column. ----// ----// querying invisible columns from dictionary views //---- ----// SQL> select table_name,column_name,column_id,hidden_column from dba_tab_cols where table_name='TEST_TAB_INV'; TABLE_NAME COLUMN_NAME COLUMN_ID HID ------------------------- -------------------- ---------- --- TEST_TAB_INV ID 1 NO NAME 2 NO JOIN_DATE 3 NO CONTACT YES ADDRESS YES As we can observe, Oracle has not allocated any COLUMN_ID for the invisible columns and that is why invisible columns doesn't qualify for column ordering. However, Oracle keeps track of the invisible columns using an internal ID as shown below. ----// ----// Oracle maintains only internal column IDs for invisible columns //---- ----// SQL> select table_name,column_name,column_id,internal_column_id,hidden_column from dba_tab_cols where table_name='TEST_TAB_INV'; TABLE_NAME COLUMN_NAME COLUMN_ID INTERNAL_COLUMN_ID HID ----------------- -------------------- ---------- ------------------ --- TEST_TAB_INV ID 1 1 NO TEST_TAB_INV NAME 2 2 NO TEST_TAB_INV JOIN_DATE 3 3 NO TEST_TAB_INV CONTACT 4 YES TEST_TAB_INV ADDRESS 5 YES Inserting records without column reference Lets try to insert a record in the table TEST_TAB_INV that we had created earlier without referring the column names. In the following example, I am not passing values for the invisible columns CONTACT and ADDRESS. ----// ----// insert record without column_list when one of the invisible column is defined as mandatory //---- ----// However, value is not passed for mandatory invisible column //--- ----// SQL> insert into TEST_TAB_INV values (1,'abbas',sysdate); insert into TEST_TAB_INV values (1,'abbas',sysdate) * ERROR at line 1: ORA-01400: cannot insert NULL into ("MYAPP"."TEST_TAB_INV"."CONTACT") Oracle did not allow me to insert a record, as the column CONTACT was defined as a mandatory column (NOT NULL) even though it was defined as invisible. Ok, lets pass a value for CONTACT column too. ----// ----// insert record without column_list when one of the invisible column is defined as mandatory //---- ----// and value passed for the mandatory invisible column //---- ----// SQL> insert into TEST_TAB_INV values (1,'abbas',sysdate,999999999); insert into TEST_TAB_INV values (1,'abbas',sysdate,999999999) * ERROR at line 1: ORA-00913: too many values We are still not allowed to insert a record. Lets pass the values for all the table columns ----// ----// insert record without column_list but values passed for all columns (visible and invisible) //---- ----// SQL> insert into TEST_TAB_INV values (2,'fazal',sysdate,888888888,'bangalore'); insert into TEST_TAB_INV values (2,'fazal',sysdate,888888888,'bangalore') * ERROR at line 1: ORA-00913: too many values We are still not allowed to insert a record. The reason is, when we try to insert a record without explicitly referring the table columns; Oracle only considers the columns that are visible by default. In the first case of insert statement, we had passed values for all the visible columns. However, since the invisible column CONTACT was defined as mandatory; Oracle did not allow us to insert that record and threw the error ORA-01400: cannot insert NULL into ("MYAPP"."TEST_TAB_INV"."CONTACT") In the second and third case of insert statements, although we had passed additional values for CONTACT and ADDRESS columns; Oracle did not recognize those columns (as those are invisible) and threw the error ORA-00913: too many values . This error indicates that Oracle was expecting less number of column values than what is supplied in the insert statement. Lets change the invisible column CONTACT from mandatory (NOT NULL) to optional (NULL) and check if we are allowed to insert a record without column reference. ----// ----// making all the invisible columns as optional //---- ----// SQL> alter table TEST_TAB_INV modify CONTACT NULL; Table altered. SQL> set COLINVISIBLE ON SQL> desc TEST_TAB_INV Name Null? Type ----------------------------------- -------- ------------------------ ID NOT NULL NUMBER NAME NOT NULL VARCHAR2(15) JOIN_DATE NOT NULL DATE CONTACT (INVISIBLE) NUMBER ---> Invisible and optional (NULL) ADDRESS (INVISIBLE) VARCHAR2(200) ---> Invisible and optional (NULL) Now, lets insert a record without column reference and without passing any values for invisible columns ----// ----// insert record without column_list when all invisible columns are optional //---- ----// SQL> insert into TEST_TAB_INV values (1,'john',sysdate); 1 row created. SQL> commit; Commit complete. Yes, we are now allowed to insert record without column reference. This was possible as all of the invisible columns (CONTACT and ADDRESS) are now allowed to have NULL. Inserting records with column reference When we insert records in a table by referring the table columns, we are allowed to insert data in the invisible columns as well as shown below. ----// ----// insert record in to invisible columns with explicit column reference //---- ----// SQL> insert into TEST_TAB_INV (id,name,join_date,contact) values (2,'mike',sysdate,999999999); 1 row created. ----// ----// insert record in to invisible columns with explicit column reference //---- ----// SQL> insert into TEST_TAB_INV (id,name,join_date,contact,address) values (3,'peter',sysdate,888888888,'bangalore'); 1 row created. SQL> commit; Commit complete. As we can see, even though if a column is defined as invisible; we would still be allowed to populate it with data provided the column is explicitly referred in the insert statements. Query table having invisible columns When we select without column reference (SELECT * FROM) from a table having invisible columns, Oracle only returns the result from the visible columns as show below. ----// ----// select from table having invisible columns, without column reference //---- ----// SQL> select * from TEST_TAB_INV; ID NAME JOIN_DATE ---------- --------------- --------- 1 john 24-SEP-15 2 mike 24-SEP-15 3 peter 24-SEP-15 Oracle internally transforms this query to include only the visible columns #/---- #/---- Oracle transformed the select query to exclude invisible columns -----/ #/---- Final query after transformations:******* UNPARSED QUERY IS ******* SELECT "TEST_TAB_INV"."ID" "ID","TEST_TAB_INV"."NAME" "NAME","TEST_TAB_INV"."JOIN_DATE" "JOIN_DATE" FROM "MYAPP"."TEST_TAB_INV" "TEST_TAB_INV" kkoqbc: optimizing query block SEL$1 (#0) : call(in-use=1136, alloc=16344), compile(in-use=67704, alloc=70816), execution(in-use=2784, alloc=4032) kkoqbc-subheap (create addr=0x2b89a1d1fb78) **************** QUERY BLOCK TEXT **************** select * from TEST_TAB_INV --------------------- However, we can still query the data from invisible columns by explicitly referring the column names in the SELECT clause as show below. ----// ----// selecting data from invisible with explicit column reference //---- ----// SQL> select id,name,join_date,contact,address from TEST_TAB_INV; ID NAME JOIN_DATE CONTACT ADDRESS ---------- --------------- --------- ---------- -------------------- 1 john 24-SEP-15 2 mike 24-SEP-15 999999999 3 peter 24-SEP-15 888888888 bangalore Statistics on Invisible columns Oracle maintains statistics for all the table columns even if a column is defined as invisible as shown below. Invisible columns also qualify for all type of statistics (histograms, extended statistics, etc.) ----// ----// collecting statistics for table with invisible columns //---- ----// SQL> exec dbms_stats.gather_table_stats('MYAPP','TEST_TAB_INV'); PL/SQL procedure successfully completed. ----// ----// Oracle maintains statistics for invisible columns as well //---- ----// SQL> select owner,table_name,column_name,num_distinct,density,last_analyzed 2 from dba_tab_col_statistics where table_name='TEST_TAB_INV'; OWNER TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY LAST_ANAL ---------- -------------------- -------------------- ------------ ---------- --------- MYAPP TEST_TAB_INV ADDRESS 1 1 24-SEP-15 MYAPP TEST_TAB_INV CONTACT 2 .5 24-SEP-15 MYAPP TEST_TAB_INV JOIN_DATE 3 .333333333 24-SEP-15 MYAPP TEST_TAB_INV NAME 3 .333333333 24-SEP-15 MYAPP TEST_TAB_INV ID 3 .333333333 24-SEP-15 Making columns visible We can convert a invisible to visible by modifying the column property using ALTER TABLE statement. The syntax for making a column visible is ----// ----// Syntax for changing a column from INVISIBLE to VISIBLE //---- ----// ALTER TABLE table_name MODIFY column_name VISIBLE; Lets make the column CONTACT visible in our table TEST_TAB_INV and observer what changes the operation brings along. ----// ----// changing column CONTACT in table TEST_TAB_INV to VISIBLE //---- ----// SQL> alter table TEST_TAB_INV modify CONTACT visible; Table altered. ----// ----// DESC command now lists the changed column //---- ----// SQL> desc TEST_TAB_INV Name Null? Type ----------------------------------- -------- ------------------------ ID NOT NULL NUMBER NAME NOT NULL VARCHAR2(15) JOIN_DATE NOT NULL DATE CONTACT NOT NULL NUMBER SQL> SET COLINVISIBLE ON SQL> desc TEST_TAB_INV Name Null? Type ----------------------------------- -------- ------------------------ ID NOT NULL NUMBER NAME NOT NULL VARCHAR2(15) JOIN_DATE NOT NULL DATE CONTACT NOT NULL NUMBER ADDRESS (INVISIBLE) VARCHAR2(200) When we make a column visible, it gets listed with the DESCRIBE command. Further, the column is assigned a column ID as well as the column is marked as NOT HIDDEN which can be verified from DBA/ALL/USER_TAB_COLS view as shown below. ----// ----// column changed to visible, is allocated a column ID //---- ----// and marked as NO for hidden_column flag //---- ----// SQL> select table_name,column_name,column_id,hidden_column from dba_tab_cols where table_name='TEST_TAB_INV'; TABLE_NAME COLUMN_NAME COLUMN_ID HID ------------------------- -------------------- ---------- --- TEST_TAB_INV ADDRESS YES CONTACT 4 NO JOIN_DATE 3 NO NAME 2 NO ID 1 NO As we can observe, when we change a invisible column to visible, it is placed as the last column in the visible column list. Since the column CONTACT is now made visible, it is exposed to SELECT queries (without column reference) as shown below. ----// ----// new visible column is now exposed to SELECT queries (without column reference) //---- ----// SQL> select * from TEST_TAB_INV; ID NAME JOIN_DATE CONTACT ---------- -------------------- --------- ---------- 1 abbas 21-SEP-15 999999999 2 fazal 21-SEP-15 888888888 Indexing Invisible columns We are also allowed to create index on invisible columns the same way we create index for a generic column. ----// ----// creating index on invisible columns //---- ----// SQL> create index idx_TEST_TAB_INV on TEST_TAB_INV (name,contact,address); Index created. Lets check if Oracle is able to use that index. ----// ----// checking if index would be used by optimizer //---- ----/ / SQL> explain plan for select * from TEST_TAB_INV where address='bangalore'; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------- Plan hash value: 3483268732 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 21 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TEST_TAB_INV | 1 | 21 | 2 (0)| 00:00:01 | |* 2 | INDEX SKIP SCAN | IDX_TEST_TAB_INV | 1 | | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ADDRESS"='bangalore') filter("ADDRESS"='bangalore') 15 rows selected. SQL> As we can observe, Oracle can utilize an index defined on invisible columns. From the above example, we can also conclude that invisible columns can also be used as query predicates Conclusion We have explored Oracle 12c's new feature of defining a column in Invisible mode. Following are the conclusions derived from the observations. Invisible column's are not returned while using SELECT * FROM TABLE statement Data can be still queried from invisible column, provided the column names are explicitly referred in the SELECT clause Records can be inserted in table having invisible columns with INSERT INTO table_name VALUES statement, provided none of the invisible columns are defined as mandatory (NOT NULL) Data can be populated in to invisible columns provided the invisible columns are explicitly referred in the insert statement like INSERT INTO table_name (column_list) VALUES Oracle maintains statistics on invisible columns Invisible columns can be be indexed as well as used as query predicates Invisible columns are not allocated a column ID and are tracked by an internal ID When a invisible column is made visible, it is placed as the last visible column and gets a column ID in that order or in other words.... Invisible column inherits all the properties of that of a visible column with just one exception that it is not visible unless referenced explicitly. Invisible columns can be useful to test the impact of column addition on the application, before actually exposing the column to application queries. Invisible columns can also be used as a trick to change column ordering for tables, we shall explore that area in an upcoming article Reference http://docs.oracle.com/database/121/ADMIN/tables.htm#ADMIN14217
↧
↧
Wiki Page: Oracle 12c: Correct column positioning with invisible columns
Introduction In one of my last article, I had discussed about Invisible Columns in Oracle database 12c. I had also mentioned that, invisible columns can be used as a method to change ordering of columns in a table. In today's article, I will discuss about the concept of changing order (position) of table columns with the help of invisible columns. In my earlier post, we have seen, when we add a invisible column or make a column invisible, it is not allocated a column ID (column position) unless it is made visible. Further, when we change a invisible column to visible, it is allocated a column ID (column position) and is placed (positioned) as the last column in the respective table. We can use this fact, to come up with a trick that can be helpful for changing column ordering in a given table. Let's go through a simple example to understand the trick and it's effectiveness. Change column order with invisible column As part of our demonstration, I have created the following table with four columns COL1 , COL3 , COL4 and COL2 respectively in the noted order. ---// ---// Create table for demonstration //--- ---// SQL> create table TEST_TAB_INV_ORDR 2 ( 3 COL1 number, 4 COL3 number, 5 COL4 number, 6 COL2 number 7 ); Table created. ---// ---// desc table to verify column positioning //--- ---// SQL> desc TEST_TAB_INV_ORDR Name Null? Type ----------------------------------------- -------- ---------------------------- COL1 NUMBER COL3 NUMBER COL4 NUMBER COL2 NUMBER Now, consider we had actually created the table with an incorrect column ordering and the columns should have been positioned in the order COL1, COL2, COL3 and COL4. We will use this example to understand, how the invisible column feature can be utilized to correct the column position within a table. So far, we know the fact that a invisible column doesn't have a column position within a given table and is tracked internally be a internal ID. This means, when we change a visible column to invisible, the position allocated to that column will lost and once we make the column visible again, the column would be positioned as the last visible column. Let's utilize this fact as a foundation to build our trick. Here is the trick In the first step, we will make all the table columns invisible except the intended first table column. This will cause all the other columns to loose their column position within the given table . At this point, we will have the first column already positioned in the first place in the table and all the other columns in the invisible state with no assigned column position. In the next step, we will start changing the invisible columns to visible. However, we shall make them visible in the order in which we want them to be positioned within the table. This is due to the fact that, when we change an invisible column to visible, it is positioned as the last visible column. Let's work on our example, to have a better understanding of the trick outlined above. In our example, the table TEST_TAB_INV_ORDR has columns positioned as COL1, COL3, COL4 and COL2. We want the columns to be positioned as COL1, COL2 , COL3 and COL4. Let's make all the columns invisible except COL1, which we want to be positioned as first column in the table. ---// ---// making all columns invisible except COL1 //--- ---// SQL> alter table TEST_TAB_INV_ORDR modify COL3 invisible; Table altered. SQL> alter table TEST_TAB_INV_ORDR modify COL4 invisible; Table altered. SQL> alter table TEST_TAB_INV_ORDR modify COL2 invisible; Table altered. ---// ---// verify column position post invisible operation //--- ---// COL1 is left visible and is placed as first column //--- ---// SQL> set COLINVISIBLE ON SQL> desc TEST_TAB_INV_ORDR Name Null? Type ----------------------------------------- -------- ---------------------------- COL1 NUMBER COL3 (INVISIBLE) NUMBER COL4 (INVISIBLE) NUMBER COL2 (INVISIBLE) NUMBER As we can observe from above output, we have the column COL1 already positioned as first column in the table and all the other columns are in invisible state. As a next step of correcting the column ordering, lets start changing the invisible columns to visible. Remember, we want the columns to be ordered as COL1, COL2, COL3 and COL4. As we know, the moment we change invisible column to visible, it will be positioned as the last visible column within the table; we can start making the columns visible in the order COL2, COL3 and COL4. Let's walk through step by step of this process for a better insight. COL1 is already positioned as first column, we want COL2 to be positioned as second column in the table. Lets change the COL2 from invisible to visible as shown below. ---// ---// making COL2 visible to position it as second column //--- ---// SQL> alter table TEST_TAB_INV_ORDR modify COL2 visible; Table altered. ---// ---// verfiy column order post visible operation //--- ---// SQL> desc TEST_TAB_INV_ORDR Name Null? Type ----------------------------------------- -------- ---------------------------- COL1 NUMBER COL2 NUMBER COL3 (INVISIBLE) NUMBER COL4 (INVISIBLE) NUMBER The moment we changed COL2 to visible, it got positioned within the table as the last visible column. At this point, we have COL1 and COL2 correctly positioned as first and second column respectively. Lets change COL3 from invisible to visible for positioning it as the third column within the table as shown below. ---// ---// making COL3 visible to position it as third column //--- ---// SQL> alter table TEST_TAB_INV_ORDR modify COL3 visible; Table altered. ---// ---// verfiy column order post visible operation //--- ---// SQL> desc TEST_TAB_INV_ORDR Name Null? Type ----------------------------------------- -------- ---------------------------- COL1 NUMBER COL2 NUMBER COL3 NUMBER COL4 (INVISIBLE) NUMBER Now, we have COL1, COL2 and COL3 correctly positioned as first, second and third column respectively. Lets change COL4 from invisible to visible for positioning it as the fourth (last) column within the table as shown below. ---// ---// making COL4 visible to position it as fourth column //--- ---// SQL> alter table TEST_TAB_INV_ORDR modify COL4 visible; Table altered. ---// ---// verfiy column order post visible operation //--- ---// SQL> desc TEST_TAB_INV_ORDR Name Null? Type ----------------------------------------- -------- ---------------------------- COL1 NUMBER COL2 NUMBER COL3 NUMBER COL4 NUMBER Now, we have all the columns positioned correctly within the table. Simple isn't it! Here is a recap of the trick, that we used to correct column positioning within the table Leave the intended first column as visible and change all the other columns to invisible Start changing the invisible columns to visible in the order in which we want them to be positioned within the table. Why do it manually? In the previous section, we have seen how we can utilize invisible columns as a trick to correct column positioning within a given table. I have come up with a PL/SQL script (procedure) which converts this trick into a simple algorithm and can be used for correcting column positioning within a given table. Here is the PL/SQL procedure that I have written based on the trick stated in the previous section. You can refer in-line comments for a brief idea about it's logic. ---// ---// PL/SQL procedure to correct column positing using invisible columns //--- ---// create or replace procedure change_col_order (o_column_list varchar2, e_tab_name varchar2, t_owner varchar2) is --- Custom column separator --- TB constant varchar2(1):=CHR(9); --- exception to handle non existence columns -- col_not_found EXCEPTION; --- exception to handle column count mismatch --- col_count_mismatch EXCEPTION; --- flag to check column existence ---- col_e number; --- variable to hold column count from dba_tab_cols --- col_count_p number; --- variable to hold column count from user given list --- col_count_o number; --- variable to hold first column name --- col_start varchar2(200); --- Creating a cursor of column names from the given column list --- cursor col_l is select regexp_substr(o_column_list,'[^,]+', 1, level) column_name from dual connect by regexp_substr(o_column_list,'[^,]+', 1, level) is not null; col_rec col_l%ROWTYPE; begin select substr(o_column_list,1,instr(o_column_list,',',1) -1) into col_start from dual; --- fetching column count from user given column list --- select count(*) into col_count_p from dual connect by regexp_substr(o_column_list,'[^,]+', 1, level) is not null; --- fetching column count from dba_tab_cols --- select count(*) into col_count_o from dba_tab_cols where owner=t_owner and table_name=e_tab_name and hidden_column='NO'; --- validating column counts --- if col_count_p != col_count_o then raise col_count_mismatch; end if; --- checking column existence --- for col_rec in col_l LOOP select count(*) into col_e from dba_tab_cols where owner=t_owner and table_name=e_tab_name and column_name=col_rec.column_name; if col_e = 0 then raise col_not_found; end if; END LOOP; --- printing current column order --- dbms_output.put_line(TB); dbms_output.put_line('Current column order for table '||t_owner||'.'||e_tab_name||' is:'); for c_rec in (select column_name,data_type from dba_tab_cols where owner=t_owner and table_name=e_tab_name order by column_id ) LOOP dbms_output.put_line(c_rec.column_name||'('||c_rec.data_type||')'); END LOOP; --- making all columns invisible except the starting column --- for col_rec in col_l LOOP if col_rec.column_name != col_start then execute immediate 'alter table '||t_owner||'.'||e_tab_name||' modify '||col_rec.column_name||' invisible'; end if; END LOOP; --- making columns visible to match the required ordering --- for col_rec in col_l LOOP if col_rec.column_name != col_start then execute immediate 'alter table '||t_owner||'.'||e_tab_name||' modify '||col_rec.column_name||' visible'; end if; END LOOP; --- printing current column order --- dbms_output.put_line(TB); dbms_output.put_line('New column order for table '||t_owner||'.'||e_tab_name||' is:'); for c_rec in (select column_name,data_type from dba_tab_cols where owner=t_owner and table_name=e_tab_name order by column_id ) LOOP dbms_output.put_line(c_rec.column_name||'('||c_rec.data_type||')'); END LOOP; EXCEPTION WHEN col_not_found THEN dbms_output.put_line('ORA-100002: column does not exist'); WHEN col_count_mismatch THEN dbms_output.put_line('ORA-100001: mismatch in column counts'); end; / ---// ---// End of procedure change_col_order //--- ---// Lets go through a demonstration to understand how the custom procedure works. The procedure takes three arguments (all strings within single quotes). The first argument is a comma separated list of column names (in the order in which we want the columns to be positioned), the second argument is the name of table for which the columns needs to be re-ordered and the third argument is the schema name to which the table belongs to. ---// ---// changing column positioning using change_col_order procedure //--- ---// SQL> set serveroutput on SQL> exec change_col_order('COL1,COL2,COL3,COL4','TEST_TAB_INV_ORDR','MYAPP'); Current column order for table MYAPP.TEST_TAB_INV_ORDR is: COL4(NUMBER) COL3(NUMBER) COL2(NUMBER) COL1(NUMBER) New column order for table MYAPP.TEST_TAB_INV_ORDR is: COL1(NUMBER) COL2(NUMBER) COL3(NUMBER) COL4(NUMBER) PL/SQL procedure successfully completed. SQL> As we can observe from the above output, the procedure reads the arguments, displays current column positioning (order) and then applies the algorithm (based on invisible column feature) before listing the final corrected column positioning (order). Conclusion In this article, we have explored; how we can utilize the 12c invisible columns feature to correct the positioning of columns within a given table. We have also explored the customized PL/SQL script which can be implemented to automate this trick and can be used as an alternative to the manual approach.
↧
Wiki Page: Oracle 12c: Optimizing In Database Archival (Row Archival)
Introduction In my last article, I had discussed about the Oracle 12c new feature In Database Archival . We had explored this new Oracle database feature and how it can be used to archive data within the same database table. We had also familiarized ourselves with the methods available to query and restore the archived data. In the previous article , we have seen how we can archive data within the same table by means of a new table clause ROW ARCHIVAL . We have also seen, once we set a table for row archival, a new table column with name ORA_ARCHIVE_STATE is introduced by Oracle and is used to control whether a particular record (row) within the table is archived or not. A value of 0 for the column ORA_ARCHIVE_STATE indicates the record is ACTIVE and a non-zero value indicates the record being ARCHIVED and by default all the records are in ACTIVE state. Today's article is primarily focused on optimizing the utilization of In Database Archival. In today's article, I would be discussing two important aspects of this new Oracle Database 12c archival feature. The first discussion emphasizes on optimizing the space utilization for the archived data and the second discussion is related to optimizing the query performance when querying the table enabled with row archival. Space optimization for In Database Archiving With "In Database Archival", the data is archived within the same table. It is physically present in the same database and just the logical representation is altered at the query (optimizer) level when we query the data, by means of the control column ORA_ARCHIVE_STATE . This means, the archived data still occupies the same amount of space (unless compressed) within the database. Now, consider if the table is on a Tier - 1 storage device; we are wasting a substantial amount of cost just to maintain the archived data. Wouldn't it be great, if we can store those archived records on a lower level storage device and able to compress those records to further cut down the cost involved in the space allocation. Guess what! this is possible with "In Database Archival" as it provides an option to optimize the space utilization by allowing us partition the table records based on its state. This means we can partition a table on the control column ORA_ARCHIVE_STATE to direct the archived data to be stored on a different storage unit (tablespace), which also enables us to apply compression just on the archived data to further trim down the space utilization for the archived data. Lets quickly go through a simple demonstration to understand these abilities. Demonstration Assumptions: Tablespace APPDATA is located on a Tier-1 storage Tablespace ARCHDATA is located on a Tier-2 storage Goal: I would like to create a table TEST_DATA_ARCH_PART with ROW ARCHIVAL being enabled. I would want the ACTIVE data to be stored on Tier-1 storage in a NOCOMPRESS format and the ARCHIVED data to be stored on Tier-2 storage in COMPRESSED format. This is to ensure that, we are utilizing the database space to its optimal level. Lets create our table TEST_DATA_ARCH_PART with ROW ARCHIVAL enabled. ----// ----// Creating table with ROW ARCHIVAL //---- ----// SQL> create table TEST_DATA_ARCH_PART 2 ( 3 id number, 4 name varchar(15), 5 join_date date 6 ) 7 ROW ARCHIVAL 8 partition by list (ORA_ARCHIVE_STATE) ---// partitioned on record state //--- 9 ( 10 partition P_ACTIVE values(0) tablespace APPDATA, ---// ACTIVE records //--- 11 partition P_ARCHIVED values(default) tablespace ARCHDATA ROW STORE COMPRESS ADVANCED ---// ARCHIVED records //--- 12 ); Table created. ----// ----// Defining primary key for the table //---- ----// SQL> alter table TEST_DATA_ARCH_PART add constraint PK_TEST_DATA_ARCH_PART primary key (ID); Table altered. In the above example, we have created the table TEST_DATA_ARCH_PART with ROW ARCHIVAL enabled. We have partitioned the table on the record state (ORA_ARCHIVE_STATE) to store the ACTIVE data (P_ACTIVE) on Tier-1 storage (APPDATA) and the ARCHIVED data (P_ARCHIVED) on Tier-2 storage (ARCHDATA). We have further enabled COMPRESSION to be applied on all the ARCHIVED records. Let's populate our table with some data. ----// ----// populating table with data //---- ----/ / SQL> insert /*+ APPEND */ into TEST_DATA_ARCH_PART 2 select rownum, rpad('X',15,'X'), sysdate 3 from dual connect by rownum commit; Commit complete. We have populated our table with 1000000 records and all the records are in ACTIVE state by default. We can validate that by querying the table as follows. ----// ----// validating table records //---- ----// SQL> select count(*) from TEST_DATA_ARCH_PART; COUNT(*) ---------- 1000000 SQL> select count(*) from TEST_DATA_ARCH_PART partition (p_active); COUNT(*) ---------- 1000000 SQL> select count(*) from TEST_DATA_ARCH_PART partition (p_archived); COUNT(*) ---------- 0 ----// ----// validating active records are on Tier-1 storage device (APPDATA) //---- ----// SQL> select owner,segment_name as "Table Name",tablespace_name,sum(bytes)/1024/1024 Size_MB 2 from dba_segments where segment_name='TEST_DATA_ARCH_PART' group by owner, segment_name,tablespace_name; OWNER Table Name TABLESPACE_NAME SIZE_MB ------------- ------------------------- -------------------- ---------- MYAPP TEST_DATA_ARCH_PART APPDATA 40 As we can see, all of our table records are in ACTIVE state and thus stored on the Tier-1 storage device (APPDATA). Lets archive some records from our table as shown below. ----// ----// archive records by setting ORA_ARCHIVE_STATE to 1 //---- ----// SQL> update TEST_DATA_ARCH_PART 2 set ORA_ARCHIVE_STATE=1 where id select table_name,row_movement from dba_tables where table_name='TEST_DATA_ARCH_PART'; TABLE_NAME ROW_MOVE ------------------------- -------- TEST_DATA_ARCH_PART DISABLED ----// ----// Enabling row movement for table //---- ----// SQL> alter table TEST_DATA_ARCH_PART enable row movement; Table altered. SQL> select table_name,row_movement from dba_tables where table_name='TEST_DATA_ARCH_PART'; TABLE_NAME ROW_MOVE ------------------------- -------- TEST_DATA_ARCH_PART ENABLED Let's try again to archive the table records by setting the control column ORA_ARCHIVE_STATE to value 1 as shown below. ----// ----// archiving all table records by setting ORA_ARCHIVE_STATE to 1 //---- ----// SQL> update TEST_DATA_ARCH_PART 2 set ORA_ARCHIVE_STATE=1; 1000000 rows updated. SQL> commit; Commit complete. As expected, we are now allowed to ARCHIVE the table records. The archived records are stored in a lower level storage tier by means of tablespace ARCHDATA and are compressed to further trim down the space utilization for archived data. We can validate this fact as shown below. ----// ----// No active records present in the table //---- ----// SQL> select count(*) from TEST_DATA_ARCH_PART; COUNT(*) ---------- 0 ----// ----// Enable archive record visibility //---- ----// SQL> alter session set row archival visibility=all; Session altered. SQL> select count(*) from TEST_DATA_ARCH_PART; COUNT(*) ---------- 1000000 ----// ----// No records present in the ACTIVE partition //---- ----// SQL> select count(*) from TEST_DATA_ARCH_PART partition (p_active); COUNT(*) ---------- 0 ----// ----// records are now moved to ARCHIVED partition //---- ----// SQL> select count(*) from TEST_DATA_ARCH_PART partition (p_archived); COUNT(*) ---------- 1000000 Let's check how much space is consumed by the ARCHIVED records by querying the database segments as shown below. SQL> select owner,segment_name as "Table Name",tablespace_name,sum(bytes)/1024/1024 Size_MB 2 from dba_segments where segment_name='TEST_DATA_ARCH_PART' group by owner,segment_name,tablespace_name; OWNER Table Name TABLESPACE_NAME SIZE_MB ------------- ------------------------- -------------------- ---------- MYAPP TEST_DATA_ARCH_PART ARCHDATA 16 MYAPP TEST_DATA_ARCH_PART APPDATA 40 We could see the archived records are moved in a compressed format to the Tier-2 tablespace ARCHDATA. However, the space from Tier-1 tablespace APPDATA is not yet released. We need to manually reclaim this space as show below. ----// ----// reclaiming unused space from the table //---- ----// SQL> alter table TEST_DATA_ARCH_PART shrink space; Table altered. SQL> select owner,segment_name as "Table Name",tablespace_name,sum(bytes)/1024/1024 Size_MB 2 from dba_segments where segment_name='TEST_DATA_ARCH_PART' group by owner,segment_name,tablespace_name; OWNER Table Name TABLESPACE_NAME SIZE_MB ------------- ------------------------- -------------------- ---------- MYAPP TEST_DATA_ARCH_PART ARCHDATA 12.9375 MYAPP TEST_DATA_ARCH_PART APPDATA .1875 As expected, all the records are ARCHIVED and are stored in a COMPRESSED format (Size: ~ 13 MB) on a low level storage device (ARCHDATA) and the unused space is reclaimed from the Tier-1 storage APPDATA. This type of setup and utilization of "In Database Archival" (Row Archival) would help us optimize the space required to store archived data within the same table (database). Note: We may consider using DBMS_REDEFINITION as an alternate to SHRINK command for reorganizing and reclaiming space ONLINE . Query Optimization for In Database Archiving In Database Archiving may lead to potential plan change or performance degradation when data is queried from the tables. This is due to the fact that, the query is transformed to add a filter condition to exclude the ARCHIVED records from query result. Let's quickly go through a simple demonstration to illustrate these facts. Demonstration I am using the same table from the previous example. As part of the last demonstration, we had archived all the table records. Let's populate the table with some ACTIVE records. ----// ----// populating table with ACTIVE records //---- ----// SQL> insert /*+ APPEND */ into TEST_DATA_ARCH_PART 2 select rownum+1e6, rpad('X',15,'X'), sysdate 3 from dual connect by rownum commit; Commit complete. We have populated the table with 1000000 ACTIVE records. Let's validate the records from the table. ----// ----// ACTIVE records from the table //---- ----// SQL> select count(*) from TEST_DATA_ARCH_PART; COUNT(*) ---------- 1000000 ----// ----// enabling row archival visibility //---- ----// SQL> alter session set ROW ARCHIVAL VISIBILITY=ALL; Session altered. ----// ----// Total records from the table //---- ----// SQL> select count(*) from TEST_DATA_ARCH_PART; COUNT(*) ---------- 2000000 SQL> select count(*) from TEST_DATA_ARCH_PART partition (p_active); COUNT(*) ---------- 1000000 SQL> select count(*) from TEST_DATA_ARCH_PART partition (p_archived); COUNT(*) ---------- 1000000 At this point, we have 2000000 records in the table, out of which 1000000 are ACTIVE and 1000000 are in ARCHIVED state. Let's query few records from the table and see how the SQL optimizer handles it. In the following example, I am querying records with ID ranging between 999000 and 1000005. The query should return only 4 records as the first 1000000 records are in ARCHIVED state. ----// ----// disabling row archival visibility //---- ----// SQL> alter session set ROW ARCHIVAL VISIBILITY=ACTIVE; Session altered. ----// ----// selecting records from table //---- ----// SQL> select /*+ gather_plan_statistics */ * from TEST_DATA_ARCH_PART 2 where id > 999000 and id 999000 AND "TEST_DATA_ARCH_PART"."ID" 999000 Now, lets take a look at the execution plan of this query. ----// ----// Query plan from the optimizer //---- ----// SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------------------------------------------- SQL_ID 3vqgzvvmj3wb9, child number 0 ------------------------------------- select /*+ gather_plan_statistics */ * from TEST_DATA_ARCH_PART where id > 999000 and id 999000 AND "ID" create index TEST_DATA_ARCH_PART_PK on TEST_DATA_ARCH_PART (ID, ORA_ARCHIVE_STATE); Index created. ----// ----// Disabling and dropping the exiting primary key //---- ----// SQL> alter table TEST_DATA_ARCH_PART disable constraint PK_TEST_DATA_ARCH_PART; Table altered. SQL> alter table TEST_DATA_ARCH_PART drop constraint PK_TEST_DATA_ARCH_PART; Table altered. ----// ----// Creating primary key using the new Index //---- ----/ / SQL> alter table TEST_DATA_ARCH_PART add constraint PK_TEST_DATA_ARCH_PART primary key (ID) using index TEST_DATA_ARCH_PART_PK; Table altered. We have modified the primary key index to include ORA_ARCHIVE_STATE in the index definition. Let's check, how the optimizer now handles the SQL query. ----// ----// Query records from table //---- ----// SQL> select /*+ gather_plan_statistics */ * from TEST_DATA_ARCH_PART 2 where id > 999000 and id select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------------- SQL_ID 2zgf279wu9291, child number 0 ------------------------------------- select /*+ gather_plan_statistics */ * from TEST_DATA_ARCH_PART where id > 999000 and id 999000 AND "TEST_DATA_ARCH_PART"."ORA_ARCHIVE_STATE"='0' AND "ID"<1000005) filter("TEST_DATA_ARCH_PART"."ORA_ARCHIVE_STATE"='0') Note ----- - dynamic statistics used: dynamic sampling (level=2) 25 rows selected. As we can see, SQL optimizer is now filtering at the access level. Now, it is fetching only 4 records rather than 1004 records when compared to the earlier execution plan. The modified index has helped the optimizer to eliminate unnecessary I/O while fetching the records. Conclusion When configuring In Database Archival , we should consider partitioning the table on the ORA_ARCHIVE_STATE column to optimize the space utilization for ARCHIVED records. Don't forget to enable ROW MOVEMENT in the table for archiving to work. Optionally, we may also need to consider reclaiming unused space on a periodic basis which would be left over due to the data movement between ACTIVE and ARCHIVED partitions. We should also consider appending the ORA_ARCHIVE_STATE column in all of the table indexes to address any performance degradation resulted from In Database Archival, while querying records from the tables. Reference Potential SQL Performance Degradation When Using "In Database Row Archiving" (Doc ID 1579790.1)
↧
Wiki Page: Oracle 12c: No more resource busy wait (ORA-0054) error while dropping index
Introduction Prior to Oracle database 12c, dropping an index was always an EXCLUSIVE (offline) operation, which required locking the base table in exclusive mode. This sometimes cause the resource busy wait while the base table is already locked for DML operations and the transactions are not yet committed. Further, when the table is locked in exclusive mode for the index drop operation, no DML is allowed on the base table until the index drop operation is completed. This may not be a problem for small indexes. However, when we are dropping a huge index, it will eventually block all DML on the base table for a higher duration which is sometimes not desired. Oracle 12c has overcome this limitation of dropping index. Dropping an index no longer requires an exclusive lock (if specified) on the base table. With Oracle database 12c, we have the option of dropping an index ONLINE which allows DML on the base table while the drop index operation is running. Lets go through a quick demonstration to validate this new feature. Drop Index in Oracle 11g (Offline) Lets create a table in a Oracle 11g database for the demonstration ----// ----// query database version //---- ----// SQL> select version from v$instance; VERSION ----------------- 11.2.0.1.0 ----// ----// create table T_DROP_IDX_11G for demonstration //---- ----// SQL> create table T_DROP_IDX_11G 2 ( 3 id number, 4 name varchar(15), 5 join_date date 6 ); Table created. ----// ----// populate table T_DROP_IDX_11G with dummy data //---- ----// SQL> insert /*+ APPEND */ into T_DROP_IDX_11G 2 select rownum, rpad('X',15,'X'), sysdate 3 from dual connect by rownum commit; Commit complete. Lets create an index on this table ----// ----// create index IDX_T_DROP_IDX_11G on table T_DROP_IDX_11G //---- ----// SQL> create index IDX_T_DROP_IDX_11G on T_DROP_IDX_11G (id, name); Index created. Now, lets perform DML (update a record) in this table without committing the transaction ----// ----// update a record in table T_DROP_IDX_11G //---- ----// SQL> SELECT sys_context('USERENV', 'SID') SID FROM DUAL; SID ---------- 20 SQL> update T_DROP_IDX_11G set name='ABBAS' where id=100; 1 row updated. ----// ----// leave the transaction uncommitted in this session //---- ----// If we query the v$locked_object view, we can see the base table is locked in row exclusive (mode=3) mode by the previous update operation which we haven't yet committed. ----// ----// query v$locked_object to check the locked object //---- ----// SQL> select object_id,session_id,locked_mode from v$locked_object; OBJECT_ID SESSION_ID LOCKED_MODE ---------- ---------- ----------- 73451 20 3 SQL> select object_name,object_type from dba_objects where object_id=73451; OBJECT_NAME OBJECT_TYPE ------------------------- ------------------- T_DROP_IDX_11G TABLE Now, from another session; lets try to drop the index (IDX_T_DROP_IDX_11G) that we had created on this table (T_DROP_IDX_11G). - ---// ----// try to drop the index IDX_T_DROP_IDX_11G from another session //---- ----// 01:17:07 SQL> drop index IDX_T_DROP_IDX_11G; drop index IDX_T_DROP_IDX_11G * ERROR at line 1: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired 01:17:10 SQL> drop index IDX_T_DROP_IDX_11G; drop index IDX_T_DROP_IDX_11G * ERROR at line 1: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired 01:17:12 SQL> drop index IDX_T_DROP_IDX_11G; drop index IDX_T_DROP_IDX_11G * ERROR at line 1: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired Index drop operation is failing with resource busy wait. This is because when we try to drop an index, Oracle tries to acquire an exclusive lock on the base table and if it fails to acquire that exclusive lock, it throws this resource busy wait. If we check the 10704 trace for this drop index operation, we can see Oracle tried to acquire a exclusive lock (mode=6) on table T_DROP_IDX_11G and failed (ksqgtl: RETURNS 51) with resource busy wait error (err=54) #----// #----// lock trace for the drop index operation //---- #----// PARSING IN CURSOR #3 len=69 dep=1 uid=85 oct=26 lid=85 tim=1443857735568850 hv=114407125 ad='8ab5b250' sqlid='04zx1kw3d3dqp' LOCK TABLE FOR INDEX "IDX_T_DROP_IDX_11G" IN EXCLUSIVE MODE NOWAIT END OF STMT PARSE #3:c=5999,e=5962,p=0,cr=59,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1443857735568850 ksqgtl *** TM-00011eeb-00000000 mode=6 flags=0x401 timeout=0 *** ksqgtl: xcb=0x8f92aa68, ktcdix=2147483647, topxcb=0x8f92aa68 ktcipt(topxcb)=0x0 ksucti: init session DID from txn DID: ksqgtl: ksqlkdid: 0001-0017-000000FD *** ksudidTrace: ksqgtl ktcmydid(): 0001-0017-000000FD ksusesdi: 0000-0000-00000000 ksusetxn: 0001-0017-000000FD ksqcmi: TM,11eeb,0 mode=6 timeout=0 ksqcmi: returns 51 ksqgtl: RETURNS 51 ksqrcl: returns 0 EXEC #3:c=0,e=67,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=1443857735568937 ERROR #3:err=54 tim=1443857735568943 CLOSE #3:c=0,e=3,dep=1,type=0,tim=1443857735568967 In this trace file, the object ID is represented in hexadecimal (00011eeb), which can be mapped to object ID as follows. ----// ----// finding object details based on hexadecimal object ID //---- ----// SQL> select object_id,to_char(object_id,'0XXXXXXX') object_hex,object_name,object_type 2 from dba_objects where object_name='T_DROP_IDX_11G'; OBJECT_ID OBJECT_HE OBJECT_NAME OBJECT_TYPE ---------- --------- ------------------------- ------------------- 73451 00011EEB T_DROP_IDX_11G TABLE We will not be allowed to drop the index unless Oracle acquires a exclusive lock (mode=6) on the base table. We can commit/rollback the transaction in the first session (sid=20) which will release the row exclusive lock (mode=3) from the table and will allow Oracle to acquire a exclusive lock on the base table and in turn process the DROP INDEX operation. Drop Index in Oracle 12c (Online) Now lets see, how the drop index operation behaves in Oracle 12c. Lets quickly create a table for our demonstration ----// ----// query database version //---- ----// SQL> select version from v$instance; VERSION ----------------- 12.1.0.2.0 ----// ----// create table T_DROP_IDX_12C for demonstration //---- ----// SQL> create table T_DROP_IDX_12C 2 ( 3 id number, 4 name varchar(15), 5 join_date date 6 ); Table created. ----// ----// populate table T_DROP_IDX_11G with dummy data //---- ----// SQL> insert /*+ APPEND */ into T_DROP_IDX_12C 2 select rownum, rpad('X',15,'X'), sysdate 3 from dual connect by rownum commit; Commit complete. Lets create an index on this table. ----// ----// create index IDX_T_DROP_IDX_12C on table T_DROP_IDX_12C //---- ----// SQL> create index IDX_T_DROP_IDX_12C on T_DROP_IDX_12C (id, name); Index created. Now lets perform DML on this table and leave them uncommitted. ----// ----// perform few DML on the table T_DROP_IDX_12C //--- ----// SQL> SELECT sys_context('USERENV', 'SID') SID FROM DUAL; SID ---------- 20 SQL> insert into T_DROP_IDX_12C values (1000001,'Abbas',sysdate); 1 row created. SQL> update T_DROP_IDX_12C set name='Fazal' where id=100; 1 row updated. ----// ----// leave the transactions uncommitted in this session //---- ----// If we query the v$locked_object view, we can see the base table is locked in row exclusive (mode=3) mode by the previous update operation which we haven't yet committed. ----// ----// query v$locked_object to check the locked object //---- ----// SQL> select object_id,session_id,locked_mode from v$locked_object; OBJECT_ID SESSION_ID LOCKED_MODE ---------- ---------- ----------- 20254 20 3 SQL> select object_name,object_type from dba_objects where object_id=20254; OBJECT_NAME OBJECT_TYPE ------------------------- ----------------------- T_DROP_IDX_12C TABLE Now, from another session; lets try to drop the index (IDX_T_DROP_IDX_12C) that we had created on this table (T_DROP_IDX_12C). ----// ----// try to drop the index IDX_T_DROP_IDX_11G from another session //---- ----// SQL> SELECT sys_context('USERENV', 'SID') SID FROM DUAL; SID ---------- 127 SQL> drop index IDX_T_DROP_IDX_12C; drop index IDX_T_DROP_IDX_12C * ERROR at line 1: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired We are still getting the same resource busy wait that we had received in Oracle database 11g. This is because the normal DROP INDEX operation still tries to acquire a exclusive (mode=6) lock on the base table while dropping an index. Here comes the new feature, the ONLINE option of DROP INDEX. With Oracle 12c, we can add the clause ONLINE while using DROP INDEX command. Lets try to drop the index ONLINE (we haven't yet committed the DMLs on the other session ). ----// ----// try to drop (ONLINE) the index IDX_T_DROP_IDX_11G from another session //---- ----// SQL> SELECT sys_context('USERENV', 'SID') SID FROM DUAL; SID ---------- 127 SQL> drop index IDX_T_DROP_IDX_12C online; ----// ----// drop index hangs here //---- ----// We no longer get the resource busy error (ORA-0054) here. However, the drop index operation just hangs as it is waiting for the DML operations to commit and release the lock (enqueue) acquired at row level. If we review the 10704 lock trace, we can see Oracle has acquired a shared lock (mode=2) on the base table and is waiting to acquire a shared transactional lock (TX: enqueue) which is currently blocked by the row exclusive lock mode from the first session (sid=20) #----// #----// lock trace for the drop index online operation //---- #----// PARSING IN CURSOR #47298660689800 len=69 dep=1 uid=63 oct=26 lid=63 tim=1443861257910293 hv=412402270 ad='7f878298' sqlid='6kxaujwc99hky' LOCK TABLE FOR INDEX "IDX_T_DROP_IDX_12C" IN ROW SHARE MODE NOWAIT END OF STMT PARSE #47298660689800:c=4999,e=6590,p=1,cr=8,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1443861257910293 ksqgtl *** TM-00004F1E-00000000-00000003-00000000 mode=2 flags=0x400 timeout=0 *** ksqgtl: xcb=0x88034da8, ktcdix=2147483647, topxcb=0x88034da8 ktcipt(topxcb)=0x0 ksucti: init session DID from txn DID: 0001-0029-00000094 ksqgtl: ksqlkdid: 0001-0029-00000094 *** ksudidTrace: ksqgtl ktcmydid(): 0001-0029-00000094 ksusesdi: 0000-0000-00000000 ksusetxn: 0001-0029-00000094 ksqgtl: RETURNS 0 EXEC #47298660689800:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=1443861257910358 CLOSE #47298660689800:c=0,e=1,dep=1,type=0,tim=1443861257910374 .. .. ( output trimmed) .. #----// #----// waiting to acquire a shared transactional lock //---- #----// ksqgtl *** TX-0001001E-00000439-00000000-00000000 mode=4 flags=0x10001 timeout=21474836 *** ksqgtl: xcb=0x88034da8, ktcdix=2147483647, topxcb=0x88034da8 ktcipt(topxcb)=0x0 ksucti: init session DID from txn DID: 0001-0029-00000094 ksqgtl: ksqlkdid: 0001-0029-00000094 *** ksudidTrace: ksqgtl ktcmydid(): 0001-0029-00000094 ksusesdi: 0000-0000-00000000 ksusetxn: 0001-0029-00000094 ksqcmi: TX-0001001E-00000439-00000000-00000000 mode=4 timeout=21474836 We can also verify from dba_waiters and v$lock views that the DROP INDEX ONLINE operation is waiting to acquire a transactional shared lock (TX:enqueue) which is blocked by the row exclusive (mode=3) lock and in turn by exclusive transactional lock (TX:mode=6) from the first session (sid=20) ----// ----// query dba_waiters to check who is holding the transactional lock on base table //---- ----// SQL> select waiting_session,holding_session,lock_type,mode_held,mode_requested from dba_waiters; WAITING_SESSION HOLDING_SESSION LOCK_TYPE MODE_HELD MODE_REQUE --------------- --------------- ------------ ------------ ---------- 127 20 Transaction Exclusive Share ----// ----// query v$lock to find out the lock mode held by the holding session //---- ----// SQL> select * from v$lock where sid=20; ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK CON_ID ---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 000000008A67B980 000000008A67B9F8 20 AE 133 0 4 0 8500 0 3 00000000885703C8 0000000088570448 20 TX 327706 994 6 0 123 1 0 00002B78ACA00EA8 00002B78ACA00F10 20 TM 20254 0 3 0 123 0 3 ----// ----// query v$lock to find out the lock mode requested by waiting session //--- ----// SQL> select * from v$lock where sid=127; ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK CON_ID ---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 000000008A67F840 000000008A67F8B8 127 AE 133 0 4 0 845 0 3 000000008857EE28 000000008857EEA8 127 TX 524294 982 6 0 4 0 0 000000008A67F628 000000008A67F6A0 127 TX 327706 994 0 4 4 0 0 00002B78AD08AA08 00002B78AD08AA70 127 TM 20254 0 2 0 4 0 3 000000008A67F228 000000008A67F2A0 127 OD 20275 0 6 0 4 0 3 000000008A67E5F8 000000008A67E670 127 OD 20254 0 4 0 4 0 3 Although the drop index (online) operation hangs (waiting for DMLs to release row exclusive locks and exclusive TX lock), it will not block any new DMLs which get executed against the base table. We can confirm this by running a new DML from a new session as show below. ----// ----// perform new DML when drop index (ONLINE) is running ----// SQL> SELECT sys_context('USERENV', 'SID') SID FROM DUAL; SID ---------- 23 SQL> delete from T_DROP_IDX_12C where id=300; 1 row deleted. ----// ----// new DML are able to acquire RX (mode=3) lock on the table //---- ----// SQL> select object_id,session_id,locked_mode from v$locked_object; OBJECT_ID SESSION_ID LOCKED_MODE ---------- ---------- ----------- 20254 20 3 --> lock held by first session where we performed DML and left uncommitted 20254 23 3 --> lock held by this session to perform delete operation 20254 127 2 --> session from where we have executed drop index (hangs and waiting for DMLs to commit) SQL> commit; Commit complete. ----// ----// lock released by current session upon commit //---- ----// SQL> select object_id,session_id,locked_mode from v$locked_object; OBJECT_ID SESSION_ID LOCKED_MODE ---------- ---------- ----------- 20254 20 3 20254 127 2 As we can see, even through DROP INDEX ONLINE hangs waiting for DMLs to commit; it doesn't block any new DML on the base table. The DROP INDEX ONLINE operation will eventually get completed once the pending transactions are committed. Lets commit the uncommitted transactions from our first session (sid=20). ----// ----// commit pending transactions from first session //---- ----// SQL> SELECT sys_context('USERENV', 'SID') SID FROM DUAL; SID ---------- 20 SQL> commit; Commit complete. Lets check the status of DROP INDEX ONLINE operation (which was hung on other session) ----// ----// check the status of hanged drop index operation //---- ----// SQL> drop index IDX_T_DROP_IDX_12C online; Index dropped. SQL> SELECT sys_context('USERENV', 'SID') SID FROM DUAL; SID ---------- 127 The moment, pending transactions are committed the DROP INDEX ONLINE operation was resumed and completed automatically as the Row Exclusive (RX:mode=3) locks and TX locks (TX:mode=6) were released from the table (rows) and the DROP INDEX ONLINE was able to acquire a shared transactional lock (mode=4) on the table rows. We can also verify from lock trace that DROP INDEX ONLINE operation was able to acquire (ksqgtl: RETURNS 0) a shared transactional lock (TX:mode=4) once the DMLs were committed in first session (sid=20) #----// #----// drop index acquired shared transactional lock upon commit of pending DML on base table //---- #----// ksqgtl *** TX-0001001E-00000439-00000000-00000000 mode=4 flags=0x10001 timeout=21474836 *** ksqgtl: xcb=0x88034da8, ktcdix=2147483647, topxcb=0x88034da8 ktcipt(topxcb)=0x0 ksucti: init session DID from txn DID: 0001-0029-00000094 ksqgtl: ksqlkdid: 0001-0029-00000094 *** ksudidTrace: ksqgtl ktcmydid(): 0001-0029-00000094 ksusesdi: 0000-0000-00000000 ksusetxn: 0001-0029-00000094 ksqcmi: TX-0001001E-00000439-00000000-00000000 mode=4 timeout=21474836 ksqcmi: returns 0 *** 2015-10-03 14:30:01.164 ksqgtl: RETURNS 0 Conclusion Oracle has made significant improvements in the locking mechanism involved with the DROP INDEX operation by introducing the ONLINE feature, which now just needs a shared lock to be acquired on the base table to start with drop operation; allowing DMLs to be executed against the base table during the index drop operation. Online index drop operation can start without causing any exclusive lock. However, the drop operation will not complete unless all the uncommitted transactions are committed in the base table and the drop operation is able to acquire a shared transactional (TX:mode=4) lock. Reference I have used the term lock mode and used tracing to identify the locks at different places through out this article. You can refer the following article by Franck Pachot to get a fair idea about the lock modes and what those values mean and how to trace the locks. http://blog.dbi-services.com/investigating-oracle-lock-issues-with-event-10704/
↧
↧
Wiki Page: Sophisticated Incremental Statistics gathering feature in 12c
Overview In typical data warehousing environment existence of huge partitioned tables is very common, gathering statistics on such tables is challenging tasks. For partitioned tables there are two types of statistics Global and Partition level statistics. Gathering global statistics is very expensive and resource consuming operation as it scans whole table. Hence most of the time people use to reduce estimate_percent down to less than 1 percent. This does helps in reducing time taken to gather stats but may not be sufficient to represent the data distribution. Gathering partition level statistics is not so expensive as it gathers only for the partitions where data has been changed. Traditionally statistics are gathered in two phase Scan complete table to gather Global statistics Scan only the partitions where data has been changed Obviously global stats can be derived by using partition level stats like say for example number of rows at table level = just sum number of rows from all the partitions. But global stats like NDV(Number of Distinct Values) which is very important in calculating cardinality can't be derived so easily. The only way to derive them is by scanning the whole table. This is why Oracle introduced Incremental Statistics gathering feature in 11g, this feature not only reduces the time it takes to gather global stats but also it increases the statistics accuracy. This avoids scanning whole table when computing global statistics and derives it from partition level statistics. But since NDV can't be derived from partition level statistics it creates synopsis for each column at individual partition level. This synopsis maintains detail information of distinct values at each partition for every columns. After implementing Incremental Statistics feature when a new partition is added to the table it gathers partition level statistics along with its synopsis and then merges all the partitions synopses to create global synopsis, at the end global statistics will be derived by using partition level statistics and global synopsis. These synopsis data are stored in WRI$_OPTSTAT_SYNOPSIS$ and WRI$_OPTSTAT_SYNOPSIS_HEAD$ tables residing in SYSAUX tablespace. Table WRI$_OPTSTAT_SYNOPSIS$ will grow enormously as there will be individual synopsis created for each hash proportional to distinct value existing at table,partition and column level. Table WRI$_OPTSTAT_SYNOPSIS_HEAD$ will have each record for every table, partition, and column. In 11.1 release gathering incremental statistics would take longer time if you have wide tables with many partitions due to delete statement working on WRI$_OPTSTAT_SYNOPSIS$ table. In 11.2 this issue has been resolved by Range-Hash partitioning the WRI$_OPTSTAT_SYNOPSIS$ table. SQL> select OWNER,TABLE_NAME,PARTITIONING_TYPE,SUBPARTITIONING_TYPE from dba_part_tables where TABLE_NAME='WRI$_OPTSTAT_SYNOPSIS$'; OWNER TABLE_NAME PARTITIONING_TYPE SUBPARTITIONING_TYPE --------------- ------------------------------ --------------------------- --------------------------- SYS WRI$_OPTSTAT_SYNOPSIS$ RANGE HASH In 12c WRI$_OPTSTAT_SYNOPSIS$ table has been changed to List-Hash partitioning to reduce the data movement when compared previous partitioning strategy. SQL> select OWNER,TABLE_NAME,PARTITIONING_TYPE,SUBPARTITIONING_TYPE from dba_part_tables where TABLE_NAME='WRI$_OPTSTAT_SYNOPSIS$'; OWNER TABLE_NAME PARTITIONING_TYPE SUBPARTITIONING_TYPE --------------- ------------------------------ --------------------------- --------------------------- SYS WRI$_OPTSTAT_SYNOPSIS$ LIST HASH NOTE: In 10.2.0.4 we can use 'APPROX_GLOBAL AND PARTITION' for the GRANULARITY parameter of the GATHER_TABLE_STATS procedures to gather statistics in incremental way, but drawback is about unavailability of NDV for non-partitioning columns and number of distinct keys of the index at the global level. This method derives all other global statistics accurately, hence it reduces the frequency of deriving global statistics but doesn't completely resolves the overhead of NDV. Implementation To enable Incremental Statistics feature for each table we need to ensure that The INCREMENTAL value for the partitioned table is true.(Default is FALSE) The PUBLISH value for the partitioned table is true.(Default is TRUE) The user specifies AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT and AUTO for GRANULARITY when gathering statistics on the table.(Default is ESTIMATE_PERCENT=>AUTO_SAMPLE_SIZE and GRANULARITY=>AUTO) Its challenging task when we try to initially enable Incremental Statistics and start gathering global statistics, as it will create synopsis for all the partitions and this will takes huge amount of time/resources due to large partitioned tables. Sometimes it may takes days together to create it. Lot of planning and controlled approach is required to enable Incremental Statistics on large partitioned tables. Different deviated approach has to be considered to create initial synopsis, as we can't blindly use GRANULARITY=>AUTO as per the documentation. Trick is to create synopsis for each partition in a controlled manner by using GRANULARITY=>PARTITION and then gather global statistics. For example, I have a monthly partitioned table ORDERS_DEMO in OE schema to which Incremental Statistics has to be enabled in controlled manner as shown below. 1. Set up Incremental Statistics feature mandatory parameters. SQL> exec dbms_stats.set_table_prefs('OE','ORDERS','INCREMENTAL','TRUE'); PL/SQL procedure successfully completed. SQL> SELECT dbms_stats.get_prefs('INCREMENTAL','OE','ORDERS_DEMO') "INCREMENTAL" FROM dual; INCREMENTAL ------------------------ FALSE SQL> SELECT dbms_stats.get_prefs('PUBLISH','OE','ORDERS_DEMO') "PUBLISH" FROM dual; PUBLISH ------------------------ TRUE SQL> SELECT dbms_stats.get_prefs('ESTIMATE_PERCENT','OE','ORDERS_DEMO') "ESTIMATE_PERCENT" FROM dual; ESTIMATE_PERCENT ------------------------ DBMS_STATS.AUTO_SAMPLE_SIZE SQL> SELECT dbms_stats.get_prefs('GRANULARITY','OE','ORDERS_DEMO') "GRANULARITY" FROM dual; GRANULARITY ------------------------ AUTO 2. Create synopsis for each partition in a controlled manner. SQL> exec dbms_stats.gather_table_stats('OE','ORDERS_DEMO',partname=>'ORDERS_OCT_2015',ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE, granularity=>'PARTITION'); PL/SQL procedure successfully completed. 3. Check synopsis creation time. SELECT o.name "Table Name", p.subname "Part", c.name "Column", h.analyzetime "Synopsis Creation Time" FROM WRI$_OPTSTAT_SYNOPSIS_HEAD$ h, OBJ$ o, USER$ u, COL$ c, ( ( SELECT TABPART$.bo# BO#, TABPART$.obj# OBJ# FROM TABPART$ tabpart$ ) UNION ALL ( SELECT TABCOMPART$.bo# BO#, TABCOMPART$.obj# OBJ# FROM TABCOMPART$ tabcompart$ ) ) tp, OBJ$ p WHERE u.name = 'OE' AND o.name = 'ORDERS_DEMO' AND tp.obj# = p.obj# AND h.bo# = tp.bo# AND h.group# = tp.obj# * 2 AND h.bo# = c.obj#(+) AND h.intcol# = c.intcol#(+) AND o.owner# = u.user# AND h.bo# = o.obj# ORDER BY 4,1,2,3 / Table Name Part Column Synopsis Creation Time -------------------- -------------------- ------------------------- ------------------------------ ORDERS_DEMO ORDERS_SEP_2015 CUSTOMER_ID 2015-11-17-01:00:25 ORDERS_DEMO ORDER_DATE 2015-11-17-01:00:25 ORDERS_DEMO ORDER_ID 2015-11-17-01:00:25 ORDERS_DEMO ORDER_MODE 2015-11-17-01:00:25 ORDERS_DEMO ORDER_STATUS 2015-11-17-01:00:25 4. Gradually build Synopsis for remaining partitions As you saw Synopsis gets created for each column in a table. Same way build synopsis for all the remaining partitions and in the end gather global statistics which will use previously created synopsis and partition level statistics. exec dbms_stats.gather_table_stats('OE','ORDERS_DEMO',partname=>'ORDERS_OCT_2015',ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE, granularity=>'PARTITION'); exec dbms_stats.gather_table_stats('OE','ORDERS_DEMO',partname=>'ORDERS_NOV_2015',ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE, granularity=>'PARTITION'); exec dbms_stats.gather_table_stats('OE','ORDERS_DEMO',partname=>'ORDERS_DEC_2015',ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE, granularity=>'PARTITION'); 5. Verify timing information of partition level statistics gathering. SELECT partition_name, to_char( last_analyzed, 'DD-MON-YYYY, HH24:MI:SS' ) last_analyze, num_rows FROM DBA_TAB_PARTITIONS WHERE table_name = 'ORDERS_DEMO' ORDER BY partition_position; PARTITION_NAME LAST_ANALYZE NUM_ROWS ---------------------------------------- ---------------------------------------- ---------- ORDERS_SEP_2015 17-NOV-2015, 01:00:25 180 ORDERS_OCT_2015 17-NOV-2015, 02:28:15 186 ORDERS_NOV_2015 17-NOV-2015, 03:35:19 180 ORDERS_DEC_2015 17-NOV-2015, 05:01:12 0 6. Gather global statistics using Incremental Statistics feature. exec dbms_stats.gather_table_stats('OE','ORDERS_DEMO'); 7. Ensure that partition level statistics have not been re-gathered and synopsis are intact. SELECT partition_name, to_char( last_analyzed, 'DD-MON-YYYY, HH24:MI:SS' ) last_analyze, num_rows FROM DBA_TAB_PARTITIONS WHERE table_name = 'ORDERS_DEMO' ORDER BY partition_position; PARTITION_NAME LAST_ANALYZE NUM_ROWS ---------------------------------------- ---------------------------------------- ---------- ORDERS_SEP_2015 17-NOV-2015, 01:00:25 180 ORDERS_OCT_2015 17-NOV-2015, 02:28:15 186 ORDERS_NOV_2015 17-NOV-2015, 03:35:19 180 ORDERS_DEC_2015 17-NOV-2015, 05:01:12 0 8. Check if we have actually gathered global statistics by using Incremental Statistics gathering feature. SELECT o.name, decode( bitand( h.spare2, 8 ), 8, 'yes', 'no' ) incremental FROM HIST_HEAD$ h, OBJ$ o WHERE h.obj# = o.obj# AND o.name = 'ORDERS_DEMO' AND o.subname IS NULL; NAME INCREMENTAL ---------------- ---------------- ORDERS_DEMO yes With this approach we can create synopsis on each partition in a controlled manner and then derive global statistics efficiently by using synopsis created before hand. Hence on large partitioned tables its better to follow this approach and avoid enabling Incremental Statistics feature directly by setting GRANULARITY=>AUTO as per the documentation. Drastic enhancements in 12c In 12c Incremental Statistics has been enhanced tremendously when compared to 11g. Now in 12c we have greater control and flexibility over the behavior of Incremental Statistics feature. Let walk through some of the important enhancements in detail. Control over STALENESS of partition statistics In 11g release if DML occurs on any partition then partition level statistics of those partitions are meant to be stale, and thus it will result in re-gathering partition statistics which are stale before deriving global statistics through Incremental gathering. This overhead has obligated many people to not use Incremental Statistics feature as even single row modification would result in staleness of partition level statistics. In 12c we can control staleness of partition level statistics by using statistics preference INCREMENTAL_STALENESS along with value USE_STALE_PERCENT, this value defines percentage of rows modified due to DML activity for a partition/sub-partition statistics to be become stale. By default it is defined as 10%, so if more than 10% of the rows are modified in a partition/sub-partition then partition/sub-partition statistics are considered to be stale. This way in 12c we can avoid overhead of collecting partition statistics whenever there is data change(even single row) in partition/sub-partition during Incremental gathering of global statistics. To set USE_STALE_PERCENT for table ORDER_DEMO. BEGIN DBMS_STATS.SET_TABLE_PREFS ( ownname => 'OE', tabname => 'ORDERS_DEMO', pname => 'INCREMENTAL_STALENESS', pvalue => 'USE_STALE_PERCENT'); END; / To modify the default 10% stale percent to 20%. BEGIN DBMS_STATS.SET_TABLE_PREFS ( ownname => 'OE', tabname => 'ORDERS_DEMO', pname => 'STALE_PERCENT', pvalue => 20); END; / Control over LOCKED partition statistics In 11g release if statistics of partitions are locked and if any data gets modified in such partition then the only way to gather global statistics is by scanning full table, its due to the fact that partition level statistics can't be gathered as they are locked. Its very common in warehousing databases to have partitions meant to archive the data and modification of data on such partitions is rare, maintaining global statistics on such type of tables was challenging even after implementing Incremental Statistics feature. In 12c we can instruct to not consider locked partition or subpartition statistics as stale regardless of DML changes(No matter how many rows are modified, it also ignores STALE_PERCENT preference) by setting statistics preference INCREMENTAL_STALENESS to value USE_LOCKED_STATS. This way in 12c we can maintain global statistics incrementally by using existing locked partition level statistics and ignoring the fact that locked partition level statistics are stale. To set USE_LOCKED_STATS for table ORDER_DEMO. BEGIN DBMS_STATS.SET_TABLE_PREFS ( ownname => 'OE', tabname => 'ORDERS_DEMO', pname => 'INCREMENTAL_STALENESS', pvalue => 'USE_LOCKED_STATS'); END; / We can also set both USE_STALE_PERCENT and USE_LOCKED_STATS for table ORDER_DEMO. BEGIN DBMS_STATS.SET_TABLE_PREFS ( ownname => 'OE', tabname => 'ORDERS_DEMO', pname => 'INCREMENTAL_STALENESS', pvalue => 'USE_STALE_PERCENT,USE_LOCKED_STATS'); END; / NOTE: If preference INCREMENTAL_STALENESS is unset then by default it behaves similar to 11g - where even if single row is modified within the partition/sub-partition then Incremental gathering will gather partition/sub-partition statistics before deriving global statistics. Also even if single row is modified within the partition/sub-partition whose statistics are locked then Incremental gathering will perform full table scan to derive global statistics. Incremental Statistics during partition maintenance In 11g release if we perform partition maintenance operation then Incremental Statistics gathering will gather impacted partition statistics(synopsis) before deriving global statistics. Say for example, if we perform partition exchange with a table having up to date statistics(no synopsis) then Incremental gathering will ignore these table level stats and gather them once again after completion of partition exchange operation. Practically partition exchange is just a matter of updating dictionary information, but Incremental gathering is not able to leverage it and thus creates overhead of gathering partition level statistics(synopsis) for this new exchanged partition. There is no way to avoid this overhead in warehousing environment where loading table through partition exchange operation is very common. In 12c we can create synopsis on a non-partitioned table which is going to be exchanged with the partitioned table where global statistics are maintained incrementally. The synopsis of non-partitioned table will allow to maintain incremental statistics as part of a partition exchange operation without having to explicitly gathering statistics on the partition after the exchange. For example, I want to exchange non-partitioned staging table ORDERS_STAGING with partitioned table ORDERS_DEMO along with synopsis to maintain incremental statistics as part of the partition exchange operation Partitioned table : ORDERS_DEMO Non-partitioned table : ORDERS_STAGING 1. Enable synopsis creation for non-partitioned table ORDERS_STAGING BEGIN -- Enable Incremental feature DBMS_STATS.SET_TABLE_PREFS ( ownname => 'OE', tabname => 'ORDERS_STAGING', pname => 'INCREMENTAL', pvalue => 'TRUE'); -- Set synopsis creation at table level DBMS_STATS.SET_TABLE_PREFS ( ownname => 'OE', tabname => 'ORDERS_STAGING', pname => 'INCREMENTAL_LEVEL', pvalue => 'TABLE'); END; / As you saw I have set INCREMENTAL_LEVEL preference to value 'TABLE', this is a new introduction of preference in 12c to gather table-level synopsis. 2. Create synopsis for non-partitioned staging table BEGIN DBMS_STATS.GATHER_TABLE_STATS ( ownname => 'OE', tabname => 'ORDERS_STAGING'); END; / 3. Check and confirm synopsis creation of staging table ORDERS_STAGING SELECT o.name "Table Name", p.subname "Part", c.name "Column", h.analyzetime "Synopsis Creation Time" FROM WRI$_OPTSTAT_SYNOPSIS_HEAD$ h, OBJ$ o, USER$ u, COL$ c, ( ( SELECT TABPART$.bo# BO#, TABPART$.obj# OBJ# FROM TABPART$ tabpart$ ) UNION ALL ( SELECT TABCOMPART$.bo# BO#, TABCOMPART$.obj# OBJ# FROM TABCOMPART$ tabcompart$ ) ) tp, OBJ$ p WHERE u.name = 'OE' AND o.name = 'ORDERS_STAGING' AND tp.obj# = p.obj# AND h.bo# = tp.bo# AND h.group# = tp.obj# * 2 AND h.bo# = c.obj#(+) AND h.intcol# = c.intcol#(+) AND o.owner# = u.user# AND h.bo# = o.obj# ORDER BY 4,1,2,3 / Table Name Part Column Synopsis Creation Time -------------------- -------------------- ------------------------- ------------------------------ ORDERS_STAGING CUSTOMER_ID 2015-11-23-04:01:46 ORDERS_STAGING ORDER_DATE 2015-11-23-04:01:46 ORDERS_STAGING ORDER_ID 2015-11-23-04:01:46 ORDERS_STAGING ORDER_MODE 2015-11-23-04:01:46 ORDERS_STAGING ORDER_STATUS 2015-11-23-04:01:46 4. Perform partition exchange ALTER TABLE ORDER_DEMO EXCHANGE PARTITION ORDERS_DEC_2015 WITH TABLE ORDERS_STAGING; 5. Check and confirm that partition level synopsis(ORDERS_DEC_2015) has been exchanged instead of re-gathering it SELECT o.name "Table Name", p.subname "Part", c.name "Column", h.analyzetime "Synopsis Creation Time" FROM WRI$_OPTSTAT_SYNOPSIS_HEAD$ h, OBJ$ o, USER$ u, COL$ c, ( ( SELECT TABPART$.bo# BO#, TABPART$.obj# OBJ# FROM TABPART$ tabpart$ ) UNION ALL ( SELECT TABCOMPART$.bo# BO#, TABCOMPART$.obj# OBJ# FROM TABCOMPART$ tabcompart$ ) ) tp, OBJ$ p WHERE u.name = 'OE' AND o.name = 'ORDERS_DEMO' AND p.subname in ('ORDERS_DEC_2015','ORDERS_NOV_2015','ORDERS_OCT_2015') AND tp.obj# = p.obj# AND h.bo# = tp.bo# AND h.group# = tp.obj# * 2 AND h.bo# = c.obj#(+) AND h.intcol# = c.intcol#(+) AND o.owner# = u.user# AND h.bo# = o.obj# ORDER BY 4,1,2,3 / Table Name Part Column Synopsis Creation Time -------------------- -------------------- ------------------------- ------------------------------ ORDERS_DEMO ORDERS_SEP_2015 CUSTOMER_ID 2015-11-17-01:00:25 ORDERS_DEMO ORDER_DATE 2015-11-17-01:00:25 ORDERS_DEMO ORDER_ID 2015-11-17-01:00:25 ORDERS_DEMO ORDER_MODE 2015-11-17-01:00:25 ORDERS_DEMO ORDER_STATUS 2015-11-17-01:00:25 ORDERS_DEMO ORDERS_OCT_2015 CUSTOMER_ID 2015-11-17-02:28:15 ORDERS_DEMO ORDER_DATE 2015-11-17-02:28:15 ORDERS_DEMO ORDER_ID 2015-11-17-02:28:15 ORDERS_DEMO ORDER_MODE 2015-11-17-02:28:15 ORDERS_DEMO ORDER_STATUS 2015-11-17-02:28:15 ORDERS_DEMO ORDERS_NOV_2015 CUSTOMER_ID 2015-11-17-03:35:19 ORDERS_DEMO ORDER_DATE 2015-11-17-03:35:19 ORDERS_DEMO ORDER_ID 2015-11-17-03:35:19 ORDERS_DEMO ORDER_MODE 2015-11-17-03:35:19 ORDERS_DEMO ORDER_STATUS 2015-11-17-03:35:19 ORDERS_DEMO ORDERS_DEC_2015 CUSTOMER_ID 2015-11-23-04:01:46 ORDERS_DEMO ORDER_DATE 2015-11-23-04:01:46 ORDERS_DEMO ORDER_ID 2015-11-23-04:01:46 ORDERS_DEMO ORDER_MODE 2015-11-23-04:01:46 ORDERS_DEMO ORDER_STATUS 2015-11-23-04:01:46 By comparing creation time(2015-11-23-04:01:46) of Synopsis we can conclude that it has been copied from ORDERS_STAGING to ORDERS_DEMO partition ORDERS_DEC_2015 due to partition exchange operation. This way we can easily swap the synopsis from staging table to partitioned table along with partition exchange operation and avoid overhead of re-gathering partition level stats and synopsis. Conclusion Alongside Histograms can also take advantage of this feature as they can be derived/aggregated from partition level histogram to build table level histogram. If there is change in method_opt while gathering stats resulting in new histogram on partition then whole table has to be scan using a small sample to build table level histogram. This saves huge amount of time and resource required to build histogram as each histogram will add another burden to dbms_stats processing. Its important to keep in mind that histogram data is not stored in synopsis but global histogram can be derived from partition level histograms by leveraging Incremental statistics gathering feature. In case of indexes it doesn't use Incremental strategy, so to gather higher level statistics for partitioned indexes it scans complete index with lower sample size and time required to gather it is directly proportional to index size. Using Incremental Statistics gathering in 11g was painful due to limited control over it, but in 12c we have great control over the behavior of Incremental Statistics gathering. Enhancements in 12c has made this feature prominent in warehousing environment by saving huge amount of resources for gathering expensive global partitioned statistics.
↧
Wiki Page: Streaming MySQL Table Data to Oracle NoSQL Database with Flume
Consider the requirement that data is added to MySQL Database from the MySQL Command Line Interface (CLI) and a copy of the data is to be made available for another application or user in Oracle NoSQL Database. Or, the MySQL Database table data is to be backed up in Oracle NoSQL Database. Integrating Flume with MySQL as a source and Oracle NoSQL Database as a sink would copy a MySQL table to Oracle NoSQL Database. In this tutorial we shall stream MySQL table data to Oracle NoSQL Database using Flume. This tutorial has the following sections. Installing MySQL Database Installing Oracle NoSQL Database Setting the Environment Creating a Database Table in MySQL Configuring Flume Running a Flume Agent Streaming Data, not just Bulk Transferring Data Installing MySQL Database First, install MySQL Database, which is to be used as the Flume source. Create a directory to install MySQL and set its permissions to global (777). mkdir /mysql chmod -R 777 /mysql cd /mysql Download and extract the MySQL Database tar.gz file. tar zxvf mysql-5.6.22-linux-glibc2.5-i686.tar.gz Create the mysql group and add the mysql user to the group, if not already added. >groupadd mysql >useradd -r -g mysql mysql Create a symlink for MySQL Database installation directory. >ln -s /mysql/mysql-5.6.19-linux-glibc2.5-i686 mysql >cd mysql Set the current directory owner and group to mysql and install the MySQL Database. chown -R mysql . chgrp -R mysql . scripts/mysql_install_db --user=mysql Change the current directory owner to root and change the data directory owner to mysql . chown -R root . chown -R mysql data Start the MySQL Database. mysqld_safe --user=mysql & By default the root user does not require a password. Set a password for the root user to mysql with the following command. >mysqladmin -u root -p password Installing Oracle NoSQL Database Download and extract the Oracle NoSQL Database tar.gz file. wget http://download.oracle.com/otn-pub/otn_software/nosql-database/kv-ce-3.2.5.tar.gz tar -xvf kv-ce-3.2.5.tar.gz Create a lightweight Oracle NoSQL Database store called kvstore with the following command. java -jar /flume/kv-3.2.5/lib/kvstore.jar kvlite The kvstore gets created with host as localhost.oraclelinux and port as 5000. Setting the Environment We need to install the following software to run Flume. -Flume 1.4 -Hadoop 2.0.0 -flume-ng-sql-source plugin -Java 7 Create a directory /flume to install Flume and set its permissions to global (777). mkdir /flume chmod -R 777 /flume cd /flume Download and extract the Java gz file. tar zxvf jdk-7u55-linux-i586.gz Download and extract the CDH 4.6 Hadoop 2.0.0 tar.gz file. wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.6.0.tar.gz tar -xvf hadoop-2.0.0-cdh4.6.0.tar.gz Create symlinks for Hadoop conf and bin directories. ln -s /flume/hadoop-2.0.0-cdh4.6.0/bin /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/bin ln -s /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/conf Download and extract the CDH 4.6 Flume 1.4 tar.gz file. wget http://archive-primary.cloudera.com/cdh4/cdh/4/flume-ng-1.4.0-cdh4.6.0.tar.gz tar -xvf flume-ng-1.4.0-cdh4.6.0.tar.gz Download the source code for flume-ng-sql-source from https://github.com/keedio/flume-ng-sql-source . Compile and package the plugin into a jar file with the following command. >mvn package The flume-ng-sql-source-0.8.jar jar gets generated in the target directory. Copy the flume-ng-sql-source-0.8.jar jar to the Flume lib directory. cp flume-ng-sql-source-0.8.jar /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Also copy the MySQL JDBC Jar file to the Flume lib directory. cp mysql-connector-java-5.1.31-bin.jar /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Set the environment variables for Hadoop, Flume, MySQL Database, and Java. vi ~/.bashrc export HADOOP_PREFIX=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop export FLUME_HOME=/flume/apache-flume-1.4.0-cdh4.6.0-bin export FLUME_CONF=/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf export JAVA_HOME=/flume/jdk1.7.0_55 export MYSQL_HOME=/mysql/mysql-5.6.19-linux-glibc2.5-i686 export HADOOP_MAPRED_HOME=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_HOME=/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1 export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:$FLUME_HOME/lib/* export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME/bin:$MYSQL_HOME/bin export CLASSPATH=$HADOOP_CLASSPATH export HADOOP_NAMENODE_USER=flume export HADOOP_DATANODE_USER=flume Create a directory sql-source/lib in the $FLUME_HOME/plugins.d directory and copy the flume-ng-sql-source-0.8.jar file to the directory. mkdir -p $FLUME_HOME/plugins.d/sql-source/lib cp /media/sf_VMShared/flume/mysql/flume-ng-sql-source-0.8.jar $FLUME_HOME/plugins.d/sql-source/lib Set the configuration properties for Hadoop in the /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop/ core-site.xml file. fs.defaultFS hdfs://10.0.2.15:8020 hadoop.tmp.dir file:///var/lib/hadoop-0.20/cache Create the directory specified as the Hadoop tmp directory. mkdir -p /var/lib/hadoop-0.20/cache chmod -R 777 /var/lib/hadoop-0.20/cache Set the HDFS configuration properties in the hdfs-site.xml file. dfs.permissions.superusergroup hadoop dfs.namenode.name.dir file:///data/1/dfs/nn dfs.replication 1 dfs.permissions false Create the directory specified as the NameNode storage directory. mkdir -p /data/1/dfs/nn chmod -R 777 /data/1/dfs/nn Format and start NameNode. Also start DataNode. hdfs namenode -format hdfs namenode hdfs datanode To copy Flume into HDFS create /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib in HDFS and set its permissions to global. hdfs dfs -mkdir -p /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib hdfs dfs -chmod -R 777 /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Put the Flume lib jars to HDFS. hdfs dfs -put /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib/* hdfs://10.0.2.15:8020/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Creating a Database Table in MySQL In this section create a MySQL Database table from which data is to be streamed to Oracle NoSQL Database. Login to the MySQL CLI and select the test database. >mysql –u root –p >use test Create a table called wlslog . CREATE TABLE wlslog (id INTEGER PRIMARY KEY, time_stamp VARCHAR2(4000), category VARCHAR2(4000), type VARCHAR2(4000), servername VARCHAR2(4000), code VARCHAR2(4000), msg VARCHAR2(4000)); Add 9 rows of data to the wlslog table. INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(1,'Apr-8-2014-7:06:16-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to STANDBY'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(2,'Apr-8-2014-7:06:17-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to STARTING'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(3,'Apr-8-2014-7:06:18-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to ADMIN'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(4,'Apr-8-2014-7:06:19-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to RESUMING'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(5,'Apr-8-2014-7:06:20-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000361','Started WebLogic AdminServer'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(6,'Apr-8-2014-7:06:21-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to RUNNING'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(7,'Apr-8-2014-7:06:22-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000360','Server started in RUNNING mode'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(8,'Apr-8-2014-7:06:23-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000360','Server started in RUNNING mode'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(9,'Apr-8-2014-7:06:24-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000360','Server started in RUNNING mode'); Configuring Flume Next, configure Flume using the flume.conf file, which should be in the $FLUME_HOME/conf directory. Property Description Value agent.channels.ch1.type Sets the channel type memory agent.sources.sql-source.channels Sets the channel on the source ch1 agent.channels Sets the channel name ch1 agent.sinks Sets the sink name noSqlDbSink agent.sinks.noSqlDbSink.channel Sets the channel on the sink ch1 agent.sources Sets the source name sql-source agent.sources.sql-source.type Sets the source type org.apache.flume.source.SQLSource agent.sources.sql-source.connection.url Sets the connection url for the source jdbc:mysql://localhost:3306/test agent.sources.sql-source.user Sets the MySQL database user root agent.sources.sql-source.password Sets the MySQL database password mysql agent.sources.sql-source.table Sets the MySQL table name wlslog agent.sources.sql-source.database Sets the MySQL Database name test agent.sources.sql-source.columns.to.select Sets the columns to select to all columns. * agent.sources.sql-source.incremental.column.name Sets the column name whose value is to be incremented in selecting rows to transfer id agent.sources.sql-source.incremental.value Sets the initial incremental column value. A value of 0 transfers all rows. 0 agent.sources.sql-source.run.query.delay Sets the query delay in ms 10000 agent.sources.sql-source.status.file.path Sets the directory for the status file /var/lib/flume agent.sources.sql-source.status.file.name Sets the status file name sql-source.status agent.sinks.noSqlDbSink.type Sets the sink type class com.gvenzl.flumekvstore.sink.NoSQLDBSink agent.sinks.noSqlDbSink.kvHost Sets the sink host localhost agent.sinks.noSqlDbSink.kvPort Sets the sink port 5000 agent.sinks.noSqlDbSink.kvStoreName Sets the KV Store name kvstore agent.sinks.noSqlDbSink.durability Sets the durability level WRITE_NO_SYNC agent.sinks.noSqlDbSink.keyPolicy Sets the key policy generate agent.sinks.noSqlDbSink.keyType Sets the key type random agent.sinks.noSqlDbSink.keyPrefix Sets the key prefix k_ agent.sinks.noSqlDbSink.batchSize Sets the batch size 10 agent.channels.ch1.capacity Sets the channel capacity 100000 The flume.conf file is listed: agent.channels.ch1.type = memory agent.sources.sql-source.channels = ch1 agent.channels = ch1 agent.sinks = noSqlDbSink agent.sinks.noSqlDbSink.channel = ch1 agent.sources = sql-source agent.sources.sql-source.type = org.apache.flume.source.SQLSource # URL to connect to database (currently only mysql is supported) agent.sources.sql-source.connection.url = jdbc:mysql://localhost:3306/test # Database connection properties agent.sources.sql-source.user = root agent.sources.sql-source.password = mysql agent.sources.sql-source.table = wlslog agent.sources.sql-source.database = test agent.sources.sql-source.columns.to.select = * # Increment column properties agent.sources.sql-source.incremental.column.name = id # Increment value is from you want to start taking data from tables (0 will import entire table) agent.sources.sql-source.incremental.value = 0 # Query delay, each configured milisecond the query will be sent agent.sources.sql-source.run.query.delay=10000 # Status file is used to save last readed row agent.sources.sql-source.status.file.path = /var/lib/flume agent.sources.sql-source.status.file.name = sql-source.status agent.sinks.noSqlDbSink.type = com.gvenzl.flumekvstore.sink.NoSQLDBSink agent.sinks.noSqlDbSink.kvHost = localhost agent.sinks.noSqlDbSink.kvPort = 5000 agent.sinks.noSqlDbSink.kvStoreName = kvstore agent.sinks.noSqlDbSink.durability = WRITE_NO_SYNC agent.sinks.noSqlDbSink.keyPolicy = generate agent.sinks.noSqlDbSink.keyType = random agent.sinks.noSqlDbSink.keyPrefix = k_ agent.sinks.noSqlDbSink.batchSize = 10 agent.channels.ch1.capacity = 100000 Create the directory and file for the SQL source status. mkdir -p /var/lib/flume chmod -R 777 /var/lib/flume cd /var/lib/flume vi sql-source.status :wq We also need to create the Flume env file from the template. cp $FLUME_HOME/conf/flume-env.sh.template $FLUME_HOME/conf/flume-env.sh Running a Flume Agent Before running the Flume agent the following should have been configured/started. -Flume configuration file flume.conf -HDFS -Oracle NoSQL Database -MySQL Database Run the Flume agent with the following command. flume-ng agent --conf ./conf/ -f $FLUME_HOME/conf/flume.conf -n agent -Dflume.root.logger=INFO,console Flume agent gets started. The source, channel and sink get started and a connection with Oracle NoSQL Database gets established to stream MySQL table wlslog data with a SQL query that selects all rows. Subsequently the Flume agent continues to run with a SQL query with id>9 in the WHERE clause as uptil id 9 have already been transferred. A more detailed output from the Flume agent is listed: -Djava.library.path=:/usr/java/packages/lib/i386:/lib:/usr/lib org.apache.flume.node.Application -f /flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume.conf -n agent 15/01/19 19:42:15 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 15/01/19 19:42:15 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume.conf 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Added sinks: noSqlDbSink Agent: agent 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent] 15/01/19 19:42:15 INFO node.AbstractConfigurationProvider: Creating channels 15/01/19 19:42:15 INFO channel.DefaultChannelFactory: Creating instance of channel ch1 type memory 15/01/19 19:42:15 INFO node.AbstractConfigurationProvider: Created channel ch1 15/01/19 19:42:15 INFO source.DefaultSourceFactory: Creating instance of source sql-source, type org.apache.flume.source.SQLSource 15/01/19 19:42:15 INFO source.SQLSource: Reading and processing configuration values for source sql-source 15/01/19 19:42:15 INFO source.SQLSource: Establishing connection to database test for source sql-source 15/01/19 19:42:16 INFO source.SQLSource: Source sql-source Connected to test 15/01/19 19:42:16 INFO sink.DefaultSinkFactory: Creating instance of sink: noSqlDbSink, type: com.gvenzl.flumekvstore.sink.NoSQLDBSink 15/01/19 19:42:16 INFO sink.NoSQLDBSink: Configuration settings: 15/01/19 19:42:16 INFO sink.NoSQLDBSink: kvHost: localhost 15/01/19 19:42:16 INFO sink.NoSQLDBSink: kvPort: 5000 15/01/19 19:42:16 INFO sink.NoSQLDBSink: kvStoreName: kvstore 15/01/19 19:42:16 INFO sink.NoSQLDBSink: durability: WRITE_NO_SYNC 15/01/19 19:42:16 INFO sink.NoSQLDBSink: keyPolicy: generate 15/01/19 19:42:16 INFO sink.NoSQLDBSink: keyType: random 15/01/19 19:42:16 INFO sink.NoSQLDBSink: keyPrefix: k_ 15/01/19 19:42:16 INFO sink.NoSQLDBSink: batchSize: 10 15/01/19 19:42:16 INFO node.AbstractConfigurationProvider: Channel ch1 connected to [sql-source, noSqlDbSink] 15/01/19 19:42:16 INFO node.Application: Starting new configuration:{ sourceRunners:{sql-source=PollableSourceRunner: { source:org.apache.flume.source.SQLSource{name:sql-source,state:IDLE} counterGroup:{ name:null counters:{} } }} sinkRunners:{noSqlDbSink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4473c counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} } 15/01/19 19:42:16 INFO node.Application: Starting Channel ch1 15/01/19 19:42:17 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean. 15/01/19 19:42:17 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: ch1 started 15/01/19 19:42:17 INFO node.Application: Starting Sink noSqlDbSink 15/01/19 19:42:17 INFO node.Application: Starting Source sql-source 15/01/19 19:42:17 INFO source.SQLSourceUtils: /var/lib/flume/sql-source.status correctly formed 15/01/19 19:42:17 INFO source.SQLSource: Query: SELECT * FROM wlslog WHERE id>9 ORDER BY id; 15/01/19 19:42:17 INFO sink.NoSQLDBSink: Connection to KV store established 15/01/19 19:42:27 INFO source.SQLSourceUtils: /var/lib/flume/sql-source.status correctly formed 15/01/19 19:42:27 INFO source.SQLSource: Query: SELECT * FROM wlslog WHERE id>9 ORDER BY id; 15/01/19 19:42:37 INFO source.SQLSourceUtils: /var/lib/flume/sql-source.status correctly formed 15/01/19 19:42:37 INFO source.SQLSource: Query: SELECT * FROM wlslog WHERE id>9 ORDER BY id; 15/01/19 19:42:47 INFO source.SQLSourceUtils: /var/lib/flume/sql-source.status correctly formed 15/01/19 19:42:47 INFO source.SQLSource: Query: SELECT * FROM wlslog WHERE id>9 ORDER BY id; 15/01/19 19:42:57 INFO source.SQLSourceUtils: /var/lib/flume/sql-source.status correctly formed 15/01/19 19:42:57 INFO source.SQLSource: Query: SELECT * FROM wlslog WHERE id>9 ORDER BY id; To find if the MySQL Table data has been transferred to Oracle NoSQL Database start the Oracle NoSQL Database CLI and connect to the kvstore with the following commands. java -Xmx256m -Xms256m -jar /flume/kv-3.2.5/lib/kvstore.jar runadmin -port 5000 -host localhost connect store –host localhost –port 5000 –name kvstore Run the following command to select all key/value pairs in Oracle NoSQL Database store kvstore with the following command. get kv –all The 9 rows transferred from MySQL table get listed. Streaming Data, not just Transferring Bulk Transferring Data A lot of the bulk data transfer tools such as Sqoop transfer bulk data but terminate after having transferred the available data. Flume does not just transfer data but streams data, implying that after the available data has been transferred the Flume agent continues to run and if more data becomes available to transfer the data is transferred as and when the data becomes available. If the MySQL table wlslog was created after starting the Flume agent the MySQL table data would got streamed to Oracle NoSQL Database. For example, add another row of data to MySQL table wlslog with the following SQL statement. INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(10,'Apr-8-2014-7:06:25-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000360','Server started in RUNNING mode'); A new row gets added to MySQL table wlslog . As indicated by the Flume output the Flume agent streams the new row of data to Oracle NoSQL Database. Run the get kv –all query in the Oracle NoSQL Database CLI. 10 rows of data get listed instead of the 9 rows listed previously. The Flume agent updates the SQL query from id>9 to id>10 in the WHERE clause and continues to run. In this tutorial we streamed MySQL table data to Oracle NoSQL Database using Flume.
↧
Wiki Page: Streaming Oracle Database Logs to HDFS with Flume
Apache Flume is a service based on streaming data flow for collecting, aggregating and moving large quantities of log data. A unit of data flow in Flume is called an event. Flume is made of three components: source, channel and sink. The source, channel and sink are collectively hosted by a Flume agent, which is a JVM process. Data flow originates in the source, which could have received the data from an external source, and is stored in the channel before being consumed by the sink. Different types of sources, channels and sinks are supported including Avro source, Thrift source, Exec source, JMS source, spooling directory source, sequence generator source, syslog source, HTTP source, scribe source and custom source. Different types of channels are supported including memory channel, JDBC channel, file channel and custom channel. The different types of sinks supported include HDFS sink, logger sink, Avro sink, Thrift sink, HBase sink, ElasticSearch sink, and custom sink. In this tutorial we shall stream Oracle Database Alert log to HDFS using Flume. Setting the Environment Finding the Log Directory Configuring Flume Running the Flume Agent Streaming a Complete Log File Exception when Processing Event Batch Setting the Environment Oracle Linux 6.5 installed on Oracle VirtualBox 4.3 is used. We need to download and install the following software. Oracle Database 11g Java 7 Flume 1.4 Hadoop 2.0.0 Create a directory /flume to install the software and set the directory’s permissions. mkdir /flume chmod -R 777 /flume cd /flume Download Java 7 .gz file and extract the file to the /flume directory. tar zxvf jdk-7u55-linux-i586.tar.gz Download and extract Hadoop 2.0.0 hadoop-2.0.0-cdh4.6.0.tar.gz file. wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.6.0.tar.gz tar -xvf hadoop-2.0.0-cdh4.6.0.tar.gz Create symlinks for bin and conf directories. ln -s /flume/hadoop-2.0.0-cdh4.6.0/bin /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/bin ln -s /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/conf Download and install Flume 1.4.0. wget http://archive-primary.cloudera.com/cdh4/cdh/4/flume-ng-1.4.0-cdh4.6.0.tar.gz tar -xvf flume-ng-1.4.0-cdh4.6.0.tar.gz Set the environment variables for Hadoop, Java, Flume, and Oracle in the bash shell file. vi ~/.bashrc export HADOOP_PREFIX=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop export FLUME_HOME=/flume/apache-flume-1.4.0-cdh4.6.0-bin export FLUME_CONF=/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf export JAVA_HOME=/flume/jdk1.7.0_55 export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1 export ORACLE_SID=ORCL export HADOOP_MAPRED_HOME=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_HOME=/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2 export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/lib/*: export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME/bin:$ORACLE_HOME/bin:$ FLUME_HOME/bin export CLASSPATH=$HADOOP_CLASSPATH Set the configuration properties fs.defaultFS and hadoop.tmp.dir in the /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop/core-site.xml file. fs.defaultFS hdfs://10.0.2.15:8020 hadoop.tmp.dir file:///var/lib/hadoop-0.20/cache Remove any previously created Hadoop temporary directory and create the directory again and set its permissions. rm –rf /var/lib/hadoop-0.20/cache mkdir -p /var/lib/hadoop-0.20/cache chmod -R 777 /var/lib/hadoop-0.20/cache Set the configuration properties dfs.permissions.superusergroup , dfs.namenode.name.dir , dfs.replication , and dfs.permissions in hdfs-site.xml . dfs.permissions.superusergroup hadoop dfs.namenode.name.dir file:///data/1/dfs/nn dfs.replication 1 dfs.permissions false Remove any previously created NameNode storage directory and create the directory again and set its permissions. rm –rf /data/1/dfs/nn mkdir -p /data/1/dfs/nn chmod -R 777 /data/1/dfs/nn Format the NameNode and start NameNode and DataNode (HDFS). hadoop namenode -format hadoop namenode hadoop datanode We need to copy Flume to HDFS to be available in the runtime classpath. Create a directory in the HDFS with the same directory structure as Flume lib directory and set its permissions to global (777). hadoop dfs -mkdir /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib hadoop dfs -chmod -R 777 /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Put the Flume lib directory jars in the HDFS. hdfs dfs -put /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib/* hdfs://10.0.2.15:8020/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Copy the Flume env template file to flume-env.sh file. cp $FLUME_HOME/conf/flume-env.sh.template $FLUME_HOME/conf/flume-env.sh Finding the Log Directory We shall be streaming data from one of the Oracle Database trace files, the Oracle Alert log, to HDFS using Flume. To find the directory location of the trace files run a SELECT query on the v$diag_info view. select * from v$diag_info The trace files directory gets listed. The Oracle alert log is also generated in a separate directory, which also gets listed. Change directory (cd) to the trace files directory /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace and list the files in the trace directory with the ls –l command. The trace files including the alert_ORCL.log file get listed. We shall be streaming data from the alert_ORCL.log file with Flume. Configuring Flume The Flume agent, which hosts the sources, channels and sinks is configured in the flume.conf file. From the $FLUME_CONF directory run the ls –l command to list the configuration files, which includes the Flume configuration file template /flume-conf.properties.template . Copy the Flume configuration properties template file to the flume.conf file as a result creating a new file flume.conf . cp conf/flume-conf.properties.template conf/flume.conf We shall configure the following properties in flume.conf for a Flume agent called hbase-agent . We need to configure the following properties in flume.conf . Configuration Property Description Value agent1.channels The Flume agent channels. We shall be using only channel called ch1 (the channel name is arbitrary). agent1.channels = ch1 agent1.sources The Flume agent sources. We shall be using one source of type exec called exec1 (the source name is arbitrary). agent1.sources = exec1 agent1.sinks The Flume agent sinks. We shall be using one sink of type hdfs called HDFS (the sink name is arbitrary). agent1.sinks = HDFS agent1.channels.ch1.type The channel type is hdfs. agent1.sinks.HDFS.type = hdfs agent1.sources.exec1.channels Define the flow by binding the source to the channel. agent1.sources.exec1.channels = ch1 agent1.sources.exec1.type Specify the source type as exec. agent1.sources.exec1.type = exec agent1.sources.exec1.command Runs the specified Unix command and produce data on stdout. Commonly used commands are the HDFS shell commands cat and tail for copying a complete log file or the last KB of a log file to stdout. We shall be demonstrating both of these commands. agent1.sources.exec1.command = tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log or agent1.sources.exec1.command = cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log agent1.sinks.HDFS.channel Define the flow by binding the sink to the channel. agent1.sinks.HDFS.channel = ch1 agent1.sinks.HDFS.type Specify the sink type as hdfs. agent1.sinks.HDFS.type = hdfs agent1.sinks.HDFS.hdfs.path Specify the sink path in the HDFS. HDFS has two connotations in the example Flume agent, HDFS is the name of the sink (the sink name may be set a different value) and also the sink type is hdfs or HDFS. hdfs://10.0.2.15:8020/flume agent1.sinks.HDFS.hdfs.file.Type The hdfs file type. agent1.sinks.HDFS.hdfs.file.Type = DataStream The flume.conf file is listed: agent1.channels.ch1.type = memory agent1.sources.exec1.channels = ch1 agent1.sources.exec1.type = exec agent1.sources.exec1.command = tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log agent1.sinks.HDFS.channel = ch1 agent1.sinks.HDFS.type = hdfs agent1.sinks.HDFS.hdfs.path = hdfs://10.0.2.15:8020/flume agent1.sinks.HDFS.hdfs.file.Type = DataStream agent1.channels = ch1 agent1.sources = exec1 agent1.sinks = HDFS Running the Flume Agent In this section we shall run the Flume agent to stream the last KB in the alert_ORCL.log file to HDFS using the tail command. Run the Flume agent using the flume-ng shell script in the bin directory. Specify the agent name using the –n option, the configuration directory using the –conf option and the configuration file using the –f option. Specify the Flume logger Dflume.root.logger as INFO,console to log at INFO level to the console. Run the following command to run the Flume agent. >cd /flume/apache-flume-1.4.0-cdh4.6.0-bin >bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1 -Dflume.root.logger=INFO,console The Flume agent gets started and streams the Oracle alert log file to HDFS. The Flume agent runs the following procedure. 1. Start the configuration provider. 2. Add sink ( HDFS ) to agent agent1 3. Create instance of channel ch1 of type memory . 4. Create instance of source exec1 of type exec . 5. Create instance of sink HDFS of type hdfs . 6. Connect channel ch1 to source and sink [exec1, HDFS]. 7. Start channel ch1 . 8. Start sink HDFS . 9. Start source exec1 . 10. Create the FlumeData file. A more detailed output from the Flume agent is as follows. [root@localhost apache-flume-1.4.0-cdh4.6.0-bin]# bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1 -Dflume.root.logger=INFO,console Info: Sourcing environment configuration script /flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume-env.sh Info: Including Hadoop libraries found via (/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/bin/hadoop) for HDFS access Info: Excluding /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/common/lib/slf4j-api-1.6.1.jar from classpath Info: Excluding /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar from classpath + exec /flume/jdk1.7.0_55/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf:/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib:/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce/lib-examples:/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/contrib/capacity-scheduler/*.jar' -Djava.library.path=:/flume/hadoop-2.0.0-cdh4.6.0/lib/native org.apache.flume.node.Application -f conf/flume.conf -n agent1 2014-11-17 11:56:23,219 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting 2014-11-17 11:56:23,242 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:conf/flume.conf 2014-11-17 11:56:23,272 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 11:56:23,278 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 11:56:23,281 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 11:56:23,282 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 11:56:23,283 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: HDFS Agent: agent1 2014-11-17 11:56:23,396 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [agent1] 2014-11-17 11:56:23,397 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:150)] Creating channels 2014-11-17 11:56:23,449 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:40)] Creating instance of channel ch1 type memory 2014-11-17 11:56:23,538 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:205)] Created channel ch1 2014-11-17 11:56:23,540 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:39)] Creating instance of source exec1, type exec 2014-11-17 11:56:23,581 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] Creating instance of sink: HDFS, type: hdfs 2014-11-17 11:56:24,183 (conf-file-poller-0) [WARN - org.apache.hadoop.util.NativeCodeLoader. (NativeCodeLoader.java:62)] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-11-17 11:56:24,518 (conf-file-poller-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:523)] Hadoop Security enabled: false 2014-11-17 11:56:24,533 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:119)] Channel ch1 connected to [exec1, HDFS] 2014-11-17 11:56:24,591 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{exec1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:exec1,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@14ced4e counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} } 2014-11-17 11:56:24,618 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel ch1 2014-11-17 11:56:24,817 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean. 2014-11-17 11:56:24,819 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: ch1 started 2014-11-17 11:56:24,820 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink HDFS 2014-11-17 11:56:24,822 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source exec1 2014-11-17 11:56:24,823 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.ExecSource.start(ExecSource.java:163)] Exec source starting with command:tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace/alert_ORCL.log 2014-11-17 11:56:24,837 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean. 2014-11-17 11:56:24,838 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: HDFS started 2014-11-17 11:56:24,864 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SOURCE, name: exec1: Successfully registered new MBean. 2014-11-17 11:56:24,868 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SOURCE, name: exec1 started 2014-11-17 11:56:28,873 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSSequenceFile.configure(HDFSSequenceFile.java:63)] writeFormat = Writable, UseRawLocalFileSystem = false 2014-11-17 11:56:28,982 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416243388870.tmp 2014-11-17 11:57:01,321 (hdfs-HDFS-call-runner-3) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416243388870.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416243388870 2014-11-17 11:57:01,339 (hdfs-HDFS-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:377)] Writer callback called. Run the following command to list the files in the /flume directory, which is the directory of the Flume sink. hadoop fs -ls hdfs://10.0.2.15:8020/flume The FlumeData file is one of the files listed. Run the following command to find the disk usage of Flume generated data. hadoop dfs -du /flume The FlumeData file disk usage gets listed. Run the following command to output the FlumeData file to the stdout . hadoop dfs -cat hdfs://10.0.2.15:8020/flume/FlumeData.1416243388870 The FlumeData file gets output to stdout . Run the following command to copy the FlumeData file to local filesystem and subsequently open the FlumeData file. hadoop dfs -copyToLocal hdfs://10.0.2.15:8020/flume/FlumeData.1416243388870 /flume The FlumeData file gets displayed. Streaming a Complete Log File In this section we shall stream the complete alert log file alert_ORCL.log using the following configuration property in flume.conf . agent1.sources.exec1.command = cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log The complete alert log file gets generated and multiple FlumeData files get generated. [root@localhost apache-flume-1.4.0-cdh4.6.0-bin]# bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1 -Dflume.root.logger=INFO,console Info: Sourcing environment configuration script /flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume-env.sh Info: Including Hadoop libraries found via (/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/bin/hadoop) for HDFS access Info: Excluding /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/common/lib/slf4j-api-1.6.1.jar from classpath Info: Excluding /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar from classpath + exec /flume/jdk1.7.0_55/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf:/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib:/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/contrib/capacity-scheduler/*.jar' -Djava.library.path=:/flume/hadoop-2.0.0-cdh4.6.0/lib/native org.apache.flume.node.Application -f conf/flume.conf -n agent1 2014-11-17 12:17:03,058 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting 2014-11-17 12:17:03,073 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:conf/flume.conf 2014-11-17 12:17:03,110 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 12:17:03,122 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 12:17:03,124 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 12:17:03,125 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 12:17:03,125 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: HDFS Agent: agent1 2014-11-17 12:17:03,292 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [agent1] 2014-11-17 12:17:03,296 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:150)] Creating channels 2014-11-17 12:17:03,347 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:40)] Creating instance of channel ch1 type memory 2014-11-17 12:17:03,385 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:205)] Created channel ch1 2014-11-17 12:17:03,388 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:39)] Creating instance of source exec1, type exec 2014-11-17 12:17:03,429 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] Creating instance of sink: HDFS, type: hdfs 2014-11-17 12:17:04,060 (conf-file-poller-0) [WARN - org.apache.hadoop.util.NativeCodeLoader. (NativeCodeLoader.java:62)] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-11-17 12:17:04,387 (conf-file-poller-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:523)] Hadoop Security enabled: false 2014-11-17 12:17:04,406 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:119)] Channel ch1 connected to [exec1, HDFS] 2014-11-17 12:17:04,467 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{exec1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:exec1,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@238a4d counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} } 2014-11-17 12:17:04,492 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel ch1 2014-11-17 12:17:04,809 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean. 2014-11-17 12:17:04,811 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: ch1 started 2014-11-17 12:17:04,813 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink HDFS 2014-11-17 12:17:04,814 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source exec1 2014-11-17 12:17:04,816 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.ExecSource.start(ExecSource.java:163)] Exec source starting with command:cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace/alert_ORCL.log 2014-11-17 12:17:04,832 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean. 2014-11-17 12:17:04,833 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: HDFS started 2014-11-17 12:17:04,849 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SOURCE, name: exec1: Successfully registered new MBean. 2014-11-17 12:17:04,851 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SOURCE, name: exec1 started 2014-11-17 12:17:04,969 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSSequenceFile.configure(HDFSSequenceFile.java:63)] writeFormat = Writable, UseRawLocalFileSystem = false 2014-11-17 12:17:05,087 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624953.tmp 2014-11-17 12:17:07,536 (hdfs-HDFS-call-runner-2) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624953.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624953 2014-11-17 12:17:07,630 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624954.tmp 2014-11-17 12:17:07,814 (hdfs-HDFS-call-runner-6) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624954.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624954 2014-11-17 12:17:07,920 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624955.tmp 2014-11-17 12:17:08,574 (hdfs-HDFS-call-runner-0) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624955.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624955 2014-11-17 12:17:08,709 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624956.tmp 2014-11-17 12:17:09,335 (hdfs-HDFS-call-runner-4) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624956.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624956 2014-11-17 12:17:09,513 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624957.tmp 2014-11-17 12:17:09,851 (hdfs-HDFS-call-runner-8) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624957.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624957 2014-11-17 12:17:09,992 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624958.tmp 2014-11-17 12:17:10,268 (hdfs-HDFS-call-runner-2) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624958.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624958 2014-11-17 12:17:10,377 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624959.tmp 2014-11-17 12:17:11,091 (hdfs-HDFS-call-runner-6) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624959.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624959 2014-11-17 12:17:11,175 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624960.tmp 2014-11-17 12:17:11,476 (hdfs-HDFS-call-runner-0) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624960.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624960 2014-11-17 12:17:11,580 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624961.tmp 2014-11-17 12:17:11,936 (hdfs-HDFS-call-runner-4) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624961.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624961 2014-11-17 12:17:12,021 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624962.tmp 2014-11-17 12:17:12,254 (hdfs-HDFS-call-runner-8) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624962.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624962 2014-11-17 12:17:12,343 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624963.tmp 2014-11-17 12:17:42,598 (hdfs-HDFS-call-runner-3) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624963.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624963 2014-11-17 12:17:42,605 (hdfs-HDFS-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:377)] Writer callback called. List the FlumeData files with the hadoop fs -ls hdfs://10.0.2.15:8020/flume command as before. Output one of the FlumeData files to the stdout . If all the FlumeData files are required to be deleted run the following command. hadoop fs -rm hdfs://10.0.2.15:8020/flume/FlumeData.* All FlumeData files get deleted. Exception when Processing Event Batch When processing a large file such as the alert_ORCL with the cat command the Flume agent might fail in putting the event batch on the channel and generate the following exception. 2014-11-17 12:17:07,943 (pool-3-thread-1) [ERROR - org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:347)] Failed while running command: cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace/alert_ORCL.log org.apache.flume.ChannelException: Unable to put batch on required channel: Caused by: org.apache.flume.ChannelException: Space for commit to queue couldn't be acquired Sinks are likely not keeping up with sources, or the buffer size is too tight 2014-11-17 12:17:07,998 (timedFlushExecService17-0) [ERROR - org.apache.flume.source.ExecSource$ExecRunnable$1.run(ExecSource.java:322)] Exception occured when processing event batch org.apache.flume.ChannelException: Unable to put batch on required channel: org.apache.flume.channel.MemoryChannel{name: ch1} [INFO - org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:370)] Command [cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace/alert_ORCL.log] exited with 141 The Flume agent still could run to completion after an interruption. The exception is generated because the default queue size of 100 is not enough. Increase the default queue size with the following configuration property in flume.conf . agent1.channels.ch1.capacity = 100000 In this tutorial we streamed Oracle Database log file data to HDFS using Flume.
↧
Wiki Page: Streaming Oracle Database Logs to HBase with Flume
In the previous tutorial we discussed streaming Oracle logs to HDFS using Flume. Flume supports various types of sources and sinks including the HBase database as a sink. In this tutorial we shall discuss streaming Oracle log file to HBase. This tutorial has the following sections. Setting the Environment Starting HDFS Starting HBase Configuring Flume Agent for HBase Running the Flume Agent Scanning HBase Table ChannelException Setting the Environment We have used the same environment as in the streaming to HDFS. Oracle Database 11g is installed on Oracle Linux 6.5 on VirtualBox 4.3. We need to download and install the following software. Oracle Database 11g HBase Java 7 Flume 1.4 Hadoop 2.0.0 First, create a directory to install the software and set its permissions. mkdir /flume chmod -R 777 /flume cd /flume Create the hadoop group and add the hbase user to the hadoop group. >groupadd hadoop >useradd –g hadoop hbase Download and install Java 7. >tar zxvf jdk-7u55-linux-i586.tar.gz Download and install CDH 4.6 Hadoop 2.0.0. >wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.6.0.tar.gz >tar -xvf hadoop-2.0.0-cdh4.6.0.tar.gz Create symlinks for Hadoop bin and conf files. >ln -s /flume/hadoop-2.0.0-cdh4.6.0/bin-mapreduce1 /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/bin >ln -s /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/conf Download and install CDH 4.6 Flume 1.4.9. wget http://archive-primary.cloudera.com/cdh4/cdh/4/flume-ng-1.4.0-cdh4.6.0.tar.gz tar -xvf flume-ng-1.4.0-cdh4.6.0.tar.gz Download and install CDH 4.6 HBase 0.94.15. wget http://archive.cloudera.com/cdh4/cdh/4/hbase-0.94.15-cdh4.6.0.tar.gz tar -xvf hbase-0.94.15-cdh4.6.0.tar.gz Set permissions of the Flume root directory to global. chmod 777 -R /flume/apache-flume-1.4.0-cdh4.6.0-bin Set the environment variables for Oracle Database, Java, HBase, Flume, and Hadoop in the bash shell file. vi ~/.bashrc export HADOOP_PREFIX=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop export FLUME_HOME=/flume/apache-flume-1.4.0-cdh4.6.0-bin export FLUME_CONF=/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf export HBASE_HOME=/flume/hbase-0.94.15-cdh4.6.0 export HBASE_CONF=/flume/hbase-0.94.15-cdh4.6.0/conf export JAVA_HOME=/flume/jdk1.7.0_55 export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1 export ORACLE_SID=ORCL export HADOOP_MAPRED_HOME=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_HOME=/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1 export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:$HBASE_CONF:$HBASE_HOME/lib/* export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME/bin:$ORACLE_HOME/bin:$FLUME_HOME/bin:$HBASE_HOME/bin export CLASSPATH=$HADOOP_CLASSPATH Starting HDFS In this section we shall configure and start HDFS. Cd to the Hadoop configuration directory. cd /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop Set the NameNode URI ( fs.defaultFS ) and the Hadoop temporary directory ( hadoop.tmp.dir ) configuration properties in the core-site.xml file. fs.defaultFS hdfs://10.0.2.15:8020 hadoop.tmp.dir file:///var/lib/hadoop-0.20/cache Remove any previously created temporary directory and create the directory again and set its permissions to global. rm -rf /var/lib/hadoop-0.20/cache mkdir -p /var/lib/hadoop-0.20/cache chmod -R 777 /var/lib/hadoop-0.20/cache Set the NameNode storage directory ( dfs.namenode.name.dir ), superusergroup ( dfs.permissions.superusergroup ), replication factor ( dfs.replication ), the upper bound on the number of files the DataNode is able to serve concurrently ( dfs.datanode.max.xcievers ), and permission checking ( dfs.permissions ) configuration properties in the hdfs-site.xml . dfs.permissions.superusergroup hadoop dfs.namenode.name.dir file:///data/1/dfs/nn dfs.replication 1 dfs.permissions false dfs.datanode.max.xcievers 4096 Remove any previously created NameNode storage directory and create a new directory and set its permissions to global. rm -rf /data/1/dfs/nn mkdir -p /data/1/dfs/nn chmod -R 777 /data/1/dfs/nn Format and start the NameNode. hadoop namenode -format hadoop namenode Start the DataNode. hadoop datanode We need to copy the Flume lib directory jars to the HDFS to be available to the runtime. Create a directory in HDFS with the same directory structure as the Flume lib directory and set its permissions to global. hadoop dfs -mkdir /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib hadoop dfs -chmod -R 777 /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Put the Flume lib directory jars to the HDFS. hdfs dfs -put /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib/* hdfs://10.0.2.15:8020/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Create the Flume configuration file flume.conf from the template. Also create the Flume env file flume-env.sh from the template. cp $FLUME_HOME/conf/ flume-conf.properties.template $FLUME_HOME/conf/flume.conf cp $FLUME_HOME/conf/flume-env.sh.template $FLUME_HOME/conf/flume-env.sh We shall set the configuration properties for Flume in a subsequent section, but first we shall install HBase. Starting HBase In this section we shall configure and start HBase. HBase configuration is discussed in detail in another tutorial ( http://www.toadworld.com/platforms/oracle/w/wiki/10976.loading-hbase-table-data-into-an-oracle-database-with-oracle-loader-for-hadoop.aspx ). Set the HBase configuration in the /flume/hbase-0.94.15-cdh4.6.0/conf/hbase-site.xml configuration file as follows. hbase.rootdir hdfs://10.0.2.15:8020/hbase hbase.zookeeper.property.dataDir /zookeeper hbase.zookeeper.property.clientPort 2182 hbase.zookeeper.quorum localhost hbase.regionserver.port 60020 hbase.master.port 60000 Create the Zookeeper data directory and set its permissions. mkdir -p /zookeeper chmod -R 700 /zookeeper As root user create the HBase root directory in HDFS /hbase and set its permissions to global (777). root>hdfs dfs -mkdir /hbase hdfs dfs -chmod -R 777 /hbase As root user increase the maximum number of file handles in the /etc/security/limits.conf file. Set the following ulimit for hdfs and hbase users. hdfs - nofile 32768 hbase - nofile 32768 Start the HBase nodes Zookeeper, Master and Regionserver. hbase-daemon.sh start zookeeper hbase-daemon.sh start master hbase-daemon.sh start regionserver The jps command should list the HDFS and HBase nodes as started. Start the HBase shell with the following command. hbase shell Create a table ( flume ) and a column family ( orcllog ) with the following command. create 'flume' , 'orcllog' The HBase table gets created. Configuring Flume Agent for HBase In this section we shall set the Flume agent configuration in the flume.conf file. We shall configure the following properties in flume.conf for a Flume agent called hbase-agent . Configuration Property Description Value hbase-agent.channels The Flume agent channels. We shall be using only channel called ch1 (the channel name is arbitrary). hbase-agent.channels=ch1 hbase-agent.sources The Flume agent sources. We shall be using one source of type exec called tail (the source name is arbitrary). hbase-agent.sources=tail hbase-agent.sinks The Flume agent sinks. We shall be using one sink of type HBaseSink called sink1 (the sink name is arbitrary). hbase-agent.sinks=sink1 hbase-agent.channels.ch1.type The channel type is memory. hbase-agent.channels.ch1.type=memory hbase-agent.sources.tail.channels Define the flow by binding the source to the channel. hbase-agent.sources.tail.channels=ch1 hbase-agent.sources.tail.type Specify the source type as exec. hbase-agent.sources.tail.type=exec hbase-agent.sources.tail.command Runs the specified Unix command and produce data on stdout. Commonly used commands are the HDFS shell commands cat and tail for copying a complete log file or the last KB of a log file to stdout. We shall be demonstrating both of these commands. hbase-agent.sources.tail.command = tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log or hbase-agent.sources.tail.command = cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log hbase-agent.sinks.sink1.channel Define the flow by binding the sink to the channel. hbase-agent.sinks.sink1.channel=ch1 hbase-agent.sinks.sink1.type Specify the sink type as HbaseSink or AsyncHbaseSink hbase-agent.sinks.sink1.type=org.apache.flume.sink.hbase. HbaseSink hbase-agent.sinks.sink1.table Specify the HBase table name. hbase-agent.sinks.sink1.table=flume hbase-agent.sinks.sink1.columnFamily Specify the HBase table column family hbase-agent.sinks.sink1.columnFamily =orcllog hbase-agent.sinks.sink1.column Specify the HBase table column family column. ?? hbase-agent.sinks.sink1.column=c1 hbase-agent.sinks.sink1.serializer Specify the HBase event serializer class. The serializer converts a Flume event into one or more puts and or increments. hbase-agent.sinks.sink1.serializer= org.apache.flume.sink.hbase. SimpleHbaseEventSerializer hbase-agent.sinks.sink1.serializer. payloadColumn A parameter to the serializer. Specifies the payload column, the column into which the payload data is stored. hbase-agent.sinks.sink1.serializer. payloadColumn =coll hbase-agent.sinks.sink1.serializer. keyType A parameter to the serializer. Specifies the key type. hbase-agent.sinks.sink1.serializer. keyType = timestamp hbase-agent.sinks.sink1.serializer. incrementColumn A parameter to the serializer. Specifies the column to be incremented. The SimpleHbaseEventSerializer may optionally be set to increment a column in HBase. hbase-agent.sinks.sink1.serializer. incrementColumn=coll hbase-agent.sinks.sink1.serializer. rowPrefix A parameter to the serializer. Specifies the row prefix to be used. hbase-agent.sinks.sink1. serializer.rowPrefix=1 hbase-agent.sinks.sink1.serializer.suffix A parameter to the serializer. One of the following values may be set: uuid random timestamp hbase-agent.sinks.sink1. serializer.suffix=timestamp The flume.conf file is listed: hbase-agent.sources=tail hbase-agent.sinks=sink1 hbase-agent.channels=ch1 hbase-agent.sources.tail.type=exec hbase-agent.sources.tail.command=tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log hbase-agent.sources.tail.channels=ch1 hbase-agent.sinks.sink1.type=org.apache.flume.sink.hbase.HBaseSink hbase-agent.sinks.sink1.channel=ch1 hbase-agent.sinks.sink1.table=flume hbase-agent.sinks.sink1.columnFamily=orcllog hbase-agent.sinks.sink1.column=c1 hbase-agent.sinks.sink1.serializer= org.apache.flume.sink.hbase.SimpleHbaseEventSerializer hbase-agent.sinks.sink1.serializer.payloadColumn=coll hbase-agent.sinks.sink1.serializer.keyType = timestamp hbase-agent.sinks.sink1.serializer.incrementColumn=coll hbase-agent.sinks.sink1.serializer.rowPrefix=1 hbase-agent.sinks.sink1.serializer.suffix=timestamp hbase-agent.channels.ch1.type=memory The alternative source exec command is as follows. hbase-agent.sources.tail.command=cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log Running the Flume Agent In this section we shall run the Flume agent to stream the last KB in the alert_ORCL.log file to HBase using the tail command. We shall also stream the complete alert log file alert_ORCL using the cat command. Run the Flume agent using the flume-ng shell script in which specify the agent name using the –n option, the configuration directory using the –conf option and the configuration file using the –f option. Specify the Flume logger Dflume.root.logger as INFO,console to log at INFO level to the console. Run the following command to run the Flume agent hbase-agent . flume-ng agent --conf $FLUME_HOME/conf/ -f $FLUME_HOME/conf/flume.conf -n hbase-agent -Dflume.root.logger=INFO,console HBase libraries get included for HBase access. The source and sink get started. The flume log output provides more detail of the Fume agent command. 05 Dec 2014 22:20:57,147 INFO [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61) - Configuration provider starting 05 Dec 2014 22:20:57,194 INFO [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133) - Reloading configuration file:/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume.conf 05 Dec 2014 22:20:57,214 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) - Processing:sink1 (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140) - Post-validation flume configuration contains configuration for agents: [hbase-agent] 05 Dec 2014 22:20:57,502 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150) - Creating channels 05 Dec 2014 22:20:57,529 INFO [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40) - Creating instance of channel ch1 type memory 05 Dec 2014 22:20:57,543 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205) - Created channel ch1 05 Dec 2014 22:20:57,545 INFO [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39) - Creating instance of source tail, type exec 05 Dec 2014 22:20:57,570 INFO [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40) - Creating instance of sink: sink1, type: org.apache.flume.sink.hbase.HBaseSink 05 Dec 2014 22:20:58,218 INFO [conf-file-poller-0] (org.apache.flume.sink.hbase.HBaseSink.configure:218) - The write to WAL option is set to: true 05 Dec 2014 22:20:58,223 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.getConfiguration:119) - Channel ch1 connected to [tail, sink1] 05 Dec 2014 22:20:58,238 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:138) - Starting new configuration:{ sourceRunners:{tail=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:tail,state:IDLE} }} sinkRunners:{sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@a21d88 counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} } 05 Dec 2014 22:20:58,240 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:145) - Starting Channel ch1 05 Dec 2014 22:20:58,372 INFO [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:119) - Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean. 05 Dec 2014 22:20:58,373 INFO [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:95) - Component type: CHANNEL, name: ch1 started 05 Dec 2014 22:20:58,373 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:173) - Starting Sink sink1 05 Dec 2014 22:20:58,375 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:184) - Starting Source tail 05 Dec 2014 22:20:58,376 INFO [lifecycleSupervisor-1-3] (org.apache.flume.source.ExecSource.start:163) - Exec source starting with command:tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace 05 Dec 2014 22:20:58,396 INFO [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:119) - Monitored counter group for type: SOURCE, name: tail: Successfully registered new MBean. 05 Dec 2014 22:20:58,397 INFO [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:95) - Component type: SOURCE, name: tail started Scanning HBase Table In this section we shall scan the HBase table after running the Flume agent each time; after running the tail –f command and after running the cat command. Run the following command in HBase shell to scan the HBase table flume . scan flume The Oracle log file data streamed into HBase gets listed. Run the scan flume command again after running the Flume agent with the cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log command. More rows get listed as the complete Oracle log file is streamed. ChannelException If the channel capacity gets exceeded while the Flume agent is streaming events the following exception may be generated. : java.lang.InterruptedException org.apache.flume.ChannelException: Unable to put batch on required channel: Caused by: org.apache.flume.ChannelException: Space for commit to queue couldn't be acquired Sinks are likely not keeping up with sources, or the buffer size is too tight A subsequent scan of the HBase table would result in fewer rows getting listed than would get streamed if the complete log file got streamed without an exception. To avoid the exception increase the default queue size with the following configuration property in flume.conf . hbase-agent.channels.ch1.capacity = 100000 In this tutorial we streamed Oracle Database logs to HBase using Flume.
↧
↧
Wiki Page: Why dirty buffers are flushed to disk whenever object is dropped ??
Sometimes when we drop objects it may take brief period of time to complete the operation. One of the many activities performed by Oracle during this duration is to flush all the dirty buffers related to this object on to the disk. We can check this behavior by tracing the session, in 11.1 "enq: RO - fast object reuse" will be the wait event which indicates performing object level checkpoint to flush dirty blocks related to dropped object, in 11.2 the event would be "enq: CR – block range reuse ckpt". But why do we really need blocks related to dropped object on the disk? Why can't Oracle just discard the cached blocks in buffer cache related to dropped objects and reuse them? This seems to be unusual intially, but its really interesting to find out the rationale behind this. Also this leads to many interesting observations. Let me try to explain the rationale behind this by simulating few cases... Case 1:- (Drop table without purge) What happens if we try to drop an TABLE when other sessions are using it at the same time ? Session 1: Create the table TAB1. SQL> create table TAB1 as 2 select rownum id, rpad('x',100) c1, rpad('y',200) c2 3 from dba_objects 4 / Table created. Session 2: Start long running query acessing table TAB1 SQL> begin 2 for i in (select c1 from TAB1) 3 loop 4 dbms_lock.sleep(0.01); 5 end loop; 6 end; 7 / Session 1: Drop table TAB1 and flush buffer cache SQL> drop table TAB1; Table dropped. SQL> alter system flush buffer_cache; System altered. Session 2: Query keeps running by re-reading the blocks from the disk. SQL> begin 2 for i in (select c1 from TAB1) 3 loop 4 dbms_lock.sleep(0.01); 5 end loop; 6 end; 7 / .. .... SQL> PL/SQL procedure successfully completed. As you can see when session 2 is retrieving the rows from the table TAB1,session 1 was able to drop the table at the same time. Intrestingly session 1 was able to retirve all the records from the table even after the table was dropped. Case 1:- (Drop table with purge) What happens if we try to drop an TABLE with purge option when other sessions are using it at the same time ? Session 1: Create the table TAB1. SQL> create table TAB1 as 2 select rownum id, rpad('x',100) c1, rpad('y',200) c2 3 from dba_objects 4 / Table created. Session 2: Start long running query acessing table TAB1 SQL> begin 2 for i in (select c1 from TAB1) 3 loop 4 dbms_lock.sleep(0.01); 5 end loop; 6 end; 7 / Session 1: Drop table TAB1 with purge option and flush buffer cache SQL> drop table TAB1 purge; Table dropped. Session 2: Query crashes with error ORA-08103 SQL> begin 2 for i in (select c1 from TAB1) 3 loop 4 dbms_lock.sleep(0.01); 5 end loop; 6 end; 7 / begin * ERROR at line 1: ORA-08103: object no longer exists ORA-06512: at line 2 As you can see when session 2 is retrieving the rows from the table TAB1,session 1 was able to drop the table with purge option at the same time. Intrestingly session 1 crashed with ORA-08103. Case 2:- (Drop index) What happens when we try to DROP/REBUILD an index when other sessions are accessing it at the same time ? Session 1: Create the table TAB1 and index IDX1. SQL> create table TAB1 as 2 select rownum id, rpad('x',100) c1, rpad('y',200) c2 3 from dba_objects 4 / Table created. SQL> create index IDX1 on TAB1(id); Index created. Session 2: Start long running query which uses index IDX1 SQL> begin 2 for i in (select /*+ index(TAB1) c1 from TAB1 where id > 0) 3 loop 4 dbms_lock.sleep(0.01); 5 end loop; 6 end; 7 / Session 1: Drop index IDX1 and flush buffer cache SQL> drop index IDX1; Index dropped. SQL> alter system flush buffer_cache; System altered. Session 2: Query keeps running by re-reading the blocks from the disk. SQL> begin 2 for i in (select /*+ index(TAB1) */ c1 from TAB1 where id > 0) 3 loop 4 dbms_lock.sleep(0.01); 5 end loop; 6 end; 7 / ... .... SQL> PL/SQL procedure successfully completed. Session 2 was able to walk through the index when at the same time session 1 was able to drop the index. In case of rebuilding the index, no matter how we Rebuild index(Online or Offline) the results will be same. Case 2:- (Drop index and create new segment to reuse the dropped index space) What happens when we try to DROP/REBUILD an index and immedaitely create new table segment TAB2 when other sessions are accessing it at the same time ? Session 1: Create the table TAB1 and index IDX1. SQL> create table TAB1 as 2 select rownum id, rpad('x',100) c1, rpad('y',200) c2 3 from dba_objects 4 / Table created. SQL> create index IDX1 on TAB1(id); Index created. Session 2: Start long running query which uses index IDX1 SQL> begin 2 for i in (select /*+ index(TAB1) */ c1 from TAB1 where id > 0) 3 loop 4 dbms_lock.sleep(0.01); 5 end loop; 6 end; 7 / Session 1: Drop index IDX1 and create new segment so that dropped index space is reused. SQL> drop index IDX1; Index dropped. SQL> create table TAB2 as select * from TAB1; Session 2: Query crashes with error SQL> begin 2 for i in (select /*+ index(TAB1) c1 from TAB1 where id > 0) 3 loop 4 dbms_lock.sleep(0.01); 5 end loop; 6 end; 7 / begin * ERROR at line 1: ORA-08103: object no longer exists ORA-06512: at line 2 In this case session 2 crashes with error ORA-08103 after creating new table TAB2 followed by dropping of index IDX1. Purpose of creating new table TAB2 is to overwrite the space used by the dropped index IDX1. Caveat of testing:- We have to be careful when we perform tests, in above cases if my table segment is having less than 11 extents then select statement will not fail even if segment is dropped using purge option. Usually select statement will revisit the segment header to get extent map information once after processing every 11 extents, so if segment is having more than 11 extents and if it is dropped using purge option then purge option will update the header contents due to which when select tries to revisit segment header to get next set of extents details after reading 11 extents it will fail. Dumping TAB1 segment header block will reveal the details of actions performed by purge option. Below is the snippet of TAB1 segment header block dump. Before drop: -------------------------------------------------------- Segment Type: 1 nl2: 1 blksz: 8192 fbsz: 0 L2 Array start offset: 0x00001434 First Level 3 BMB: 0x00000000 L2 Hint for inserts: 0x010002f9 Last Level 1 BMB: 0x01005201 Last Level II BMB: 0x010002f9 Last Level III BMB: 0x00000000 Map Header:: next 0x00000000 #extents: 21 obj#: 18566 flag: 0x10000000 After drop without Purge: -------------------------------------------------------- Segment Type: 1 nl2: 1 blksz: 8192 fbsz: 0 L2 Array start offset: 0x00001434 First Level 3 BMB: 0x00000000 L2 Hint for inserts: 0x010002f9 Last Level 1 BMB: 0x01005201 Last Level II BMB: 0x010002f9 Last Level III BMB: 0x00000000 Map Header:: next 0x00000000 #extents: 21 obj#: 18566 flag: 0x10000000 After drop with Purge: -------------------------------------------------------- Segment Type: 1 nl2: 1 blksz: 8192 fbsz: 0 L2 Array start offset: 0x00001434 First Level 3 BMB: 0x00000000 L2 Hint for inserts: 0x010002f9 Last Level 1 BMB: 0x01005201 Last Level II BMB: 0x010002f9 Last Level III BMB: 0x00000000 Map Header:: next 0x00000000 #extents: 1 obj#: 18566 flag: 0x12000000 Before drop and after drop of the table without purge option has not touched the segment header, but the segment header has been updated to only one extent when table is dropped using purge option. In Case 2 when I dropped the index select query kept running without any issues, but when I created new segment TAB2 immediately after dropping index IDX1 select query crashed as the blocks were reused/overwritten by the new segment. Results will be the same even in case of index rebuild, it doesn't matter how index is rebuilt online/offline. CROSS DDL CONSISTENCY:- This feature was implemented way long back in Oracle 8i when the partition-exchange feature was introduced. Intention behind this implemetation was to allow sessions to select data and to not interrupt when we perform partition maintenance tasks. This is why all the dirty blocks related to dropped objects are flushed to disk so that any session which are already accessing this dropped object should be able to adhere CONSISTENCY property of Oracle database and in-turn Oracle supports it by providing the dirty blocks they need to read by flushing them to disk. Its important to remember that whenever index lookup are done by the queries and when it finds that the index block doesn't contain the data its expecting for the related object id, query will crash with error "ORA-01410: invalid rowid". Also its important to note that this feature is not applicable for Truncate statement. In the above simulated cases when I create new table immediately after dropping TABLE/INDEX session 1 was not able to read the dropped blocks from the disk as those blocks were overwritten/used by demanding new table TAB2. When "drop purge" is used all the dirty data blocks will be flushed from the buffer cache and the segment header block will be updated and breaks the "Cross DDL Consistency" feature, since the query re-reads the segment header block after every few scans, it gets to a point where it discovers that the table has been dropped due to which query will crash. This feature has been leveraged in 12c for new feature out-of-place Materialized View complete refresh. Conclusion:- Occasionally in production database sessions running queries may crash without any clue with below error messages, in such scenarios we can easily try to debug and find whether it is due to failure of "CROSS-DDL CONSISTENCY" feature for the reasons explained above. ORA-08103: object no longer exists ORA-01410: invalid rowid
↧
Blog Post: Help Celebrate One Million Answers at the PL/SQL Challenge!
Sometime in the next couple of months, someone will submit the 1,000,000th answer on the PL/SQL Challenge. And Chris Saxon, member of the AskTom Answer Team, SQL Wizard and Database Design Quizmaster, assures me that he will write a query to identify exactly who that person is (will be). That's a very nice milestone, so we figured it will be a good time to celebrate. We are still sorting out precisely what we will do, but for sure, we will want to feature stories from our players: How the PL/SQL Challenge has helped you in your career High points (or low points) from your activity on the site (I have low points, which I will share, hopefully none of you do!) Interesting stories from the last six years of answering and commenting on quizzes Whatever else comes to mind! You are welcome to post your stories here or the PL/SQL Challenge blog or email them directly to steven.feuerstein@oracle.com. I will be collating them for publication on the site. I would also be very pleased to do some video interviews of players, reviewers, etc. After all, we do live in the Age of YouTube. So if you are open to or interested in that, let me know. Thanks! Steven Feuerstein
↧
Blog Post: Retaining Previous Agent Images, the Why and the How
I appreciate killing two birds with one stone. I’m all about efficiency and if I can do satisfy more than one task with a simple productive process, then I’m going to do it. Today, I’m about to: Show you why you should have a backup copy of previous agent software and how to do this. Create a documented process to restore previous images of an agent to a target host. Create the content section for the Collaborate HOL on Gold Images and make it reproducible. Create a customer demonstration of Gold Image AND publish a blog post on how to do it all. I have a pristine Enterprise Manager 13c environment that I’m working in. To “pollute” it with a 12.1.0.5 or earlier agent seems against what anyone would want to do in a real world EM, but there may very well reasons for having to do so: A plugin or bug in the EM13c agent requires a previous agent version to be deployed. A customer wants to see a demo of the EM13c gold agent image and this would require a host being monitored by an older, 12c agent. Retaining Previous Agent Copies It would appear to be a simple process. Let’s say you have the older version of the agent you wish to deploy in your software repository. You can access the software versions in your software library by clicking on Setup, Extensibility, Self-Update . Agent Software is the first in our list, so it’s already highlighted, but otherwise, click in the center of the row, where there’s no link and then click on Actions and Open to access the details on what Agent Software you have downloaded to your Software Library . If you scroll down, considering all the versions of agent there are available, you can see that the 12.1.0.5 agent for Linux is already in the software library. If we try to deploy it from Cloud Control, we notice that no version is offered, only platform, which means the latest, 13.1.0.0.0 will be deployed, but what if we want to deploy an earlier one? Silent Deploy of an Agent The Enterprise Manager Command Line Interface, (EMCLI) offers us a lot more control over what we can request, so let’s try to use the agent from the command line. Log into the CLI from the OMS host, (or another host with EMCLI installed.) [oracle@em12 bin]$ ./emcli login -username=sysman Enter password : Login successful First get the information about the agents that are stored in the software library: [oracle@em12 bin]$ ./emcli get_supportedplatforms Error: The command name "get_supportedplatforms" is not a recognized command. Run the "help" command for a list of recognized commands. You may also need to run the "sync" command to synchronize with the current OMS. [oracle@em12 bin]$ ./emcli get_supported_platforms ----------------------------------------------- Version = 12.1.0.5.0 Platform = Linux x86-64 ----------------------------------------------- Version = 13.1.0.0.0 Platform = Linux x86-64 ----------------------------------------------- Platforms list displayed successfully. I already have the 13.1.0.0.0 version. I want to export the 12.1.0.5.0 to a zip file to be deployed elsewhere: [oracle@em12 bin]$ ./emcli get_agentimage -destination=/home/oracle/125 -platform="Platform = Linux x86-64" -version=12.1.0.5.0 ERROR:You cannot retrieve an agent image lower than 13.1.0.0.0. Only retrieving an agent image of 13.1.0.0.0 or higher is supported by this command. OK, so much for that idea! So what have we learned here? Use this process to “export” a copy of your previous version of the agent software BEFORE upgrading Enterprise Manager to a new version. Now, lucky for me, I have multiple EM environments and had an EM 12.1.0.5 to export the agent software from using the steps that I outlined above. I’ve SCP’d it over to the EM13c to use to deploy and will retain that copy for future endeavors, but remember, we just took care of task number one on our list. Show you why you should have a backup copy of previous agent software and how to do this. Silent Deploy of Previous Agent Software If we look in our folder, we can see our zip file: [oracle@osclxc ~]$ ls 12.1.0.5.0_AgentCore_226.zip p20299023_121020_Linux-x86-64.zip 20299023 p6880880_121010_Linux-x86-64.zip I’ve already copied it over to the folder I’ll deploy from: scp 12.1.0.5.0_AgentCore_226.zip oracle@host3.oracle.com:/home/oracle/. Now I need to upzip it and update the entries in the response file, (agent.rsp) OMS_HOST=OMShostname.oracle.com EM_UPLOAD_PORT=4890 You can set a new one in the EMCC if you don't know this information. AGENT_INSTANCE_HOME=/u01/app/oracle/product/agent12c AGENT_PORT=3872 b_startAgent=true ORACLE_HOSTNAME=host.oracle.com s_agentHomeName= Now run the shell script, including the argument to ignore the version prerequisite, along with our response file: $./agentDeploy.sh -ignorePrereqs AGENT_BASE_DIR=/u01/app/oracle/product RESPONSE_FILE=/home/oracle/agent.rsp The script should deploy the agent successfully, which will result in the end output from the run: Agent Configuration completed successfully The following configuration scripts need to be executed as the "root" user. #!/bin/sh #Root script to run /u01/app/oracle/core/12.1.0.5.0/root.sh To execute the configuration scripts: 1. Open a terminal window 2. Log in as "root" 3. Run the scripts Agent Deployment Successful. Check that an upload is possible and check the status: [oracle@fs3 bin]$ ./emctl status agent Oracle Enterprise Manager Cloud Control 12c Release 5 Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved. --------------------------------------------------------------- Agent Version : 12.1.0.5.0 OMS Version : 13.1.0.0.0 Protocol Version : 12.1.0.1.0 Agent Home : /u01/app/oracle/product/agent12c Agent Log Directory : /u01/app/oracle/product/agent12c/sysman/log Agent Binaries : /u01/app/oracle/product/core/12.1.0.5.0 Agent Process ID : 2698 Parent Process ID : 2630 You should see your host in your EM13c environment now. OK, that takes care of Number two task: 2. Create a documented process to restore previous images of an agent to a target host. Using a Gold Agent Image From here, we can then demonstrate the EM13c Gold Agent Image effectively. Click on Setup, Manage Cloud Control, Gold Agent Image: Now I’ve already created a Gold Agent Image in this post. Now it’s time to Manage subscriptions, which you can see a link at the center of the page, to the right side. Click on it and then we need to subscribe hosts by clicking on “Subscribe” and add it to the list, (by using the shift or ctrl key, you can choose more than one at a time. As you can see, I’ve added all my agents to the Gold Image Agent as subscriptions and now it will go through and check the version and add it to be managed by the Gold Agent Image . This includes my new host on the 12.1.0.5.0 agent. Keep in mind that a blackout is part of this process for each of these agents for them to be added, so be aware of this step as you refresh and monitor the additions. Once the added host(s) update to show that they’re now available for update, click on the agent you wish to update, (you can choose even one that’s already on the current version…) and click on Update, Current Version. This will use your Current version gold image that its subscribed to and deploy it via an EM job- The job will run for a period of time as it checks everything out, deploys the software and updates the agent, including a blackout so as not to alarm everyone as you work on this task. Once complete, the agent will be upgraded to the same release as your gold agent image you created! Well, with that step, I believe I’ve taken care of the next three items on my list! If you’d like to know more about Gold Agent Images, outside of the scenic route I took you on today, check out the Oracle documentation . Tags: Agent Gold Image , em13c , Enterprise Manager Del.icio.us Facebook TweetThis Digg StumbleUpon Comments: 0 (Zero), Be the first to leave a reply! You might be interested in this: Santa Came Early! Enterprise Manager 13c is OUT! The New Chapter RMOUG 2013, The Conference Director Perspective Oracle Open World- Sunday! ADDM Compare in EM12c Copyright © DBA Kevlar [ Retaining Previous Agent Images, the Why and the How ], All Right Reserved. 2016.
↧
Blog Post: Creando un Gráfico de Barras en Oracle APEX 5.0
Sabemos que los gráficos son muy necesarios a la hora de querer presentar nuestros reportes en nuestras aplicaciones en APEX, ya que le permite al usuario ver los datos de forma más visual y es por ello que hoy quiero mostrarte cuan sencillo es crear un gráfico en APEX. Primero de todo, creamos una página en blanco en nuestra aplicación. Luego, creamos una región de tipo Chart que la llamaremos Gráfico. 1) En la Serie colocamos: a) Nombre: Demo b) Tipo: Barra 2) En Origen: select null link , d.dname etiqueta , COUNT(e.empno) Empleados from emp e , dept d where e.deptno = d.deptno group by d.dname, d.deptno En Atributos del Gráfico 1) Título: Empleados por Departamentos 2) En Diseño a) Ancho: 800 b) Altura: 500 3) En Color de la Serie a) Esquema: Aspecto 1 b) Nivel: Serie c) Tipo de Entramado: No 4) En Eje X a) Título: Departamentos b) Tamaño: 12 En 5) En Eje Y a) Título: Empleados b) Tamaño: 12 6) En Leyenda a) Mostrar: Derecha b) Título: Leyenda c)) Orientación de Elemento: Vertical 7) Guardamos y ejecutamos la página Si queremos mostrar los pilares en distinto color podemos usar en “Color de la Serie” en vez de Nivel igual a Serie usar Nivel igual a Punto, (pero deberemos quitar la leyenda del gráfico ya que no mostrará el nombre de la Serie). Crear Enlace a Reporte Creamos una página en blanco y luego una región de tipo Informe Interactivo que contenga la siguiente consulta SQL de Origen: select e.empno, e.ename, e.job, e.sal, e.comm, e.deptno from emp e, dept d where e.deptno = d.deptno Luego creamos un Elemento de Página Oculto que guardará el número de departamento, y lo llamaremos P2_DEPTNO. Regresamos a la Página 1 (donde tenemos el gráfico) y editamos la consulta SQL de la Serie, reemplazando la misma por la siguiente: select 'f?p=&APP_ID.:2:&APP_SESSION.::NO:RIR:IREQ_DEPTNO:'||d.deptno link , d.dname etiqueta , COUNT(e.empno) Empleados from emp e , dept d where e.deptno = d.deptno group by d.dname, d.deptno Lo que hemos agregado fue el enlace pasándole en la URL el filtro del número de Departamento para el reporte interactivo usando. Cuando hacemos clic en una columna del gráfico, por ejemplo la que corresponde al departamento RESEARCH que tiene 5 empleados, podemos ver que filtra el reporte interactivo mostrando los empleados del departamento seleccionado. En otro post explicaré cómo podemos crear filtros para nuestros reportes interactivos desde la URL. Ver el demo del ejemplo AQUI Hasta Pronto!
↧
↧
Blog Post: Configuring RStudio Server for Oracle R Enterprise
In this blog post I will show you the configurations that are necessary for RStudio Server to work with Oracle R Enterprise on your Oracle Database server. In theory if you have just installed ORE and then RStudio Server, everything should work, but if you encounter any issues then check out the following. Before I get started make sure to check out my previous blog posts on installing R Studio Server. The first blog post was installing and configuring RStudio Server on the Oracle BigDataLite VM . This is an automated install. The second blog post was a step by step guide to installing RStudio Server on your (Oracle) Linux Database Server and how to open the port on the VM using VirtualBox. Right. Let's get back to configuring to work with Oracle R Enterprise. The following assumes you have complete the second blog post mentioned above. 1. Edit the rserver.conf files Add in the values and locations for RHOME and ORACLE_HOME sudo vi /etc/rstudio/rserver.conf rsession-ld-library-path=RHOME/lib:ORACLE_HOME/lib 2. Edit the .Renviron file. Add in the values for ORACLE_HOME, ORACLE_HOSTNAME and ORACLE_SID cd /home/oracle sudo vi .Renviron ORACLE_HOME=ORACLE_HOME ORACLE_HOSTNAME=ORACLE_HOSTNAME ORACLE_SID=ORACLE_SID export ORACLE_HOME export ORACLE_HOSTNAME export ORACLE_SID 3. To access the Oracle R Distribution Add the following to the usr/lib/rstudio-server/R/modules/SessionHelp.R file for the version of Oracle R Distribution you installed prior to installing Oracle R Enterprise. .rs.addFunction( "httpdPortIsFunction", function() { getRversion() >= "3.2" }) You are all done now with all the installations and configurations.
↧
Blog Post: Adding a Database to the Raspberry Pi Zero
This morning, after someone added SQL Developer to a Raspberry Pi 3, Jeff Smith pinged me and the question was posed, as it often is, can you run Oracle database server on a Raspberry Pi, (RPI)? The answer is no, you can’t, as there are no binaries for the ARM processor platform that you can compile, so its nothing to do with power, (although it is low powered for a very good reason…) but really due to the processor type. Hopefully that’ll stop those of you building out massive RPI clusters in hopes of creating their own home servers with these inexpensive computers. With this said, you can run a database, so to show how easy this is, I’ll show you how to install and work with an installation of SQLite on a Raspberry Pi Zero , the version that’s less than 1/2 the size of a credit card. Other people have a beer and take the evening off. Me? I have a beer and start installing stuff on single board computers… Install SQLite Installation is easy. Ensure you have the latest update, so if you haven’t run your update in awhile, run that first from the command line. Remember you must have root or sudo privileges to perform these tasks: $sudo apt-get update Once complete and you know you’re up to date, then simply get and install SQLite: $sudo apt-get install sqlite3 Create a Database Let’s create a database: $sqlite3 mydb.db This creates the database using the mydb.db file as the logical container for it. Note: If you need help at any time, you can type in .help or .show from the sqlite prompt and it will display information similar to man pages in linux. It’s very helpful and user friendly. If you’re out there comparing features, about to complain about all the ways that SQLite is NOT Oracle, well, you can stop right here. On the support page for SQLite is the quote: Small. Fast. Reliable. Choose Any Three. SQLite isn’t trying to be Oracle, but if you need a database and you’d like to put one on a RPI, this is the one to use that is small, fast and reliable. Working with SQLite Of course there are some syntax differences and SQLite has most standard SQL syntax at it’s disposal. Remembering to type BEGIN for the implicit transactions and to COMMIT after each is commonly the biggest challenge. This includes for data dictionary objects, (aka ddl). As this isn’t a full RDBMS client/server database, there aren’t any roles or privileges that reside outside of the OS level privileges to the database file. This database works very well in the support of RPI projects and this is the goal of what I’m hoping to demonstrate here. So let’s start by just creating a table and adding a few rows. begin; create table tbl1(col1_id text, date_c1 date, time_c1 time, range_1 numeric); commit; Now you can insert rows into your new table: begin: insert into tbl1 values('user1', date('now'), time('now'), 12); insert into tbl1 values('user2', date('now'), time('now'), 7); insert into tbl1 values('user3', date('now'), time('now'), 20); commit; You can then select from your new table and the row(s) will be returned, separated by the pipe symbol: select * from tbl1 where col1_id='user2'; user2|2016-03-22|00:12:237 So there you have it. Database software installed- check. Database created- check. Object created and rows added- check. Table queried- check. All done in less than five minutes. If you’d like to learn more, you can check out SQLite’s home page . Tags: Raspberry Pi , Rpi , SQLite Del.icio.us Facebook TweetThis Digg StumbleUpon Comments: 0 (Zero), Be the first to leave a reply! You might be interested in this: ERROR-400|Data will be rejected for upload from agent ‘https://:/emd/main/’, max size limit for direct load exceeded in OEM The Rants of a SQL Server DBA Addressing a Deadlock CBO, Statistics and A Rebel(DBA) With A Cause Weblogic Start Failure: Empty Initial Replica Copyright © DBA Kevlar [ Adding a Database to the Raspberry Pi Zero ], All Right Reserved. 2016.
↧
Blog Post: The Case of the Confusing CASE
This odd little piece of code was featured in the weekly PL/SQL Challenge quiz 12 March - 18 March 2016 . What do you think will be displayed after executing the following block? DECLARE my_flag BOOLEAN; BEGIN CASE my_flag WHEN my_flag IS NULL THEN DBMS_OUTPUT.PUT_LINE ('my_flag is NULL'); WHEN TRUE THEN DBMS_OUTPUT.PUT_LINE ('my_flag is TRUE'); ELSE DBMS_OUTPUT.PUT_LINE ('my_flag is FALSE'); END CASE; END; / At first glance (if you are like me), you would say "my_flag is NULL", right? After all, my_flag is initialized to NULL when declared, and I don't change the value. But, lo and behold, you will see: my_flag is FALSE Curious, right? So what's going on? Well, we have a very confused and confusing piece of code: I have written a simple CASE (which is of the form CASE expression WHEN ...), but then my WHEN clauses follow a typical searched CASE format (CASE WHEN expr1 ... WHEN expr2 ...). CASE is a really wonderful feature in PL/SQL (and many other languages, of course), but you need to make sure you use it properly.
↧
Blog Post: Oracle y la distancia de Levenshtein
La semana pasada asistí a una capacitación de ACL, una herramienta para el análisis de datos muy utilizada por auditores y contadores. Al abordar el tema de funciones, el instructor presentó una función llamada LEVDIST() que permite calcular la distancia entre dos cadenas de caracteres. En ese punto surgió la pregunta: ¿Qué es la distancia entre dos cadenas de caracteres? ¿Cómo se mide? Así fue como me enteré de que la función LEVDIST debe su nombre a la llamada distancia de Levenshtein , también conocida como distancia de edición o distancia entre palabras . La distancia de Levensthein es el número mínimo de operaciones requeridas para transformar una cadena de caracteres en otra. La distancia de Levenshtein debe su nombre al matemático y científico Vladimir Levenshtein que desarrolló, entre otras cosas, el algoritmo que calcula esta distancia. El algoritmo es muy utilizado en programas que necesitan determinar cuán parecidas son dos cadenas de caracteres. Un ejemplo de uso es el caso de los correctores ortográficos. Llegado a este punto me pregunté acerca de la existencia de alguna versión del algoritmo desarrollada en lenguaje PL/SQL. Para mi sorpresa me encontré con el paquete UTL_MATCH , incorporado en la base de datos Oracle a partir de la versión 10g. El paquete UTL_MATCH provee cuatro funciones: EDIT_DISTANCE . Calcula el número de cambios necesarios para transformar una cadena en otra. EDIT_DISTANCE_SIMILARITY . Calcula el número de cambios necesarios para transformar una cadena en otra y retorna un valor entre 0 (sin coincidencia) y 100 (coincidencia perfecta) JARO_WINKLER . Calcula el grado de similitud que existe en dos cadenas JARO_WINKLER_SIMILARITY . Calcula el grado de similitud que existe entre dos cadenas y retorna un valor entre 0 (sin coincidencia) y 100 (coincidencia perfecta) La distancia de Jaro-Winkler también es una medida de similitud entre dos cadenas. Suele ser utilizada con cadenas cortas, tales como nombres de persona. Veamos algunos ejemplos de uso que facilitarán comprender el funcionamiento de la función EDIT_DISTANCE : SQL> select utl_match.edit_distance('hola', 'hola') levdist from dual; LEVDIST ---------- 0 SQL> select utl_match.edit_distance('hola', 'hole') levdist from dual; LEVDIST ---------- 1 SQL> select utl_match.edit_distance('hola', 'bole') levdist from dual; LEVDIST ---------- 2 SQL> select utl_match.edit_distance('hola', 'bote') levdist from dual; LEVDIST ---------- 3 SQL> select utl_match.edit_distance('hola', 'gato') levdist from dual; LEVDIST ---------- 4 SQL> Nos vemos!
↧
↧
Blog Post: AWR Warehouse Fails on Upload- No New Snapshots
This issue can be seen in either EM12c or EM13c AWR Warehouse environments. It occurs when there is a outage on the AWR Warehouse and/or the source database that is to upload to it. The first indication of the problem, is when databases appear to not have uploaded once the environments are back up and running. The best way to see an upload, from beginning to end is to highlight the database you want to load manually, (click in the center of the row, if you click on the database name, you’ll be taken from the AWR Warehouse to the source database’s performance home page.) Click on Actions, Upload Snapshots Now . A job will be submitted and you’ll be aware of it by a notification at the top of the console: Click on the View Job Details and you’ll be taken to the job that will run all steps of the AWR Warehouse ETL- Inspect what snapshots are required by comparing the metadata table vs. what ones are in the source database. Perform a datapump export of those snapshots from the AWR schema and update the metadata tables. Perform an agent to agent push of the file from the source database server to the AWR Warehouse server. Run the datapump import of the database data into the AWR Warehouse repository, partitioning by DBID, snapshot ID or a combination of both. Update support tables in the Warehouse showing status and success. Now note the steps where metadata and successes are updated. We’re now inspecting the job that we’re currently running to update our tables, but instead of success, we see the following in the job logs: We can clearly see that the extract, (ETL step on the source database to datapump the AWR data out) has failed. Scrolling down to the Output, we can see the detailed log to see the error that was returned on this initial step: ORA-20137: NO NEW SNAPSHOTS TO EXTRACT. Per the Source database, in step 1, where it compares the database snapshot information to the metadata table, it has returned no new snapshots that should be extracted. The problem, is that we know on the AWR Warehouse side, (seen in the alerts in section 3 of the console) there are snapshots that haven’t been uploaded in a timely manner. How to Troubleshoot First, let’s verify what the AWR Warehouse believes is the last and latest snapshot that was loaded to the warehouse via the ETL: Log into the AWR Warehouse via SQL*Plus or SQLDeveloper and run the following query, using the CAW_DBID_MAPPING table, which resides in the DBSNMP database: SQL> select target_name, new_dbid from caw_dbid_mapping; TARGET_NAME -------------------------------------------------------------------------------- NEW_DBID ---------- DNT.oracle.com 3695123233 cawr 1054384982 emrep 4106115278 and what’s the max snapshot that I have for the database DNT, the one in question? SQL> select max(dhs.snap_id) from dba_hist_snapshot dhs, caw_dbid_mapping cdm 2 where dhs.dbid=cdm.new_dbid 3 and cdm.target_name='DNT.oracle.com'; MAX(DHS.SNAP_ID) ---------------- 501 The Source These next steps require querying the source database, as we’ve already verified the latest snapshot in the AWR WArehouse and the error occurred on the source environment, along with where it failed at that step in the ETL process. Log into the database using SQL*Plus or another query tool. We will again need privileges to the DBSNMP schema and the DBA_HIST views. SQL> select table_name from dba_tables where owner='DBNSMP' and table_name like 'CAW%'; TABLE_NAME -------------------------------------------------------------------------------- CAW_EXTRACT_PROPERTIES CAW_EXTRACT_METADATA These are the two tables that hold information about the AWR Warehouse ETL process in the source database. There are a number of ways we could inspect the extract data, but the first thing we’ll do is get the last load information from the metadata table, which will tell us what were the SQL> select begin_snap_id, end_snap_id, start_time, end_time, filename from caw_extract_metadata where extract_id=(select max(extract_id) from caw_extract_metadata); 502 524 23-MAR-16 10.43.14.024255 AM 23-MAR-16 10.44.27.319536 AM 1_2EB95980AB33561DE053057AA8C04903_3695123233_502_524.dmp So we can see that per the metadata table, the ETL BELIEVES it’s already loaded the snapshots from 502-524. We’ll now query the PROPERTIES table that tells us where our dump files are EXTRACTED TO: SQL> select * from caw_extract_properties 2 where property_name='dump_dir_1'; dump_dir_1 /u01/app/oracle/product/agent12c/agent_inst ls /u01/app/oracle/product/agent12c/agent_inst/*.dmp 1_2EB95980AB33561DE053057AA8C04903_3695123233_502_524.dmp So here is our problem. We have a dump file that was created, but never performed the agent to agent push or load to the AWR Warehouse. As the source table was updated with the rows to the METADATA table, it now fails to load these rows. Steps to Correct Clean up the dump file from the datapump directory Update the METADATA table Rerun the job cd /u01/app/oracle/product/agent12c/agent_inst rm 1_2EB95980AB33561DE053057AA8C04903_3695123233_502_524.dmp Note: You can also choose to rename the extension in the file if you wish to retain it until you are comfortable that everything is successfully loading, but be aware of size constraints in your $AGENT_HOME directory. I’ve seen issues due to space constraints. Log into the database and remove the latest row update in the metadata table: select extract_id from caw_extract_metadata where being_snap_id=502 and end_snap_id=504; 101 delete from caw_extract_metadata where extract_id=101; 1 row deleted. commit; Log into your AWR Warehouse dashboard and run the manual Upload Snapshots Now for the database again. Tags: AWR Wareshouse , EM12c , em13c , Enterprise Manager Del.icio.us Facebook TweetThis Digg StumbleUpon Comments: 0 (Zero), Be the first to leave a reply! You might be interested in this: Exadata Can't Fix Your Temp Problem Enhancing A Moving Art Project to Beginning Robotics with Raspberry Pi Being Friends with Larry Ellison... :) EM12c's Configuration Topology 12.1.0.5 Hybrid Cloud Cloning Copyright © DBA Kevlar [ AWR Warehouse Fails on Upload- No New Snapshots ], All Right Reserved. 2016.
↧
Blog Post: Gold Agent Image
The Gold Agent Image is going to simplify agent management in EM13c, something a lot of folks are going to appreciate. First step to using this new feature is to create a image to be used as your gold agent standard. This should be the agent that is the newest, most up to date and patched agent that you would like your other agents to match. Managing Gold Images You can access this feature via your cloud control console from the Setup menu, Manage Cloud Control, Gold Agent Images. If it’s the first time you’re accessing this, you’ll want to click on Manage all Images button in the middle, right hand side to begin. The first thing you’ll do is click on Create and the will begin the step to build out your shell for your gold image. The naming convention requires underscores between words and can accept periods, which is great to keep release versions straight. Type in a description, choose the Platform, which pulls from your software library and then click Submit. You’ve now created your first Gold Agent Image for the platform you chose from the drop down before clicking Submit. The Gold Agent Dashboard Now let’s return to Gold Agent Images by clicking on the link that you see above on the left hand side of the above screen. As this environment only has one agent to update, it matches what I have in production and says everything is on the gold agent image. You may want to know where you go from here- There are a number of ways to manage and use Gold Agent Images for provisioning. I’ve covered much of it in this post . You may be less than enthusiastic about all this clicking in the user interface. We can avoid that with incorporating the Enterprise Manager Command Line Interface, (EMCLI) into the mix. The following commands can be issued from any host with the EMCLI installed. Subscribing and Provisioning Via the EMCLI The syntax to subscribe agents to an existing Gold Agent Image from my example from above to two hosts, would be: $ /bin/emcli subscribe_agents -image_name="AgentLinux131000" -agents="host1.us.oracle.com:1832,host2.us.oracle.com:1832" Or if the agents belong to an Admin group, then I could deploy the Gold Agent Image to all the agents in a group by running the following command from the EMCLI on the OMS host: $ /bin/emcli subscribe_agents -image_name="AgentLinux131000" -groups="Admin_dev1,Admin_prod1" The syntax to provision the new gold agent image to a host(s) is: /bin/emcli update_agents -gold_image_series="Agent13100" -version_name="V1" agents="host1.us.oracle.com:1832,host2…" Status’ of provisioning jobs can be checked via the EMCLI, as can other tasks. Please see Oracle’s documentation to see more cool ways to use the command line with the Gold Agent Image feature! Tags: em13c , Enterprise Manager , Gold Agent Image Del.icio.us Facebook TweetThis Digg StumbleUpon Comments: 0 (Zero), Be the first to leave a reply! You might be interested in this: AWR Warehouse Jobs in EM13c Adobe FlashPlayer with Firefox When Setting up EM Express for DB12c RMOUG 2010 Addressing a Deadlock Oracle Open World 2011 Followup Copyright © DBA Kevlar [ Gold Agent Image ], All Right Reserved. 2016.
↧
Blog Post: Secretos del comando DESCRIBE
El comando DESCRIBE no resulta desconocido para aquellos que alguna vez utilizaron SqlPlus. El comando DESCRIBE nos permite obtener la definición de una tabla, vista, sinónimo, función o procedimiento. Si bien es un comando muy simple y popular, existen algunos secretos que muchos desconocen: es posible cambiar el comportamiento del comando DESCRIBE a partir de una serie de configuraciones. Veamos... Para conocer qué comportamientos podemos modificar podemos ejecutar el comando SHOW DESCRIBE : SQL> show describe describe DEPTH 2 LINENUM OFF INDENT ON Vemos que es posible cambiar tres configuraciones: 1) DEPTH 2) LINENUM 3) INDENT Comencemos probando la más simple de todas: LINENUM SQL> desc equipos Name Null? Type ------------------- -------- ---------------------------- CODIGO NOT NULL NUMBER NOMBRE NOT NULL VARCHAR2(30) TIPO VARCHAR2(15) SQL> set describe linenum on SQL> desc equipos Name Null? Type ------------------ -------- ---------------------------- 1 CODIGO NOT NULL NUMBER 2 NOMBRE NOT NULL VARCHAR2(30) 3 TIPO VARCHAR2(15) SQL> set describe linenum off DEPTH e INDENT son de mucha utilidad cuando trabajamos con tipos de objeto (Object Types). Para ejemplificar su uso, he creado dos tipos: PERSONA_TYPE y DOMICILIO_TYPE: SQL> show describe describe DEPTH 1 LINENUM OFF INDENT ON SQL> desc persona_type Name Null? Type ------------------------ -------- --------------------------- CODIGO NUMBER NOMBRE VARCHAR2(20) TELEFONO VARCHAR2(20) DOMICILIO DOMICILIO_TYPE SQL> desc domicilio_type Name Null? Type ----------------------- -------- ---------------------------- CALLE VARCHAR2(20) CIUDAD VARCHAR2(20) PROVINCIA VARCHAR2(20) CODIGO_POSTAL VARCHAR2(10) SQL> Veamos cómo los comandos DEPTH e INDENT facilitan la visualización de estas estructuras con tan solo cambiar una configuración: SQL> set describe depth 2 SQL> desc persona_type Name Null? Type -------------------------- -------- ---------------------------- CODIGO NUMBER NOMBRE VARCHAR2(20) TELEFONO VARCHAR2(20) DOMICILIO DOMICILIO_TYPE CALLE VARCHAR2(20) CIUDAD VARCHAR2(20) PROVINCIA VARCHAR2(20) CODIGO_POSTAL VARCHAR2(10) SQL> Por suerte, el valor por defecto de INDENT es ON. Si lo pongo en OFF, la visualización se hace un poco más complicada: SQL> set describe indent off SQL> desc persona_type Name Null? Type --------------------------- -------- ---------------------------- CODIGO NUMBER NOMBRE VARCHAR2(20) TELEFONO VARCHAR2(20) DOMICILIO DOMICILIO_TYPE CALLE VARCHAR2(20) CIUDAD VARCHAR2(20) PROVINCIA VARCHAR2(20) CODIGO_POSTAL VARCHAR2(10) SQL> Nos vemos!
↧