In an earlier article we discussed Storing and Querying XML in Oracle NoSQL Database with OXH 3.0.0 . In this article we shall store XML as Avro records in Oracle NoSQL Database and subsequently access the collection of Avro records. We shall load the Avro record values into Oracle Database 11g. Avro is a schema (JSON schemas) based data serialization system with data being stored in a binary format container file with support for varied data structures. Oracle XML Query for Hadoop’s Oracle NoSQL Database adapter provides support for storing and accessing Avro records using the following built-in functions. Function Signature Description kv:collection-avroxml declare %kv:collection("avroxml") function kv:collection-avroxml($parent-key as xs:string?) as element()* external; Accesses a collection of values in Oracle NoSQL Database with each value being parsed as an Avro record and returned as an XML document object. kv:get-avroxml declare %kv:get("avroxml") function kv:get-avroxml($key as xs:string) as element()? external; Gets the value associated with a specified key. The value is parsed as an Avro record and returned as an XML element. Custom functions for accessing a collection of Avro values and getting a specific Avro value are also supported. While a built-in function for storing Avro records is not provided the following custom function for storing Avro is supported. declare %kv:put("avroxml") function local:myFunctionName($key as xs:string, $xml as node()) external; The following annotations may be used with the preceding custom function. Annotation Description %kv:put("method") Stores XML as an Avro record. %avro:schema-kv("schema-name") Specifies the fully qualified name of a schema in the Oracle NoSQL Database catalog to be used to validate an Avro record stored in the database. This article has the following sections. Setting the Environment Adding Avro Schema to Oracle NoSQL Database Adding Text File to HDFS Storing XML as Avro in Oracle NoSQL Database Querying Avro in Oracle NoSQL Database Setting the Environment The environment is the same as in the preceding tutorials on OXH’s Oracle NoSQL Database adapter. Set the following env variables in the bash shell and the hadoop-env.sh . export HADOOP_PREFIX=/oxh/hadoop-2.0.0-cdh4.6.0 export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1 export ORACLE_SID=ORCL export KVHOME=/oranosql/kv-3.0.5 export OLH_HOME=/oxh/oraloader-3.0.0-h2 export OXH_HOME=/oxh/oxh-3.0.0-cdh4.6.0 export HADOOP_MAPRED_HOME=/oxh/hadoop-2.0.0-cdh4.6.0/bin export HADOOP_HOME=/oxh/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2 export HADOOP_CLASSPATH=$HADOOP_HOME/lib/*:$OLH_HOME/jlib/*:$OXH_HOME/lib/*:$KVHOME/lib/* export PATH=$PATH:$HADOOP_HOME:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME:$ORACLE_HOME/bin Format the NameNode and start the HDFS , NameNode and DataNode with the following commands. hadoop namenode -format hadoop namenode hadoop datanode Create an Oracle Database 11g table OE.WLSSERVER with the following SQL script. CREATE TABLE wlsserver (timestamp VARCHAR2(55), category VARCHAR2(15), type VARCHAR2(55), servername VARCHAR2(15), code VARCHAR2(15), msg VARCHAR2(255)) The structure of the database table is listed with the DESC OE.WLSSERVER command in SQL Plus. Adding Avro Schema to Oracle NoSQL Database The Avro records in Oracle NoSQL Database are based on a JSON format schema. Create the following schema wlslog.avsc in the /olh directory to store Avro records. { "type": "record", "name": "WLSLogRec", "namespace": "wls.log", "fields" : [ {"name": "timestamp", "type": "string","default" : "NONE"}, {"name": "category", "type": "string","default" : "NONE" }, {"name": "type", "type": "string","default" : "NONE"}, {"name": "servername", "type": "string","default" : "NONE"}, {"name": "code", "type": "string","default" : "NONE"}, {"name": "msg", "type": "string","default" : "NONE"} ] } Create a lightweight Oracle NoSQL Database instance with the following command. java -jar /oranosql/kv-3.0.5/lib/kvstore.jar kvlite A Oracle NoSQL Database store called kvstore gets created. Start the Oracle NoSQL Database command line interface (CLI) with the following command. java -Xmx256m -Xms256m -jar /oranosql/kv-3.0.5/lib/kvstore.jar runadmin -port 5000 -host localhost The kv> prompt gets displayed. Run the following command in the kv> prompt to add the wlslog.avsc schema to the Oracle NoSQL Database catalog . ddl add-schema -file wlslog.avsc The schema gets added. Adding Text File to HDFS We shall creating Avro records in Oracle NoSQL Database from XML elements. The data for the XML elements is obtained from a text file. Create the following text file wlslog.txt , which has WebLogic server log with ‘,’ separated data for one Avro record on each line. Apr-8-2014-7:06:16-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000365,Server state changed to STANDBY Apr-8-2014-7:06:17-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000365,Server state changed to STARTING Apr-8-2014-7:06:18-PM-PDT,Notice,Log Management,AdminServer,BEA-170027,The Server has established connection with the Domain level Diagnostic Service successfully Apr-8-2014-7:06:19-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000365,Server state changed to ADMIN Apr-8-2014-7:06:20-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000365,Server state changed to RESUMING Apr-8-2014-7:06:21-PM-PDT,Notice,Server,AdminServer,BEA-002613,Channel Default is now listening on fe80:0:0:0:0:5efe:c0a8:147:7001 for protocols iiop t3ldap snmp http Apr-8-2014-7:06:22-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000331,Started WebLogic Admin Server AdminServer for domain base_domain running in Development Mode Apr-8-2014-7:06:23-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000365,Server state changed to RUNNING Apr-8-2014-7:06:24-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000360,Server started in RUNNING mode Store the wlslog.txt file in the HDFS with the following command. hadoop dfs -put wlslog.txt hdfs://10.0.2.15:8020/wls Storing XML as Avro in Oracle NoSQL Database In this section we shall create Avro records from XML. Create a query script txt_oranosql.xq in which import the oxh:kv and oxh:text modules. The %kv:put("avroxml") annotation specifies that a Avro record is to be stored created from XML. The %avro:schema-kv annotation specifies the schema to be used as wls.log.WLSLogRec . Specify a custom function for storing an avroxml record. Access the text file in HDFS from which XML elements are to be created using the text:collection($uris) function. Tokenize the text file and create the key String from the first split prepended with “/logentry/avro”. Use the custom function to store a key value pair for each line in the text file with the value for the record created as an XML element with sub-elements for timestamp , category , type , servername , code and msg . The query script txt_oranosql.xq is listed: import module "oxh:text"; import module "oxh:kv"; declare %kv:put("avroxml") %avro:schema-kv("wls.log.WLSLogRec") function local:put-logentry($key as xs:string, $value as node()) external; for $line in text:collection("/wls/wlslog.txt") let $split := fn:tokenize($line, ",") let $timestamp := $split[1] let $key := "/logentry/avro/" || $timestamp return local:put-logentry( $key, {$split[1]} {$split[2]} {$split[3]} {$split[4]} {$split[5]} {$split[6]} ) We also need to provide the connection parameters for accessing Oracle NoSQL Database, either on the command line with –Dproperty=value format or in a XML configuration file. Create the following XML configuration file oxh_config.xml in which specify the oracle.kv.hosts property for the Oracle NoSQL Database host:port pairs and the oracle.kv.kvstore property for the KV store name. oracle.kv.hosts localhost:5000 oracle.kv.kvstore kvstore Next, run the following hadoop command to process the query script with the oxh_config.xml configuration file specified with the –conf option. hadoop jar $OXH_HOME/lib/oxh.jar -conf oxh_config.xml txt_oranosql.xq Oracle XQuery for Hadoop 3.0.0 gets started. The query script gets processed by a MapReduce application and Avro records get stored in Oracle NoSQL Database. The output from the hadoop command is listed: [root@localhost oxh]# hadoop jar $OXH_HOME/lib/oxh.jar -conf oxh_config.xml txt_oranosql.xq 14/05/24 16:15:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/05/24 16:15:28 INFO hadoop.xquery: OXH: Oracle XQuery for Hadoop 3.0.0 (build 3.0.0-cdh4.6.0-mr1 @mr2). Copyright (c) 2014, Oracle. All rights reserved. 14/05/24 16:15:29 INFO hadoop.xquery: Executing query "txt_oranosql.xq". Output path: "hdfs://10.0.2.15:8020/tmp/oxh-root/output" 14/05/24 16:15:36 INFO hadoop.xquery: Submitting map-reduce job "oxh:txt_oranosql.xq#0" id="7cd3f7c6-751a-4a92-83b1-7a90a531c864.0", inputs=[hdfs://10.0.2.15:8020/wls/wlslog.txt], output=hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/7cd3f7c6-751a-4a92-83b1-7a90a531c864.0 14/05/24 16:15:37 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 14/05/24 16:15:37 INFO input.FileInputFormat: Total input paths to process : 1 14/05/24 16:15:38 INFO mapreduce.JobSubmitter: number of splits:1 14/05/24 16:15:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1710401567_0001 14/05/24 16:15:49 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 14/05/24 16:15:49 INFO hadoop.xquery: Waiting for map-reduce job oxh:txt_oranosql.xq#0 14/05/24 16:15:49 INFO mapreduce.Job: Running job: job_local1710401567_0001 14/05/24 16:15:49 INFO mapred.LocalJobRunner: OutputCommitter set in config null 14/05/24 16:15:49 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 14/05/24 16:15:49 INFO mapred.LocalJobRunner: Waiting for map tasks 14/05/24 16:15:49 INFO mapred.LocalJobRunner: Starting task: attempt_local1710401567_0001_m_000000_0 14/05/24 16:15:49 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 14/05/24 16:15:49 INFO mapred.MapTask: Processing split: hdfs://10.0.2.15:8020/wls/wlslog.txt:0+1120 14/05/24 16:15:50 INFO mapreduce.Job: Job job_local1710401567_0001 running in uber mode : false 14/05/24 16:15:50 INFO mapreduce.Job: map 0% reduce 0% 14/05/24 16:15:51 INFO mapred.LocalJobRunner: 14/05/24 16:15:51 INFO mapred.Task: Task:attempt_local1710401567_0001_m_000000_0 is done. And is in the process of committing 14/05/24 16:15:51 INFO mapred.LocalJobRunner: 14/05/24 16:15:51 INFO mapred.Task: Task attempt_local1710401567_0001_m_000000_0 is allowed to commit now 14/05/24 16:15:51 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1710401567_0001_m_000000_0' to hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/7cd3f7c6-751a-4a92-83b1-7a90a531c864.0/_temporary/0/task_local1710401567_0001_m_000000 14/05/24 16:15:51 INFO mapred.LocalJobRunner: map 14/05/24 16:15:51 INFO mapred.Task: Task 'attempt_local1710401567_0001_m_000000_0' done. 14/05/24 16:15:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local1710401567_0001_m_000000_0 14/05/24 16:15:51 INFO mapred.LocalJobRunner: Map task executor complete. 14/05/24 16:15:52 INFO mapreduce.Job: map 100% reduce 0% 14/05/24 16:15:52 INFO mapreduce.Job: Job job_local1710401567_0001 completed successfully 14/05/24 16:15:52 INFO mapreduce.Job: Counters: 23 File System Counters FILE: Number of bytes read=12538 FILE: Number of bytes written=19161802 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=18763377 HDFS: Number of bytes written=1714 HDFS: Number of read operations=206 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Map-Reduce Framework Map input records=9 Map output records=0 Input split bytes=101 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=241 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=28319744 File Input Format Counters Bytes Read=1120 File Output Format Counters Bytes Written=0 14/05/24 16:15:52 INFO hadoop.xquery: Submitting map-reduce job "oxh:txt_oranosql.xq#1" id="7cd3f7c6-751a-4a92-83b1-7a90a531c864.1", inputs=[hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/7cd3f7c6-751a-4a92-83b1-7a90a531c864.0/OXHI0xputlogentry-m-00000] 14/05/24 16:15:52 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 14/05/24 16:15:52 INFO input.FileInputFormat: Total input paths to process : 1 14/05/24 16:15:52 INFO mapreduce.JobSubmitter: number of splits:1 14/05/24 16:15:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1515682078_0002 14/05/24 16:16:01 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 14/05/24 16:16:01 INFO hadoop.xquery: Waiting for map-reduce job oxh:txt_oranosql.xq#1 14/05/24 16:16:01 INFO mapreduce.Job: Running job: job_local1515682078_0002 14/05/24 16:16:01 INFO mapred.LocalJobRunner: OutputCommitter set in config null 14/05/24 16:16:01 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.NullOutputFormat$2 14/05/24 16:16:01 INFO mapred.LocalJobRunner: Waiting for map tasks 14/05/24 16:16:01 INFO mapred.LocalJobRunner: Starting task: attempt_local1515682078_0002_m_000000_0 14/05/24 16:16:01 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 14/05/24 16:16:01 INFO mapred.MapTask: Processing split: hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/7cd3f7c6-751a-4a92-83b1-7a90a531c864.0/OXHI0xputlogentry-m-00000:0+1714 14/05/24 16:16:01 INFO mapred.LocalJobRunner: 14/05/24 16:16:01 INFO mapred.Task: Task:attempt_local1515682078_0002_m_000000_0 is done. And is in the process of committing 14/05/24 16:16:01 INFO mapred.LocalJobRunner: map 14/05/24 16:16:01 INFO mapred.Task: Task 'attempt_local1515682078_0002_m_000000_0' done. 14/05/24 16:16:01 INFO mapred.LocalJobRunner: Finishing task: attempt_local1515682078_0002_m_000000_0 14/05/24 16:16:01 INFO mapred.LocalJobRunner: Map task executor complete. 14/05/24 16:16:02 INFO mapreduce.Job: Job job_local1515682078_0002 running in uber mode : false 14/05/24 16:16:02 INFO mapreduce.Job: map 100% reduce 0% 14/05/24 16:16:02 INFO mapreduce.Job: Job job_local1515682078_0002 completed successfully 14/05/24 16:16:02 INFO mapreduce.Job: Counters: 23 File System Counters FILE: Number of bytes read=25147 FILE: Number of bytes written=38319701 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=37527348 HDFS: Number of bytes written=1714 HDFS: Number of read operations=420 HDFS: Number of large read operations=0 HDFS: Number of write operations=7 Map-Reduce Framework Map input records=9 Map output records=0 Input split bytes=172 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=0 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=28319744 File Input Format Counters Bytes Read=1714 File Output Format Counters Bytes Written=0 14/05/24 16:16:02 INFO hadoop.xquery: Finished executing "txt_oranosql.xq". Output path: "hdfs://10.0.2.15:8020/tmp/oxh-root/output" Querying Avro in Oracle NoSQL Database In this section we shall access the Oracle NoSQL Database Avro values and load the data fetched into Oracle Database. Create an query script oranosql_oradb.xq in which specify a custom function for loading data into Oracle Database. Specify a FLOWR expression in which invoke the kv:collection-avroxml function to access the Avro records in Oracle NoSQL Database. Specify the key as /logentry/avro , the same as what was used to store the Avro records. In the let clauses create variables for a logentry’s timestamp , category , type , servername , code and msg . Load the data into Oracle Database using the Oracle Database custom function. import module "oxh:text"; import module "oxh:kv"; declare %oracle:put %oracle-property:targetTable('wlsserver') %oracle-property:connection.user('OE') %oracle-property:connection.password('OE') %oracle-property:connection.url('jdbc:oracle:thin:@localhost:1521:orcl') function local:myPut($c1, $c2, $c3, $c4, $c5, $c6) external; for $logentry in kv:collection-avroxml("/logentry/avro") let $timestamp := $logentry/timestamp let $category := $logentry/category let $type := $logentry/type let $servername := $logentry/servername let $code := $logentry/code let $msg := $logentry/msg return local:myPut($timestamp, $category, $type, $servername, $code, $msg) Run the following hadoop command with the XML configuration file specified using the –option to process the query script. hadoop jar $OXH_HOME/lib/oxh.jar -conf oxh_config.xml oranosql_oradb.xq Oracle XQuery for Hadoop gets started. A MapReduce application runs to process the query script and load the data from Oracle NoSQL Database into Oracle Database. The output from the hadoop command to run OXH to process the query script is listed: 14/05/24 16:19:50 INFO mapred.LocalJobRunner: Finishing task: attempt_local1665208517_0001_m_000006_0 14/05/24 16:19:50 INFO mapred.LocalJobRunner: Starting task: attempt_local1665208517_0001_m_000007_0 14/05/24 16:19:50 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 14/05/24 16:19:50 INFO mapred.MapTask: Processing split: oracle.kv.hadoop.KVInputSplit@1966e9 14/05/24 16:19:50 INFO mapred.LocalJobRunner: 14/05/24 16:19:50 INFO mapred.Task: Task:attempt_local1665208517_0001_m_000007_0 is done. And is in the process of committing 14/05/24 16:19:50 INFO mapred.LocalJobRunner: map 14/05/24 16:19:50 INFO mapred.Task: Task 'attempt_local1665208517_0001_m_000007_0' done. 14/05/24 16:19:50 INFO mapred.LocalJobRunner: Finishing task: attempt_local1665208517_0001_m_000007_0 14/05/24 16:19:50 INFO mapred.LocalJobRunner: Starting task: attempt_local1665208517_0001_m_000008_0 14/05/24 16:19:50 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 14/05/24 16:19:50 INFO mapred.MapTask: Processing split: oracle.kv.hadoop.KVInputSplit@d0daba 14/05/24 16:19:51 INFO mapreduce.Job: map 100% reduce 0% 14/05/24 16:19:51 INFO mapred.LocalJobRunner: 14/05/24 16:19:51 INFO mapred.Task: Task:attempt_local1665208517_0001_m_000008_0 is done. And is in the process of committing 14/05/24 16:19:51 INFO mapred.LocalJobRunner: map 14/05/24 16:19:51 INFO mapred.Task: Task 'attempt_local1665208517_0001_m_000008_0' done. 14/05/24 16:19:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local1665208517_0001_m_000008_0 14/05/24 16:19:51 INFO mapred.LocalJobRunner: Starting task: attempt_local1665208517_0001_m_000009_0 14/05/24 16:19:51 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 14/05/24 16:19:51 INFO mapred.MapTask: Processing split: oracle.kv.hadoop.KVInputSplit@1754758 14/05/24 16:19:52 INFO mapred.LocalJobRunner: 14/05/24 16:19:52 INFO mapred.Task: Task:attempt_local1665208517_0001_m_000009_0 is done. And is in the process of committing 14/05/24 16:19:52 INFO mapred.LocalJobRunner: map 14/05/24 16:19:52 INFO mapred.Task: Task 'attempt_local1665208517_0001_m_000009_0' done. 14/05/24 16:19:52 INFO mapred.LocalJobRunner: Finishing task: attempt_local1665208517_0001_m_000009_0 14/05/24 16:19:52 INFO mapred.LocalJobRunner: Map task executor complete. 14/05/24 16:19:53 INFO mapreduce.Job: Job job_local1665208517_0001 completed successfully 14/05/24 16:19:53 INFO mapreduce.Job: Counters: 23 File System Counters FILE: Number of bytes read=207036 FILE: Number of bytes written=249857990 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=245352030 HDFS: Number of bytes written=20177 HDFS: Number of read operations=2985 HDFS: Number of large read operations=0 HDFS: Number of write operations=90 Map-Reduce Framework Map input records=9 Map output records=0 Input split bytes=1681 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=439 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=324362240 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 14/05/24 16:19:53 INFO hadoop.xquery: Starting "oracle.hadoop.loader.OraLoader" tool, with map-reduce job "oxh:oranosql_oradb.xq#1", inputs=[hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/4f74761e-94aa-40f1-9427-765c49e39df9.0/OXHI0xmyPut-m-00000.avro, hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/4f74761e-94aa-40f1-9427-765c49e39df9.0/OXHI0xmyPut-m-00001.avro, hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/4f74761e-94aa-40f1-9427-765c49e39df9.0/OXHI0xmyPut-m-00003.avro, hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/4f74761e-94aa-40f1-9427-765c49e39df9.0/OXHI0xmyPut-m-00005.avro] ... 1more, output=hdfs://10.0.2.15:8020/tmp/oxh-root/output/myPut 14/05/24 16:19:54 INFO loader.OraLoader: Oracle Loader for Hadoop Release 3.0.0 - Production Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. 14/05/24 16:19:54 INFO loader.OraLoader: Built-Against: hadoop-2.2.0-cdh5.0.0-beta-2 hive-0.12.0-cdh5.0.0-beta-2 avro-1.7.3 jackson-1.8.8 14/05/24 16:19:56 INFO loader.OraLoader: oracle.hadoop.loader.loadByPartition is disabled because table: WLSSERVER is not partitioned 14/05/24 16:19:56 INFO loader.OraLoader: oracle.hadoop.loader.enableSorting disabled, no sorting key provided 14/05/24 16:19:56 INFO loader.OraLoader: Reduce tasks set to 0 because of no partitioning or sorting. Loading will be done in the map phase. 14/05/24 16:19:56 INFO output.DBOutputFormat: Setting map tasks speculative execution to false for : oracle.hadoop.loader.lib.output.JDBCOutputFormat 14/05/24 16:19:56 WARN conf.Configuration: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 14/05/24 16:19:56 WARN loader.OraLoader: Sampler error: the number of reduce tasks must be greater than one; the configured value is 0 . Job will continue without sampled information. 14/05/24 16:19:56 INFO loader.OraLoader: Sampling time=0D:0h:0m:0s:112ms (112 ms) 14/05/24 16:19:56 INFO loader.OraLoader: Submitting OraLoader job oxh:oranosql_oradb.xq#1 14/05/24 16:19:56 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 14/05/24 16:19:58 INFO input.FileInputFormat: Total input paths to process : 5 14/05/24 16:19:58 INFO mapreduce.JobSubmitter: number of splits:5 14/05/24 16:19:58 WARN conf.Configuration: mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files 14/05/24 16:19:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local114456139_0002 14/05/24 16:20:10 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 14/05/24 16:20:10 INFO mapred.LocalJobRunner: OutputCommitter set in config null 14/05/24 16:20:10 INFO mapred.LocalJobRunner: OutputCommitter is oracle.hadoop.loader.lib.output.DBOutputCommitter 14/05/24 16:20:10 INFO mapred.LocalJobRunner: Waiting for map tasks 14/05/24 16:20:10 INFO mapred.LocalJobRunner: Starting task: attempt_local114456139_0002_m_000000_0 14/05/24 16:20:10 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 14/05/24 16:20:10 INFO mapred.MapTask: Processing split: hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/4f74761e-94aa-40f1-9427-765c49e39df9.0/OXHI0xmyPut-m-00003.avro:0+675 14/05/24 16:20:11 INFO output.DBOutputFormat: conf prop: defaultExecuteBatch: 100 14/05/24 16:20:11 INFO output.DBOutputFormat: conf prop: loadByPartition: false 14/05/24 16:20:11 INFO loader.OraLoader: map 0% reduce 0% 14/05/24 16:20:11 INFO output.DBOutputFormat: Insert statement: INSERT INTO "OE"."WLSSERVER" ("TIMESTAMP", "CATEGORY", "TYPE", "SERVERNAME", "CODE", "MSG") VALUES (?, ?, ?, ?, ?, ?) 14/05/24 16:20:11 INFO mapred.LocalJobRunner: 14/05/24 16:20:12 INFO mapred.Task: Task:attempt_local114456139_0002_m_000000_0 is done. And is in the process of committing 14/05/24 16:20:12 INFO mapred.LocalJobRunner: 14/05/24 16:20:12 INFO mapred.Task: Task attempt_local114456139_0002_m_000000_0 is allowed to commit now 14/05/24 16:20:12 INFO output.JDBCOutputFormat: Committed work for task attempt attempt_local114456139_0002_m_000000_0 14/05/24 16:20:13 INFO output.FileOutputCommitter: Saved output of task 'attempt_local114456139_0002_m_000000_0' to hdfs://10.0.2.15:8020/tmp/oxh-root/output/myPut/_temporary/0/task_local114456139_0002_m_000000 14/05/24 16:20:13 INFO mapred.LocalJobRunner: map 14/05/24 16:20:13 INFO mapred.Task: Task 'attempt_local114456139_0002_m_000000_0' done. 14/05/24 16:20:13 INFO mapred.LocalJobRunner: Finishing task: attempt_local114456139_0002_m_000000_0 14/05/24 16:20:13 INFO mapred.LocalJobRunner: Starting task: attempt_local114456139_0002_m_000001_0 14/05/24 16:20:13 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 14/05/24 16:20:13 INFO mapred.MapTask: Processing split: hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/4f74761e-94aa-40f1-9427-765c49e39df9.0/OXHI0xmyPut-m-00001.avro:0+629 14/05/24 16:20:13 INFO loader.OraLoader: map 100% reduce 0% 14/05/24 16:20:13 INFO output.DBOutputFormat: conf prop: defaultExecuteBatch: 100 14/05/24 16:20:13 INFO output.DBOutputFormat: conf prop: loadByPartition: false 14/05/24 16:20:13 INFO output.DBOutputFormat: Insert statement: INSERT INTO "OE"."WLSSERVER" ("TIMESTAMP", "CATEGORY", "TYPE", "SERVERNAME", "CODE", "MSG") VALUES (?, ?, ?, ?, ?, ?) 14/05/24 16:20:13 INFO mapred.LocalJobRunner: 14/05/24 16:20:14 INFO mapred.Task: Task:attempt_local114456139_0002_m_000001_0 is done. And is in the process of committing 14/05/24 16:20:14 INFO mapred.LocalJobRunner: 14/05/24 16:20:14 INFO mapred.Task: Task attempt_local114456139_0002_m_000001_0 is allowed to commit now 14/05/24 16:20:14 INFO output.JDBCOutputFormat: Committed work for task attempt attempt_local114456139_0002_m_000001_0 14/05/24 16:20:14 INFO output.FileOutputCommitter: Saved output of task 'attempt_local114456139_0002_m_000001_0' to hdfs://10.0.2.15:8020/tmp/oxh-root/output/myPut/_temporary/0/task_local114456139_0002_m_000001 14/05/24 16:20:14 INFO mapred.LocalJobRunner: map 14/05/24 16:20:14 INFO mapred.Task: Task 'attempt_local114456139_0002_m_000001_0' done. 14/05/24 16:20:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local114456139_0002_m_000001_0 14/05/24 16:20:14 INFO mapred.LocalJobRunner: Starting task: attempt_local114456139_0002_m_000002_0 14/05/24 16:20:14 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 14/05/24 16:20:14 INFO mapred.MapTask: Processing split: hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/4f74761e-94aa-40f1-9427-765c49e39df9.0/OXHI0xmyPut-m-00005.avro:0+627 14/05/24 16:20:15 INFO output.DBOutputFormat: conf prop: defaultExecuteBatch: 100 14/05/24 16:20:15 INFO output.DBOutputFormat: conf prop: loadByPartition: false 14/05/24 16:20:15 INFO output.DBOutputFormat: Insert statement: INSERT INTO "OE"."WLSSERVER" ("TIMESTAMP", "CATEGORY", "TYPE", "SERVERNAME", "CODE", "MSG") VALUES (?, ?, ?, ?, ?, ?) 14/05/24 16:20:15 INFO mapred.LocalJobRunner: 14/05/24 16:20:15 INFO mapred.Task: Task:attempt_local114456139_0002_m_000002_0 is done. And is in the process of committing 14/05/24 16:20:15 INFO mapred.LocalJobRunner: 14/05/24 16:20:15 INFO mapred.Task: Task attempt_local114456139_0002_m_000002_0 is allowed to commit now 14/05/24 16:20:15 INFO output.JDBCOutputFormat: Committed work for task attempt attempt_local114456139_0002_m_000002_0 14/05/24 16:20:15 INFO output.FileOutputCommitter: Saved output of task 'attempt_local114456139_0002_m_000002_0' to hdfs://10.0.2.15:8020/tmp/oxh-root/output/myPut/_temporary/0/task_local114456139_0002_m_000002 14/05/24 16:20:15 INFO mapred.LocalJobRunner: map 14/05/24 16:20:15 INFO mapred.Task: Task 'attempt_local114456139_0002_m_000002_0' done. 14/05/24 16:20:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local114456139_0002_m_000002_0 14/05/24 16:20:15 INFO mapred.LocalJobRunner: Starting task: attempt_local114456139_0002_m_000003_0 14/05/24 16:20:15 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 14/05/24 16:20:15 INFO mapred.MapTask: Processing split: hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/4f74761e-94aa-40f1-9427-765c49e39df9.0/OXHI0xmyPut-m-00006.avro:0+519 14/05/24 16:20:16 INFO output.DBOutputFormat: conf prop: defaultExecuteBatch: 100 14/05/24 16:20:16 INFO output.DBOutputFormat: conf prop: loadByPartition: false 14/05/24 16:20:17 INFO output.DBOutputFormat: Insert statement: INSERT INTO "OE"."WLSSERVER" ("TIMESTAMP", "CATEGORY", "TYPE", "SERVERNAME", "CODE", "MSG") VALUES (?, ?, ?, ?, ?, ?) 14/05/24 16:20:17 INFO mapred.LocalJobRunner: 14/05/24 16:20:17 INFO mapred.Task: Task:attempt_local114456139_0002_m_000003_0 is done. And is in the process of committing 14/05/24 16:20:17 INFO mapred.LocalJobRunner: 14/05/24 16:20:17 INFO mapred.Task: Task attempt_local114456139_0002_m_000003_0 is allowed to commit now 14/05/24 16:20:17 INFO output.JDBCOutputFormat: Committed work for task attempt attempt_local114456139_0002_m_000003_0 14/05/24 16:20:17 INFO output.FileOutputCommitter: Saved output of task 'attempt_local114456139_0002_m_000003_0' to hdfs://10.0.2.15:8020/tmp/oxh-root/output/myPut/_temporary/0/task_local114456139_0002_m_000003 14/05/24 16:20:17 INFO mapred.LocalJobRunner: map 14/05/24 16:20:17 INFO mapred.Task: Task 'attempt_local114456139_0002_m_000003_0' done. 14/05/24 16:20:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local114456139_0002_m_000003_0 14/05/24 16:20:17 INFO mapred.LocalJobRunner: Starting task: attempt_local114456139_0002_m_000004_0 14/05/24 16:20:17 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 14/05/24 16:20:17 INFO mapred.MapTask: Processing split: hdfs://10.0.2.15:8020/tmp/oxh-root/scratch/4f74761e-94aa-40f1-9427-765c49e39df9.0/OXHI0xmyPut-m-00000.avro:0+458 14/05/24 16:20:18 INFO output.DBOutputFormat: conf prop: defaultExecuteBatch: 100 14/05/24 16:20:18 INFO output.DBOutputFormat: conf prop: loadByPartition: false 14/05/24 16:20:18 INFO output.DBOutputFormat: Insert statement: INSERT INTO "OE"."WLSSERVER" ("TIMESTAMP", "CATEGORY", "TYPE", "SERVERNAME", "CODE", "MSG") VALUES (?, ?, ?, ?, ?, ?) 14/05/24 16:20:18 INFO mapred.LocalJobRunner: 14/05/24 16:20:18 INFO mapred.Task: Task:attempt_local114456139_0002_m_000004_0 is done. And is in the process of committing 14/05/24 16:20:18 INFO mapred.LocalJobRunner: 14/05/24 16:20:18 INFO mapred.Task: Task attempt_local114456139_0002_m_000004_0 is allowed to commit now 14/05/24 16:20:18 INFO output.JDBCOutputFormat: Committed work for task attempt attempt_local114456139_0002_m_000004_0 14/05/24 16:20:18 INFO output.FileOutputCommitter: Saved output of task 'attempt_local114456139_0002_m_000004_0' to hdfs://10.0.2.15:8020/tmp/oxh-root/output/myPut/_temporary/0/task_local114456139_0002_m_000004 14/05/24 16:20:18 INFO mapred.LocalJobRunner: map 14/05/24 16:20:18 INFO mapred.Task: Task 'attempt_local114456139_0002_m_000004_0' done. 14/05/24 16:20:18 INFO mapred.LocalJobRunner: Finishing task: attempt_local114456139_0002_m_000004_0 14/05/24 16:20:18 INFO mapred.LocalJobRunner: Map task executor complete. 14/05/24 16:20:20 INFO loader.OraLoader: Job complete: oxh:oranosql_oradb.xq#1 (job_local114456139_0002) 14/05/24 16:20:20 INFO loader.OraLoader: Counters: 23 File System Counters FILE: Number of bytes read=3412234 FILE: Number of bytes written=232935635 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=225273001 HDFS: Number of bytes written=99008 HDFS: Number of read operations=3110 HDFS: Number of large read operations=0 HDFS: Number of write operations=180 Map-Reduce Framework Map input records=9 Map output records=9 Input split bytes=855 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=77 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=162181120 File Input Format Counters Bytes Read=5816 File Output Format Counters Bytes Written=8058 14/05/24 16:20:20 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/05/24 16:20:20 INFO hadoop.xquery: Finished executing "oranosql_oradb.xq". Output path: "hdfs://10.0.2.15:8020/tmp/oxh-root/output" Subsequently, run a SELECT SQL statement in SQL Plus to list the data loaded into the OE.WLSSERVER table. The 9 rows of data loaded into the OE.WLSSERVER table from Oracle NoSQL Database gets listed. The output from the SQL query is listed: SQL> SELECT * FROM OE.WLSSERVER; TIMESTAMP CATEGORY ------------------------------------------------------- --------------- TYPE SERVERNAME ------------------------------------------------------- --------------- CODE --------------- MSG -------------------------------------------------------------------------------- Apr-8-2014-7:06:20-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to RESUMING TIMESTAMP CATEGORY ------------------------------------------------------- --------------- TYPE SERVERNAME ------------------------------------------------------- --------------- CODE --------------- MSG -------------------------------------------------------------------------------- Apr-8-2014-7:06:23-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to RUNNING TIMESTAMP CATEGORY ------------------------------------------------------- --------------- TYPE SERVERNAME ------------------------------------------------------- --------------- CODE --------------- MSG -------------------------------------------------------------------------------- Apr-8-2014-7:06:24-PM-PDT Notice WebLogicServer AdminServer BEA-000360 Server started in RUNNING mode TIMESTAMP CATEGORY ------------------------------------------------------- --------------- TYPE SERVERNAME ------------------------------------------------------- --------------- CODE --------------- MSG -------------------------------------------------------------------------------- Apr-8-2014-7:06:19-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to ADMIN TIMESTAMP CATEGORY ------------------------------------------------------- --------------- TYPE SERVERNAME ------------------------------------------------------- --------------- CODE --------------- MSG -------------------------------------------------------------------------------- Apr-8-2014-7:06:21-PM-PDT Notice Server AdminServer BEA-002613 Channel Default is now listening on fe80:0:0:0:0:5efe:c0a8:147:7001 for protocol s iiop t3ldap snmp http TIMESTAMP CATEGORY ------------------------------------------------------- --------------- TYPE SERVERNAME ------------------------------------------------------- --------------- CODE --------------- MSG -------------------------------------------------------------------------------- Apr-8-2014-7:06:16-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to STANDBY TIMESTAMP CATEGORY ------------------------------------------------------- --------------- TYPE SERVERNAME ------------------------------------------------------- --------------- CODE --------------- MSG -------------------------------------------------------------------------------- Apr-8-2014-7:06:18-PM-PDT Notice Log Management AdminServer BEA-170027 The Server has established connection with the Domain level Diagnostic Service s TIMESTAMP CATEGORY ------------------------------------------------------- --------------- TYPE SERVERNAME ------------------------------------------------------- --------------- CODE --------------- MSG -------------------------------------------------------------------------------- uccessfully Apr-8-2014-7:06:22-PM-PDT Notice WebLogicServer AdminServer BEA-000331 TIMESTAMP CATEGORY ------------------------------------------------------- --------------- TYPE SERVERNAME ------------------------------------------------------- --------------- CODE --------------- MSG -------------------------------------------------------------------------------- Started WebLogic Admin Server AdminServer for domain base_domain running in Deve lopment Mode Apr-8-2014-7:06:17-PM-PDT Notice WebLogicServer AdminServer TIMESTAMP CATEGORY ------------------------------------------------------- --------------- TYPE SERVERNAME ------------------------------------------------------- --------------- CODE --------------- MSG -------------------------------------------------------------------------------- BEA-000365 Server state changed to STARTING 9 rows selected. SQL> In this article we stored Avro records in Oracle NoSQL Database with the data being stored as XML. Subsequently we queried Oracle NoSQL Database to access the Avro record values and load the data fetched into Oracle Database 11g, all with Oracle XQuery for Hadoop.
↧