Quantcast
Channel: Oracle
Viewing all articles
Browse latest Browse all 1814

Wiki Page: Using Oracle Database with CDH 5.2 Sqoop 1.4.5

$
0
0
Written by Deepak Vhora In an earlier tutorial on using Oracle Database with Sqoop the Sqoop (Apache Sqoop 1.4.1 incubating), Oracle Database (Oracle Database 10g Express Edition), Apache Hadoop (1.0.0), Apache Hive (0.90), and Apache HBase (0.94.1) versions were earlier versions. In this tutorial Oracle Database 11g is used with later versions: CDH 5.2 Sqoop 1.4.5 (sqoop-1.4.5-cdh5.2.0), Hadoop 2.5.0 (hadoop-2.5.0-cdh5.2.0), Hive 0.13.1 (hive-0.13.1-cdh5.2.0), and HBase 0.98.6 (hbase-0.98.6-cdh5.2.0). This tutorial has the following sections. Setting the Environment Creating a Oracle Database Table Importing into HDFS Exporting from HDFS Importing into HBase Importing into Hive Setting the Environment The following software is required for this tutorial. -Oracle Database 11g -Sqoop 1.4.5 (sqoop-1.4.5-cdh5.2.0) -Hadoop 2.5.0 (hadoop-2.5.0-cdh5.2.0) -Hive 0.13.1 (hive-0.13.1-cdh5.2.0) -HBase 0.98.6 (hbase-0.98.6-cdh5.2.0) -Java 7 Create a directory /sqoop to install the software and set its permissions. mkdir /sqoop chmod -R 777 /sqoop cd /sqoop Add the hadoop group and add the hbase user to the hadoop group. groupadd hadoop useradd -g hadoop hbase Download and extract the Java 7 gz file. tar zxvf jdk-7u55-linux-i586.gz Download and extract the Hadoop 2.5.0 tar.gz file. wget http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.2.0.tar.gz tar -xvf hadoop-2.5.0-cdh5.2.0.tar.gz Create symlinks for the Hadoop conf and bin directories. ln -s /sqoop/hadoop-2.5.0-cdh5.2.0/bin-mapreduce1 /sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1/bin ln -s /sqoop/hadoop-2.5.0-cdh5.2.0/etc/hadoop /sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1/conf Download and extract the Sqoop 1.4.5 tar.gz file. wget http://archive-primary.cloudera.com/cdh5/cdh/5/sqoop-1.4.5-cdh5.2.0.tar.gz tar -xvf sqoop-1.4.5-cdh5.2.0.tar.gz Copy the Oracle JDBC Jar file to the Sqoop lib directory. cp ojdbc6.jar /sqoop/sqoop-1.4.5-cdh5.2.0/lib Download and install the Hive 0.13.1 tar.gz file. wget http://archive-primary.cloudera.com/cdh5/cdh/5/hive-0.13.1-cdh5.2.0.tar.gz tar -xvf hive-0.13.1-cdh5.2.0.tar.gz Create a hive-site.xml configuration file from the template file. cp /sqoop/hive-0.13.1-cdh5.2.0/conf/hive-default.xml.template /sqoop/hive-0.13.1-cdh5.2.0/conf/hive-site.xml Set the following configuration properties in the /sqoop/hive-0.13.1-cdh5.2.0/conf/hive-site.xml file. The host ip address specified in the hive.metastore.warehouse.dir could be different. hive.metastore.warehouse.dir hdfs://10.0.2.15:8020/user/hive/warehouse hive.metastore.uris thrift://localhost:10000 Download and extract the HBase 0.98.6 tar.gz file. wget http://archive-primary.cloudera.com/cdh5/cdh/5/hbase-0.98.6-cdh5.2.0.tar.gz tar -xvf hbase-0.98.6-cdh5.2.0.tar.gz Set the following configuration properties in the /sqoop/hbase-0.98.6-cdh5.2.0/conf/hbase-site.xml file. The ip address specified in the hbase.rootdir could be different. hbase.rootdir hdfs://10.0.2.15:8020/hbase hbase.zookeeper.property.dataDir /zookeeper hbase.zookeeper.property.clientPort 2181 hbase.zookeeper.quorum localhost hbase.regionserver.port 60020 hbase.master.port 60000 Create the directory specified in the hbase.zookeeper.property.dataDir property and set its permissions. mkdir -p /zookeeper chmod -R 700 /zookeeper As root user increase the maximum number of file handles limit in the /etc/security/limits.conf file. hdfs - nofile 32768 hbase - nofile 32768 Set the environment variables for Oracle Database, Sqoop, Hadoop, Hive, HBase and Java. vi ~/.bashrc export HADOOP_PREFIX=/sqoop/hadoop-2.5.0-cdh5.2.0 export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop export HIVE_HOME=/sqoop/hive-0.13.1-cdh5.2.0 export HBASE_HOME=/sqoop/hbase-0.98.6-cdh5.2.0 export SQOOP_HOME=/sqoop/sqoop-1.4.5-cdh5.2.0 export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1 export ORACLE_SID=ORCL export JAVA_HOME=/sqoop/jdk1.7.0_55 export HADOOP_MAPRED_HOME=/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1 export HADOOP_HOME=/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1 export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:SQOOP_HOME/lib/*:$HBASE_HOME/lib/*:$HIVE_HOME/lib/* export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME/bin:$SQOOP_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:$ORACLE_HOME/bin export CLASSPATH=$HADOOP_CLASSPATH export HADOOP_NAMENODE_USER=sqoop export HADOOP_DATANODE_USER=sqoop Set the following Hadoop core properties in the /sqoop/hadoop-2.5.0-cdh5.2.0/etc/hadoop/core-site.xml configuration file. The ip address specified in the fs.defaultFS property could be different. fs.defaultFS hdfs://10.0.2.15:8020 hadoop.tmp.dir /var/lib/hadoop-0.20/cache Create the directory specified in the hadoop.tmp.dir property and set its permissions. mkdir -p /var/lib/hadoop-0.20/cache chmod -R 777 /var/lib/hadoop-0.20/cache Set the following HDFS configuration properties in the /sqoop/hadoop-2.5.0-cdh5.2.0/etc/hadoop/hdfs-site.xml file. dfs.permissions.superusergroup hadoop dfs.namenode.name.dir /data/1/dfs/nn dfs.replication 1 dfs.permissions false dfs.datanode.max.xcievers 4096 Create the NameNode storage directory and set its permissions. mkdir -p /data/1/dfs/nn chmod -R 777 /data/1/dfs/nn Format the NameNode and start the NameNode and the DataNode. hadoop namenode -format hadoop namenode hadoop datanode Create the HDFS directory specified in the hive.metastore.warehouse.dir property in the hive-site.xml file and set its permissions. hadoop dfs -mkdir -p hdfs://10.0.2.15:8020/user/hive/warehouse hadoop dfs -chmod -R 777 hdfs://10.0.2.15:8020/user/hive/warehouse Create the HDFS directory specified in the hbase.rootdir property in the hbase-site.xml file and set its permissions. hadoop dfs -mkdir /hbase hadoop dfs -chmod -R 777 /hbase We need to copy the Sqoop lib jars to the HDFS to be available in the runtime classpath. Create a HDFS directory path to put the Sqoop lib jars, set the directory path permissions and put the Sqoop lib jars to HDFS. hadoop dfs -mkdir hdfs://10.0.2.15:8020/sqoop/sqoop-1.4.5-cdh5.2.0/lib hadoop dfs -chmod -R 777 hdfs://10.0.2.15:8020/sqoop/sqoop-1.4.5-cdh5.2.0/lib hdfs dfs -put /sqoop/sqoop-1.4.5-cdh5.2.0/lib/* hdfs://10.0.2.15:8020/sqoop/sqoop-1.4.5-cdh5.2.0/lib Similarly create a HDFS directory path to put the Hive lib jars, set the directory path permissions and put the Hive lib jars to HDFS. hadoop dfs -mkdir hdfs://10.0.2.15:8020/sqoop/hive-0.13.1-cdh5.2.0/lib hadoop dfs -chmod -R 777 hdfs://10.0.2.15:8020/sqoop/hive-0.13.1-cdh5.2.0/lib hdfs dfs -put /sqoop/hive-0.13.1-cdh5.2.0/lib/* hdfs://10.0.2.15:8020/sqoop/hive-0.13.1-cdh5.2.0/lib Similarly create a HDFS directory path to put the HBase lib jars, set the directory path permissions and put the HBase lib jars to HDFS. hadoop dfs -mkdir hdfs://10.0.2.15:8020/sqoop/hbase-0.98.6-cdh5.2.0/lib hadoop dfs -chmod -R 777 hdfs://10.0.2.15:8020/sqoop/hbase-0.98.6-cdh5.2.0/lib hdfs dfs -put /sqoop/hbase-0.98.6-cdh5.2.0/lib/* hdfs://10.0.2.15:8020/sqoop/hbase-0.98.6-cdh5.2.0/lib Start the HBase Master, Regionserver and Zookeeper nodes. hbase-daemon.sh start master hbase-daemon.sh start regionserver hbase-daemon.sh start zookeeper Creating a Oracle Database Table In this section we shall create the Oracle Database table that is to be used to import/export with Sqoop. In SQL*Plus connect as schema OE . Create a database table wlslog . CONNECT OE/OE; CREATE TABLE OE.wlslog (time_stamp VARCHAR2(4000), category VARCHAR2(4000), type VARCHAR2(4000), servername VARCHAR2(4000), code VARCHAR2(4000), msg VARCHAR2(4000)); Run the following SQL script to add data to the wlslog table. INSERT INTO wlslog(time_stamp,category,type,servername,code,msg) VALUES('Apr-8-2014-7:06:16-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to STANDBY'); INSERT INTO wlslog(time_stamp,category,type,servername,code,msg) VALUES('Apr-8-2014-7:06:17-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to STARTING'); INSERT INTO wlslog(time_stamp,category,type,servername,code,msg) VALUES('Apr-8-2014-7:06:18-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to ADMIN'); INSERT INTO wlslog(time_stamp,category,type,servername,code,msg) VALUES('Apr-8-2014-7:06:19-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to RESUMING'); INSERT INTO wlslog(time_stamp,category,type,servername,code,msg) VALUES('Apr-8-2014-7:06:20-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000361','Started WebLogic AdminServer'); INSERT INTO wlslog(time_stamp,category,type,servername,code,msg) VALUES('Apr-8-2014-7:06:21-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to RUNNING'); INSERT INTO wlslog(time_stamp,category,type,servername,code,msg) VALUES('Apr-8-2014-7:06:22-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000360','Server started in RUNNING mode'); As the output in SQL*Plus indicates the database table wlslog gets created. Create another table WLSLOG_COPY , with the same structure as the wlslog table, to be used to export from HDFS. CREATE TABLE WLSLOG_COPY(time_stamp VARCHAR2(4000), category VARCHAR2(4000), type VARCHAR2(4000), servername VARCHAR2(4000), code VARCHAR2(4000), msg VARCHAR2(4000)); The WLSLOG_COPY table gets created. Do not add data to the table as data is to be exported to it from HDFS. Importing into HDFS In this section Sqoop is used to import Oracle Database table data into HDFS. Run the following sqoop import command to import into HDFS. sqoop import --connect "jdbc:oracle:thin:@localhost:1521:ORCL" --password "OE" --username "OE" --table "wlslog" --columns "time_stamp,category,type,servername,code,msg" --split-by "time_stamp" --target-dir "/oradb/import" –verbose The sqoop import command arguments are as follows. Argument Description Value --connect Sets the connection url for Oracle Database "jdbc:oracle:thin:@localhost:1521:ORCL" --username Sets username to connect to Oracle Database "OE" --password Sets the password for Oracle Database "OE" --table Sets the Oracle Database table name "wlslog" --columns Sets the Oracle Database table columns "time_stamp,category,type,servername,code,msg" --split-by Sets the primary key column "time_stamp" --target-dir Sets the HDFS directory to import into "/oradb/import" –verbose Sets verbose output A MapReduce job runs to import the Oracle Database table data into HDFS. A more detailed output from the sqoop import command is as follows. 15/04/03 11:09:17 INFO mapred.LocalJobRunner: 15/04/03 11:09:18 INFO mapred.JobClient: map 100% reduce 0% 15/04/03 11:09:22 INFO mapred.Task: Task:attempt_local1162911152_0001_m_000000_0 is done. And is in the process of commiting 15/04/03 11:09:22 INFO mapred.LocalJobRunner: 15/04/03 11:09:22 INFO mapred.Task: Task attempt_local1162911152_0001_m_000000_0 is allowed to commit now 15/04/03 11:09:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1162911152_0001_m_000000_0' to /oradb/import 15/04/03 11:09:24 INFO mapred.LocalJobRunner: 15/04/03 11:09:24 INFO mapred.Task: Task 'attempt_local1162911152_0001_m_000000_0' done. 15/04/03 11:09:24 INFO mapred.LocalJobRunner: Finishing task: attempt_local1162911152_0001_m_000000_0 15/04/03 11:09:24 INFO mapred.LocalJobRunner: Map task executor complete. 15/04/03 11:09:25 INFO mapred.JobClient: Job complete: job_local1162911152_0001 15/04/03 11:09:26 INFO mapred.JobClient: Counters: 18 15/04/03 11:09:26 INFO mapred.JobClient: File System Counters 15/04/03 11:09:26 INFO mapred.JobClient: FILE: Number of bytes read=21673941 15/04/03 11:09:26 INFO mapred.JobClient: FILE: Number of bytes written=21996421 15/04/03 11:09:26 INFO mapred.JobClient: FILE: Number of read operations=0 15/04/03 11:09:26 INFO mapred.JobClient: FILE: Number of large read operations=0 15/04/03 11:09:26 INFO mapred.JobClient: FILE: Number of write operations=0 15/04/03 11:09:26 INFO mapred.JobClient: HDFS: Number of bytes read=0 15/04/03 11:09:26 INFO mapred.JobClient: HDFS: Number of bytes written=717 15/04/03 11:09:26 INFO mapred.JobClient: HDFS: Number of read operations=1 15/04/03 11:09:26 INFO mapred.JobClient: HDFS: Number of large read operations=0 15/04/03 11:09:26 INFO mapred.JobClient: HDFS: Number of write operations=2 15/04/03 11:09:26 INFO mapred.JobClient: Map-Reduce Framework 15/04/03 11:09:26 INFO mapred.JobClient: Map input records=7 15/04/03 11:09:26 INFO mapred.JobClient: Map output records=7 15/04/03 11:09:26 INFO mapred.JobClient: Input split bytes=87 15/04/03 11:09:26 INFO mapred.JobClient: Spilled Records=0 15/04/03 11:09:26 INFO mapred.JobClient: CPU time spent (ms)=0 15/04/03 11:09:26 INFO mapred.JobClient: Physical memory (bytes) snapshot=0 15/04/03 11:09:26 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0 15/04/03 11:09:26 INFO mapred.JobClient: Total committed heap usage (bytes)=180756480 15/04/03 11:09:26 INFO mapreduce.ImportJobBase: Transferred 717 bytes in 182.2559 seconds (3.934 bytes/sec) 15/04/03 11:09:26 INFO mapreduce.ImportJobBase: Retrieved 7 records. 15/04/03 11:09:26 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@3d4817 Exporting from HDFS Having imported into HDFS, in this section we shall export the imported data back into Oracle Database using the sqoop export tool. Run the following sqoop export command to export to Oracle Database. sqoop export --connect "jdbc:oracle:thin:@127.0.0.1:1521:ORCL" --hadoop-home "/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1" --password "OE" --username "OE" --export-dir "/oradb/import" --table "WLSLOG_COPY" --verbose The sqoop export command arguments are as follows. Argument Description Value --connect Sets the connection url for Oracle Database "jdbc:oracle:thin:@localhost:1521:ORCL" --username Sets username to connect to Oracle Database "OE" --password Sets the password for Oracle Database "OE" --table Sets the Oracle Database table name to export to "WLSLOG_COPY" --hadoop-home Sets the Hadoop home directory "/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1" --export-dir Sets the HDFS directory to export from. Should be same as the directory imported into "/oradb/import" –verbose Sets verbose output A MapReduce job runs to export HDFS data into Oracle Database. A more detailed output from the sqoop export command is as follows. [root@localhost sqoop]# sqoop export --connect "jdbc:oracle:thin:@localhost:1521:ORCL" --hadoop-home "/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1" --password "OE" --username "OE" --export-dir "/oradb/import" --table "WLSLOG_COPY" --verbose 15/04/03 11:13:03 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@localhost:1521 15/04/03 11:13:03 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache. 15/04/03 11:13:03 INFO manager.SqlManager: Using default fetchSize of 1000 15/04/03 11:13:03 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.OracleManager@1101fa5 15/04/03 11:13:03 INFO tool.CodeGenTool: Beginning code generation 15/04/03 11:13:04 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM WLSLOG_COPY t WHERE 1=0 15/04/03 11:13:04 DEBUG manager.SqlManager: Execute getColumnInfoRawQuery : SELECT t.* FROM WLSLOG_COPY t WHERE 1=0 15/04/03 11:13:05 DEBUG manager.OracleManager: Creating a new connection for jdbc:oracle:thin:@localhost:1521:ORCL, using username: OE 15/04/03 11:13:05 DEBUG manager.OracleManager: No connection paramenters specified. Using regular API for making connection. 15/04/03 11:13:10 INFO manager.OracleManager: Time zone has been set to GMT 15/04/03 11:13:11 DEBUG manager.SqlManager: Using fetchSize for next query: 1000 15/04/03 11:13:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM WLSLOG_COPY t WHERE 1=0 15/04/03 11:13:15 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 11:13:15 DEBUG orm.ClassWriter: selected columns: 15/04/03 11:13:15 DEBUG orm.ClassWriter: TIME_STAMP 15/04/03 11:13:15 DEBUG orm.ClassWriter: CATEGORY 15/04/03 11:13:15 DEBUG orm.ClassWriter: TYPE 15/04/03 11:13:15 DEBUG orm.ClassWriter: SERVERNAME 15/04/03 11:13:15 DEBUG orm.ClassWriter: CODE 15/04/03 11:13:15 DEBUG orm.ClassWriter: MSG 15/04/03 11:14:00 INFO mapreduce.ExportJobBase: Beginning export of WLSLOG_COPY 15/04/03 11:14:00 DEBUG util.ClassLoaderStack: Checking for existing class: WLSLOG_COPY 15/04/03 11:14:52 DEBUG mapreduce.JobBase: Using InputFormat: class org.apache.sqoop.mapreduce.ExportInputFormat 15/04/03 11:14:54 DEBUG db.DBConfiguration: Securing password into job credentials store 15/04/03 11:14:54 DEBUG manager.OracleManager$ConnCache: Got cached connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 11:15:23 INFO input.FileInputFormat: Total input paths to process : 1 15/04/03 11:15:23 DEBUG mapreduce.ExportInputFormat: Target numMapTasks=4 15/04/03 11:15:23 DEBUG mapreduce.ExportInputFormat: Total input bytes=717 15/04/03 11:15:23 DEBUG mapreduce.ExportInputFormat: maxSplitSize=179 15/04/03 11:15:23 INFO input.FileInputFormat: Total input paths to process : 1 15/04/03 11:15:25 DEBUG mapreduce.ExportInputFormat: Generated splits: 15/04/03 11:15:25 DEBUG mapreduce.ExportInputFormat: Paths:/oradb/import/part-m-00000:0+179 Locations:localhost:; 15/04/03 11:15:25 DEBUG mapreduce.ExportInputFormat: Paths:/oradb/import/part-m-00000:179+179 Locations:localhost:; 15/04/03 11:15:25 DEBUG mapreduce.ExportInputFormat: Paths:/oradb/import/part-m-00000:358+179 Locations:localhost:; 15/04/03 11:15:25 DEBUG mapreduce.ExportInputFormat: Paths:/oradb/import/part-m-00000:537+90,/oradb/import/part-m-00000:627+90 Locations:localhost:; 15/04/03 11:16:35 INFO mapred.LocalJobRunner: OutputCommitter set in config null 15/04/03 11:16:35 INFO mapred.JobClient: Running job: job_local596048800_0001 15/04/03 11:16:35 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.sqoop.mapreduce.NullOutputCommitter 15/04/03 11:16:36 INFO mapred.LocalJobRunner: Waiting for map tasks 15/04/03 11:16:36 INFO mapred.LocalJobRunner: Starting task: attempt_local596048800_0001_m_000000_0 15/04/03 11:16:37 INFO mapred.JobClient: map 0% reduce 0% 15/04/03 11:16:38 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 15/04/03 11:16:40 INFO util.ProcessTree: setsid exited with exit code 0 15/04/03 11:16:41 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@9487e9 15/04/03 11:16:41 INFO mapred.MapTask: Processing split: Paths:/oradb/import/part-m-00000:537+90,/oradb/import/part-m-00000:627+90 15/04/03 11:16:41 DEBUG mapreduce.CombineShimRecordReader: ChildSplit operates on: hdfs://10.0.2.15:8020/oradb/import/part-m-00000 15/04/03 11:16:41 DEBUG db.DBConfiguration: Fetching password from job credentials store 15/04/03 11:16:46 DEBUG mapreduce.CombineShimRecordReader: ChildSplit operates on: hdfs://10.0.2.15:8020/oradb/import/part-m-00000 15/04/03 11:16:46 INFO mapred.LocalJobRunner: 15/04/03 11:16:48 DEBUG mapreduce.AsyncSqlOutputFormat: Committing transaction of 1 statements 15/04/03 11:16:48 INFO mapred.Task: Task:attempt_local596048800_0001_m_000000_0 is done. And is in the process of commiting 15/04/03 11:16:49 INFO mapred.LocalJobRunner: 15/04/03 11:16:49 INFO mapred.Task: Task 'attempt_local596048800_0001_m_000000_0' done. 15/04/03 11:16:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local596048800_0001_m_000000_0 15/04/03 11:16:49 INFO mapred.LocalJobRunner: Starting task: attempt_local596048800_0001_m_000001_0 15/04/03 11:16:49 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 15/04/03 11:16:49 INFO mapred.JobClient: map 25% reduce 0% 15/04/03 11:16:49 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@318c80 15/04/03 11:16:49 INFO mapred.MapTask: Processing split: Paths:/oradb/import/part-m-00000:0+179 15/04/03 11:16:49 DEBUG mapreduce.CombineShimRecordReader: ChildSplit operates on: hdfs://10.0.2.15:8020/oradb/import/part-m-00000 15/04/03 11:16:49 DEBUG db.DBConfiguration: Fetching password from job credentials store 15/04/03 11:16:53 INFO mapred.LocalJobRunner: 15/04/03 11:16:53 DEBUG mapreduce.AsyncSqlOutputFormat: Committing transaction of 1 statements 15/04/03 11:16:53 INFO mapred.Task: Task:attempt_local596048800_0001_m_000001_0 is done. And is in the process of commiting 15/04/03 11:16:53 INFO mapred.LocalJobRunner: 15/04/03 11:16:53 INFO mapred.Task: Task 'attempt_local596048800_0001_m_000001_0' done. 15/04/03 11:16:53 INFO mapred.LocalJobRunner: Finishing task: attempt_local596048800_0001_m_000001_0 15/04/03 11:16:53 INFO mapred.LocalJobRunner: Starting task: attempt_local596048800_0001_m_000002_0 15/04/03 11:16:53 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 15/04/03 11:16:53 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@19d4b58 15/04/03 11:16:53 INFO mapred.MapTask: Processing split: Paths:/oradb/import/part-m-00000:179+179 15/04/03 11:16:53 DEBUG mapreduce.CombineShimRecordReader: ChildSplit operates on: hdfs://10.0.2.15:8020/oradb/import/part-m-00000 15/04/03 11:16:53 DEBUG db.DBConfiguration: Fetching password from job credentials store 15/04/03 11:16:54 INFO mapred.JobClient: map 50% reduce 0% 15/04/03 11:16:58 DEBUG mapreduce.AutoProgressMapper: Progress thread shutdown detected. 15/04/03 11:16:58 INFO mapred.LocalJobRunner: 15/04/03 11:16:58 DEBUG mapreduce.AsyncSqlOutputFormat: Committing transaction of 1 statements 15/04/03 11:16:58 INFO mapred.Task: Task:attempt_local596048800_0001_m_000002_0 is done. And is in the process of commiting 15/04/03 11:16:58 INFO mapred.LocalJobRunner: 15/04/03 11:16:58 INFO mapred.Task: Task 'attempt_local596048800_0001_m_000002_0' done. 15/04/03 11:16:58 INFO mapred.LocalJobRunner: Finishing task: attempt_local596048800_0001_m_000002_0 15/04/03 11:16:58 INFO mapred.LocalJobRunner: Starting task: attempt_local596048800_0001_m_000003_0 15/04/03 11:16:58 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 15/04/03 11:16:58 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1059ca1 15/04/03 11:16:58 INFO mapred.MapTask: Processing split: Paths:/oradb/import/part-m-00000:358+179 15/04/03 11:16:58 DEBUG mapreduce.CombineShimRecordReader: ChildSplit operates on: hdfs://10.0.2.15:8020/oradb/import/part-m-00000 15/04/03 11:16:58 DEBUG db.DBConfiguration: Fetching password from job credentials store 15/04/03 11:16:59 INFO mapred.JobClient: map 75% reduce 0% 15/04/03 11:17:02 INFO mapred.LocalJobRunner: 15/04/03 11:17:02 DEBUG mapreduce.AsyncSqlOutputFormat: Committing transaction of 1 statements 15/04/03 11:17:03 INFO mapred.Task: Task:attempt_local596048800_0001_m_000003_0 is done. And is in the process of commiting 15/04/03 11:17:03 INFO mapred.LocalJobRunner: 15/04/03 11:17:03 INFO mapred.Task: Task 'attempt_local596048800_0001_m_000003_0' done. 15/04/03 11:17:03 INFO mapred.LocalJobRunner: Finishing task: attempt_local596048800_0001_m_000003_0 15/04/03 11:17:03 INFO mapred.LocalJobRunner: Map task executor complete. 15/04/03 11:17:03 INFO mapred.JobClient: map 100% reduce 0% 15/04/03 11:17:03 INFO mapred.JobClient: Job complete: job_local596048800_0001 15/04/03 11:17:04 INFO mapred.JobClient: Counters: 18 15/04/03 11:17:04 INFO mapred.JobClient: File System Counters 15/04/03 11:17:04 INFO mapred.JobClient: FILE: Number of bytes read=86701670 15/04/03 11:17:04 INFO mapred.JobClient: FILE: Number of bytes written=87982780 15/04/03 11:17:04 INFO mapred.JobClient: FILE: Number of read operations=0 15/04/03 11:17:04 INFO mapred.JobClient: FILE: Number of large read operations=0 15/04/03 11:17:04 INFO mapred.JobClient: FILE: Number of write operations=0 15/04/03 11:17:04 INFO mapred.JobClient: HDFS: Number of bytes read=4720 15/04/03 11:17:04 INFO mapred.JobClient: HDFS: Number of bytes written=0 15/04/03 11:17:04 INFO mapred.JobClient: HDFS: Number of read operations=78 15/04/03 11:17:04 INFO mapred.JobClient: HDFS: Number of large read operations=0 15/04/03 11:17:04 INFO mapred.JobClient: HDFS: Number of write operations=0 15/04/03 11:17:04 INFO mapred.JobClient: Map-Reduce Framework 15/04/03 11:17:04 INFO mapred.JobClient: Map input records=7 15/04/03 11:17:04 INFO mapred.JobClient: Map output records=7 15/04/03 11:17:04 INFO mapred.JobClient: Input split bytes=576 15/04/03 11:17:04 INFO mapred.JobClient: Spilled Records=0 15/04/03 11:17:04 INFO mapred.JobClient: CPU time spent (ms)=0 15/04/03 11:17:04 INFO mapred.JobClient: Physical memory (bytes) snapshot=0 15/04/03 11:17:04 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0 15/04/03 11:17:04 INFO mapred.JobClient: Total committed heap usage (bytes)=454574080 15/04/03 11:17:04 INFO mapreduce.ExportJobBase: Transferred 4.6094 KB in 128.7649 seconds (36.656 bytes/sec) 15/04/03 11:17:04 INFO mapreduce.ExportJobBase: Exported 7 records. 15/04/03 11:17:04 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@3d4817 Run a SELECT statement in SQL*PLUS to list the exported data. The 7 rows of data exported to the WLSLOG_COPY table get listed. Importing into HBase In this section Sqoop is used to import data into HDFS. Run the following sqoop import command to import into HBase. sqoop import --connect "jdbc:oracle:thin:@localhost:1521:ORCL" --hadoop-home "/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1" --password "OE" --username "OE" --hbase-create-table --hbase-table "WLS_LOG" --column-family "wls" --table "wlslog" –verbose The sqoop import command arguments are as follows. Argument Description Value --connect Sets the connection url for Oracle Database "jdbc:oracle:thin:@localhost:1521:ORCL" --username Sets username to connect to Oracle Database "OE" --password Sets the password for Oracle Database "OE" --table Sets the Oracle Database table name "wlslog" --hadoop-home Sets the Hadoop home directory "/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1" --hbase-create-table Creates the HBase table --hbase-table Sets the HBase table name "WLS_LOG" --column-family Sets the HBase column family "wls" –verbose Sets verbose output A MapReduce job runs to import Oracle Database data into HBase. A more detailed output from the sqoop import command is as follows. [root@localhost sqoop]# sqoop import --connect "jdbc:oracle:thin:@localhost:1521:ORCL" --hadoop-home "/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1" --password "OE" --username "OE" --hbase-create-table --hbase-table "WLS_LOG" --column-family "wls" --table "WLSLOG" --verbose 15/04/03 13:56:26 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory 15/04/03 13:56:26 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@localhost:1521 15/04/03 13:56:26 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache. 15/04/03 13:56:26 INFO manager.SqlManager: Using default fetchSize of 1000 15/04/03 13:56:26 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.OracleManager@704f33 15/04/03 13:56:26 INFO tool.CodeGenTool: Beginning code generation 15/04/03 13:56:26 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM WLSLOG t WHERE 1=0 15/04/03 13:56:26 DEBUG manager.SqlManager: Execute getColumnInfoRawQuery : SELECT t.* FROM WLSLOG t WHERE 1=0 15/04/03 13:56:28 DEBUG manager.OracleManager: Creating a new connection for jdbc:oracle:thin:@localhost:1521:ORCL, using username: OE 15/04/03 13:56:28 DEBUG manager.OracleManager: No connection paramenters specified. Using regular API for making connection. 15/04/03 13:57:09 DEBUG manager.SqlManager: Using fetchSize for next query: 1000 15/04/03 13:57:09 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM WLSLOG t WHERE 1=0 15/04/03 13:57:22 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 13:57:22 DEBUG orm.ClassWriter: selected columns: 15/04/03 13:57:22 DEBUG orm.ClassWriter: TIME_STAMP 15/04/03 13:57:22 DEBUG orm.ClassWriter: CATEGORY 15/04/03 13:57:22 DEBUG orm.ClassWriter: TYPE 15/04/03 13:57:22 DEBUG orm.ClassWriter: SERVERNAME 15/04/03 13:57:22 DEBUG orm.ClassWriter: CODE 15/04/03 13:57:22 DEBUG orm.ClassWriter: MSG 15/04/03 13:58:46 DEBUG db.DBConfiguration: Securing password into job credentials store 15/04/03 13:58:46 DEBUG manager.OracleManager$ConnCache: Got cached connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 13:58:46 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 13:58:46 DEBUG mapreduce.DataDrivenImportJob: Using table class: WLSLOG 15/04/03 13:58:46 DEBUG mapreduce.DataDrivenImportJob: Using InputFormat: class com.cloudera.sqoop.mapreduce.db.OracleDataDrivenDBInputFormat 15/04/03 13:58:47 DEBUG manager.OracleManager$ConnCache: Got cached connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 13:59:07 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/i386:/lib:/usr/lib 15/04/03 13:59:07 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 15/04/03 13:59:07 INFO zookeeper.ZooKeeper: Client environment:java.compiler= 15/04/03 13:59:07 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 15/04/03 13:59:07 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386 15/04/03 13:59:07 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.39-400.247.1.el6uek.i686 15/04/03 13:59:07 INFO zookeeper.ZooKeeper: Client environment:user.name=root 15/04/03 13:59:07 INFO zookeeper.ZooKeeper: Client environment:user.home=/root 15/04/03 13:59:07 INFO zookeeper.ZooKeeper: Client environment:user.dir=/sqoop 15/04/03 13:59:07 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x8fffea, quorum=localhost:2181, baseZNode=/hbase 15/04/03 13:59:09 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/10.0.2.15:2181. Will not attempt to authenticate using SASL (unknown error) 15/04/03 13:59:10 INFO zookeeper.ClientCnxn: Socket connection established to localhost/10.0.2.15:2181, initiating session 15/04/03 13:59:11 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/10.0.2.15:2181, sessionid = 0x14c806c5f420006, negotiated timeout = 40000 15/04/03 13:59:47 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 15/04/03 13:59:54 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x8fffea connecting to ZooKeeper ensemble=localhost:2181 15/04/03 13:59:54 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x8fffea, quorum=localhost:2181, baseZNode=/hbase 15/04/03 13:59:54 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/10.0.2.15:2181. Will not attempt to authenticate using SASL (unknown error) 15/04/03 13:59:54 INFO zookeeper.ClientCnxn: Socket connection established to localhost/10.0.2.15:2181, initiating session 15/04/03 13:59:55 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/10.0.2.15:2181, sessionid = 0x14c806c5f420007, negotiated timeout = 40000 15/04/03 14:00:07 INFO zookeeper.ZooKeeper: Session: 0x14c806c5f420007 closed 15/04/03 14:00:07 INFO mapreduce.HBaseImportJob: Creating missing HBase table WLS_LOG 15/04/03 14:00:07 INFO zookeeper.ClientCnxn: EventThread shut down 15/04/03 14:00:14 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x8fffea connecting to ZooKeeper ensemble=localhost:2181 15/04/03 14:00:14 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x8fffea, quorum=localhost:2181, baseZNode=/hbase 15/04/03 14:00:14 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 15/04/03 14:00:15 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 15/04/03 14:00:15 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14c806c5f420008, negotiated timeout = 40000 15/04/03 14:00:15 INFO zookeeper.ClientCnxn: EventThread shut down 15/04/03 14:00:15 INFO zookeeper.ZooKeeper: Session: 0x14c806c5f420008 closed 15/04/03 14:00:18 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 15/04/03 14:00:18 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 15/04/03 14:00:20 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 15/04/03 14:00:44 DEBUG db.DBConfiguration: Fetching password from job credentials store 15/04/03 14:00:49 INFO db.DBInputFormat: Using read commited transaction isolation 15/04/03 14:00:49 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '1=1' and upper bound '1=1' 15/04/03 14:02:39 INFO mapred.JobClient: Running job: job_local1040061811_0001 15/04/03 14:02:39 INFO mapred.LocalJobRunner: OutputCommitter set in config null 15/04/03 14:02:39 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.sqoop.mapreduce.NullOutputCommitter 15/04/03 14:02:40 INFO mapred.LocalJobRunner: Waiting for map tasks 15/04/03 14:02:40 INFO mapred.LocalJobRunner: Starting task: attempt_local1040061811_0001_m_000000_0 15/04/03 14:02:40 INFO mapred.JobClient: map 0% reduce 0% 15/04/03 14:02:46 DEBUG db.DBConfiguration: Fetching password from job credentials store 15/04/03 14:02:50 INFO db.DBInputFormat: Using read commited transaction isolation 15/04/03 14:02:50 INFO mapred.MapTask: Processing split: 1=1 AND 1=1 15/04/03 14:02:51 INFO db.OracleDBRecordReader: Time zone has been set to GMT 15/04/03 14:02:53 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1 15/04/03 14:02:53 DEBUG db.DataDrivenDBRecordReader: Using query: SELECT TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG FROM WLSLOG WHERE ( 1=1 ) AND ( 1=1 ) 15/04/03 14:02:53 DEBUG db.DBRecordReader: Using fetchSize for next query: 1000 15/04/03 14:02:53 INFO db.DBRecordReader: Executing query: SELECT TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG FROM WLSLOG WHERE ( 1=1 ) AND ( 1=1 ) 15/04/03 14:03:01 DEBUG mapreduce.AutoProgressMapper: Instructing auto-progress thread to quit. 15/04/03 14:03:01 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false 15/04/03 14:03:01 DEBUG mapreduce.AutoProgressMapper: Waiting for progress thread shutdown... 15/04/03 14:03:01 DEBUG mapreduce.AutoProgressMapper: Progress thread shutdown detected. 15/04/03 14:03:01 INFO mapred.LocalJobRunner: 15/04/03 14:03:06 INFO mapred.LocalJobRunner: 15/04/03 14:03:07 INFO mapred.Task: Task:attempt_local1040061811_0001_m_000000_0 is done. And is in the process of commiting 15/04/03 14:03:07 INFO mapred.JobClient: map 100% reduce 0% 15/04/03 14:03:07 INFO mapred.LocalJobRunner: 15/04/03 14:03:07 INFO mapred.Task: Task 'attempt_local1040061811_0001_m_000000_0' done. 15/04/03 14:03:07 INFO mapred.LocalJobRunner: Finishing task: attempt_local1040061811_0001_m_000000_0 15/04/03 14:03:07 INFO mapred.LocalJobRunner: Map task executor complete. 15/04/03 14:03:08 INFO mapred.JobClient: Job complete: job_local1040061811_0001 15/04/03 14:03:08 INFO mapred.JobClient: Counters: 18 15/04/03 14:03:08 INFO mapred.JobClient: File System Counters 15/04/03 14:03:08 INFO mapred.JobClient: FILE: Number of bytes read=39829434 15/04/03 14:03:08 INFO mapred.JobClient: FILE: Number of bytes written=40338352 15/04/03 14:03:08 INFO mapred.JobClient: FILE: Number of read operations=0 15/04/03 14:03:08 INFO mapred.JobClient: FILE: Number of large read operations=0 15/04/03 14:03:08 INFO mapred.JobClient: FILE: Number of write operations=0 15/04/03 14:03:08 INFO mapred.JobClient: HDFS: Number of bytes read=0 15/04/03 14:03:08 INFO mapred.JobClient: HDFS: Number of bytes written=0 15/04/03 14:03:08 INFO mapred.JobClient: HDFS: Number of read operations=0 15/04/03 14:03:08 INFO mapred.JobClient: HDFS: Number of large read operations=0 15/04/03 14:03:08 INFO mapred.JobClient: HDFS: Number of write operations=0 15/04/03 14:03:08 INFO mapred.JobClient: Map-Reduce Framework 15/04/03 14:03:08 INFO mapred.JobClient: Map input records=7 15/04/03 14:03:08 INFO mapred.JobClient: Map output records=7 15/04/03 14:03:08 INFO mapred.JobClient: Input split bytes=87 15/04/03 14:03:08 INFO mapred.JobClient: Spilled Records=0 15/04/03 14:03:08 INFO mapred.JobClient: CPU time spent (ms)=0 15/04/03 14:03:08 INFO mapred.JobClient: Physical memory (bytes) snapshot=0 15/04/03 14:03:08 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0 15/04/03 14:03:08 INFO mapred.JobClient: Total committed heap usage (bytes)=180756480 15/04/03 14:03:08 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 171.3972 seconds (0 bytes/sec) 15/04/03 14:03:09 INFO mapreduce.ImportJobBase: Retrieved 7 records. 15/04/03 14:03:09 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@3d4817 Start the HBase shell. hbase shell Run the scan command to list the data imported into the WLS_LOG table. scan "WLS_LOG" The scan command lists the HBase table data. The 7 rows of data imported into HBase get listed. Importing into Hive In this section Sqoop is used to import Oracle database table data into Hive. Run the following sqoop import command to import into Hive. sqoop import --connect "jdbc:oracle:thin:@localhost:1521:ORCL" --hadoop-home "/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1" --password "OE" --username "OE" --hive-import --create-hive-table --hive-table "WLSLOG" --table "WLSLOG_COPY" --split-by "time_stamp" –verbose The sqoop import command arguments are as follows. Argument Description Value --connect Sets the connection url for Oracle Database "jdbc:oracle:thin:@localhost:1521:ORCL" --username Sets username to connect to Oracle Database "OE" --password Sets the password for Oracle Database "OE" --table Sets the Oracle Database table name to import from "WLSLOG_COPY" --hadoop-home Sets the Hadoop home directory "/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1" --hive-import Import into Hive --create-hive-table Sets to create Hive table --hive-table Sets the Hive table name "WLSLOG" --split-by Sets the Oracle Database table primary key "time_stamp" –verbose Sets verbose output A MapReduce job runs to import Oracle Database table data into Hive. A more detailed output from the sqoop import command is as follows. [root@localhost sqoop]# sqoop import --connect "jdbc:oracle:thin:@localhost:1521:ORCL" --hadoop-home "/sqoop/hadoop-2.5.0-cdh5.2.0/share/hadoop/mapreduce1" --password "OE" --username "OE" --hive-import --create-hive-table --hive-table "WLSLOG" --table "WLSLOG_COPY" --split-by "time_stamp" --verbose 15/04/03 13:20:42 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory 15/04/03 13:20:42 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@localhost:1521 15/04/03 13:20:43 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache. 15/04/03 13:20:43 INFO manager.SqlManager: Using default fetchSize of 1000 15/04/03 13:20:43 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.OracleManager@9ed26e 15/04/03 13:20:44 INFO tool.CodeGenTool: Beginning code generation 15/04/03 13:20:44 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM WLSLOG_COPY t WHERE 1=0 15/04/03 13:20:44 DEBUG manager.SqlManager: Execute getColumnInfoRawQuery : SELECT t.* FROM WLSLOG_COPY t WHERE 1=0 15/04/03 13:20:51 DEBUG manager.OracleManager: Creating a new connection for jdbc:oracle:thin:@localhost:1521:ORCL, using username: OE 15/04/03 13:20:51 DEBUG manager.OracleManager: No connection paramenters specified. Using regular API for making connection. 15/04/03 13:21:18 DEBUG manager.SqlManager: Using fetchSize for next query: 1000 15/04/03 13:21:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM WLSLOG_COPY t WHERE 1=0 15/04/03 13:21:30 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 13:21:30 DEBUG orm.ClassWriter: selected columns: 15/04/03 13:21:30 DEBUG orm.ClassWriter: TIME_STAMP 15/04/03 13:21:30 DEBUG orm.ClassWriter: CATEGORY 15/04/03 13:21:30 DEBUG orm.ClassWriter: TYPE 15/04/03 13:21:30 DEBUG orm.ClassWriter: SERVERNAME 15/04/03 13:21:30 DEBUG orm.ClassWriter: CODE 15/04/03 13:21:30 DEBUG orm.ClassWriter: MSG 15/04/03 13:21:31 DEBUG orm.ClassWriter: Writing source file: /tmp/sqoop-root/compile/6235c3beba4d629be2f91c2c832c8033/WLSLOG_COPY.java 15/04/03 13:21:31 DEBUG orm.ClassWriter: Table name: WLSLOG_COPY 15/04/03 13:21:31 DEBUG orm.ClassWriter: Columns: TIME_STAMP:12, CATEGORY:12, TYPE:12, SERVERNAME:12, CODE:12, MSG:12, 15/04/03 13:21:52 INFO mapreduce.ImportJobBase: Beginning import of WLSLOG_COPY 15/04/03 13:21:53 DEBUG util.ClassLoaderStack: Checking for existing class: WLSLOG_COPY 15/04/03 13:22:04 DEBUG db.DBConfiguration: Securing password into job credentials store 15/04/03 13:22:04 DEBUG manager.OracleManager$ConnCache: Got cached connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 13:22:04 INFO manager.OracleManager: Time zone has been set to GMT 15/04/03 13:22:04 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 13:22:05 DEBUG mapreduce.DataDrivenImportJob: Using table class: WLSLOG_COPY 15/04/03 13:22:05 DEBUG mapreduce.DataDrivenImportJob: Using InputFormat: class com.cloudera.sqoop.mapreduce.db.OracleDataDrivenDBInputFormat 15/04/03 13:24:39 INFO mapred.LocalJobRunner: OutputCommitter set in config null 15/04/03 13:24:39 INFO mapred.JobClient: Running job: job_local846992281_0001 15/04/03 13:24:40 INFO mapred.JobClient: map 0% reduce 0% 15/04/03 13:24:40 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 15/04/03 13:24:42 INFO mapred.LocalJobRunner: Waiting for map tasks 15/04/03 13:24:42 INFO mapred.LocalJobRunner: Starting task: attempt_local846992281_0001_m_000000_0 15/04/03 13:24:43 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 15/04/03 13:24:45 INFO util.ProcessTree: setsid exited with exit code 0 15/04/03 13:24:46 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e1a108 15/04/03 13:24:46 DEBUG db.DBConfiguration: Fetching password from job credentials store 15/04/03 13:24:50 INFO db.DBInputFormat: Using read commited transaction isolation 15/04/03 13:24:50 INFO mapred.MapTask: Processing split: 1=1 AND 1=1 15/04/03 13:24:50 INFO db.OracleDBRecordReader: Time zone has been set to GMT 15/04/03 13:24:53 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1 15/04/03 13:24:53 DEBUG db.DataDrivenDBRecordReader: Using query: SELECT TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG FROM WLSLOG_COPY WHERE ( 1=1 ) AND ( 1=1 ) 15/04/03 13:24:53 DEBUG db.DBRecordReader: Using fetchSize for next query: 1000 15/04/03 13:24:53 INFO db.DBRecordReader: Executing query: SELECT TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG FROM WLSLOG_COPY WHERE ( 1=1 ) AND ( 1=1 ) 15/04/03 13:25:01 INFO mapred.LocalJobRunner: 15/04/03 13:25:06 INFO mapred.LocalJobRunner: 15/04/03 13:25:07 INFO mapred.JobClient: map 100% reduce 0% 15/04/03 13:25:12 INFO mapred.Task: Task:attempt_local846992281_0001_m_000000_0 is done. And is in the process of commiting 15/04/03 13:25:12 INFO mapred.LocalJobRunner: 15/04/03 13:25:12 INFO mapred.Task: Task attempt_local846992281_0001_m_000000_0 is allowed to commit now 15/04/03 13:25:14 INFO output.FileOutputCommitter: Saved output of task 'attempt_local846992281_0001_m_000000_0' to WLSLOG_COPY 15/04/03 13:25:14 INFO mapred.LocalJobRunner: 15/04/03 13:25:14 INFO mapred.Task: Task 'attempt_local846992281_0001_m_000000_0' done. 15/04/03 13:25:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local846992281_0001_m_000000_0 15/04/03 13:25:14 INFO mapred.LocalJobRunner: Map task executor complete. 15/04/03 13:25:15 INFO mapred.JobClient: Job complete: job_local846992281_0001 15/04/03 13:25:15 INFO mapred.JobClient: Counters: 18 15/04/03 13:25:15 INFO mapred.JobClient: File System Counters 15/04/03 13:25:15 INFO mapred.JobClient: FILE: Number of bytes read=21673967 15/04/03 13:25:15 INFO mapred.JobClient: FILE: Number of bytes written=21996158 15/04/03 13:25:15 INFO mapred.JobClient: FILE: Number of read operations=0 15/04/03 13:25:15 INFO mapred.JobClient: FILE: Number of large read operations=0 15/04/03 13:25:15 INFO mapred.JobClient: FILE: Number of write operations=0 15/04/03 13:25:15 INFO mapred.JobClient: HDFS: Number of bytes read=0 15/04/03 13:25:15 INFO mapred.JobClient: HDFS: Number of bytes written=717 15/04/03 13:25:15 INFO mapred.JobClient: HDFS: Number of read operations=1 15/04/03 13:25:15 INFO mapred.JobClient: HDFS: Number of large read operations=0 15/04/03 13:25:15 INFO mapred.JobClient: HDFS: Number of write operations=2 15/04/03 13:25:15 INFO mapred.JobClient: Map-Reduce Framework 15/04/03 13:25:15 INFO mapred.JobClient: Map input records=7 15/04/03 13:25:15 INFO mapred.JobClient: Map output records=7 15/04/03 13:25:16 INFO mapred.JobClient: Input split bytes=87 15/04/03 13:25:16 INFO mapred.JobClient: Spilled Records=0 15/04/03 13:25:16 INFO mapred.JobClient: CPU time spent (ms)=0 15/04/03 13:25:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=0 15/04/03 13:25:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0 15/04/03 13:25:16 INFO mapred.JobClient: Total committed heap usage (bytes)=180756480 15/04/03 13:25:16 INFO mapreduce.ImportJobBase: Transferred 717 bytes in 182.8413 seconds (3.9214 bytes/sec) 15/04/03 13:25:16 INFO mapreduce.ImportJobBase: Retrieved 7 records. 15/04/03 13:25:16 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@3d4817 15/04/03 13:25:16 DEBUG hive.HiveImport: Hive.inputTable: WLSLOG_COPY 15/04/03 13:25:16 DEBUG hive.HiveImport: Hive.outputTable: WLS_LOG 15/04/03 13:25:16 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM WLSLOG_COPY t WHERE 1=0 15/04/03 13:25:16 DEBUG manager.SqlManager: Execute getColumnInfoRawQuery : SELECT t.* FROM WLSLOG_COPY t WHERE 1=0 15/04/03 13:25:16 DEBUG manager.OracleManager$ConnCache: Got cached connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 13:25:18 INFO manager.OracleManager: Time zone has been set to GMT 15/04/03 13:25:18 DEBUG manager.SqlManager: Using fetchSize for next query: 1000 15/04/03 13:25:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM WLSLOG_COPY t WHERE 1=0 15/04/03 13:25:21 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@localhost:1521:ORCL/OE 15/04/03 13:25:21 DEBUG hive.TableDefWriter: Create statement: CREATE TABLE `WLS_LOG` ( `TIME_STAMP` STRING, `CATEGORY` STRING, `TYPE` STRING, `SERVERNAME` STRING, `CODE` STRING, `MSG` STRING) COMMENT 'Imported by sqoop on 2015/04/03 13:25:21' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE 15/04/03 13:25:21 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://10.0.2.15:8020/user/root/WLSLOG_COPY' INTO TABLE `WLS_LOG` 15/04/03 13:25:21 INFO hive.HiveImport: Loading uploaded data into Hive 15/04/03 13:25:23 DEBUG hive.HiveImport: Using in-process Hive instance. 15/04/03 13:25:23 DEBUG util.SubprocessSecurityManager: Installing subprocess security manager Logging initialized using configuration in jar:file:/sqoop/hive-0.13.1-cdh5.2.0/lib/hive-common-0.13.1-cdh5.2.0.jar!/hive-log4j.properties OK Time taken: 75.724 seconds Loading data to table default.wls_log Table default.wls_log stats: [numFiles=1, numRows=0, totalSize=717, rawDataSize=0] OK Time taken: 36.523 seconds Start the Hive Thrift Server. hive --service hiveserver Start the Hive shell. >hive Run the following SELECT statement in the Hive shell to list the imported data. SELECT * FROM default.wls_log The 7 rows of data imported from Oracle Database gets listed. In this tutorial we used Sqoop 1.4.5 with Oracle Database 11g.

Viewing all articles
Browse latest Browse all 1814

Trending Articles