Quantcast
Channel: Oracle
Viewing all articles
Browse latest Browse all 1814

Wiki Page: Loading MySQL Table Data into Oracle Database

$
0
0
Written by Deepak Vohra If data is to be loaded from MySQL database to Oracle Database, which are the two most commonly used relational databases, some of the options available are: Use Sqoop Use SQL Developer Use Oracle Loader for Hadoop We discussed the Sqoop option in an earlier tutorial . In this tutorial we discuss the Oracle Loader for Hadoop option. Oracle Loader for Hadoop (OLH) is the tool to load data from different data sources into Oracle Database. OLH supports input formats Avro, delimited text file, Hive, Oracle NoSQL Database, but does not support JDBC input format. Though Oracle Loader for Hadoop does not support the JDBC input format directly, we shall discuss in this tutorial how a Hive external table could be created over MySQL database and Oracle Loader for Hadoop used to load the Hive table data into Oracle Database. Setting the Environment Creating Database Tables Starting HDFS Creating a Hive External Table Configuring Oracle Loader for Hadoop Running the Oracle Loader for Hadoop Querying Oracle Database Table Setting the Environment We require the following software to install on Oracle Linux 6.6. -MySQL 5.x Database -Hive JDBC Storage Handler -Oracle Loader for Hadoop 3.0.0 -CDH 4.6 Hadoop 2.0.0 -CDH 4.6 Hive 0.10.0 -Java 7 Create a directory called and set its permissions to global (777). mkdir /hive chmod -R 777 /hive cd /hive Download, install and configure MySQL Database, Hive JDBC Storage Handler, CDH 4.6 Hadoop 2.0.0, CDH 4.6 Hive 0.10.0 and Java 7 as discussed in an earlier tutorial on creating a Hive external table over MySQL Database. Download Oracle Loader for Hadoop Release 3.0.0 oraloader-3.0.0.x86_64.zip from http://www.oracle.com/technetwork/database/database-technologies/bdc/big-data-connectors/downloads/index.html . Unzip the file to a directory. Two files get extracted oraloader-3.0.0-h1.x86_64.zip and oraloader-3.0.0-h2.x86_64.zip . The oraloader-3.0.0-h2.x86_64.zip file is for CDH4 and CDH5. As we are using CDH4.6 extract oraloader-3.0.0-h2.x86_64.zip . root>unzip oraloader-3.0.0-h2.x86_64.zip Copy The Hive MongoDB JDBC Handler to the Oracle Loader for Hadoop jlib directory. cp /hive/hive-0.10.0-cdh4.6.0/hive-jdbc-storage-handler/hive-jdbc-storage-handler-1.1.1-cdh4.3.0-SNAPSHOT-dist.jar =/hive/oraloader-3.0.0-h2/jlib Set the environment variables for MySQL Database, Oracle Database, Oracle Loader for Hadoop, Hadoop, Hive and Java in the bash shell. vi ~/.bashrc export HADOOP_PREFIX=/hive/hadoop-2.0.0-cdh4.6.0 export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop export OLH_HOME =/hive/oraloader-3.0.0-h2 export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1 export ORACLE_SID=ORCL export HIVE_HOME=/hive/hive-0.10.0-cdh4.6.0 export HIVE_CONF=$HIVE_HOME/conf export JAVA_HOME=/hive/jdk1.7.0_55 export MYSQL_HOME=/mysql/mysql-5.6.19-linux-glibc2.5-i686 export HADOOP_MAPRED_HOME=/hive/hadoop-2.0.0-cdh4.6.0/bin export HADOOP_HOME=/hive/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2 export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:$HIVE_HOME/lib/*:$HIVE_CONF:$HADOOP_CONF:$OLH_HOME/jlib/* export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME:$HIVE_HOME/bin:$MYSQL_HOME/bin:$ORACLE_HOME/bin export CLASSPATH=$HADOOP_CLASSPATH Creating Database Tables Create a MySQL Database table called wlslog and input table data also as discussed in an earlier tutorial on creating a Hive external table over MySQL Database. A select query lists the wlslog table data. Create a Oracle Database table called OE.wlslog to load data into. CREATE TABLE OE.wlslog (time_stamp VARCHAR2(255), category VARCHAR2(255), type VARCHAR2(255), servername VARCHAR2(255), code VARCHAR2(255), msg VARCHAR2(255)); The OE.WLSLOG table may have to be dropped first if already created for another application. Starting HDFS Format the NameNode and start the NameNode. Also start the DataNode. hadoop namenode –format hadoop namenode hadoop datanode Create the Hive warehouse directory in HDFS in which the Hive tables are to be stored. hadoop dfs -mkdir hdfs://10.0.2.15:8020/user/warehouse hadoop dfs -chmod -R g+w hdfs://10.0.2.15:8020/user/warehouse Create a directory structure in HDFS to put Hive lib jars, set permissions for the directory, and put the Hive lib jars in HDFS. hadoop dfs -mkdir hdfs://10.0.2.15:8020-0.10.0-cdh4.6.0/lib hadoop dfs -chmod -R g+w hdfs://10.0.2.15:8020-0.10.0-cdh4.6.0/lib hadoop dfs -put -0.10.0-cdh4.6.0/lib/* hdfs://10.0.2.15:8020-0.10.0-cdh4.6.0/lib Also, create a directory structure for Oracle Loader for Hadoop in HDFS and put the OLH jars into HDFS. /hive>hadoop dfs -mkdir hdfs://10.0.2.15:8020/hive/oraloader-3.0.0-h2/jlib /hive>hadoop dfs -chmod -R g+w hdfs://10.0.2.15:8020/hive/oraloader-3.0.0-h2/jlib /hive>hadoop dfs -put /hive/oraloader-3.0.0-h2/jlib/* hdfs://10.0.2.15:8020/hive/oraloader-3.0.0-h2/jlib Creating a Hive External Table Create a Hive external table using the CREATE EXTERNAL TABLE command. First, start the Hive Thrift server with the following command. hive --service hiveserver Start the Hive shell. hive Add the Hive JDBC Storage Handler jar file to the Hive shell classpath with the ADD JAR command. hive>ADD JAR /hive/hive-0.10.0-cdh4.6.0/hive-jdbc-storage-handler/hive-jdbc-storage-handler-1.1.1-cdh4.3.0-SNAPSHOT-dist.jar; Create the Hive external table called wlslog in the default database with the CREATE EXTERNAL TABLE command in the Hive shell. hive>CREATE EXTERNAL TABLE wlslog(time_stamp STRING, category STRING, type STRING, servername STRING, code STRING, msg STRING) STORED BY 'com.qubitproducts.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "qubit.sql.database.type" = "MySQL", "qubit.sql.jdbc.url" = "jdbc:mysql://localhost:3306/test?user=root&password=", "qubit.sql.jdbc.driver" = "com.mysql.jdbc.Driver", "qubit.sql.query" = "SELECT time_stamp,category,type, servername, code, msg FROM wlslog", "qubit.sql.column.mapping" = "time_stamp=time_stamp,category=category,type=type,servername=servername,code=code,msg=msg"); Query the Hive external table using the SELECT query in Hive shell to list the Hive table data. Configuring Oracle Loader for Hadoop Oracle Loader for Hadoop makes use of a configuration file to get the parameters for the loader. Create a configuration file ( OraLoadConf.xml ). The input format is specified using the mapreduce.inputformat.class property, which is set as follows for Hive as input. mapreduce.inputformat.class oracle.hadoop.loader.lib.input.HiveToAvroInputFormat If Hive is the input format the following properties are also required to be specified. Property Description Value oracle.hadoop.loader.input.hive.databaseName The Hive database name. default oracle.hadoop.loader.input.hive.tableName The Hive table name. wlslog The output format class is JDBCOutputFormat as data is to be loaded into Oracle Database table. mapreduce.job.outputformat.class oracle.hadoop.loader.lib.output.JDBCOutputFormat The target database table is specified with the oracle.hadoop.loader.loaderMap.targetTable property. oracle.hadoop.loader.loaderMap.targetTable OE.CATALOG The following properties are provided for specifying the connection parameters. Property Description Oracle Database setting oracle.hadoop.loader.connection.url The connection URL used to connect to Oracle Database. jdbc:oracle:thin:@${HOST}: ${TCPPORT}: ${SID} TCPPORT The port number to connect to. 1521 HOST The host name. localhost SID The Oracle Database service name. ORCL oracle.hadoop.loader.connection.user The user name or schema name. OE oracle.hadoop.loader.connection.password Password. OE The OraLoadJobConf.xml configuration file is listed below. mapreduce.inputformat.class oracle.hadoop.loader.lib.input.HiveToAvroInputFormat oracle.hadoop.loader.input.hive.databaseName default oracle.hadoop.loader.input.hive.tableName wlslog mapreduce.job.outputformat.class oracle.hadoop.loader.lib.output.JDBCOutputFormat mapreduce.output.fileoutputformat.outputdir oraloadout oracle.hadoop.loader.loaderMap.targetTable OE.WLSLOG oracle.hadoop.loader.connection.url jdbc:oracle:thin:@${HOST}:${TCPPORT}:${SID} TCPPORT 1521 HOST localhost SID ORCL oracle.hadoop.loader.connection.user OE oracle.hadoop.loader.connection.password OE Running the Oracle Loader for Hadoop Run the Oracle Loader for Hadoop with the following command in which the configuration file is specified with the –conf option. hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -conf OraLoadConf.xml -libjars $OLH_HOME/jlib/oraloader.jar A MapReduce job runs to load 7 rows of data from MySQL database to Oracle Database. A more detailed output from the OLH is as follows. [root@localhost hive]# hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -conf OraLoadConf.xml -libjars $OLH_HOME/jlib/oraloader.jar Oracle Loader for Hadoop Release 3.0.0 - Production Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. 15/09/02 19:46:03 INFO loader.OraLoader: Oracle Loader for Hadoop Release 3.0.0 - Production Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. 15/09/02 19:48:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1709085000_0001 15/09/02 19:48:56 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 15/09/02 19:48:56 INFO mapred.LocalJobRunner: OutputCommitter set in config null 15/09/02 19:48:57 INFO mapred.LocalJobRunner: OutputCommitter is oracle.hadoop.loader.lib.output.DBOutputCommitter 15/09/02 19:48:57 INFO mapred.LocalJobRunner: Waiting for map tasks 15/09/02 19:48:58 INFO mapred.LocalJobRunner: Starting task: attempt_local1709085000_0001_m_000000_0 15/09/02 19:48:58 INFO loader.OraLoader: map 0% reduce 0% 15/09/02 19:49:00 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 15/09/02 19:49:00 INFO mapred.MapTask: Processing split: hdfs://10.0.2.15:8020/user/hive/warehouse/wlslog:0+0 15/09/02 19:49:05 INFO output.DBOutputFormat: conf prop: defaultExecuteBatch: 100 15/09/02 19:49:05 INFO output.DBOutputFormat: conf prop: loadByPartition: false 15/09/02 19:49:17 INFO output.DBOutputFormat: Insert statement: INSERT INTO "OE"."WLSLOG" ("TIME_STAMP", "CATEGORY", "TYPE", "SERVERNAME", "CODE", "MSG") VALUES (?, ?, ?, ?, ?, ?) 15/09/02 19:49:18 INFO mapred.LocalJobRunner: 15/09/02 19:49:20 INFO mapred.LocalJobRunner: map 15/09/02 19:49:21 INFO loader.OraLoader: map 100% reduce 0% 15/09/02 19:49:23 INFO mapred.LocalJobRunner: map 15/09/02 19:49:56 INFO mapred.Task: Task:attempt_local1709085000_0001_m_000000_0 is done. And is in the process of committing 15/09/02 19:49:56 INFO mapred.LocalJobRunner: map 15/09/02 19:49:56 INFO mapred.Task: Task attempt_local1709085000_0001_m_000000_0 is allowed to commit now 15/09/02 19:49:57 INFO output.JDBCOutputFormat: Committed work for task attempt attempt_local1709085000_0001_m_000000_0 15/09/02 19:49:57 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1709085000_0001_m_000000_0' to hdfs://10.0.2.15:8020/user/root/oraloadout/_temporary/0/task_local1709085000_0001_m_000000 15/09/02 19:49:57 INFO mapred.LocalJobRunner: map 15/09/02 19:49:57 INFO mapred.Task: Task 'attempt_local1709085000_0001_m_000000_0' done. 15/09/02 19:49:57 INFO mapred.LocalJobRunner: Finishing task: attempt_local1709085000_0001_m_000000_0 15/09/02 19:49:57 INFO mapred.LocalJobRunner: Map task executor complete. 15/09/02 19:49:59 INFO loader.OraLoader: Job complete: OraLoader (job_local1709085000_0001) 15/09/02 19:49:59 INFO loader.OraLoader: Counters: 23 File System Counters FILE: Number of bytes read=10414783 FILE: Number of bytes written=11377856 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=10422607 HDFS: Number of bytes written=9769395 HDFS: Number of read operations=234 HDFS: Number of large read operations=0 HDFS: Number of write operations=36 Map-Reduce Framework Map input records=7 Map output records=7 Input split bytes=3270 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=2218 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=23531520 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=1622 [root@localhost hive]# Querying Oracle Database Table Subsequently run a SELECT query in Oracle Database SQL*Plus shell. SELECT * FROM OE.WLSLOG; The SELECT command is shown in SQL*Plus shell. The 7 rows of data loaded into Oracle Database get listed. The complete output from the SELECT query is listed: SQL> select * from OE.WLSLOG; TIME_STAMP -------------------------------------------------------------------------------- CATEGORY -------------------------------------------------------------------------------- TYPE -------------------------------------------------------------------------------- SERVERNAME -------------------------------------------------------------------------------- CODE -------------------------------------------------------------------------------- MSG -------------------------------------------------------------------------------- Apr-8-2014-7:06:16-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to STANDBY Apr-8-2014-7:06:17-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to STARTING Apr-8-2014-7:06:18-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to ADMIN Apr-8-2014-7:06:19-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to RESUMING Apr-8-2014-7:06:20-PM-PDT Notice WebLogicServer AdminServer BEA-000361 Started WebLogic AdminServer Apr-8-2014-7:06:21-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to RUNNING Apr-8-2014-7:06:22-PM-PDT Notice WebLogicServer AdminServer BEA-000360 Server started in RUNNING mode 7 rows selected. SQL> In this tutorial we loaded MySQL database table data into Oracle Database with Oracle Loader for Hadoop.

Viewing all articles
Browse latest Browse all 1814

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>