Quantcast
Channel: Oracle
Viewing all articles
Browse latest Browse all 1814

Wiki Page: Streaming MySQL Table Data to Oracle NoSQL Database with Flume

$
0
0
Consider the requirement that data is added to MySQL Database from the MySQL Command Line Interface (CLI) and a copy of the data is to be made available for another application or user in Oracle NoSQL Database. Or, the MySQL Database table data is to be backed up in Oracle NoSQL Database. Integrating Flume with MySQL as a source and Oracle NoSQL Database as a sink would copy a MySQL table to Oracle NoSQL Database. In this tutorial we shall stream MySQL table data to Oracle NoSQL Database using Flume. This tutorial has the following sections. Installing MySQL Database Installing Oracle NoSQL Database Setting the Environment Creating a Database Table in MySQL Configuring Flume Running a Flume Agent Streaming Data, not just Bulk Transferring Data Installing MySQL Database First, install MySQL Database, which is to be used as the Flume source. Create a directory to install MySQL and set its permissions to global (777). mkdir /mysql chmod -R 777 /mysql cd /mysql Download and extract the MySQL Database tar.gz file. tar zxvf mysql-5.6.22-linux-glibc2.5-i686.tar.gz Create the mysql group and add the mysql user to the group, if not already added. >groupadd mysql >useradd -r -g mysql mysql Create a symlink for MySQL Database installation directory. >ln -s /mysql/mysql-5.6.19-linux-glibc2.5-i686 mysql >cd mysql Set the current directory owner and group to mysql and install the MySQL Database. chown -R mysql . chgrp -R mysql . scripts/mysql_install_db --user=mysql Change the current directory owner to root and change the data directory owner to mysql . chown -R root . chown -R mysql data Start the MySQL Database. mysqld_safe --user=mysql & By default the root user does not require a password. Set a password for the root user to mysql with the following command. >mysqladmin -u root -p password Installing Oracle NoSQL Database Download and extract the Oracle NoSQL Database tar.gz file. wget http://download.oracle.com/otn-pub/otn_software/nosql-database/kv-ce-3.2.5.tar.gz tar -xvf kv-ce-3.2.5.tar.gz Create a lightweight Oracle NoSQL Database store called kvstore with the following command. java -jar /flume/kv-3.2.5/lib/kvstore.jar kvlite The kvstore gets created with host as localhost.oraclelinux and port as 5000. Setting the Environment We need to install the following software to run Flume. -Flume 1.4 -Hadoop 2.0.0 -flume-ng-sql-source plugin -Java 7 Create a directory /flume to install Flume and set its permissions to global (777). mkdir /flume chmod -R 777 /flume cd /flume Download and extract the Java gz file. tar zxvf jdk-7u55-linux-i586.gz Download and extract the CDH 4.6 Hadoop 2.0.0 tar.gz file. wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.6.0.tar.gz tar -xvf hadoop-2.0.0-cdh4.6.0.tar.gz Create symlinks for Hadoop conf and bin directories. ln -s /flume/hadoop-2.0.0-cdh4.6.0/bin /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/bin ln -s /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/conf Download and extract the CDH 4.6 Flume 1.4 tar.gz file. wget http://archive-primary.cloudera.com/cdh4/cdh/4/flume-ng-1.4.0-cdh4.6.0.tar.gz tar -xvf flume-ng-1.4.0-cdh4.6.0.tar.gz Download the source code for flume-ng-sql-source from https://github.com/keedio/flume-ng-sql-source . Compile and package the plugin into a jar file with the following command. >mvn package The flume-ng-sql-source-0.8.jar jar gets generated in the target directory. Copy the flume-ng-sql-source-0.8.jar jar to the Flume lib directory. cp flume-ng-sql-source-0.8.jar /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Also copy the MySQL JDBC Jar file to the Flume lib directory. cp mysql-connector-java-5.1.31-bin.jar /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Set the environment variables for Hadoop, Flume, MySQL Database, and Java. vi ~/.bashrc export HADOOP_PREFIX=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop export FLUME_HOME=/flume/apache-flume-1.4.0-cdh4.6.0-bin export FLUME_CONF=/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf export JAVA_HOME=/flume/jdk1.7.0_55 export MYSQL_HOME=/mysql/mysql-5.6.19-linux-glibc2.5-i686 export HADOOP_MAPRED_HOME=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_HOME=/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1 export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:$FLUME_HOME/lib/* export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME/bin:$MYSQL_HOME/bin export CLASSPATH=$HADOOP_CLASSPATH export HADOOP_NAMENODE_USER=flume export HADOOP_DATANODE_USER=flume Create a directory sql-source/lib in the $FLUME_HOME/plugins.d directory and copy the flume-ng-sql-source-0.8.jar file to the directory. mkdir -p $FLUME_HOME/plugins.d/sql-source/lib cp /media/sf_VMShared/flume/mysql/flume-ng-sql-source-0.8.jar $FLUME_HOME/plugins.d/sql-source/lib Set the configuration properties for Hadoop in the /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop/ core-site.xml file. fs.defaultFS hdfs://10.0.2.15:8020 hadoop.tmp.dir file:///var/lib/hadoop-0.20/cache Create the directory specified as the Hadoop tmp directory. mkdir -p /var/lib/hadoop-0.20/cache chmod -R 777 /var/lib/hadoop-0.20/cache Set the HDFS configuration properties in the hdfs-site.xml file. dfs.permissions.superusergroup hadoop dfs.namenode.name.dir file:///data/1/dfs/nn dfs.replication 1 dfs.permissions false Create the directory specified as the NameNode storage directory. mkdir -p /data/1/dfs/nn chmod -R 777 /data/1/dfs/nn Format and start NameNode. Also start DataNode. hdfs namenode -format hdfs namenode hdfs datanode To copy Flume into HDFS create /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib in HDFS and set its permissions to global. hdfs dfs -mkdir -p /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib hdfs dfs -chmod -R 777 /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Put the Flume lib jars to HDFS. hdfs dfs -put /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib/* hdfs://10.0.2.15:8020/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Creating a Database Table in MySQL In this section create a MySQL Database table from which data is to be streamed to Oracle NoSQL Database. Login to the MySQL CLI and select the test database. >mysql –u root –p >use test Create a table called wlslog . CREATE TABLE wlslog (id INTEGER PRIMARY KEY, time_stamp VARCHAR2(4000), category VARCHAR2(4000), type VARCHAR2(4000), servername VARCHAR2(4000), code VARCHAR2(4000), msg VARCHAR2(4000)); Add 9 rows of data to the wlslog table. INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(1,'Apr-8-2014-7:06:16-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to STANDBY'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(2,'Apr-8-2014-7:06:17-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to STARTING'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(3,'Apr-8-2014-7:06:18-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to ADMIN'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(4,'Apr-8-2014-7:06:19-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to RESUMING'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(5,'Apr-8-2014-7:06:20-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000361','Started WebLogic AdminServer'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(6,'Apr-8-2014-7:06:21-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to RUNNING'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(7,'Apr-8-2014-7:06:22-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000360','Server started in RUNNING mode'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(8,'Apr-8-2014-7:06:23-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000360','Server started in RUNNING mode'); INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(9,'Apr-8-2014-7:06:24-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000360','Server started in RUNNING mode'); Configuring Flume Next, configure Flume using the flume.conf file, which should be in the $FLUME_HOME/conf directory. Property Description Value agent.channels.ch1.type Sets the channel type memory agent.sources.sql-source.channels Sets the channel on the source ch1 agent.channels Sets the channel name ch1 agent.sinks Sets the sink name noSqlDbSink agent.sinks.noSqlDbSink.channel Sets the channel on the sink ch1 agent.sources Sets the source name sql-source agent.sources.sql-source.type Sets the source type org.apache.flume.source.SQLSource agent.sources.sql-source.connection.url Sets the connection url for the source jdbc:mysql://localhost:3306/test agent.sources.sql-source.user Sets the MySQL database user root agent.sources.sql-source.password Sets the MySQL database password mysql agent.sources.sql-source.table Sets the MySQL table name wlslog agent.sources.sql-source.database Sets the MySQL Database name test agent.sources.sql-source.columns.to.select Sets the columns to select to all columns. * agent.sources.sql-source.incremental.column.name Sets the column name whose value is to be incremented in selecting rows to transfer id agent.sources.sql-source.incremental.value Sets the initial incremental column value. A value of 0 transfers all rows. 0 agent.sources.sql-source.run.query.delay Sets the query delay in ms 10000 agent.sources.sql-source.status.file.path Sets the directory for the status file /var/lib/flume agent.sources.sql-source.status.file.name Sets the status file name sql-source.status agent.sinks.noSqlDbSink.type Sets the sink type class com.gvenzl.flumekvstore.sink.NoSQLDBSink agent.sinks.noSqlDbSink.kvHost Sets the sink host localhost agent.sinks.noSqlDbSink.kvPort Sets the sink port 5000 agent.sinks.noSqlDbSink.kvStoreName Sets the KV Store name kvstore agent.sinks.noSqlDbSink.durability Sets the durability level WRITE_NO_SYNC agent.sinks.noSqlDbSink.keyPolicy Sets the key policy generate agent.sinks.noSqlDbSink.keyType Sets the key type random agent.sinks.noSqlDbSink.keyPrefix Sets the key prefix k_ agent.sinks.noSqlDbSink.batchSize Sets the batch size 10 agent.channels.ch1.capacity Sets the channel capacity 100000 The flume.conf file is listed: agent.channels.ch1.type = memory agent.sources.sql-source.channels = ch1 agent.channels = ch1 agent.sinks = noSqlDbSink agent.sinks.noSqlDbSink.channel = ch1 agent.sources = sql-source agent.sources.sql-source.type = org.apache.flume.source.SQLSource # URL to connect to database (currently only mysql is supported) agent.sources.sql-source.connection.url = jdbc:mysql://localhost:3306/test # Database connection properties agent.sources.sql-source.user = root agent.sources.sql-source.password = mysql agent.sources.sql-source.table = wlslog agent.sources.sql-source.database = test agent.sources.sql-source.columns.to.select = * # Increment column properties agent.sources.sql-source.incremental.column.name = id # Increment value is from you want to start taking data from tables (0 will import entire table) agent.sources.sql-source.incremental.value = 0 # Query delay, each configured milisecond the query will be sent agent.sources.sql-source.run.query.delay=10000 # Status file is used to save last readed row agent.sources.sql-source.status.file.path = /var/lib/flume agent.sources.sql-source.status.file.name = sql-source.status agent.sinks.noSqlDbSink.type = com.gvenzl.flumekvstore.sink.NoSQLDBSink agent.sinks.noSqlDbSink.kvHost = localhost agent.sinks.noSqlDbSink.kvPort = 5000 agent.sinks.noSqlDbSink.kvStoreName = kvstore agent.sinks.noSqlDbSink.durability = WRITE_NO_SYNC agent.sinks.noSqlDbSink.keyPolicy = generate agent.sinks.noSqlDbSink.keyType = random agent.sinks.noSqlDbSink.keyPrefix = k_ agent.sinks.noSqlDbSink.batchSize = 10 agent.channels.ch1.capacity = 100000 Create the directory and file for the SQL source status. mkdir -p /var/lib/flume chmod -R 777 /var/lib/flume cd /var/lib/flume vi sql-source.status :wq We also need to create the Flume env file from the template. cp $FLUME_HOME/conf/flume-env.sh.template $FLUME_HOME/conf/flume-env.sh Running a Flume Agent Before running the Flume agent the following should have been configured/started. -Flume configuration file flume.conf -HDFS -Oracle NoSQL Database -MySQL Database Run the Flume agent with the following command. flume-ng agent --conf ./conf/ -f $FLUME_HOME/conf/flume.conf -n agent -Dflume.root.logger=INFO,console Flume agent gets started. The source, channel and sink get started and a connection with Oracle NoSQL Database gets established to stream MySQL table wlslog data with a SQL query that selects all rows. Subsequently the Flume agent continues to run with a SQL query with id>9 in the WHERE clause as uptil id 9 have already been transferred. A more detailed output from the Flume agent is listed: -Djava.library.path=:/usr/java/packages/lib/i386:/lib:/usr/lib org.apache.flume.node.Application -f /flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume.conf -n agent 15/01/19 19:42:15 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 15/01/19 19:42:15 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume.conf 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Added sinks: noSqlDbSink Agent: agent 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Processing:noSqlDbSink 15/01/19 19:42:15 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent] 15/01/19 19:42:15 INFO node.AbstractConfigurationProvider: Creating channels 15/01/19 19:42:15 INFO channel.DefaultChannelFactory: Creating instance of channel ch1 type memory 15/01/19 19:42:15 INFO node.AbstractConfigurationProvider: Created channel ch1 15/01/19 19:42:15 INFO source.DefaultSourceFactory: Creating instance of source sql-source, type org.apache.flume.source.SQLSource 15/01/19 19:42:15 INFO source.SQLSource: Reading and processing configuration values for source sql-source 15/01/19 19:42:15 INFO source.SQLSource: Establishing connection to database test for source sql-source 15/01/19 19:42:16 INFO source.SQLSource: Source sql-source Connected to test 15/01/19 19:42:16 INFO sink.DefaultSinkFactory: Creating instance of sink: noSqlDbSink, type: com.gvenzl.flumekvstore.sink.NoSQLDBSink 15/01/19 19:42:16 INFO sink.NoSQLDBSink: Configuration settings: 15/01/19 19:42:16 INFO sink.NoSQLDBSink: kvHost: localhost 15/01/19 19:42:16 INFO sink.NoSQLDBSink: kvPort: 5000 15/01/19 19:42:16 INFO sink.NoSQLDBSink: kvStoreName: kvstore 15/01/19 19:42:16 INFO sink.NoSQLDBSink: durability: WRITE_NO_SYNC 15/01/19 19:42:16 INFO sink.NoSQLDBSink: keyPolicy: generate 15/01/19 19:42:16 INFO sink.NoSQLDBSink: keyType: random 15/01/19 19:42:16 INFO sink.NoSQLDBSink: keyPrefix: k_ 15/01/19 19:42:16 INFO sink.NoSQLDBSink: batchSize: 10 15/01/19 19:42:16 INFO node.AbstractConfigurationProvider: Channel ch1 connected to [sql-source, noSqlDbSink] 15/01/19 19:42:16 INFO node.Application: Starting new configuration:{ sourceRunners:{sql-source=PollableSourceRunner: { source:org.apache.flume.source.SQLSource{name:sql-source,state:IDLE} counterGroup:{ name:null counters:{} } }} sinkRunners:{noSqlDbSink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4473c counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} } 15/01/19 19:42:16 INFO node.Application: Starting Channel ch1 15/01/19 19:42:17 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean. 15/01/19 19:42:17 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: ch1 started 15/01/19 19:42:17 INFO node.Application: Starting Sink noSqlDbSink 15/01/19 19:42:17 INFO node.Application: Starting Source sql-source 15/01/19 19:42:17 INFO source.SQLSourceUtils: /var/lib/flume/sql-source.status correctly formed 15/01/19 19:42:17 INFO source.SQLSource: Query: SELECT * FROM wlslog WHERE id>9 ORDER BY id; 15/01/19 19:42:17 INFO sink.NoSQLDBSink: Connection to KV store established 15/01/19 19:42:27 INFO source.SQLSourceUtils: /var/lib/flume/sql-source.status correctly formed 15/01/19 19:42:27 INFO source.SQLSource: Query: SELECT * FROM wlslog WHERE id>9 ORDER BY id; 15/01/19 19:42:37 INFO source.SQLSourceUtils: /var/lib/flume/sql-source.status correctly formed 15/01/19 19:42:37 INFO source.SQLSource: Query: SELECT * FROM wlslog WHERE id>9 ORDER BY id; 15/01/19 19:42:47 INFO source.SQLSourceUtils: /var/lib/flume/sql-source.status correctly formed 15/01/19 19:42:47 INFO source.SQLSource: Query: SELECT * FROM wlslog WHERE id>9 ORDER BY id; 15/01/19 19:42:57 INFO source.SQLSourceUtils: /var/lib/flume/sql-source.status correctly formed 15/01/19 19:42:57 INFO source.SQLSource: Query: SELECT * FROM wlslog WHERE id>9 ORDER BY id; To find if the MySQL Table data has been transferred to Oracle NoSQL Database start the Oracle NoSQL Database CLI and connect to the kvstore with the following commands. java -Xmx256m -Xms256m -jar /flume/kv-3.2.5/lib/kvstore.jar runadmin -port 5000 -host localhost connect store –host localhost –port 5000 –name kvstore Run the following command to select all key/value pairs in Oracle NoSQL Database store kvstore with the following command. get kv –all The 9 rows transferred from MySQL table get listed. Streaming Data, not just Transferring Bulk Transferring Data A lot of the bulk data transfer tools such as Sqoop transfer bulk data but terminate after having transferred the available data. Flume does not just transfer data but streams data, implying that after the available data has been transferred the Flume agent continues to run and if more data becomes available to transfer the data is transferred as and when the data becomes available. If the MySQL table wlslog was created after starting the Flume agent the MySQL table data would got streamed to Oracle NoSQL Database. For example, add another row of data to MySQL table wlslog with the following SQL statement. INSERT INTO wlslog(id,time_stamp,category,type,servername,code,msg) VALUES(10,'Apr-8-2014-7:06:25-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000360','Server started in RUNNING mode'); A new row gets added to MySQL table wlslog . As indicated by the Flume output the Flume agent streams the new row of data to Oracle NoSQL Database. Run the get kv –all query in the Oracle NoSQL Database CLI. 10 rows of data get listed instead of the 9 rows listed previously. The Flume agent updates the SQL query from id>9 to id>10 in the WHERE clause and continues to run. In this tutorial we streamed MySQL table data to Oracle NoSQL Database using Flume.

Viewing all articles
Browse latest Browse all 1814

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>