Quantcast
Channel: Oracle
Viewing all articles
Browse latest Browse all 1814

Wiki Page: Streaming Oracle Database Logs to HBase with Flume

$
0
0
In the previous tutorial we discussed streaming Oracle logs to HDFS using Flume. Flume supports various types of sources and sinks including the HBase database as a sink. In this tutorial we shall discuss streaming Oracle log file to HBase. This tutorial has the following sections. Setting the Environment Starting HDFS Starting HBase Configuring Flume Agent for HBase Running the Flume Agent Scanning HBase Table ChannelException Setting the Environment We have used the same environment as in the streaming to HDFS. Oracle Database 11g is installed on Oracle Linux 6.5 on VirtualBox 4.3. We need to download and install the following software. Oracle Database 11g HBase Java 7 Flume 1.4 Hadoop 2.0.0 First, create a directory to install the software and set its permissions. mkdir /flume chmod -R 777 /flume cd /flume Create the hadoop group and add the hbase user to the hadoop group. >groupadd hadoop >useradd –g hadoop hbase Download and install Java 7. >tar zxvf jdk-7u55-linux-i586.tar.gz Download and install CDH 4.6 Hadoop 2.0.0. >wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.6.0.tar.gz >tar -xvf hadoop-2.0.0-cdh4.6.0.tar.gz Create symlinks for Hadoop bin and conf files. >ln -s /flume/hadoop-2.0.0-cdh4.6.0/bin-mapreduce1 /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/bin >ln -s /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/conf Download and install CDH 4.6 Flume 1.4.9. wget http://archive-primary.cloudera.com/cdh4/cdh/4/flume-ng-1.4.0-cdh4.6.0.tar.gz tar -xvf flume-ng-1.4.0-cdh4.6.0.tar.gz Download and install CDH 4.6 HBase 0.94.15. wget http://archive.cloudera.com/cdh4/cdh/4/hbase-0.94.15-cdh4.6.0.tar.gz tar -xvf hbase-0.94.15-cdh4.6.0.tar.gz Set permissions of the Flume root directory to global. chmod 777 -R /flume/apache-flume-1.4.0-cdh4.6.0-bin Set the environment variables for Oracle Database, Java, HBase, Flume, and Hadoop in the bash shell file. vi ~/.bashrc export HADOOP_PREFIX=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop export FLUME_HOME=/flume/apache-flume-1.4.0-cdh4.6.0-bin export FLUME_CONF=/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf export HBASE_HOME=/flume/hbase-0.94.15-cdh4.6.0 export HBASE_CONF=/flume/hbase-0.94.15-cdh4.6.0/conf export JAVA_HOME=/flume/jdk1.7.0_55 export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1 export ORACLE_SID=ORCL export HADOOP_MAPRED_HOME=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_HOME=/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1 export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:$HBASE_CONF:$HBASE_HOME/lib/* export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME/bin:$ORACLE_HOME/bin:$FLUME_HOME/bin:$HBASE_HOME/bin export CLASSPATH=$HADOOP_CLASSPATH Starting HDFS In this section we shall configure and start HDFS. Cd to the Hadoop configuration directory. cd /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop Set the NameNode URI ( fs.defaultFS ) and the Hadoop temporary directory ( hadoop.tmp.dir ) configuration properties in the core-site.xml file. fs.defaultFS hdfs://10.0.2.15:8020 hadoop.tmp.dir file:///var/lib/hadoop-0.20/cache Remove any previously created temporary directory and create the directory again and set its permissions to global. rm -rf /var/lib/hadoop-0.20/cache mkdir -p /var/lib/hadoop-0.20/cache chmod -R 777 /var/lib/hadoop-0.20/cache Set the NameNode storage directory ( dfs.namenode.name.dir ), superusergroup ( dfs.permissions.superusergroup ), replication factor ( dfs.replication ), the upper bound on the number of files the DataNode is able to serve concurrently ( dfs.datanode.max.xcievers ), and permission checking ( dfs.permissions ) configuration properties in the hdfs-site.xml . dfs.permissions.superusergroup hadoop dfs.namenode.name.dir file:///data/1/dfs/nn dfs.replication 1 dfs.permissions false dfs.datanode.max.xcievers 4096 Remove any previously created NameNode storage directory and create a new directory and set its permissions to global. rm -rf /data/1/dfs/nn mkdir -p /data/1/dfs/nn chmod -R 777 /data/1/dfs/nn Format and start the NameNode. hadoop namenode -format hadoop namenode Start the DataNode. hadoop datanode We need to copy the Flume lib directory jars to the HDFS to be available to the runtime. Create a directory in HDFS with the same directory structure as the Flume lib directory and set its permissions to global. hadoop dfs -mkdir /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib hadoop dfs -chmod -R 777 /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Put the Flume lib directory jars to the HDFS. hdfs dfs -put /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib/* hdfs://10.0.2.15:8020/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Create the Flume configuration file flume.conf from the template. Also create the Flume env file flume-env.sh from the template. cp $FLUME_HOME/conf/ flume-conf.properties.template $FLUME_HOME/conf/flume.conf cp $FLUME_HOME/conf/flume-env.sh.template $FLUME_HOME/conf/flume-env.sh We shall set the configuration properties for Flume in a subsequent section, but first we shall install HBase. Starting HBase In this section we shall configure and start HBase. HBase configuration is discussed in detail in another tutorial ( http://www.toadworld.com/platforms/oracle/w/wiki/10976.loading-hbase-table-data-into-an-oracle-database-with-oracle-loader-for-hadoop.aspx ). Set the HBase configuration in the /flume/hbase-0.94.15-cdh4.6.0/conf/hbase-site.xml configuration file as follows. hbase.rootdir hdfs://10.0.2.15:8020/hbase hbase.zookeeper.property.dataDir /zookeeper hbase.zookeeper.property.clientPort 2182 hbase.zookeeper.quorum localhost hbase.regionserver.port 60020 hbase.master.port 60000 Create the Zookeeper data directory and set its permissions. mkdir -p /zookeeper chmod -R 700 /zookeeper As root user create the HBase root directory in HDFS /hbase and set its permissions to global (777). root>hdfs dfs -mkdir /hbase hdfs dfs -chmod -R 777 /hbase As root user increase the maximum number of file handles in the /etc/security/limits.conf file. Set the following ulimit for hdfs and hbase users. hdfs - nofile 32768 hbase - nofile 32768 Start the HBase nodes Zookeeper, Master and Regionserver. hbase-daemon.sh start zookeeper hbase-daemon.sh start master hbase-daemon.sh start regionserver The jps command should list the HDFS and HBase nodes as started. Start the HBase shell with the following command. hbase shell Create a table ( flume ) and a column family ( orcllog ) with the following command. create 'flume' , 'orcllog' The HBase table gets created. Configuring Flume Agent for HBase In this section we shall set the Flume agent configuration in the flume.conf file. We shall configure the following properties in flume.conf for a Flume agent called hbase-agent . Configuration Property Description Value hbase-agent.channels The Flume agent channels. We shall be using only channel called ch1 (the channel name is arbitrary). hbase-agent.channels=ch1 hbase-agent.sources The Flume agent sources. We shall be using one source of type exec called tail (the source name is arbitrary). hbase-agent.sources=tail hbase-agent.sinks The Flume agent sinks. We shall be using one sink of type HBaseSink called sink1 (the sink name is arbitrary). hbase-agent.sinks=sink1 hbase-agent.channels.ch1.type The channel type is memory. hbase-agent.channels.ch1.type=memory hbase-agent.sources.tail.channels Define the flow by binding the source to the channel. hbase-agent.sources.tail.channels=ch1 hbase-agent.sources.tail.type Specify the source type as exec. hbase-agent.sources.tail.type=exec hbase-agent.sources.tail.command Runs the specified Unix command and produce data on stdout. Commonly used commands are the HDFS shell commands cat and tail for copying a complete log file or the last KB of a log file to stdout. We shall be demonstrating both of these commands. hbase-agent.sources.tail.command = tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log or hbase-agent.sources.tail.command = cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log hbase-agent.sinks.sink1.channel Define the flow by binding the sink to the channel. hbase-agent.sinks.sink1.channel=ch1 hbase-agent.sinks.sink1.type Specify the sink type as HbaseSink or AsyncHbaseSink hbase-agent.sinks.sink1.type=org.apache.flume.sink.hbase. HbaseSink hbase-agent.sinks.sink1.table Specify the HBase table name. hbase-agent.sinks.sink1.table=flume hbase-agent.sinks.sink1.columnFamily Specify the HBase table column family hbase-agent.sinks.sink1.columnFamily =orcllog hbase-agent.sinks.sink1.column Specify the HBase table column family column. ?? hbase-agent.sinks.sink1.column=c1 hbase-agent.sinks.sink1.serializer Specify the HBase event serializer class. The serializer converts a Flume event into one or more puts and or increments. hbase-agent.sinks.sink1.serializer= org.apache.flume.sink.hbase. SimpleHbaseEventSerializer hbase-agent.sinks.sink1.serializer. payloadColumn A parameter to the serializer. Specifies the payload column, the column into which the payload data is stored. hbase-agent.sinks.sink1.serializer. payloadColumn =coll hbase-agent.sinks.sink1.serializer. keyType A parameter to the serializer. Specifies the key type. hbase-agent.sinks.sink1.serializer. keyType = timestamp hbase-agent.sinks.sink1.serializer. incrementColumn A parameter to the serializer. Specifies the column to be incremented. The SimpleHbaseEventSerializer may optionally be set to increment a column in HBase. hbase-agent.sinks.sink1.serializer. incrementColumn=coll hbase-agent.sinks.sink1.serializer. rowPrefix A parameter to the serializer. Specifies the row prefix to be used. hbase-agent.sinks.sink1. serializer.rowPrefix=1 hbase-agent.sinks.sink1.serializer.suffix A parameter to the serializer. One of the following values may be set: uuid random timestamp hbase-agent.sinks.sink1. serializer.suffix=timestamp The flume.conf file is listed: hbase-agent.sources=tail hbase-agent.sinks=sink1 hbase-agent.channels=ch1 hbase-agent.sources.tail.type=exec hbase-agent.sources.tail.command=tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log hbase-agent.sources.tail.channels=ch1 hbase-agent.sinks.sink1.type=org.apache.flume.sink.hbase.HBaseSink hbase-agent.sinks.sink1.channel=ch1 hbase-agent.sinks.sink1.table=flume hbase-agent.sinks.sink1.columnFamily=orcllog hbase-agent.sinks.sink1.column=c1 hbase-agent.sinks.sink1.serializer= org.apache.flume.sink.hbase.SimpleHbaseEventSerializer hbase-agent.sinks.sink1.serializer.payloadColumn=coll hbase-agent.sinks.sink1.serializer.keyType = timestamp hbase-agent.sinks.sink1.serializer.incrementColumn=coll hbase-agent.sinks.sink1.serializer.rowPrefix=1 hbase-agent.sinks.sink1.serializer.suffix=timestamp hbase-agent.channels.ch1.type=memory The alternative source exec command is as follows. hbase-agent.sources.tail.command=cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log Running the Flume Agent In this section we shall run the Flume agent to stream the last KB in the alert_ORCL.log file to HBase using the tail command. We shall also stream the complete alert log file alert_ORCL using the cat command. Run the Flume agent using the flume-ng shell script in which specify the agent name using the –n option, the configuration directory using the –conf option and the configuration file using the –f option. Specify the Flume logger Dflume.root.logger as INFO,console to log at INFO level to the console. Run the following command to run the Flume agent hbase-agent . flume-ng agent --conf $FLUME_HOME/conf/ -f $FLUME_HOME/conf/flume.conf -n hbase-agent -Dflume.root.logger=INFO,console HBase libraries get included for HBase access. The source and sink get started. The flume log output provides more detail of the Fume agent command. 05 Dec 2014 22:20:57,147 INFO [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61) - Configuration provider starting 05 Dec 2014 22:20:57,194 INFO [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133) - Reloading configuration file:/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume.conf 05 Dec 2014 22:20:57,214 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) - Processing:sink1 (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140) - Post-validation flume configuration contains configuration for agents: [hbase-agent] 05 Dec 2014 22:20:57,502 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150) - Creating channels 05 Dec 2014 22:20:57,529 INFO [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40) - Creating instance of channel ch1 type memory 05 Dec 2014 22:20:57,543 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205) - Created channel ch1 05 Dec 2014 22:20:57,545 INFO [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39) - Creating instance of source tail, type exec 05 Dec 2014 22:20:57,570 INFO [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40) - Creating instance of sink: sink1, type: org.apache.flume.sink.hbase.HBaseSink 05 Dec 2014 22:20:58,218 INFO [conf-file-poller-0] (org.apache.flume.sink.hbase.HBaseSink.configure:218) - The write to WAL option is set to: true 05 Dec 2014 22:20:58,223 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.getConfiguration:119) - Channel ch1 connected to [tail, sink1] 05 Dec 2014 22:20:58,238 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:138) - Starting new configuration:{ sourceRunners:{tail=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:tail,state:IDLE} }} sinkRunners:{sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@a21d88 counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} } 05 Dec 2014 22:20:58,240 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:145) - Starting Channel ch1 05 Dec 2014 22:20:58,372 INFO [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:119) - Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean. 05 Dec 2014 22:20:58,373 INFO [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:95) - Component type: CHANNEL, name: ch1 started 05 Dec 2014 22:20:58,373 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:173) - Starting Sink sink1 05 Dec 2014 22:20:58,375 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:184) - Starting Source tail 05 Dec 2014 22:20:58,376 INFO [lifecycleSupervisor-1-3] (org.apache.flume.source.ExecSource.start:163) - Exec source starting with command:tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace 05 Dec 2014 22:20:58,396 INFO [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:119) - Monitored counter group for type: SOURCE, name: tail: Successfully registered new MBean. 05 Dec 2014 22:20:58,397 INFO [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:95) - Component type: SOURCE, name: tail started Scanning HBase Table In this section we shall scan the HBase table after running the Flume agent each time; after running the tail –f command and after running the cat command. Run the following command in HBase shell to scan the HBase table flume . scan flume The Oracle log file data streamed into HBase gets listed. Run the scan flume command again after running the Flume agent with the cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log command. More rows get listed as the complete Oracle log file is streamed. ChannelException If the channel capacity gets exceeded while the Flume agent is streaming events the following exception may be generated. : java.lang.InterruptedException org.apache.flume.ChannelException: Unable to put batch on required channel: Caused by: org.apache.flume.ChannelException: Space for commit to queue couldn't be acquired Sinks are likely not keeping up with sources, or the buffer size is too tight A subsequent scan of the HBase table would result in fewer rows getting listed than would get streamed if the complete log file got streamed without an exception. To avoid the exception increase the default queue size with the following configuration property in flume.conf . hbase-agent.channels.ch1.capacity = 100000 In this tutorial we streamed Oracle Database logs to HBase using Flume.

Viewing all articles
Browse latest Browse all 1814


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>