Apache Flume is a service based on streaming data flow for collecting, aggregating and moving large quantities of log data. A unit of data flow in Flume is called an event. Flume is made of three components: source, channel and sink. The source, channel and sink are collectively hosted by a Flume agent, which is a JVM process. Data flow originates in the source, which could have received the data from an external source, and is stored in the channel before being consumed by the sink. Different types of sources, channels and sinks are supported including Avro source, Thrift source, Exec source, JMS source, spooling directory source, sequence generator source, syslog source, HTTP source, scribe source and custom source. Different types of channels are supported including memory channel, JDBC channel, file channel and custom channel. The different types of sinks supported include HDFS sink, logger sink, Avro sink, Thrift sink, HBase sink, ElasticSearch sink, and custom sink. In this tutorial we shall stream Oracle Database Alert log to HDFS using Flume. Setting the Environment Finding the Log Directory Configuring Flume Running the Flume Agent Streaming a Complete Log File Exception when Processing Event Batch Setting the Environment Oracle Linux 6.5 installed on Oracle VirtualBox 4.3 is used. We need to download and install the following software. Oracle Database 11g Java 7 Flume 1.4 Hadoop 2.0.0 Create a directory /flume to install the software and set the directory’s permissions. mkdir /flume chmod -R 777 /flume cd /flume Download Java 7 .gz file and extract the file to the /flume directory. tar zxvf jdk-7u55-linux-i586.tar.gz Download and extract Hadoop 2.0.0 hadoop-2.0.0-cdh4.6.0.tar.gz file. wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.6.0.tar.gz tar -xvf hadoop-2.0.0-cdh4.6.0.tar.gz Create symlinks for bin and conf directories. ln -s /flume/hadoop-2.0.0-cdh4.6.0/bin /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/bin ln -s /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/conf Download and install Flume 1.4.0. wget http://archive-primary.cloudera.com/cdh4/cdh/4/flume-ng-1.4.0-cdh4.6.0.tar.gz tar -xvf flume-ng-1.4.0-cdh4.6.0.tar.gz Set the environment variables for Hadoop, Java, Flume, and Oracle in the bash shell file. vi ~/.bashrc export HADOOP_PREFIX=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop export FLUME_HOME=/flume/apache-flume-1.4.0-cdh4.6.0-bin export FLUME_CONF=/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf export JAVA_HOME=/flume/jdk1.7.0_55 export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1 export ORACLE_SID=ORCL export HADOOP_MAPRED_HOME=/flume/hadoop-2.0.0-cdh4.6.0 export HADOOP_HOME=/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2 export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/lib/*: export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME/bin:$ORACLE_HOME/bin:$ FLUME_HOME/bin export CLASSPATH=$HADOOP_CLASSPATH Set the configuration properties fs.defaultFS and hadoop.tmp.dir in the /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop/core-site.xml file. fs.defaultFS hdfs://10.0.2.15:8020 hadoop.tmp.dir file:///var/lib/hadoop-0.20/cache Remove any previously created Hadoop temporary directory and create the directory again and set its permissions. rm –rf /var/lib/hadoop-0.20/cache mkdir -p /var/lib/hadoop-0.20/cache chmod -R 777 /var/lib/hadoop-0.20/cache Set the configuration properties dfs.permissions.superusergroup , dfs.namenode.name.dir , dfs.replication , and dfs.permissions in hdfs-site.xml . dfs.permissions.superusergroup hadoop dfs.namenode.name.dir file:///data/1/dfs/nn dfs.replication 1 dfs.permissions false Remove any previously created NameNode storage directory and create the directory again and set its permissions. rm –rf /data/1/dfs/nn mkdir -p /data/1/dfs/nn chmod -R 777 /data/1/dfs/nn Format the NameNode and start NameNode and DataNode (HDFS). hadoop namenode -format hadoop namenode hadoop datanode We need to copy Flume to HDFS to be available in the runtime classpath. Create a directory in the HDFS with the same directory structure as Flume lib directory and set its permissions to global (777). hadoop dfs -mkdir /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib hadoop dfs -chmod -R 777 /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Put the Flume lib directory jars in the HDFS. hdfs dfs -put /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib/* hdfs://10.0.2.15:8020/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib Copy the Flume env template file to flume-env.sh file. cp $FLUME_HOME/conf/flume-env.sh.template $FLUME_HOME/conf/flume-env.sh Finding the Log Directory We shall be streaming data from one of the Oracle Database trace files, the Oracle Alert log, to HDFS using Flume. To find the directory location of the trace files run a SELECT query on the v$diag_info view. select * from v$diag_info The trace files directory gets listed. The Oracle alert log is also generated in a separate directory, which also gets listed. Change directory (cd) to the trace files directory /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace and list the files in the trace directory with the ls –l command. The trace files including the alert_ORCL.log file get listed. We shall be streaming data from the alert_ORCL.log file with Flume. Configuring Flume The Flume agent, which hosts the sources, channels and sinks is configured in the flume.conf file. From the $FLUME_CONF directory run the ls –l command to list the configuration files, which includes the Flume configuration file template /flume-conf.properties.template . Copy the Flume configuration properties template file to the flume.conf file as a result creating a new file flume.conf . cp conf/flume-conf.properties.template conf/flume.conf We shall configure the following properties in flume.conf for a Flume agent called hbase-agent . We need to configure the following properties in flume.conf . Configuration Property Description Value agent1.channels The Flume agent channels. We shall be using only channel called ch1 (the channel name is arbitrary). agent1.channels = ch1 agent1.sources The Flume agent sources. We shall be using one source of type exec called exec1 (the source name is arbitrary). agent1.sources = exec1 agent1.sinks The Flume agent sinks. We shall be using one sink of type hdfs called HDFS (the sink name is arbitrary). agent1.sinks = HDFS agent1.channels.ch1.type The channel type is hdfs. agent1.sinks.HDFS.type = hdfs agent1.sources.exec1.channels Define the flow by binding the source to the channel. agent1.sources.exec1.channels = ch1 agent1.sources.exec1.type Specify the source type as exec. agent1.sources.exec1.type = exec agent1.sources.exec1.command Runs the specified Unix command and produce data on stdout. Commonly used commands are the HDFS shell commands cat and tail for copying a complete log file or the last KB of a log file to stdout. We shall be demonstrating both of these commands. agent1.sources.exec1.command = tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log or agent1.sources.exec1.command = cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log agent1.sinks.HDFS.channel Define the flow by binding the sink to the channel. agent1.sinks.HDFS.channel = ch1 agent1.sinks.HDFS.type Specify the sink type as hdfs. agent1.sinks.HDFS.type = hdfs agent1.sinks.HDFS.hdfs.path Specify the sink path in the HDFS. HDFS has two connotations in the example Flume agent, HDFS is the name of the sink (the sink name may be set a different value) and also the sink type is hdfs or HDFS. hdfs://10.0.2.15:8020/flume agent1.sinks.HDFS.hdfs.file.Type The hdfs file type. agent1.sinks.HDFS.hdfs.file.Type = DataStream The flume.conf file is listed: agent1.channels.ch1.type = memory agent1.sources.exec1.channels = ch1 agent1.sources.exec1.type = exec agent1.sources.exec1.command = tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log agent1.sinks.HDFS.channel = ch1 agent1.sinks.HDFS.type = hdfs agent1.sinks.HDFS.hdfs.path = hdfs://10.0.2.15:8020/flume agent1.sinks.HDFS.hdfs.file.Type = DataStream agent1.channels = ch1 agent1.sources = exec1 agent1.sinks = HDFS Running the Flume Agent In this section we shall run the Flume agent to stream the last KB in the alert_ORCL.log file to HDFS using the tail command. Run the Flume agent using the flume-ng shell script in the bin directory. Specify the agent name using the –n option, the configuration directory using the –conf option and the configuration file using the –f option. Specify the Flume logger Dflume.root.logger as INFO,console to log at INFO level to the console. Run the following command to run the Flume agent. >cd /flume/apache-flume-1.4.0-cdh4.6.0-bin >bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1 -Dflume.root.logger=INFO,console The Flume agent gets started and streams the Oracle alert log file to HDFS. The Flume agent runs the following procedure. 1. Start the configuration provider. 2. Add sink ( HDFS ) to agent agent1 3. Create instance of channel ch1 of type memory . 4. Create instance of source exec1 of type exec . 5. Create instance of sink HDFS of type hdfs . 6. Connect channel ch1 to source and sink [exec1, HDFS]. 7. Start channel ch1 . 8. Start sink HDFS . 9. Start source exec1 . 10. Create the FlumeData file. A more detailed output from the Flume agent is as follows. [root@localhost apache-flume-1.4.0-cdh4.6.0-bin]# bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1 -Dflume.root.logger=INFO,console Info: Sourcing environment configuration script /flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume-env.sh Info: Including Hadoop libraries found via (/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/bin/hadoop) for HDFS access Info: Excluding /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/common/lib/slf4j-api-1.6.1.jar from classpath Info: Excluding /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar from classpath + exec /flume/jdk1.7.0_55/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf:/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib:/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce/lib-examples:/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/contrib/capacity-scheduler/*.jar' -Djava.library.path=:/flume/hadoop-2.0.0-cdh4.6.0/lib/native org.apache.flume.node.Application -f conf/flume.conf -n agent1 2014-11-17 11:56:23,219 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting 2014-11-17 11:56:23,242 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:conf/flume.conf 2014-11-17 11:56:23,272 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 11:56:23,278 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 11:56:23,281 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 11:56:23,282 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 11:56:23,283 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: HDFS Agent: agent1 2014-11-17 11:56:23,396 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [agent1] 2014-11-17 11:56:23,397 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:150)] Creating channels 2014-11-17 11:56:23,449 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:40)] Creating instance of channel ch1 type memory 2014-11-17 11:56:23,538 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:205)] Created channel ch1 2014-11-17 11:56:23,540 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:39)] Creating instance of source exec1, type exec 2014-11-17 11:56:23,581 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] Creating instance of sink: HDFS, type: hdfs 2014-11-17 11:56:24,183 (conf-file-poller-0) [WARN - org.apache.hadoop.util.NativeCodeLoader. (NativeCodeLoader.java:62)] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-11-17 11:56:24,518 (conf-file-poller-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:523)] Hadoop Security enabled: false 2014-11-17 11:56:24,533 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:119)] Channel ch1 connected to [exec1, HDFS] 2014-11-17 11:56:24,591 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{exec1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:exec1,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@14ced4e counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} } 2014-11-17 11:56:24,618 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel ch1 2014-11-17 11:56:24,817 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean. 2014-11-17 11:56:24,819 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: ch1 started 2014-11-17 11:56:24,820 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink HDFS 2014-11-17 11:56:24,822 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source exec1 2014-11-17 11:56:24,823 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.ExecSource.start(ExecSource.java:163)] Exec source starting with command:tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace/alert_ORCL.log 2014-11-17 11:56:24,837 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean. 2014-11-17 11:56:24,838 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: HDFS started 2014-11-17 11:56:24,864 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SOURCE, name: exec1: Successfully registered new MBean. 2014-11-17 11:56:24,868 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SOURCE, name: exec1 started 2014-11-17 11:56:28,873 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSSequenceFile.configure(HDFSSequenceFile.java:63)] writeFormat = Writable, UseRawLocalFileSystem = false 2014-11-17 11:56:28,982 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416243388870.tmp 2014-11-17 11:57:01,321 (hdfs-HDFS-call-runner-3) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416243388870.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416243388870 2014-11-17 11:57:01,339 (hdfs-HDFS-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:377)] Writer callback called. Run the following command to list the files in the /flume directory, which is the directory of the Flume sink. hadoop fs -ls hdfs://10.0.2.15:8020/flume The FlumeData file is one of the files listed. Run the following command to find the disk usage of Flume generated data. hadoop dfs -du /flume The FlumeData file disk usage gets listed. Run the following command to output the FlumeData file to the stdout . hadoop dfs -cat hdfs://10.0.2.15:8020/flume/FlumeData.1416243388870 The FlumeData file gets output to stdout . Run the following command to copy the FlumeData file to local filesystem and subsequently open the FlumeData file. hadoop dfs -copyToLocal hdfs://10.0.2.15:8020/flume/FlumeData.1416243388870 /flume The FlumeData file gets displayed. Streaming a Complete Log File In this section we shall stream the complete alert log file alert_ORCL.log using the following configuration property in flume.conf . agent1.sources.exec1.command = cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log The complete alert log file gets generated and multiple FlumeData files get generated. [root@localhost apache-flume-1.4.0-cdh4.6.0-bin]# bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1 -Dflume.root.logger=INFO,console Info: Sourcing environment configuration script /flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume-env.sh Info: Including Hadoop libraries found via (/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/bin/hadoop) for HDFS access Info: Excluding /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/common/lib/slf4j-api-1.6.1.jar from classpath Info: Excluding /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar from classpath + exec /flume/jdk1.7.0_55/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf:/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib:/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/contrib/capacity-scheduler/*.jar' -Djava.library.path=:/flume/hadoop-2.0.0-cdh4.6.0/lib/native org.apache.flume.node.Application -f conf/flume.conf -n agent1 2014-11-17 12:17:03,058 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting 2014-11-17 12:17:03,073 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:conf/flume.conf 2014-11-17 12:17:03,110 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 12:17:03,122 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 12:17:03,124 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 12:17:03,125 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS 2014-11-17 12:17:03,125 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: HDFS Agent: agent1 2014-11-17 12:17:03,292 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [agent1] 2014-11-17 12:17:03,296 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:150)] Creating channels 2014-11-17 12:17:03,347 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:40)] Creating instance of channel ch1 type memory 2014-11-17 12:17:03,385 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:205)] Created channel ch1 2014-11-17 12:17:03,388 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:39)] Creating instance of source exec1, type exec 2014-11-17 12:17:03,429 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] Creating instance of sink: HDFS, type: hdfs 2014-11-17 12:17:04,060 (conf-file-poller-0) [WARN - org.apache.hadoop.util.NativeCodeLoader. (NativeCodeLoader.java:62)] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-11-17 12:17:04,387 (conf-file-poller-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:523)] Hadoop Security enabled: false 2014-11-17 12:17:04,406 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:119)] Channel ch1 connected to [exec1, HDFS] 2014-11-17 12:17:04,467 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{exec1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:exec1,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@238a4d counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} } 2014-11-17 12:17:04,492 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel ch1 2014-11-17 12:17:04,809 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean. 2014-11-17 12:17:04,811 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: ch1 started 2014-11-17 12:17:04,813 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink HDFS 2014-11-17 12:17:04,814 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source exec1 2014-11-17 12:17:04,816 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.ExecSource.start(ExecSource.java:163)] Exec source starting with command:cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace/alert_ORCL.log 2014-11-17 12:17:04,832 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean. 2014-11-17 12:17:04,833 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: HDFS started 2014-11-17 12:17:04,849 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SOURCE, name: exec1: Successfully registered new MBean. 2014-11-17 12:17:04,851 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SOURCE, name: exec1 started 2014-11-17 12:17:04,969 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSSequenceFile.configure(HDFSSequenceFile.java:63)] writeFormat = Writable, UseRawLocalFileSystem = false 2014-11-17 12:17:05,087 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624953.tmp 2014-11-17 12:17:07,536 (hdfs-HDFS-call-runner-2) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624953.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624953 2014-11-17 12:17:07,630 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624954.tmp 2014-11-17 12:17:07,814 (hdfs-HDFS-call-runner-6) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624954.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624954 2014-11-17 12:17:07,920 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624955.tmp 2014-11-17 12:17:08,574 (hdfs-HDFS-call-runner-0) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624955.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624955 2014-11-17 12:17:08,709 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624956.tmp 2014-11-17 12:17:09,335 (hdfs-HDFS-call-runner-4) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624956.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624956 2014-11-17 12:17:09,513 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624957.tmp 2014-11-17 12:17:09,851 (hdfs-HDFS-call-runner-8) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624957.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624957 2014-11-17 12:17:09,992 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624958.tmp 2014-11-17 12:17:10,268 (hdfs-HDFS-call-runner-2) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624958.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624958 2014-11-17 12:17:10,377 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624959.tmp 2014-11-17 12:17:11,091 (hdfs-HDFS-call-runner-6) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624959.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624959 2014-11-17 12:17:11,175 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624960.tmp 2014-11-17 12:17:11,476 (hdfs-HDFS-call-runner-0) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624960.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624960 2014-11-17 12:17:11,580 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624961.tmp 2014-11-17 12:17:11,936 (hdfs-HDFS-call-runner-4) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624961.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624961 2014-11-17 12:17:12,021 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624962.tmp 2014-11-17 12:17:12,254 (hdfs-HDFS-call-runner-8) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624962.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624962 2014-11-17 12:17:12,343 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://10.0.2.15:8020/flume/FlumeData.1416244624963.tmp 2014-11-17 12:17:42,598 (hdfs-HDFS-call-runner-3) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:540)] Renaming hdfs://10.0.2.15:8020/flume/FlumeData.1416244624963.tmp to hdfs://10.0.2.15:8020/flume/FlumeData.1416244624963 2014-11-17 12:17:42,605 (hdfs-HDFS-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:377)] Writer callback called. List the FlumeData files with the hadoop fs -ls hdfs://10.0.2.15:8020/flume command as before. Output one of the FlumeData files to the stdout . If all the FlumeData files are required to be deleted run the following command. hadoop fs -rm hdfs://10.0.2.15:8020/flume/FlumeData.* All FlumeData files get deleted. Exception when Processing Event Batch When processing a large file such as the alert_ORCL with the cat command the Flume agent might fail in putting the event batch on the channel and generate the following exception. 2014-11-17 12:17:07,943 (pool-3-thread-1) [ERROR - org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:347)] Failed while running command: cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace/alert_ORCL.log org.apache.flume.ChannelException: Unable to put batch on required channel: Caused by: org.apache.flume.ChannelException: Space for commit to queue couldn't be acquired Sinks are likely not keeping up with sources, or the buffer size is too tight 2014-11-17 12:17:07,998 (timedFlushExecService17-0) [ERROR - org.apache.flume.source.ExecSource$ExecRunnable$1.run(ExecSource.java:322)] Exception occured when processing event batch org.apache.flume.ChannelException: Unable to put batch on required channel: org.apache.flume.channel.MemoryChannel{name: ch1} [INFO - org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:370)] Command [cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace/alert_ORCL.log] exited with 141 The Flume agent still could run to completion after an interruption. The exception is generated because the default queue size of 100 is not enough. Increase the default queue size with the following configuration property in flume.conf . agent1.channels.ch1.capacity = 100000 In this tutorial we streamed Oracle Database log file data to HDFS using Flume.
↧