Written by Deepak Vohra A typical messaging system such as Apache Kafka consists of a message producer and a message consumer. Apache Kafka was introduced in an earlier tutorial . LinkedIn invented Apache Kafka and uses Kafka to move data around between systems. With its widespread use LinkedIn needs to move around large quantities of data quickly and reliably. Kafka provides resiliency and reliability while performing high throughput. LinkedIn is sent over 800 billion message a day, which is 175 terabytes of data. And, more than 650 terabytes of data are consumed each day. While LinkedIn receives millions of messages per second another relatively small scale company may not require as much throughput. The number of Kafka brokers and clusters may be scaled according to the data volume requirements. One common use case of Apache Kafka is to develop a Stream Data Platform. A Stream Data Platform provides two main uses: Data Integration and Stream Processing. Data Integration involves collecting streams of events and sending and storing them in a data store such as a relation database or HDFS. Stream Processing is the continuous, real-time processing of data streams. A Stream Data Platform could be used for several different purposes. Consider the use case that a user is producing messages and the messages are streamed to Oracle Database. While Apache Kafka is used to provide a publish/subscribe messaging system for such a use case Apache Flume could be used to stream the messages to an Oracle sink. The following sequence should be used to stream Kafka messages to Oracle Database. Start Oracle Database. Create an Oracle Database table to receive Kafka messages. Start Kafka ZooKeeper Start Kafka Server Create a Kakfa Topic to send messages to from a Kafka Producer Create another Kafka Topic for a Kafka channel to be used by Apache Flume Start a Kafka Producer Configure a Apache Flume agent with source of type Kafka, channel of type Kafka and sink of type JDBC (Oracle Database) Start Apache Flume Agent Send Messages from Kafka Producer Kafka Messages get streamed to Oracle Database table. The sequence of streaming messages from Kafka producer to Oracle Database is shown in following illustration. This tutorial has the following sections. Setting the Environment Creating an Oracle Database Table Starting Kafka Configuring an Apache Flume Agent Starting the Flume Agent Producing messages at Kafka Producer Querying Oracle Database Table Setting the Environment The following software is required for this tutorial. -Oracle Database -Apache Flume 1.6 -Apache Kafka -Stratio JDBC Sink -Oracle JDBC Driver Jar -Jooq -Apache Maven -Java 7 Create a directory to install the software (except Oracle Database) and set its permissions to global (777). mkdir /flume chmod -R 777 /flume cd /flume Download and extract the Apache Kafka tar file. wget http://apache.mirror.iweb.ca/kafka/0.8.2.1/kafka_2.10-0.8.2.1.tgz tar -xvf kafka_2.10-0.8.2.1.tgz Download and extract the Apache Flume tar file. Apache Flume version must be 1.6 for Kafka support. wget http://archive.apache.org/dist/flume/stable/apache-flume-1.6.0-bin.tar.gz tar -xvf apache-flume-1.6.0-bin.tar.gz Copy the Kakfa jars to Flume classpath. cp /flume/kafka_2.10-0.8.2.1/libs/* /flume/apache-flume-1.6.0-bin/lib Set the environment variables for Oracle Database, Flume, Kafka, Maven, and Java. vi ~/.bashrc export MAVEN_HOME=/flume/apache-maven-3.3.3-bin export FLUME_HOME=/flume/apache-flume-1.6.0-bin export KAFKA_HOME=/flume/kafka_2.10-0.8.2.1 export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1 export ORACLE_SID=ORCL export FLUME_CONF=$FLUME_HOME/conf export JAVA_HOME=/flume/jdk1.7.0_55 export PATH=/usr/lib/qt-3.3/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/bin:$FLUME_HOME/bin:$JAVA_HOME/bin:$MAVEN_HOME/bin:$ORACLE_HOME/bin:$KAFKA_HOME/bin export CLASSPATH=$FLUME_HOME/lib/* Download, compile and package the Stratio JDBC sink and copy the jar file generated to Flume lib directory. cp stratio-jdbc-sink-0.5.0-SNAPSHOT.jar $FLUME_HOME/lib Copy the Oracle JDBC driver jar to Flume lib directory. cp ojdbc6.jar $FLUME_HOME/lib Copy the Jooq jar to Flume lib directory. cp jooq-3.6.2 $FLUME_HOME/lib Creating an Oracle Database Table Start SQL*Plus and create Oracle Database table to store the Kafka messages streamed to it. Run the following SQL script to create a table called kafkamsg . CREATE TABLE kafkamsg(msg VARCHAR(4000)); Oracle Database table gets created. Starting Kafka Apache Kafka comprises of the following main components. -ZooKeeper server -Kafka server -Kafka Topic/s -Kafka Producer -Kafka Consumer Start the Kafka ZooKeeper. cd /flume/kafka_2.10-0.8.2.1 zookeeper-server-start.sh config/zookeeper.properties ZooKeeper server gets started. Start Kafka server. cd /flume/kafka_2.10-0.8.2.1 kafka-server-start.sh config/server.properties Kafka server gets started. We need to create two Kafka topics. Topic kafka-orcldb to produce messages to be streamed to Oracle Database Topic kafkachannel for Flume channel of type Kafka Run the following commands to create the two Kafka topics. kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic kafka-orcldb kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic kafkachannel The two Kafka topics get created. We only need to start the Kafka producer and not the Kafka consumer as we shall be streaming the messages using Apache Flume and not consuming the messages at a consumer. Run Producer with the following command to producer messages at topic kafka-orcldb . kafka-console-producer.sh --broker-list localhost:9092 --topic kafka-orcldb Kafka producer gets started. Configuring an Apache Flume Agent Apache Flume makes use of a configuration file to get configure the Flume source, channel and sink. The Flume source, channel and sink should be of following types. -Flume Source of type Kafka -Flume channel of type Kafka -Flume sink of type JDBC The Flume configuration properties are discussed in following table. Configuration Property Description Value agent.sources Sets the Flume source kafkaSrc agent.channels Sets the Flume channel channel1 agent.sinks Sets the Flume sink jdbcSink agent.channels.channel1.type Sets the channel type org.apache.flume.channel.kafka. KafkaChannel agent.channels.channel1.brokerList Sets the channel broker list localhost:9092 agent.channels.channel1.topic Sets the Kafka channel topic kafkachannel agent.channels.channel1.zookeeperConnect Sets the Kafka channel ZooKeeper host:port localhost:2181 agent.channels.channel1.capacity Sets the channel capacity 10000 agent.channels.channel1.transactionCapacity Sets the channel transaction capacity 1000 agent.sources.kafkaSrc.type Sets the source type org.apache.flume.source.kafka. KafkaSource agent.sources.kafkaSrc.channels Sets the channel on the source channel1 agent.sources.kafkaSrc.zookeeperConnect Sets the source ZooKeeper host:port localhost:2181 agent.sources.kafkaSrc.topic Sets the Kafka source topic kafka-orcldb agent.sinks.jdbcSink.type Sets the sink type com.stratio.ingestion.sink.jdbc.JDBCSink agent.sinks.jdbcSink.connectionString Sets the connection URI for Oracle Database jdbc:oracle:thin:@127.0.0.1:1521:ORCL agent.sinks.jdbcSink.username Sets the Oracle Database username OE agent.sinks.jdbcSink.password Sets the Oracle Database password OE agent.sinks.jdbcSink.batchSize Sets the batch size 10 agent.sinks.jdbcSink.channel Sets the channel on the sink channel1 agent.sinks.jdbcSink.sqlDialect Sets the SQL dialect. A Oracle specific dialect is not provided but the DERBY dialect could be used DERBY agent.sinks.jdbcSink.driver Sets the Oracle Database JDBC driver class oracle.jdbc.OracleDriver agent.sinks.jdbcSink.sql Sets the custom SQL to add data to Oracle Database INSERT INTO kafkamsg(msg) VALUES(${body:varchar}) The flume.conf is listed: agent.sources=kafkaSrc agent.channels=channel1 agent.sinks=jdbcSink agent.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannel agent.channels.channel1.brokerList=localhost:9092 agent.channels.channel1.topic=kafkachannel agent.channels.channel1.zookeeperConnect=localhost:2181 agent.channels.channel1.capacity=10000 agent.channels.channel1.transactionCapacity=1000 agent.sources.kafkaSrc.type = org.apache.flume.source.kafka.KafkaSource agent.sources.kafkaSrc.channels = channel1 agent.sources.kafkaSrc.zookeeperConnect = localhost:2181 agent.sources.kafkaSrc.topic = kafka-orcldb agent.sinks.jdbcSink.type = com.stratio.ingestion.sink.jdbc.JDBCSink agent.sinks.jdbcSink.connectionString = jdbc:oracle:thin:@127.0.0.1:1521:ORCL agent.sinks.jdbcSink.username=OE agent.sinks.jdbcSink.password=OE agent.sinks.jdbcSink.batchSize = 10 agent.sinks.jdbcSink.channel =channel1 agent.sinks.jdbcSink.sqlDialect=DERBY agent.sinks.jdbcSink.driver=oracle.jdbc.OracleDriver agent.sinks.jdbcSink.sql=INSERT INTO kafkamsg(msg) VALUES(${body:varchar}) Copy the Flume configuration file to Flume conf directory. cp flume.conf $FLUME_HOME/conf/flume.conf Starting the Flume Agent Next, run the Flume agent with the following command. flume-ng agent --classpath --conf $FLUME_CONF/ -f $FLUME_CONF/flume.conf -n agent -Dflume.root.logger=INFO,console Flume agent gets started. Producing messages at Kafka Producer We already started a Kafka producer. Send messages at the Kafka producer. Add a message and click on “Enter” button to send the message. An empty line sent is also considered a message. In the following illustration three messages have been produced with the 2 nd message being an empty message, which is also streamed. Querying Oracle Database Table The messages produced at the Kafka producer are streamed to Oracle Database by Flume agent. In SQL*Plus run a SQL query to list the messages. The three messages including the empty message get listed. The messages produced at Kafka producer are streamed as they are produced. Send more messages at Kafka producer. The messages get streamed to Oracle Database and get listed with an SQL query. In this tutorial we used Apache Flume to stream Kafka messages to Oracle Database.
↧