Quantcast
Channel: Oracle
Viewing all articles
Browse latest Browse all 1814

Wiki Page: Using MySQL Database as Apache Tajo Catalog Store

$
0
0
Written by Deepak Vohra Apache Tajo is a distributed data warehouse system for Apache Hadoop, designed to store large data sets in HDFS and other data sources, and a SQL query processing engine with support for SQL standards. Tajo supports various file formats as data sources including CSV, JSON, RCFile, SequenceFile and Parquet. A Tajo shell is provided to make SQL Queries. By default Apache Tajo uses Apache Derby to store persistent data in a Catalog Store. Tajo supports some other databases for use as the Catalog Store. Amongst the supported databases are MySQL, PostgreSQL, Oracle, and MariaDB. In this tutorial we shall configure Apache Tajo to use MySQL Database for the Catalog Store. Setting the Environment The following software is required for this tutorial. -Apache Tajo -Apache Hadoop (2.3.0 to 2.6.0) -Java 6 or 7 -MySQL Database -MySQL Connector/J JAR Create a directory (/tajo) to install the software and set its permissions to global. mkdir /tajo chmod -R 777 /tajo cd /tajo Download and install Java 7. tar -xvf jdk-7u55-linux-i586.gz Download the MySQL Connector/J JDBC Driver Jar file mysql-connector-java-5.1.31-bin.jar (or later version) from http://dev.mysql.com/downloads/connector/j/ . Download and extract the Hadoop 2.3.0 tar file. >wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.1.tar.gz >tar -xvf hadoop-2.3.0-cdh5.0.1.tar.gz Download and extract the Tajo 0.10.1 tar file. wget http://apache.mirror.gtcomm.net/tajo/tajo-0.10.1/tajo-0.10.1.tar.gz tar -xvf tajo-0.10.1.tar.gz Start MySQL Database server. bin/mysqld_safe --user=mysql & MySQL Database gets started. Creating a Tajo User and Database in MySQL Database We need to create a MySQL user and database for Apache Tajo. Start the MySQL shell as root. bin/mysql -u root Create a user called ‘tajo’ and set its password. mysql> create user 'tajo'@'localhost' identified by 'tajo'; Create a database called ‘tajo’. mysql> create database tajo; Grant all permissions on ‘tajo’ database to the ‘tajo’ user. mysql> grant all on tajo.* to 'tajo'@'localhost'; The output from the preceding commands is shown in MySQL Command Line shell. Installing Apache Hadoop Apache Hadoop 2.3.0 or higher (uptil 2.6) is required to run Apache Tajo. We downloaded the Hadoop 2.3.0 tar file earlier. In this section we shall configure a single node Hadoop cluster. Create symlinks for the Hadoop bin directory and the conf directory. ln -s /tajo/hadoop-2.3.0-cdh5.0.1/bin-mapreduce1 /tajo/hadoop-2.3.0-cdh5.0.1/share/hadoop/mapreduce1/bin ln -s /tajo/hadoop-2.3.0-cdh5.0.1/etc/hadoop /tajo/hadoop-2.3.0-cdh5.0.1/share/hadoop/mapreduce1/conf Next, set the Hadoop core configuration properties in the core-site.xml configuration file. cd /tajo/hadoop-2.3.0-cdh5.0.1/etc/hadoop vi core-site.xml Set the fs.defaultFS and hadoop.tmp.dir properties as follows. The fs.defaultFS specifies the NameNode URI and the hadoop.tmp.dir specifies the Hadoop temporary directory. fs.defaultFS hdfs://10.0.2.15:8020/ hadoop.tmp.dir file:///var/lib/hadoop-0.20/cache Create the Hadoop temporary directory and set its permissions to global (777). mkdir -p /var/lib/hadoop-0.20/cache chmod -R 77 /var/lib/hadoop-0.20/cache Configure the HDFS properties in the hdfs-site.xml. Set the dfs.permissions.superusergroup, dfs.namenode.name.dir, dfs.replication and dfs.permissions properties. The dfs.namenode.name.dir specifies the directory/directories in which the NameNode stores the fsimage. The dfs.permissions.superusergroup is set to hadoop as the super user group. Replication is set to 1 as we are using a sngle node Hadoop cluster. Set dfs.permissions to false to turn permissions checking off. vi hdfs-site.xml dfs.permissions.superusergroup hadoop dfs.namenode.name.dir file:///data/1/dfs/nn dfs.replication 1 dfs.permissions false Create the NameNode storage directory and set its permissions to global (777). mkdir -p /data/1/dfs/nn chmod -R 777 /data/1/dfs/nn Setting the Environment Variables Set the environment variables for MySQL Database, Apache Tajo, Apache Hadoop, Java in the bash shell. The HADOOP_NAMENODE_USER and HADOOP_DATANODE_USER are also required to be set to run the HDFS. Create the hadoop user in the hadoop group if not already created. >useradd –g hadoop hadoop The Hadoop classpath should include the jars in the $HADOOP_PREFIX/share/hadoop/common/lib and $HADOOP_PREFIX/share/hadoop/yarn directories. The bash shell properties are as follows. vi ~/.bashrc export MYSQL_HOME=/mysql/mysql-5.6.19-linux-glibc2.5-i686 export TAJO_HOME=/tajo/tajo-0.10.1 export TAJO_CONF=$TAJO_HOME/conf export HADOOP_PREFIX=/tajo/hadoop-2.3.0-cdh5.0.1 export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop export JAVA_HOME=/tajo/jdk1.7.0_55 export HADOOP_MAPRED_HOME=/tajo/hadoop-2.3.0-cdh5.0.1/share/hadoop/mapreduce1 export HADOOP_HOME=/tajo/hadoop-2.3.0-cdh5.0.1/share/hadoop/mapreduce1 export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:$HADOOP_PREFIX/share/hadoop/common/lib/*:$TAJO_HOME/lib/*:$TAJO_CONF:$HADOOP_PREFIX/share/hadoop/yarn/* export PATH=/usr/lib/qt-3.3/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/bin:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME:$MYSQL_HOME/bin export CLASSPATH=$HADOOP_CLASSPATH export HADOOP_NAMENODE_USER=hadoop export HADOOP_DATANODE_USER=hadoop Configuring Apache Tajo The Apache Tajo shell scripts to start/stop Tajo and run Tajo shell are in the $ TAJO_HOME/bin directory and the configuration files are in the $TAJO_CONF directory. We won’t need all the bin scripts. We do no need to modify some of the scripts and configuration files in the $TAJO_CONF directory. In the tajo-env.sh shell script set the HADOOP_HOME, JAVA_HOME and TAJO_CLASSPATH variables. The TAJO_CLASSPATH should include the MySQL JDBC jar file. vi conf/tajo-env.sh export HADOOP_HOME=/tajo/hadoop-2.3.0-cdh5.0.1/share/hadoop/mapreduce1 export JAVA_HOME=/tajo/jdk1.7.0_55 export TAJO_CLASSPATH=/tajo/mysql-connector-java-5.1.31-bin.jar:$CLASSPATH Apache Tajo could be configured to run in two modes: local mode and distributed mode. The default is the local mode. To run in a distributed mode, which makes use of HDFS, we need to configure the tajo-site.xml file in the $TAJO_CONF directory. Create the tajo-site.xml file from the template. cp $TAJO_CONF/ tajo-site.xml.template $TAJO_CONF/tajo-site.xml Two sets of properties are provided for Tajo; TajoMaster settings and Worker settings. Uncomment the sections for the TajoMaster and Worker settings. Set the Tajo root directory in the tajo.rootdir to hdfs://10.0.2.15:8020/tajo. For the other TajoMaster settings replace hostname with 127.0.0.1. Keep the Worker settings as the default. vi $TAJO_CONF/conf/ tajo-site.xml tajo.rootdir hdfs://10.0.2.15:8020/tajo Base directory including system directories. tajo.master.umbilical-rpc.address 127.0.0.1:26001 TajoMaster binding address between master and workers. tajo.master.client-rpc.address 127.0.0.1:26002 TajoMaster binding address between master and clients. tajo.resource-tracker.rpc.address 127.0.0.1:26003 TajoMaster binding address between master and workers. tajo.catalog.client-rpc.address 127.0.0.1:26005 CatalogServer binding address between catalog server and workers. tajo.worker.resource.cpu-cores 1 Number of CPU cores tajo.worker.resource.memory-mb 1024 Available memory size (MB) tajo.worker.resource.disks 1 Available disk capacity (usually number of disks) tajo.worker.tmpdir.locations /tmp/tajo-${user.name}/tmpdir A base for other temporary directories. Because we are using non-default database as the Catalog store we need to provide a catalog-site.xml configuration file. Create a catalog-site.xml from the template. Uncomment the configuration properties for the JDBC Common Settings and JDBC Store Section> MySQL Catalog Store Driver sections. Set the tajo.catalog.jdbc.connection.id and tajo.catalog.jdbc.connection.password to the MySQL user and password we created earlier in MySQL CLI shell; both were set to ‘tajo’. For the MySQL Catalog Store Driver the tajo.catalog.store.class is set to org.apache.tajo.catalog.store.MySQLStore by default. Set the tajo.catalog.jdbc.uri to the connection URL for MySQL Database. cp $TAJO_CONF/catalog-site.xml.template $TAJO_CONF/catalog-site.xml vi $TAJO_CONF/conf/catalog-site.xml tajo.catalog.jdbc.connection.id tajo tajo.catalog.jdbc.connection.password tajo tajo.catalog.store.class org.apache.tajo.catalog.store.MySQLStore tajo.catalog.jdbc.uri jdbc:mysql://localhost:3306/tajo?createDatabaseIfNotExist=true The catalog-site.xml with JDBC sections uncommented is as follows. Starting HDFS Next, we shall start the HDFS and put a data source file (a CSV format file) in the HDFS. We shall subsequently create a Tajo table located on the CSV file. Store the following listing as wlslog.csv. catalog1,Apr-8-2014-7:06:16-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000365,Server state changed to STANDBY catalog2,Apr-8-2014-7:06:17-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000365,Server state changed to STARTING catalog3,Apr-8-2014-7:06:18-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000365,Server state changed to ADMIN catalog4,Apr-8-2014-7:06:19-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000365,Server state changed to RESUMING catalog5,Apr-8-2014-7:06:20-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000361,Started WebLogic AdminServer catalog6,Apr-8-2014-7:06:21-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000365,Server state changed to RUNNING catalog7,Apr-8-2014-7:06:22-PM-PDT,Notice,WebLogicServer,AdminServer,BEA-000360,Server started in RUNNING mode Format the NameNode and start the HDFS (NameNode and DataNode). hadoop namenode -format hadoop namenode hadoop datanode Create a directory in the HDFS to put the wlslog.csv data file and set its permissions. hadoop dfs -mkdir hdfs://localhost:8020/wlslog hadoop dfs -chmod -R g+w hdfs://localhost:8020/wlslog Put the wlslog.csv file in the HDFS. hadoop dfs -put wlslog.csv hdfs://localhost:8020/wlslog Also create a HDFS directory & set its permissions to global (777) for the Tajo Master, the directory configured in tajo-site.xml in the tajo.rootdir property. hadoop dfs -mkdir hdfs://localhost:8020/tajo hadoop dfs -chmod -R g+w hdfs://localhost:8020/tajo Starting Tajo Start Tajo with the following command. $TAJO_HOME/bin/start-tajo.sh TajoMaster and Worker get started. Before the worker is started the root password is prompted for. Creating a Table in Apache Tajo Having configured an started Tajo, next we shall create a table in Tajo using the Tajo shell. Start the Tajo shell. $TAJO_HOME/bin/tsql The default database gets connected to. Create an external table called wlslog located on the wlslog.csv file in the HDFS. tajo> create external table wlslog (id text,time_stamp text,category text,type text,servername text,code text,msg text) using text with ('text.delimiter'=',') location 'hdfs://localhost:8020/wlslog'; The wlslog table gets created. Query the wlslog table. select * from wlslog; The data stored in the wlslog table gets listed. In this tutorial we used MySQL Database as the Catalog Store for Apache Tajo.

Viewing all articles
Browse latest Browse all 1814

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>