In the preceding article on Oracle SQL Connector for HDFS (OSCH) we created an external table located on a delimited text file in HDFS. In this article we shall create an external table with an Apache Hive external table as the data source. Setting the Environment Oracle Database 11g is installed on Oracle Linux 6, which is installed on Oracle Virtual Box 4.3.10. Oracle SQL Connector for HDFS and CDH 4.6 are also installed on Oracle Linux 6. We also need to install Hive. Download the CDH 4.6 Hive 0.10.0 hive-0.10.0-cdh4.6.0.tar.gz and untar the file into the /osch directory. wget http://archive.cloudera.com/cdh4/cdh/4/hive-0.10.0-cdh4.6.0.tar.gz tar -xvf hive-0.10.0-cdh4.6.0.tar.gz Add the hive group and the hive user to the hive group. groupadd hive useradd –g hive hive Set the environment variables for Oracle Database 11g, Java 7, OSCH and Hive in the hadoop-env.sh . The HIVE_CONF_DIR must also be added to the HADOOP_ CLASSPATH . vi hadoop-env.sh export HADOOP_PREFIX=/osch/hadoop-2.0.0-cdh4.6.0 export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1 export ORACLE_SID=ORCL export OSCH_HOME=/osch/orahdfs-3.0.0 export HIVE_HOME=/osch/hive-0.10.0-cdh4.6.0 export HIVE_CONF_DIR=$HIVE_HOME/conf export HADOOP_MAPRED_HOME=/osch/hadoop-2.0.0-cdh4.6.0/bin export HADOOP_HOME=/osch/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2 export HADOOP_CLASSPATH=$HADOOP_HOME:$HADOOP_HOME/lib/*:$OSCH_HOME/jlib/*:$HIVE_HOME/lib/*:$HIVE_CONF_DIR export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME:$ORACLE_HOME/bin:$OSCH_HOME/bin:$HIVE_HOME/bin Set the Hive env variables in the hive-env.sh , which is in the HIVE_CONF_DIR . vi hive-env.sh export HADOOP_HOME=/osch/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2 export HIVE_HOME=/osch/hive-0.10.0-cdh4.6.0 export HIVE_CONF_DIR=$HIVE_HOME/conf export CLASSPATH=$HIVE_HOME/lib/*:$HIVE_CONF_DIR As in the preceding article also set the PATH env variable in the preprocessor script orahdfs-3.0.0//bin/hdfs_stream . Create the directory objects and set permissions as discussed in the preceding article. Start HDFS NameNode and DataNode. Add the catalog.txt file to HDFS. Creating a Hive Table The Hive database and table are created in the Hive warehouse directory, which is /user/hive/warehouse by default, and set with the hive.metastore.warehouse.dir property in the Hive configuration file hive-site.xml . Specify the hdfs path to the Hive warehouse directory in hive-site.xml . hive.metastore.warehouse.dir hdfs://10.0.2.15:8020/user/hive/warehouse Create the /user/hive/warehouse directory in HDFS. hadoop dfs –mkdir hdfs://10.0.2.15:8020/user/hive/warehouse hadoop dfs –chmod –R g+w hdfs://10.0.2.15:8020/user/hive/warehouse hadoop dfs -chown –R hive:hive hdfs://10.0.2.15:8020/user/hive/warehouse Start the Hive client and create the Hive database with the following command. hive>CREATE DATABASE catalog; Create a Hive table with location specified on the hdfs path for the catalog.txt file with the following command. hive>CREATE EXTERNAL TABLE catalog.catalog(CATALOGID INT,JOURNAL STRING, PUBLISHER STRING, EDITION STRING,TITLE STRING,AUTHOR STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 'hdfs://10.0.2.15:8020/catalog'; A Hive database and table get created. Creating the Configuration File for the ExternalTable tool The configuration properties for OSCH ExternalTable tool are specified in an XML configuration file or with –D on the command line. We shall use the same XML configuration file catalog_hdfs.xml , but some of the properties are different for Hive. For the Hive input the following configuration properties are required. Property Description Value used oracle.hadoop.exttab.tableName Schema qualified name of the external table in which the schema defaults to the user name if omitted OE.CATALOG_EXT oracle.hadoop.exttab.defaultDirectory The Oracle Database directory for the external table OSCH_EXTTAB_DIR created earlier oracle.hadoop.exttab.sourceType The source type for OSCH hive oracle.hadoop.exttab.hive.tableName Hive table name (must be un-prefixed) catalog oracle.hadoop.exttab.hive.databaseName Hive database name catalog oracle.hadoop.connection.url The connection URL for Oracle Database jdbc:oracle:thin: @localhost:1521:orcl oracle.hadoop.connection.user The Oracle Database user. OE The configuration file catalog_hdfs.xml with all the required properties and some optional properties is listed below; copy the file to the /osch directory. oracle.hadoop.exttab.tableName OE.CATALOG_EXT oracle.hadoop.exttab.locationFileCount 4 oracle.hadoop.exttab.defaultDirectory OSCH_EXTTAB_DIR oracle.hadoop.exttab.columnNames CATALOGID,JOURNAL,PUBLISHER,EDITION,TITLE,AUTHOR oracle.hadoop.exttab.colMap.CATALOGID.columnType NUMBER oracle.hadoop.exttab.colMap.JOURNAL.columnType VARCHAR2 oracle.hadoop.exttab.colMap.PUBLISHER.columnType VARCHAR2 oracle.hadoop.exttab.colMap.EDITION.columnType VARCHAR2 oracle.hadoop.exttab.colMap.TITLE.columnType VARCHAR2 oracle.hadoop.exttab.colMap.AUTHOR.columnType VARCHAR2 oracle.hadoop.exttab.sourceType hive oracle.hadoop.exttab.hive.tableName catalog oracle.hadoop.exttab.hive.databaseName catalog oracle.hadoop.connection.url jdbc:oracle:thin:@localhost:1521:orcl oracle.hadoop.connection.user OE Creating the External Table Next, run the ExternalTable tool to create an external table for the Hive table catalog.catalog . Run the –createTable command to create an external table and back its location file with the Hive table data. hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -Doracle.hadoop.exttab.printStackTrace=true -conf catalog_hdfs.xml –createTable The OSCH gets started and prompts for the Oracle Database schema password. Specify the OE schema password and select Enter. The –createTable command creates an external table in OE schema including a location file with the URI of the data file in HDFS. The output from the command is listed: [root@localhost osch]# hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -Doracle.hadoop.exttab.printStackTrace=true -conf catalog_hdfs.xml -createTable Oracle SQL Connector for HDFS Release 3.0.0 - Production Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. [Enter Database Password:] 14/05/13 19:12:01 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 14/05/13 19:12:01 INFO metastore.ObjectStore: ObjectStore, initialize called 14/05/13 19:12:02 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 14/05/13 19:12:08 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 14/05/13 19:12:09 INFO metastore.ObjectStore: Initialized ObjectStore 14/05/13 19:12:12 INFO metastore.HiveMetaStore: 0: get_table : db=catalog tbl=catalog SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/osch/hadoop-2.0.0-cdh4.6.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/osch/hive-0.10.0-cdh4.6.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 14/05/13 19:12:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/05/13 19:12:12 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=catalog tbl=catalog 14/05/13 19:12:12 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 14/05/13 19:12:12 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. The create table command succeeded. CREATE TABLE "OE"."CATALOG_EXT" ( "CATALOGID" NUMBER, "JOURNAL" VARCHAR2(4000), "PUBLISHER" VARCHAR2(4000), "EDITION" VARCHAR2(4000), "TITLE" VARCHAR2(4000), "AUTHOR" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "OSCH_EXTTAB_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' CHARACTERSET AL32UTF8 PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'2C' MISSING FIELD VALUES ARE NULL ( "CATALOGID" CHAR NULLIF "CATALOGID"=0X'5C4E', "JOURNAL" CHAR(4000) NULLIF "JOURNAL"=0X'5C4E', "PUBLISHER" CHAR(4000) NULLIF "PUBLISHER"=0X'5C4E', "EDITION" CHAR(4000) NULLIF "EDITION"=0X'5C4E', "TITLE" CHAR(4000) NULLIF "TITLE"=0X'5C4E', "AUTHOR" CHAR(4000) NULLIF "AUTHOR"=0X'5C4E' ) ) LOCATION ( 'osch-20140513071215-4941-1' ) ) PARALLEL REJECT LIMIT UNLIMITED; The following location files were created. osch-20140513071215-4941-1 contains 1 URI, 266 bytes 266 hdfs://10.0.2.15:8020/catalog/catalog.txt An external table OE.CATALOG_EXT gets created. The data file URI must be a hdfs:// URI. Run a DESC SQL statement in SQL Plus to list the description of the external table. Create an Oracle Database table OE.CATALOG from the external table OE.CATALOG_EXT and run a SELECT SQL statement in SQL Plus on the external table to list the Hive data. The complete output from the SELECT statement query is listed: SQL> SELECT * FROM OE.CATALOG; CATALOGID ---------- JOURNAL -------------------------------------------------------------------------------- PUBLISHER -------------------------------------------------------------------------------- EDITION -------------------------------------------------------------------------------- TITLE -------------------------------------------------------------------------------- AUTHOR -------------------------------------------------------------------------------- 1 Oracle Magazine Oracle Publishing Nov-Dec 2004 Database Resource Manager Kimberly Floss 2 Oracle Magazine Oracle Publishing Nov-Dec 2004 From ADF UIX to JSF Jonas Jacobi 3 Oracle Magazine Oracle Publishing March-April 2005 Starting with Oracle ADF Steve Muench Describing the Tables Run the –describe command using the ExternalTable tool to describe the external table OE.CATALOG_EXT . The output from the –describe command is listed: [root@localhost osch]# hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -Doracle.hadoop.exttab.printStackTrace=true -conf catalog_hdfs.xml -describe Oracle SQL Connector for HDFS Release 3.0.0 - Production Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. [Enter Database Password:] The described object is "OE"."CATALOG_EXT" CREATE TABLE "OE"."CATALOG_EXT" ( "CATALOGID" NUMBER, "JOURNAL" VARCHAR2(4000), "PUBLISHER" VARCHAR2(4000), "EDITION" VARCHAR2(4000), "TITLE" VARCHAR2(4000), "AUTHOR" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "OSCH_EXTTAB_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' CHARACTERSET AL32UTF8 PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'2C' MISSING FIELD VALUES ARE NULL ( "CATALOGID" CHAR NULLIF "CATALOGID"=0X'5C4E', "JOURNAL" CHAR(4000) NULLIF "JOURNAL"=0X'5C4E', "PUBLISHER" CHAR(4000) NULLIF "PUBLISHER"=0X'5C4E', "EDITION" CHAR(4000) NULLIF "EDITION"=0X'5C4E', "TITLE" CHAR(4000) NULLIF "TITLE"=0X'5C4E', "AUTHOR" CHAR(4000) NULLIF "AUTHOR"=0X'5C4E' ) ) LOCATION ( 'osch-20140513071215-4941-1' ) ) REJECT LIMIT UNLIMITED PARALLEL Listing Location Files for external table: [CATALOG_EXT] osch-20140513071215-4941-1 contains 1 URI, 266 bytes 266 hdfs://10.0.2.15:8020/catalog/catalog.txt Listing the Location Files Run the –listLocations command to list the location files for the external table OE.CATALOG_EXT . hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -Doracle.hadoop.exttab.printStackTrace=true -conf /conf/catalog_hdfs.xml –listLocations The output from the command lists one location file with a HDFS data file URI. Getting the DDL Run the –getDDL command to get the DDL for the external table. hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -Doracle.hadoop.exttab.printStackTrace=true -conf /conf/catalog_hdfs.xml –getDDL The output from the –getDDL command is listed: [root@localhost osch]# hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -Doracle.hadoop.exttab.printStackTrace=true -conf catalog_hdfs.xml -getDDL Oracle SQL Connector for HDFS Release 3.0.0 - Production Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. [Enter Database Password:] CREATE TABLE "OE"."CATALOG_EXT" ( "CATALOGID" NUMBER, "JOURNAL" VARCHAR2(4000), "PUBLISHER" VARCHAR2(4000), "EDITION" VARCHAR2(4000), "TITLE" VARCHAR2(4000), "AUTHOR" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "OSCH_EXTTAB_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' CHARACTERSET AL32UTF8 PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'2C' MISSING FIELD VALUES ARE NULL ( "CATALOGID" CHAR NULLIF "CATALOGID"=0X'5C4E', "JOURNAL" CHAR(4000) NULLIF "JOURNAL"=0X'5C4E', "PUBLISHER" CHAR(4000) NULLIF "PUBLISHER"=0X'5C4E', "EDITION" CHAR(4000) NULLIF "EDITION"=0X'5C4E', "TITLE" CHAR(4000) NULLIF "TITLE"=0X'5C4E', "AUTHOR" CHAR(4000) NULLIF "AUTHOR"=0X'5C4E' ) ) LOCATION ( 'osch-20140513071215-4941-1' ) ) REJECT LIMIT UNLIMITED PARALLEL Publishing the Hive Table Data The –createTable command backs the external table data with the Hive table data. If the location file for external table is required to be updated because, for example, the Hive table data has been updated, the –publish command may be run to generate a new location file and delete the previous location file. Run the –publish command. hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -Doracle.hadoop.exttab.printStackTrace=true -conf /conf/catalog_hdfs.xml –publish The OE.CATALOG_EXT external table gets altered and a new location file gets created. [root@localhost osch]# hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -Doracle.hadoop.exttab.printStackTrace=true -conf catalog_hdfs.xml -publish Oracle SQL Connector for HDFS Release 3.0.0 - Production Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. [Enter Database Password:] 14/05/13 19:21:04 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 14/05/13 19:21:05 INFO metastore.ObjectStore: ObjectStore, initialize called 14/05/13 19:21:05 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 14/05/13 19:21:09 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 14/05/13 19:21:09 INFO metastore.ObjectStore: Initialized ObjectStore 14/05/13 19:21:12 INFO metastore.HiveMetaStore: 0: get_table : db=catalog tbl=catalog 14/05/13 19:21:12 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=catalog tbl=catalog 14/05/13 19:21:12 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 14/05/13 19:21:12 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. The publish command succeeded. ALTER TABLE "OE"."CATALOG_EXT" LOCATION ( 'osch-20140513072115-9620-1' ); The following location files were created. osch-20140513072115-9620-1 contains 1 URI, 266 bytes 266 hdfs://10.0.2.15:8020/catalog/catalog.txt The following location files were deleted. osch-20140513071215-4941-1 was deleted. In this article we created an external table on a Hive table.
↧