Simba Apache Hive ODBC Driver Quickstart Guide

31
Simba Apache Hive ODBC Driver Quickstart Guide Revised: September 10, 2014

Transcript of Simba Apache Hive ODBC Driver Quickstart Guide

Page 1: Simba Apache Hive ODBC Driver Quickstart Guide

Simba Apache Hive ODBC Driver Quickstart Guide

Revised: September 10, 2014

Page 2: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 2

Simba Apache Hive ODBC Driver Quickstart Guide

Contents Purpose ......................................................................................................................................................... 3

Do you need Hive? ........................................................................................................................................ 4

Do you need sample data in Hive? ............................................................................................................... 6

Install the Simba Apache Hive ODBC Driver ................................................................................................. 9

Connect from Excel ..................................................................................................................................... 14

Connect from Tableau ................................................................................................................................ 19

Troubleshooting .......................................................................................................................................... 27

Architecture Mismatch Problems ........................................................................................................... 27

For assistance at any point in this installation process, please contact Simba for free Engineering Level

Support at: [email protected].

Page 3: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 3

Simba Apache Hive ODBC Driver Quickstart Guide

Purpose This document is targeted towards users of the Simba Apache Hive ODBC Driver. The following sections

will outline how to get your Windows environment quickly set up to allow for evaluation and use of the

driver.

Use the following flow-chart to determine what page of the guide to start on.

Page 4: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 4

Simba Apache Hive ODBC Driver Quickstart Guide

Do you need Hive? How to Use Hive? To use Hive, you need to have a Hadoop installation already set up.

a. Download from Apache Hadoop web site: https://hadoop.apache.org/releases.html

b. Install Guides at https://hadoop.apache.org/docs/current/.

How to Download Apache Hive for free: a. Visit https://hive.apache.org/releases.html

b. Follow the install guides at:

https://cwiki.apache.org/confluence/display/Hive/GettingStarted.

Download Options for pre-setup distributions:

Hortonworks: http://hortonworks.com/products/hortonworks-sandbox/

Cloudera: http://www.cloudera.com/content/cloudera-content/cloudera-

docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html

MapR: http://www.mapr.com/doc/display/MapR/Quick+Start+-

+Test+Drive+MapR+on+a+Virtual+Machine

Amazon EMR: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-

hive.html

Confirm Apache Hive is installed and running:

Step 1: Start Hadoop, open the command line interface on the machine that Hadoop is running on and

type the following command:

jps

This will list the running Hadoop processes. You should see output similar to the following:

Page 5: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 5

Simba Apache Hive ODBC Driver Quickstart Guide

Note: that you may also have a SecondaryNameNode item listed if you are configured to use

secondary name nodes.

Step 2: To confirm that Hive is running, run the following command:

netstat –nl | grep <port>

where <port> is the port that Hive was started on. You should see similar output:

This indicates that Hive is indeed listening on an open port. Note: that 10000 is the default port for Hive.

Page 6: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 6

Simba Apache Hive ODBC Driver Quickstart Guide

Do you need sample data in Hive?

Note:The following steps will import some trial data if you do not already have data to use. The

Hadoop and Hive configuration used here is on Linux, however the steps should be the same if

hosted on Windows.

How to get a sample data set for Hive?

Step 1:

Download the sample data set from:

http://www.simba.com/wp-content/uploads/2013/10/FAA_Hive.zip.

Note: This is a modified version of the FAA data set, the original which is available here:

http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time

Step 2:

Extract the zip file that was downloaded.

Note: The HiveSchema.txt file contains the Hive CREATE TABLE commands for each of the CSV

files in the data set, if you wish to load tables other than the Airline table shown here.

Step 3:

Start Hadoop and Hive as directed in the respective user guides.

Page 7: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 7

Simba Apache Hive ODBC Driver Quickstart Guide

Step 4:Run the Hive shell.

Step 5:

In the shell, create a table for holding the data from the Airline.csv file by typing the following: create table Airline(

UNIQUE_CARRIER string,

AIRLINE_ID int,

CARRIER string,

TAIL_NUM string,

FL_NUM string

) row format delimited fields terminated by ',' stored as textfile;

Page 8: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 8

Simba Apache Hive ODBC Driver Quickstart Guide

Step 6:

Import the data by running the following command: load data local inpath '<path>/Airline.csv' into table Airline;

where <path> is the full path to the Airline.csv file that was extracted from the zip file.

Step 7:

Confirm that the data has been imported into hive by running the following command:

select * from Airline;

Page 9: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 9

Simba Apache Hive ODBC Driver Quickstart Guide

Install the Simba Apache Hive ODBC Driver How to Get the Simba Apache Hive Driver?

Step 1: Download from: http://www.simba.com/connectors/apache-hadoop-hive-odbc

Choose either the 32 or 64 bit version as appropriate. Follow the steps in the install guide

(http://www.simba.com/wp-content/uploads/2013/05/Simba-ODBC-Driver-for-Hive-Install-

Guide1.pdf) to install on your machine.

Step 2: You should receive an email with a license key attached. This license key should be placed

alongside the driver DLL in the installation directory. By default, the directories for the licenses

are:

C:\Program Files\Simba Hive ODBC Driver\lib

32-bit driver on 64-bit Windows: C:\Program Files (x86)\Simba Hive ODBC Driver\lib

Note: If you use Outlook, you may need to save the license file locally before placing it in the

above directory to avoid permission problems.

Page 10: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 10

Simba Apache Hive ODBC Driver Quickstart Guide

How to Configure the DSN (Data Source Name)?

Step 1: Open the ODBC Administrator

Note: Use the ODBC Administrator that matches the bitness of the driver you are using. See

http://www.simba.com/wp-content/uploads/2010/10/HOW-TO-32-bit-vs-64-bit-ODBC-Data-

Source-Administrator.pdf for information.

Page 11: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 11

Simba Apache Hive ODBC Driver Quickstart Guide

Step 2:

Choose the System DSN tab.

Step 3:

Choose the “Sample Simba Hive DSN” and press “Configure…”

Page 12: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 12

Simba Apache Hive ODBC Driver Quickstart Guide

Step 4:

Change the Host to the IP or hostname of your Hive server.

Change the port to the port that Hive is running on. Note: 10000 is the default port for Hive.

Change the database to “faa” to access the “airline” table that was created earlier, or

the name of the database that has your data in it. Username is only relevant if Hive is

configured to use authentication. If it is configured to use Authentication, check “Enable

authentication” under the “Advanced Options” section.

Page 13: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 13

Simba Apache Hive ODBC Driver Quickstart Guide

Step 5:

Press the “Test” button to confirm that your configuration is correct.

Page 14: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 14

Simba Apache Hive ODBC Driver Quickstart Guide

Connect from Excel How do I connect and make a basic query with the Simba Apache Hive Driver to Excel? Note: The version of Excel used is Excel 2010 32-bit, however the driver will work with any version and bitness of Excel.

Step 1:

Open Excel

Step 2:

Choose the Data tab, then choose “From Other Data Sources” and select “From Data Connection

Wizard”.

Page 15: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 15

Simba Apache Hive ODBC Driver Quickstart Guide

Step 3:

Choose “ODBC DSN” from the list and press “Next >”.

DSN stands for Data Source Name, which is what was configured when installing and configuring

the driver. Essentially, it’s a preconfigured and stored set of connection settings which allow you

to easily connect a driver to the data source.

Step 4:

Choose the “Sample Simba Hive DSN” (or the DSN that you have created and configured) from

the list and press “Next >”.

Page 16: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 16

Simba Apache Hive ODBC Driver Quickstart Guide

a. If you see the following image, the connection to the driver did not succeed.

b. If you press the “Test Connection” button will give you the following dialog.

c. Issue Diagnosis: Bitness is incorrectly matched.

You must match the bitness of the application with the bitness of the driver to correctly

connect such as use 32-bit Excel with a 32-bit driver, or 64-bit Excel with a 64-bit driver.

See Architecture Mismatch Problems (page 27) for more information.

Page 17: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 17

Simba Apache Hive ODBC Driver Quickstart Guide

Step 5:

Choose the table that you wish to query and press the “Finish” button.

Step 6:

Choose the location for your returned data. Leave it as “=$A$1” and press the “OK” button.

Page 18: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 18

Simba Apache Hive ODBC Driver Quickstart Guide

Step 7:

Wait while data is returned.

Congratulations, your data is now available from Excel.

Page 19: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 19

Simba Apache Hive ODBC Driver Quickstart Guide

Connect from Tableau How do I connect and make a basic query with the Simba Hive Driver to Tableau? Note: The version of Tableau used is 8.0, however the driver should work without problems in Tableau 7 as well.

Step 1:

Open Tableau.

Step 2:

Select “Connect to data”

Step 3:

Select the “Other Databases (ODBC)” option at the bottom of the list.

Note: The Simba Apache Hive ODBC Driver will work with all distributions of Hadoop and Hive.

To Use the Cloudera distribution:

Download the “Cloudera Hadoop” ODBC driver for use with Tableau at:

http://go.cloudera.com/odbc-driver-hive-impala.html.

To Use the Hortonworks distribution:

Download the “Hortonworks Hadoop Hive” ODBC driver for use with Tableau at:

http://hortonworks.com/download/download-archives/

To Use the Hortonworks distribution:

Download the “MapR Hadoop Hive” ODBC driver for use with Tableau at:

http://www.mapr.com/doc/display/MapR/Hive+ODBC+Connector

Page 20: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 20

Simba Apache Hive ODBC Driver Quickstart Guide

To Use the Amazon EMR distribution:

Download the “Amazon EMR” ODBC driver for use with Tableau at:

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-bi-

tools.html

Page 21: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 21

Simba Apache Hive ODBC Driver Quickstart Guide

Step 4:

Select the “Sample Simba Hive DSN” (or the DSN that you have created and configured) from

the DSN drop-down and press the “Connect” button.

a. You may see the following dialog if there is an error connecting to the driver.

Page 22: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 22

Simba Apache Hive ODBC Driver Quickstart Guide

Page 23: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 23

Simba Apache Hive ODBC Driver Quickstart Guide

b. Issue Diagnosis: Bitness is incorrectly matched.

If the connection fails, confirm that the bitness of the driver that you are using matches

the bitness of Tableau. If you are using the 32-bit version of Tableau, then you need to

use the 32-bit version of the driver. If you are using the 64-bit version of Tableau, then

you need to use the 64-bit version of the driver. For further details on diagnosing the

issue, see Architecture Mismatch Problems on page 27.

Step 5:

Select the “Single Table” option and press the magnifying glass to open the list of tables.

Page 24: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 24

Simba Apache Hive ODBC Driver Quickstart Guide

Step 6:

Choose one of the tables to query. Press the “Load” link if it’s shown to load all of the tables to

select from. Press the “Select” button when done.

Step 7:

Press OK to load the selected table.

Page 25: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 25

Simba Apache Hive ODBC Driver Quickstart Guide

Step 9:

If a warning comes up, press OK. This warning is displayed because Tableau sees only a generic

driver and attempts to determine what the capabilities are on its own. The warning does not

affect how the driver will operate in Tableau.

Step 10:

Choose to connect live so that Tableau does not import all of the data and you work on the data

in Hadoop.

Page 26: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 26

Simba Apache Hive ODBC Driver Quickstart Guide

Step 11:

The table will be loaded into Tableau with the columns listed as dimensions and measures

depending on data type. From here, you can create reports on the table you selected.

Congratulations, your data is now available for visualization in Tableau.

Page 27: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 27

Simba Apache Hive ODBC Driver Quickstart Guide

Troubleshooting

Architecture Mismatch Problems If you encounter an error message similar to “The specified DSN contains an architecture mismatch

between the Driver and Application”, then the bitness of the application does not match the bitness of

the driver. You are likely connecting a 32-bit application to a 64-bit driver, or vice versa. Please ensure

that the bitness of your application matches the bitness of driver that you are trying to use.

To determine the bitness of Excel: Excel 2007 and earlier These versions of Excel are strictly 32-bit.

Excel 2010 Step 1: Click on the “File” tab.

Page 28: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 28

Simba Apache Hive ODBC Driver Quickstart Guide

Step 2: Click on the “Help” item on the left-hand side.

Page 29: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 29

Simba Apache Hive ODBC Driver Quickstart Guide

Step 3: Look at the version that is displayed on the help page. If Excel is 32-bit it will show “(32-bit)” (as

pictured) while if it is 64-bit it will show “(64-bit)”.

Excel 2013 Step 1:

Click on the “FILE” tab.

Page 30: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 30

Simba Apache Hive ODBC Driver Quickstart Guide

Step 2: Click on the “Account” item on the left-hand side.

Step 3: Click on the “About Excel” button on the right side.

Page 31: Simba Apache Hive ODBC Driver Quickstart Guide

Page | 31

Simba Apache Hive ODBC Driver Quickstart Guide

Step 4: Check the version string in the pop-up dialog. If Excel is 32-bit, it will show “32-bit” in the string,

if it is 64-bit then it will show “64-bit”.

To determine the bitness of Tableau:

In Tableau, click the Help menu, and then click About Tableau

The About Tableau dialog displays the bitness of your Tableau application in the top right

corner, next to the version number.