Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 ›...

31
Simba Apache HBase ODBC Driver Quickstart Guide Revised: October 21, 2013

Transcript of Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 ›...

Page 1: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Simba Apache HBase ODBC Driver Quickstart Guide

Revised: October 21, 2013

Page 2: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 2

Simba Apache HBase ODBC Driver Quickstart Guide

Contents Purpose ......................................................................................................................................................... 3

Do you need HBase? ..................................................................................................................................... 4

Do you need sample data in HBase? ............................................................................................................ 7

Install the Simba Apache HBase ODBC Driver ............................................................................................ 12

Connect from Excel ..................................................................................................................................... 16

Connect from Tableau ................................................................................................................................ 21

Troubleshooting .......................................................................................................................................... 28

Architecture Mismatch Problems ........................................................................................................... 28

For assistance at any point in this installation process, please contact Simba for free Engineering Level Support at: [email protected].

Page 3: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 3

Simba Apache HBase ODBC Driver Quickstart Guide

Purpose This document is targeted towards users of the Simba Apache HBase ODBC Driver. The following sections will outline how to get your Windows environment quickly set up to allow for evaluation and use of the driver.

Use the following flow-chart to determine what page of the guide to start on.

Page 4: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 4

Simba Apache HBase ODBC Driver Quickstart Guide

Do you need HBase? How to Use HBase? To use Hive, you need to have a Hadoop installation already set up.

a. Download from Apache Hadoop web site: https://hadoop.apache.org/releases.html b. Install Guides at https://hadoop.apache.org/docs/current/.

How to Download Apache Hive for free: a. Visit https://www.apache.org/dyn/closer.cgi/hbase/. b. Follow the install guides at: https://hbase.apache.org/book/quickstart.html

Download Options for pre-setup distributions: • Hortonworks: http://hortonworks.com/products/hortonworks-sandbox/ • Cloudera: http://www.cloudera.com/content/cloudera-content/cloudera-

docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html • MapR: http://www.mapr.com/doc/display/MapR/Quick+Start+-

+Test+Drive+MapR+on+a+Virtual+Machine

Confirm Apache Hive is installed and running:

Step 1: Start Hadoop, open the command line interface on the machine that Hadoop is running on and type the following command: jps

This will list the running Hadoop processes. You should see output similar to the following:

Page 5: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 5

Simba Apache HBase ODBC Driver Quickstart Guide

Note: that you may also have a SecondaryNameNode item listed if you are configured to use secondary name nodes.

Step 2: To start HBase, use the following command: start-hbase.sh

Step 3: After this, start the REST service with the following command:

hbase-daemons.sh start rest

Page 6: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 6

Simba Apache HBase ODBC Driver Quickstart Guide

Note: The default port is 8080.

• To change this use –p <port> in the start command. • To confirm that HBase is running, run the following command:

netstat –nl | grep <port>

where <port> is the port that HBase was started on. You should see output similar to the following:

This indicates that HBase is indeed listening on an open port.

Page 7: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 7

Simba Apache HBase ODBC Driver Quickstart Guide

Do you need sample data in HBase? The following steps will import some trial data if you do not already have data to use.

How to get a sample data set for Hive?

Step 1: Download the sample data set from: http://www.simba.com/wp-content/uploads/2013/10/FAA_HBase.zip Note: This is a modified version of the FAA data set, the original which is available here: http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time Step 2: Extract the zip file that was downloaded.

Step 3: Start Hadoop and HBase as directed in the respective user guides.

Page 8: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 8

Simba Apache HBase ODBC Driver Quickstart Guide

Step 4: Create a directory for the data files in Hadoop with the following command:

hadoop fs –mkdir /user/data/faa

Step 5: Copy over the Airline.csv file to the Hadoop file system using the following command:

hadoop fs -copyFromLocal Airline.csv /user/data/faa/Airline.csv

Page 9: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 9

Simba Apache HBase ODBC Driver Quickstart Guide

Step 6: Next create the table to hold the data from the table with the following commands:

hbase shell create ‘Airline’, ‘c’ quit

Page 10: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 10

Simba Apache HBase ODBC Driver Quickstart Guide

Step 7: Import the data from the Hadoop file system CSV file with the following command:

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv '-Dimporttsv.separator=,' -Dimporttsv.columns=HBASE_ROW_KEY,c:unique_carrier,c:airline_id,c:carrier,c:tail_num,c:fl_num Airline /user/data/faa/Airline.csv

Step 8: The import should run and you should see a finished screen similar to the below:

Page 11: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 11

Simba Apache HBase ODBC Driver Quickstart Guide

Step 9: To confirm that the data was imported correctly, open the HBase shell again and type the following commands:

t = get_table ‘Airline’ t.scan

This scan through the created table, showing the rows that you have imported. At the end of the scan, you should see something like this:

Page 12: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 12

Simba Apache HBase ODBC Driver Quickstart Guide

Install the Simba Apache HBase ODBC Driver How to Get the Simba Apache HBase Driver?

Step 1: Download from: http://www.simba.com/connectors/HBase-odbc

Choose either the 32 or 64 bit version as appropriate. Follow the steps in the install guide (http://www.simba.com/wp-content/uploads/2013/05/Simba-HBase-ODBC-Driver-User-Guide.pdf) to install on your machine.

Step 2: You should receive an email with a license key attached. This license key should be placed alongside the driver DLL in the installation directory. By default, the directories for the licenses are:

• C:\Program Files\Simba Apache HBase ODBC Driver\lib • 32-bit driver on 64-bit Windows: C:\Program Files (x86)\Simba Apache HBase ODBC

Driver\lib

Note: If you use Outlook, you may need to save the license file locally before placing it in the above directory to avoid permission problems.

Page 13: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 13

Simba Apache HBase ODBC Driver Quickstart Guide

How to Configure the DSN (Data Source Name)?

Step 1: Open the ODBC Administrator Note: Use ODBC Administrator that matches the bitness of the driver you are using. See http://www.simba.com/wp-content/uploads/2010/10/HOW-TO-32-bit-vs-64-bit-ODBC-Data-Source-Administrator.pdf for information.

Step 2: Choose the System DSN tab.

Page 14: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 14

Simba Apache HBase ODBC Driver Quickstart Guide

Step 3: Choose the “Sample Simba HBase DSN” and press “Configure…”

Step 4:

• Change the Host to the IP or hostname of your HBase server. • Change the port to the port that the HBase REST service is running on.

Note: 8080 is the default port for HBase.

Page 15: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 15

Simba Apache HBase ODBC Driver Quickstart Guide

Step 5: Press the “Test” button to confirm that your configuration is correct. You should see the a dialog which indicates a successful test and a list of some of the tables in HBase.

Step 6: Press the “OK” button to save the configuration.

Page 16: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 16

Simba Apache HBase ODBC Driver Quickstart Guide

Connect from Excel How do I connect and make a basic query with the Simba Apache HBase Driver to Excel? Note: The version of Excel used is Excel 2010 32-bit, however the driver will work with any version and bitness of Excel.

Step 1: Open Excel Step 2: Choose the Data tab, then choose “From Other Data Sources” and select “From Data Connection Wizard”.

Page 17: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 17

Simba Apache HBase ODBC Driver Quickstart Guide

Step 3: Choose “ODBC DSN” from the list and press “Next >”. DSN stands for Data Source Name, which is what was configured when installing and configuring the driver. Essentially, it’s a preconfigured and stored set of connection settings which allow you to easily connect a driver to the data source.

Step 4: Choose the “Sample Simba Apache HBase DSN” (or the DSN that you have created and configured) from the list and press “Next >”.

Page 18: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 18

Simba Apache HBase ODBC Driver Quickstart Guide

a. If you see the following image, the connection to the driver did not succeed.

b. Pressing the “Test Connection” button will give you the following dialog.

c. Issue Diagnosis: Bitness is incorrectly matched. You must match the bitness of the application with the bitness of the driver to correctly connect such as use 32-bit Excel with a 32-bit driver, or 64-bit Excel with a 64-bit driver. See Architecture Mismatch Problems (page 28) for more information.

Page 19: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 19

Simba Apache HBase ODBC Driver Quickstart Guide

Step 5: Choose the table that you wish to query and press the “Finish” button.

Step 6: Choose the location for your returned data. Leave it as “=$A$1” and press the “OK” button.

Page 20: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 20

Simba Apache HBase ODBC Driver Quickstart Guide

Step 7: Wait while data is returned.

Congratulations, your data is now available from Excel.

Page 21: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 21

Simba Apache HBase ODBC Driver Quickstart Guide

Connect from Tableau How do I connect and make a basic query with the Simba HBase Driver to Tableau? Note: The version of Tableau used is 8.0, however the driver should work without problems in Tableau 7 as well.

Step 1: Open Tableau. Step 2: Select “Connect to data”

Page 22: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 22

Simba Apache HBase ODBC Driver Quickstart Guide

Step 3: Choose “Other Databases (ODBC)” at the bottom of the list.

Page 23: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 23

Simba Apache HBase ODBC Driver Quickstart Guide

Step 4: Select the “Sample Simba Apache HBase DSN” (or the DSN that you have created and configured) from the DSN drop-down and press the “Connect” button.

a. You may see the following dialog if there is an error connecting to the driver.

b. Issue Diagnosis: Bitness incorrect, using the 64-bit driver.

At the time of this writing, Tableau is only available as a 32-bit application, so you must use a 32-bit driver. You can confirm the bitness of the driver by pressing the “Show Details” button and ensuring that the driver DLL name is SimbaHBaseODBC32.dll. See Architecture Mismatch Problems (page 28) for more information.

Page 24: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 24

Simba Apache HBase ODBC Driver Quickstart Guide

Step 5: Select the “Single Table” option and press the magnifying glass to open the list of tables.

Step 6: Choose one of the tables to query and press the “Select” button.

Page 25: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 25

Simba Apache HBase ODBC Driver Quickstart Guide

Step 7: Press OK to load the selected table.

Step 8: If a warning comes up, press OK as this will not affect operation of the driver. This warning is displayed because Tableau sees only a generic driver and attempts to determine what the capabilities are on its own. The warning does not affect how the driver will operate in Tableau.

Page 26: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 26

Simba Apache HBase ODBC Driver Quickstart Guide

Step 9: Choose to connect live so that Tableau does not import all of the data and you work on the data in HBase.

Page 27: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 27

Simba Apache HBase ODBC Driver Quickstart Guide

Step 10: The table will be loaded into Tableau with the columns listed as dimensions and measures depending on data type. From here, you can create reports on the table you selected.

Congratulations, your data is now available for visualization in Tableau.

Page 28: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 28

Simba Apache HBase ODBC Driver Quickstart Guide

Troubleshooting

Architecture Mismatch Problems If you encounter an error message similar to “The specified DSN contains an architecture mismatch between the Driver and Application”, then the bitness of the application does not match the bitness of the driver. You are likely connecting a 32-bit application to a 64-bit driver, or vice versa. Please ensure that the bitness of your application matches the bitness of driver that you are trying to use.

To determine the bitness of Excel: Excel 2007 and earlier These versions of Excel are strictly 32-bit.

Excel 2010 Step 1: Click on the “File” tab.

Step 2: Click on the “Help” item on the left-hand side.

Page 29: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 29

Simba Apache HBase ODBC Driver Quickstart Guide

Step 3: Look at the version that is displayed on the help page. If Excel is 32-bit it will show “(32-bit)” (as pictured) while if it is 64-bit it will show “(64-bit)”.

Excel 2013 Step 1:

Click on the “FILE” tab.

Page 30: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 30

Simba Apache HBase ODBC Driver Quickstart Guide

Step 2: Click on the “Account” item on the left-hand side.

Step 3: Click on the “About Excel” button on the right side.

Page 31: Simba Apache HBase ODBC Driver Quickstart Guide › wp-content › uploads › 2013 › 05 › ...Simba Apache HBase ODBC Driver . Quickstart Guide Purpose . This document is targeted

Page | 31

Simba Apache HBase ODBC Driver Quickstart Guide

Step 4: Check the version string in the pop-up dialog. If Excel is 32-bit, it will show “32-bit” in the string, if it is 64-bit then it will show “64-bit”.

To determine the bitness of Tableau:

At the time of writing, all versions of Tableau are 32-bit.