HBase_Lab3
-
Upload
muhammad-sadiq -
Category
Documents
-
view
23 -
download
5
description
Transcript of HBase_Lab3
-
IBM Software
Using HBase for Real-time Access to your BigData Using administrative and advance features for schema creation and data retrieval
-
Copyright IBM Corporation, 2013
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
-
IBM Software
Contents USING ADMINISTRATIVE AND ADVANCE FEATURES FOR SCHEMA CREATION AND DATA RETRIEVAL ......................... 4
3.1 CREATING AND MODIFYING SCHEMAS USING THE HBASEADMIN API .............................................................. 5 3.2 LOADING A DATA SET INTO HBASE ............................................................................................................. 8
3.2.1 LOADING THE DATA INTO THE HDFS ................................................................................................. 8 3.2.2. IMPORTING THE DATA INTO HBASE ................................................................................................. 11
3.3 CREATING AND USING FILTERS TO RETRIEVE YOUR DATA ........................................................................... 14 3.4 WORKING WITH COUNTERS..................................................................................................................... 16 3.5 SUMMARY ............................................................................................................................................. 18
Contents Page 3
-
IBM Software
Using administrative and advance features for schema creation and data retrieval In this lab, you will create and update schemas using the HBaseAdmin API. This will allow you to create your tables and column families for your data.
Then you will use Java API to take advantage of Filters and Counters. As part of that lab setup, you will see how to load a sample data set using the ImportTsv tool.
After completing this hands-on lab, you will be able to:
Create and Modify HBase tables and schemas using the HBaseAdmin API
Load data into HBase using the ImportTsv tool
Apply Filters to your Scans or Get operations to enhance the data returned from HBase
Use Counters for statistics collection
This lab assumes some familiarity with the Eclipse environment. The lab solution is included with the files that you downloaded in the VM image.
Allow 60 minutes to 90 minutes to complete this section of lab.
This version of the lab was designed using the InfoSphere BigInsights 2.1 Quick Start Edition. Throughout this lab you will be using the following account login information:
Username Password
VM image setup screen root password
Linux biadmin biadmin
Page 4
-
IBM Software
3.1 Creating and modifying schemas using the HBaseAdmin API
You have been using tables and column families by creating them directly from the shell. What if you need to create them programmatically? Using the HBaseAdmin API will allow you to do so.
Solutions for this part of the exercise can be found in the Lab_Files/LabSolutions.
__1. Start Eclipse and go to the default workspace. Exercise 2s files will most likely be opened in your workspace if you had not closed them earlier. Close them all now:
__2. You will create a new package named: hbase.exercise3. Then you will import the partially completed classes from the lab files under Lab_Files/Exercise3 into the workspace. These are the four files you should have imported.
__3. Create some tables using HBaseAdmin. Open up HBase_SchemaTester.java. Fill in the code needed.
__a. HBaseAdmin admin = new HBaseAdmin(conf);
__b. HTableDescriptor desc = new HTableDescriptor(tableName);
Contents Page 5
-
IBM Software
__c. HColumnDescriptor colFamilyDesc = new HColumnDescriptor(columnFamily);
__d. desc.addFamily(colFamilyDesc);
__e. admin.createTable(desc);
__f. Uncomment the code to test if the table has been created to complete the method for creating the table.
__4. Run the program to see that your table gets created. Be sure to uncomment out the line that tests if the table is available if you havent done so in the step above. The results of your output should return true.
You can also go to the HBaseShell and type in the command list or describe tableFromJava to see it.
__5. Once your table has been created, you will need to comment out the code that you just wrote so that you can run the next set of code to modify an existing table. Go ahead and comment out these lines.
Page 6
-
IBM Software
You can leave the other parts of the method intact as you will still need them to modify the table.
__6. Now you need to create the code to modify the table. You will type in the code below the code you just commented out.
__a. HTableDescriptor htd1 = admin.getTableDescriptor(tableName);
__b. long oldMaxFileSize = htd1.getMaxFileSize();
__c. HColumnDescriptor colFamilyDesc2 = new HColumnDescriptor(Bytes.toBytes("cf2"));
__d. htd1.addFamily(colFamilyDesc2);
__e. htd1.setMaxFileSize(1024 * 1024 * 1024L);
__f. admin.disableTable(tableName);
__g. admin.modifyTable(tableName, htd1);
__h. admin.enableTable(tableName);
__i. HTableDescriptor htd2 = admin.getTableDescriptor(tableName);
__j. Uncomment out the System.out.println() to test your class
__7. When you are done, run the program and you should see the modifications to the schema based on the series of the System.outs.
Contents Page 7
-
IBM Software
3.2 Loading a data set into HBase
3.2.1 Loading the data into the HDFS To prepare for this lab, you will load a sample data set into HBase using BigInsights. GSDB database, a rich and realistic database that contains sample data for the Great Outdoors company, which is a fictional outdoor equipment retailer, is used for this purpose. For simplicity, we will use only one table from this database.
First we will load that file into HDFS. Then we will import it into an HBase table.
You should have downloaded the lab files for this exercise already. If you did not download them, go to the Big Data Universitys course page to get the instructions to get your lab files. You will need the SLS_Sales_Fact.txt file.
__8. Double click the BigInsights
__9. Navigate to the Files tab, go to the biadmin directory under users, and click the create new directory icon on the menu bar.
Page 8
-
IBM Software
__10. Give the new folder the name : exercise3
__11. Select the exercise 3 directory and then click the Upload icon:
__12. Click Browse and search for the file to upload under Lab_Files/SLS_SALES_FACT.txt
Contents Page 9
-
IBM Software
__13. Click Open and then OK to upload the file. Once the upload has been completed, you will see the file.
Page 10
-
IBM Software
3.2.2. Importing the data into HBase
__14. Next thing you are going to do is create the table, sales_fact with a single column family that stores only one version of its value. You will do this using this shell command:
create sales_fact, {NAME=> cf, VERSIONS=>1}
__15. The table has been created, go ahead and exit the HBase Shell.
__16. We are now going to use the ImportTsv tool load the data into the sales_fact table
Contents Page 11
-
IBM Software
Remember again, that the columns here are representative of typical column names in a traditional RDBMS. In HBase, you will not want to name your columns as such, but instead use as short of a column name as you can.
__17. Run this command to add the columns and their respective values:
$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -
Dimporttsv.columns=HBASE_ROW_KEY,cf:ok,cf:ek,cf:rk,cf:rsk,cf:pdk,cf:pmk,cf:
omk,cf:sok,cf:sdk,cf:cdk,cf:q,cf:uc,cf:up,cf:usp,cf:gm,cf:st,cf:gp -
Dimporttsv.skip.bad.lines=false 'sales_fact'
hdfs://bivm:9000/user/biadmin/exercise3/SLS_SALES_FACT.txt
__18. Once it is done, count the rows in the result, by using this command in the HBase shell:
count sales_fact
Page 12
-
IBM Software
You have imported 440 rows into the sales_fact table.
The data has been loaded. Go to the next section to work with the data set.
Contents Page 13
-
IBM Software
3.3 Creating and Using Filters to retrieve your data
__19. We will work with the Filters first. Open up AccessObject.java and go to the getInfo() method. In this method, you will create a scanner on the sales_fact table. You want to restrict the scan to only two columns, enough for our purposes. You will create the Filter later, for now write the code to add those two columns and set the filter:
__a. scan.addColumn(COLUMN_FAMILY, UNIT_PRICE);
__b. scan.addColumn(COLUMN_FAMILY, QUANTITY);
__c. scan.setFilter(filter);
__20. Now open up HBase_FilterTester.java. In here, you will write the code to create five different filters.
__a. Filter f1 = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("20070920")));
__b. Filter f2 = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("20050920")));
__c. Filter f3 = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator(".*2006."));
__d. Filter f4 = new QualifierFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("q")));
Page 14
-
IBM Software
__e. Filter f5 = new ValueFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("136.90")));
__21. Once you have written the five filters, uncomment out the ao.getInfo(f1) and run the program with each of the filter and look at the results to validate your filters. You may also change the comparison operators or the comparators around to see different results.
Contents Page 15
-
IBM Software
3.4 Working with Counters
__22. Next you will work with Counters. Go back to the AccessObject.java class. We have a method performIncrement() that increment counters. Write the appropriate code to complete this method:
__a. Increment increment1 = new Increment(Bytes.toBytes(rowkey));
__b. increment1.addColumn(COLUMN_FAMILY, Bytes.toBytes("ViewCount"), viewCountValue);
__c. increment1.addColumn(COLUMN_FAMILY, Bytes.toBytes("AnotherCount"), anotherCountValue);
__d. Result result1 = sales_fact.increment(increment1);
__e. Uncomment out the last section of the code to complete the method.
__23. Once you are done with AccessObject.java, open up HBaseCounterTester.java:
Page 16
-
IBM Software
We are just picking a random row to add the two counter columns that was defined earlier: ViewCount and AnotherCount. The two values to increment by are 1 and 20. Run the HBase_CounterTester to see the counter results. Run it multiple times to see the counter increment.
__24. Run it with negative values to see the counters decrease or run it with a 0 to get current value:
__25. You are done with this lab exercise. Go ahead and save and close your Eclipse and any other windows or terminals that you may have open.
Contents Page 17
-
IBM Software
3.5 Summary
Excellent work! After using tables for a few lessons already, you have now seen how to create the tables programmatically using the HBaseAdmin API for your applications. Its also worth mentioning again that you could have created tables using the HBase Shell commands, but in an application, you would use some sort of Client API.
You should now be familiar with some of the Filters available for filtering out the data that is returned from HBase. You also should understand how Counters work. Filters and Counters are essential to HBase as they allow you to narrow down large sets of data (were dealing with Big Data, so youll have LOTS of data) to get to what you need quickly. Counters help you manage and collect the statistics of your table.
Also, as part of the lab setup, you saw how to use ImportTsv tool to import some data into HBase. First you had to load the data into the HDFS. Then you ran the tool which specified the column families and column names of the data from the text file.
Page 18
-
NOTES
-
NOTES
-
Copyright IBM Corporation 2013.
The information contained in these materials is provided for
informational purposes only, and is provided AS IS without warranty
of any kind, express or implied. IBM shall not be responsible for any
damages arising out of the use of, or otherwise related to, these
materials. Nothing contained in these materials is intended to, nor
shall have the effect of, creating any warranties or representations
from IBM or its suppliers or licensors, or altering the terms and
conditions of the applicable license agreement governing the use of
IBM software. References in these materials to IBM products,
programs, or services do not imply that they will be available in all
countries in which IBM operates. This information is based on
current IBM product plans and strategy, which are subject to change
by IBM without notice. Product release dates and/or capabilities
referenced in these materials may change at any time at IBMs sole
discretion based on market opportunities or other factors, and are not
intended to be a commitment to future product or feature availability
in any way.
IBM, the IBM logo and ibm.com are trademarks of International
Business Machines Corp., registered in many jurisdictions
worldwide. Other product and service names might be trademarks of
IBM or other companies. A current list of IBM trademarks is
available on the Web at Copyright and trademark information at
www.ibm.com/legal/copytrade.shtml.
Using administrative and advance features for schema creation and data retrieval33.1 Creating and modifying schemas using the HBaseAdmin API3.2 Loading a data set into HBase3.2.1 Loading the data into the HDFS3.2.2. Importing the data into HBase
3.3 Creating and Using Filters to retrieve your data3.4 Working with Counters3.5 Summary