ADASTRA +1 or How YOU generate additional value in the project lifecycle - Петър Кацаров, ADASTRA
Integration with Hadoop PolyBase in SQL...
Transcript of Integration with Hadoop PolyBase in SQL...
1
Integration with Hadoop
PolyBase in SQL 2016
Adastra
Pavel Stejskal, Consultant
linkedin.com/in/pavelstejskal
20.4.2016
Integration with Hadoop using PolyBase
2
Excel + Power BI add-insQuery, Pivot, View, Map
SharePointPower Pivot Gallery, Power View
ExcelData Mining
Power BI Desktop Power BI Portal
Azure ML
Power BI Mobile App
Analytics Platform System (APS)
PolyBase allows you to use Transact-SQL (T-SQL) statements to access data stored in Hadoop or Azure Blob Storage and query it in an ad-hoc fashion.
3
PolyBase
What is PolyBase and where belongs to?
4
Hadoop cluster
Hortonworks / Cloudera
Azure
Blob Storage
Cloud solution
On-Premises solution
Hadoop cluster
Hortonworks / Cloudera
Relational Non-relational
Po
lyB
ase
Standard BI tools Integration
How to start with PolyBase - requirements
5
• Hardware
– Server for SQL (SMP architecture)
– Hadoop cluster (MPP architecture)
– Fast network between SQL and Hadoop
• Software
– MS SQL 2016 – RDBMS
– Hadoop distribution (Hortonworks or Cloudera)
• In case of cloud solution
– Hadoop in cloud
– Azure blob storage
PolyBase for SQLServer 2016 – How it works
6
SQL Server engine
PolyBase engine
PolyBase DMS*
Hadoop cluster
NameNode DataNode DataNode DataNode
T-SQL query
Direct JOIN
without ETL
DB Table
External Table
* DMS = Data Movement Service
MS SQL 2016
Data transfer
3 basic concepts for PolyBase object
7
1. External data sourceCREATE EXTERNAL DATA SOURCE HadoopHDP2 WITH (
TYPE = HADOOP,
LOCATION ='hdfs://10.xxx.xx.xxx:xxxx',
RESOURCE_MANAGER_LOCATION = '10.xxx.xx.xxx:xxxx',
CREDENTIAL = HadoopUser1 (for Kerberos-secured Hadoop)
);
2. External file formatCREATE EXTERNAL FILE FORMAT TextFileFormat WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR ='|',
USE_TYPE_DEFAULT = TRUE)
);
3 basic concepts for PolyBase object
8
3. External tableCREATE EXTERNAL TABLE ClickStream (
url varchar(50),
event_date date,
user_IP varchar(50)
)
WITH (
LOCATION='/webdata/employee.tbl', --path in HDFS)
DATA_SOURCE = HadoopHDP2,
FILE_FORMAT = TextFileFormat
);
External table
9
• Adding a shape to semi-structured data
File format – “|” as delimiter
Defined types of columns
Table for T-SQL query
1
2
3
10
Demo
Sqoop vs. PolyBase
11
SQL
Hadoop cluster
SQL
Hadoop clusterSqoop PolyBase
2 TB 100 TB 2 TB 100 TB
T-SQL queryT-SQL query Hive SQL
Data volume Data volume
???
ADASTRA CZECH REPUBLICAdastra, s.r.o.
Karolinská 654/2, 186 00 Praha 8
Tel.: +420 271 733 303
www.adastra.cz
ADASTRA GROUP North America8500 Leslie St.
Markham, Ontario, L3T 7M8
Tel: +1 905 881 7946
Restrictions for public release and use:This document can comprise confidential information. As such it may not, without Adastra’s prior consent, be copied or transferred.
Important:All brands and names of products given in this documentation are or can be registered trademarks of their owners.© 2016 Adastra, all rights reserved.
12
Thank you!