HDP2.5 Updates

Post on 21-Jan-2017

628 views 0 download

Transcript of HDP2.5 Updates

1 ©HortonworksInc.2011–2016.AllRightsReserved

HortonworksDataPla.ormUpdatesYutaImai,Hortonworks

2 ©HortonworksInc.2011–2016.AllRightsReserved

HortonworksDataPla.orm

3 ©HortonworksInc.2011–2016.AllRightsReserved

HortonworksDataPla.orm:ReleaseStrategy

MorefrequentreleasesofSpark,Hive,AmbariandotherApacheDataAccessprojects

ExtendedServices

LongerreleasearcsforcoreApacheHadoopcomponents:HDFS,YARNandMapReduce

HadoopCore2016 2017

2016 2017

4 ©HortonworksInc.2011–2016.AllRightsReserved

HORTONWORKSDATAPLATFORM

Had

oop

&YAR

N

Flume

Oozie

HDP2.3isApacheHadoop;not“basedon”Hadoop

Pig

Hive

Tez

Sqo

op

Cloud

break

Amba

ri

Slid

er

KaR

a

Kno

x

Solr

Zoo

keep

er

Spa

rk

Falcon

Ran

ger

HBa

se

Atla

s

Accum

ulo

Storm

Pho

enix

4.10.2

DATAMGMT DATAACCESS GOVERNANCE&INTEGRATION OPERATIONS SECURITY

HDP2.2Dec2014

HDP2.1April2014

HDP2.0Oct2013

HDP2.2Dec2014

HDP2.1April2014

HDP2.0Oct2013

0.12.0 0.12.0

0.12.1 0.13.0 0.4.0

1.4.4 1.4.4 3.3.23.4.5

0.4.00.5.0

0.14.0 0.14.0 3.4.6 0.5.0 0.4.00.9.30.5.2

4.0.04.7.2

1.2.1 0.60.0 0.98.4 4.2.0 1.6.1 0.6.0 1.5.21.4.5 4.1.02.0.0

1.4.0 1.5.1 4.0.0

1.3.1

1.5.1 1.4.4 3.4.5

2.2.0

2.4.0

2.6.0

2.7.1 1.4.6 1.0.0 0.6.0 0.5.02.1.00.8.2 3.4.61.5.25.2.1 0.80.0 0.5.01.7.04.4.0 0.10.0 0.6.10.7.01.2.10.15.0HDP2.3Oct2015 4.2.0

0.96.1

0.98.0 0.9.1

0.8.1

1.4.1 1.1.2

2.7.3 1.4.6 1.3.0 0.9.0 0.6.02.4.00.10.0 3.4.61.5.25.5.1 0.91.0 0.7.01.7.04.7.0 1.0.1 0.10.00.7.01.2.1+2.1***0.16.0HDP2.5*

2H20164.2.01.6.2+

2.0** 1.1.2

2.7.1 1.4.6 1.2.0 0.6.0 0.5.02.2.10.9.0 3.4.61.5.25.2.1 0.80.0 0.5.01.7.04.4.0 0.10.0 0.6.10.7.01.2.10.15.0HDP2.4Mar2016 4.2.01.6.0 1.1.2

Zep

pelin

OngoingInnovadoninApache

0.6.0

*HDP2.5–ShowscurrentApachebranchesbeingused.FinalcomponentversionsubjecttochangebasedonApachereleaseprocess.

**Spark1.6.2+Spark2.0–HDP2.5supportinstallaEonofbothSpark1.6.2andSpark2.0.Spark2.0isTechnicalPreviewwithinHDP2.5.

***Hive2.1isTechnicalPreviewwithinHDP2.5.

5 ©HortonworksInc.2011–2016.AllRightsReserved

HortonworksDataPla.orm2.5KeyHighlights

•  InteracYveQueryinSeconds:HivewithLLAP(TechnicalPreview)•  EnterpriseSparkatScale:ApacheZeppelinNotebookforSpark•  Real-TimeApplicaYons:StormandHBase/Phoenix•  StreamlinedOperaYons:ApacheAmbari•  DynamicSecurity:ApacheAtlas+RangerIntegraYon•  HortonworksDataCloud(TechnicalPreview)•  HortonworksHDB(ApacheHAWQ)

6 ©HortonworksInc.2011–2016.AllRightsReserved

InteracdveQueryinSecondsHivewithLLAPTechnicalPreview

7 ©HortonworksInc.2011–2016.AllRightsReserved7 ©HortonworksInc.2011–2016.AllRightsReserved

LLAP

8 ©HortonworksInc.2011–2016.AllRightsReserved

Hive2withLLAPEnableInteracdveQueryInSeconds

DeveloperProducYvity:InteracYvequeryinseconds

EaseofUseandAdopYon:100%compaYblewithHiveSQL

EnterpriseReadiness:LinearscalingatTerabytesvolumeofdata

StreamlinedOperaYons:LLAPintegraYonwithAmbariwithautomateddashboards

9 ©HortonworksInc.2011–2016.AllRightsReserved

Why LLAP? •  PeoplelikeHive•  Disk->Memisgehngfurtheraway

–  CloudStorageisn’tco-located–  DisksareconnectedtotheCPUvianetwork

•  Securitylandscapeischanging–  Cells&Columnsarethenewsecurityboundary,notfiles–  Safelymaskingcolumnsneedsaprocessboundary

•  Concurrency,Performance&Scaleareatconflict–  Concurrencyat100kqueries/hour–  Latenciesat2-5seconds/query–  Petabytescalewarehouses(withterabytesof“hot”data)

Node

LLAP Process

Cache

Query Fragment

HDFS

Query Fragment

10 ©HortonworksInc.2011–2016.AllRightsReserved

What is LLAP? •  Hybrid model combining daemons and containers

for fast, concurrent execution of analytical workloads (e.g. Hive SQL queries)

•  ConcurrentquerieswithoutspecializedYARNqueuesetup•  MulY-threadedexecuYonofvectorizedoperatorpipelines

•  Asynchronous IO and efficient in-memory caching •  Relational view of the data available thru the API •  Highperformancescans,execuYoncodepushdown•  Centralizeddatasecurity

Node

LLAP Process

Cache

Query Fragment

HDFS

Query Fragment

11 ©HortonworksInc.2011–2016.AllRightsReserved

Hive2withLLAP:ArchitectureOverview

Deep

Storage

YARNCluster

LLAPDaemon

QueryExecutors

LLAPDaemon

QueryExecutors

LLAPDaemon

QueryExecutors

LLAPDaemon

QueryExecutors

QueryCoordinators

Coord-inator

Coord-inator

Coord-inator

HiveServer2(Query

Endpoint)

ODBC/JDBC SQL

Queries In-MemoryCache(SharedAcrossAllUsers)

HDFSandCompaYble S3 WASB Isilon

12 ©HortonworksInc.2011–2016.AllRightsReserved

MR vs Tez vs Tez+LLAP

M M M

R R

M M R

M M

R

M M

R

HDFS

HDFS

HDFS

T T T

R R

R

T T

T

R

M M M

R R

R

M M

R

R

HDFS In-Memorycolumnarcache

Map – Reduce Intermediate results in HDFS

Tez Optimized Pipeline

Tez with LLAP Resident process on Nodes

MaptasksreadHDFS

13 ©HortonworksInc.2011–2016.AllRightsReserved

So…

M M M

R R

R

M M

R

R

Tez

14 ©HortonworksInc.2011–2016.AllRightsReserved

AM

So…

T T T

R R

R

T T

T

R

M M M

R R

R

M M

R

R

Tez Tez with LLAP (auto)

auto

15 ©HortonworksInc.2011–2016.AllRightsReserved

AM

AM

So…

T T T

R R

R

T T

T

R

M M M

R R

R

M M

R

R

Tez Tez with LLAP (auto)

T T T

R R

R

T T

T

R

Tez with LLAP (all)

allauto

16 ©HortonworksInc.2011–2016.AllRightsReserved

Hive2withLLAP:PreliminaryNumbers

0

10

20

30

40

50

60

70

80

q3 q7 q12 q13 q19 q21 q26 q27 q42 q43 q45 q52 q55 q60 q73 q84 q89 q91 q98

Hive2.0andLLAP:TPC-DSat10TBScale,18Nodes

Hive2.0-Tez

LLAP

Minquerydme:Query55:2.38s

17 ©HortonworksInc.2011–2016.AllRightsReserved17 ©HortonworksInc.2011–2016.AllRightsReserved

ACID

18 ©HortonworksInc.2011–2016.AllRightsReserved

KeyFeatures:EDWOffload

Ã  ACIDGAforStreamingandSQL:–  50+stabilizaYonfixes.–  TestedatmulY-terabytescalewithsimultaneousingest,deleteandquery.

Ã  BererBIToolCompaYbilitythroughExpandedOLAPCapabiliYes:–  MulYparYYon-by,mulYorder-by.–  OrderbyUDF/UDAF.–  NullorderspecificaYon(nullsfirstornullslast).

Ã  FasterETLwithMoreScalableParYYonLoads:–  2xfasterdynamicparYYonloads.

Ã  ProceduralExtensions(TechPreview):–  Proceduralstructures:loops,if/else.–  Determinemin/maxparYYonvalues.–  CopydatafromexternalsourceslikeFTP.–  SimplifiesETL/dataloadprocesses.

19 ©HortonworksInc.2011–2016.AllRightsReserved

HCatalog Stream Mutation API

ORCORC

ORCORC

ORCORC

HDFS

Table

Bucket

Bucket

Bucket

ORC

20 ©HortonworksInc.2011–2016.AllRightsReserved20 ©HortonworksInc.2011–2016.AllRightsReserved

SQL Compliance

21 ©HortonworksInc.2011–2016.AllRightsReserved

DataTypes SQLFeatures FileFormats FuturesNumeric CoreSQLFeatures Columnar ProceduralExtensions(PL/SQL)

FLOAT/DOUBLE Date,TimeandArithmeYcalFuncYons ORCFile PrimaryKey/ForeignKeyDECIMAL INNER,OUTER,CROSSandSEMIJoins Parquet Non-EquijoinINT/TINYINT/SMALLINT/BIGINT DerivedTableSubqueries Text ScalableCrossProductBOOLEAN Correlated+UncorrelatedSubqueries CSV EnhancedOLAP

String UNIONALL LogfileCHAR/VARCHAR UDFs,UDAFs,UDTFs Nested/Complex ACIDMERGESTRING CommonTableExpressions Avro MulYSubqueryBINARY UNIONDISTINCT JSON Comparisontosub-select

Date,Time AdvancedAnalydcs XML INTERSECTandEXCEPTDATE OLAPandWindowingFuncYons CustomFormatsTIMESTAMP CUBEandGroupingSets OtherFeaturesIntervalTypes NestedDataAnalydcs XPathAnalyYcs

ComplexTypes NestedDataTraversalARRAY LateralViewsMAP ACIDTransacdonsSTRUCT INSERT/UPDATE/DELETEUNION

ApacheHive:JourneytoSQL:2011Analydcs

LegendExisYng

Projected:HDP3.0

Projected:HDP2.5

TrackHiveSQLComplete:HIVE-13554

22 ©HortonworksInc.2011–2016.AllRightsReserved

EnterpriseSparkatScale

23 ©HortonworksInc.2011–2016.AllRightsReserved

ApacheZeppelinGA:TheDataScienceNotebook

Web-baseddatasciencenotebook

InteracYvedataingesYonanddataexploraYon

EasysharingandcollaboraYon

Securewithsinglesign-onandencrypYon

24 ©HortonworksInc.2011–2016.AllRightsReserved

25 ©HortonworksInc.2011–2016.AllRightsReserved

ApacheSpark2.0(TechnicalPreview)

StructuringSpark:DataFrames,DatasetsandStreaming

InteracYvedataingesYonanddataexploraYon

EasysharingandcollaboraYon

Securewithsinglesign-onandencrypYon

26 ©HortonworksInc.2011–2016.AllRightsReserved

DynamicSecurityPoliciesApacheAtlasandRangerIntegradon

27 ©HortonworksInc.2011–2016.AllRightsReserved

ApacheAtlas+Ranger-PowerfulTogether

28 ©HortonworksInc.2011–2016.AllRightsReserved

DynamicMaskingandRowLevelFiltering

Dept SSN CCNo Name DOB MRN PolicyID01 232323233 4539067047629850 JohnDoe 9/12/1969 8233054331 nj23j424

02 333287465 5391304868205600 JaneDoe 9/13/1969 3736885376 cadsd984

RangerPolicyEnforcement

Dept SSN CCNo MRN Name

01 xxxxx3233 4539xxxxxxxxxxxx null JohnDoe

02 xxxxx7465 5391xxxxxxxxxxxx null JaneDoe

Dept SSN Name MRN01 232323233 JohnDoe 8233054331

MarkeYnggroupsseesCCandSSNasmaskedvaluesandMRNisnullified

Deptemployeeonlyseesdataspecifictothatdepartment

29 ©HortonworksInc.2011–2016.AllRightsReserved

Sqoop

TeradataConnector

ApacheKaRa

Expanded Native Connector: Dataset Lineage

CustomAcdvityReporter

MetadataRepository

RDBMS

30 ©HortonworksInc.2011–2016.AllRightsReserved

ApacheAtlasEnablesBusinessCatalogforEaseofUse

Ã  Organizedataassetsalongbusinessterms–  AuthoritaYve:HierarchicalbusinessTaxonomyCreaYon–  Agilemodeling:ModelConceptual,Logical,Physicalassets–  DefiniYonandassignmentoftagslikePII(Personally

IdenYfiableInformaYon)

Ã  Comprehensivefeaturesforcompliance–  MulYpleuserprofilesincludingDataStewardandBusiness

Analysts–  ObjectaudiYngtotrack“Whodidit”–  MetadataVersioningtotrack”whatdidtheydo”

KeyBenefits:EasywaytocreatebusinessTaxonomyUsefulformulYpleusertypesincludingDataStewardandBusinessAnalystsComprehensivefeaturesforcompliance

31 ©HortonworksInc.2011–2016.AllRightsReserved

BusinessCatalog ModelandexploremetadataviathenewBusinessCataloginApacheAtlas

DataSteward

32 ©HortonworksInc.2011–2016.AllRightsReserved

RealTimeApplicadonspoweredbyStormandHBase/Phoenix

33 ©HortonworksInc.2011–2016.AllRightsReserved

What’sNewinStorm

DeveloperProducYvity:Slidingandtumblingwindowingsupport

DeveloperProducYvity:NewconnectorsforsearchandNoSQLDatabase

EnterpriseReadiness:AutomaYcbackpressure

StreamlinedOperaYons:ResourceawareschedulingandStormviewforAmbari

34 ©HortonworksInc.2011–2016.AllRightsReserved

What’sNewinHBaseandPhoenix

DeveloperProducYvity:PhoenixandHiveIntegraYontorunHBASEqueriesinHIVE

EnterpriseReadiness:IncrementalBackup/Restore

EnterpriseReadiness:Performanceboostforhigh-scaleloads

DeveloperProducYvity:AdHocAnalyYcswithconnectortoanyODBCBItool

35 ©HortonworksInc.2011–2016.AllRightsReserved

StreamlinedOperadonsApacheAmbari

36 ©HortonworksInc.2011–2016.AllRightsReserved

StreamlinedOperadonsPhase1:AdvancedMetricsVisualizaYon&Dashboarding

AmbariMetricsSystem

AMBAR I Grafana

Goal:Quicklyunderstandclusterhealthmetricsandkeyperformanceindicators⬢  Capabilides

–  CentralizedDashboardingfocusingoncomponentHealth&Performance

–  Ad-HocGraphCreaYon

⬢  Pre-BuiltDashboards–  HDFS–  YARN–  HBase

⬢  CoreTechnologies–  AmbariMetricsSystem–  Grafana

37 ©HortonworksInc.2011–2016.AllRightsReserved

Ambarinowincludespre-builtdashboardsforvisualizingthemostimportantclusterhealth.

38 ©HortonworksInc.2011–2016.AllRightsReserved

StreamlinedOperadonsPhase2:ConsolidatedClusterAcYvityReporYng

Goal:Quicklyvisualizeandreportonhowbusinessusersandtenantsareusingthecluster,top10queue’s,users,mostBmeconsumingjobs⬢  Capabilides

–  TopKAcYvityReporYng–  Chargeback

⬢  ServicesCovered–  YARN–  MapReduce–  Hive/Tez–  Spark–  HDFS

⬢  CoreTechnologies–  HortonworksSmartSense–  ApacheZeppelin

SmartSenseAMBAR I

AmbariMetricsSystem

Zeppelin

39 ©HortonworksInc.2011–2016.AllRightsReserved

AcdvityExplorer:ClusterUdlizadonRepordng

40 ©HortonworksInc.2011–2016.AllRightsReserved

Preview:StreamlinedOperadonsInvestments

Solr

AMBAR I Log

Search

Phase3:Centralized&ContextualLogSearch

Goal:Whenissuesarise,beabletoquicklyfindissuesacrossallHDPcomponents⬢  Capabilides

–  RapidSearchofallHDPcomponentlogs–  SearchacrossYmeranges,loglevels,andforkeywords

⬢  CoreTechnologies:–  ApacheAmbari–  ApacheSolr–  ApacheAmbariLogSearch

41 ©HortonworksInc.2011–2016.AllRightsReserved

HortonworksDataCloud

42 ©HortonworksInc.2011–2016.AllRightsReserved

Abstract: Governance and Security in Cloud Today’stransportaYonmarketplaceiscompeYYveandquicklyevolving.Ouen,unexpectedregulaYonscanposeaseriousrisktooperaYonsandtheboromline.WithHortonworksDataCloud(HDC),we’llshowhowtogainagilityinadapYngtonewchallengesthatcanturnproblemsintoopportuniYes.•  QuicklyprovisionanewanalyYccloudenviroment•  ClassifyandTagassetstofindandunderstandyourdata•  SecurityandAuditsservicetomeetcompliancerequirements

43 ©HortonworksInc.2011–2016.AllRightsReserved

44 ©HortonworksInc.2011–2016.AllRightsReserved

Learn More

http://hortonworks.github.io/hdp-aws/index.html

45 ©HortonworksInc.2011–2016.AllRightsReserved

HortonworksHDBPoweredbyApacheHAWQ

46 ©HortonworksInc.2011–2016.AllRightsReserved

WhatisHDB/ApacheHAWQ?

Hadoop-native SQL query engine and advanced analytics MPP database

that offers high-performance interactive ANSI SQL query execution

and machine learning for Data Analysts & Data Scientists who want to find insights from large/complex

datasets.

HORTONWORKS

HDB powered by Apache HAWQ

47 ©HortonworksInc.2011–2016.AllRightsReserved

HortonworksHDBPoweredByApacheHAWQ

1.  Interactive query performance •  Queryperformanceinseconds •  Compatible with any ANSI SQL compliant BI Tool •  Largernumberofconcurrentusers

2.  MADlib big data Machine Learning in SQL for data scientists and data analysts •  Classification e.g. predict loan default •  Regression e.g. predict value of a sale •  Clustering e.g. marketing campaign segmentation, …

3.  Data federation using HAWQ Extension Framework •  SQL queries against otherdatasources

BI Tool X

BI Tool Y

BI Tool Z

HDP HORTONWORKS DATA PLATFORM

HORTONWORKS

HDB

SQL-89 SQL-92 SQL-2003

48 ©HortonworksInc.2011–2016.AllRightsReserved

Advanced Analytics Performance

Exceptional MPP performance, low latency, high scalability, ACID reliability,

fault tolerance

Most Complete Language Compliance

Higher degree of SQL compatibility, SQL-92, 99, 2003, OLAP, leverage

existing SQL skills

Best-in-class Query Optimizer

Maximize performance and do advanced queries with confidence

Elastic Architecture for Scalability

Scale-up/down or scale-in/out, expand/shrink clusters on the fly

Tightly integrated w/MADlib Machine

Learning Advanced MPP analytics, data science at

scale, directly on Hadoop data

HDB/HAWQAdvantages

MAD

49 ©HortonworksInc.2011–2016.AllRightsReserved

NewinHDF2.0

50 ©HortonworksInc.2011–2016.AllRightsReserved

NewFeaturesofHDF2.0Ã EnterpriseproducYvityviastreamlinedoperaYons

– AmbariIntegraYonofApacheNiFi,Kava,Storm– ApacheRangerauthorizaYon– Modernized,moreintuiYveUI– MulY-tenancyofdataflows

Ã 170+processors,30%morethaninApacheNiFi1.0

Ã EdgeintelligencewithApacheMiNiFiÃ  IncreasedsecurityopYonswithApacheKava0.10

Ã 10XstreaminganalyYcsperformance,windowingandproducYvitytoolswithApacheStorm1.0

51 ©HortonworksInc.2011–2016.AllRightsReserved

AmbariIntegradon

52 ©HortonworksInc.2011–2016.AllRightsReserved

ComprehensiveStorm-AmbariViews

53 ©HortonworksInc.2011–2016.AllRightsReserved

Muld-tenantAuthorizadon

ReadPermission

54 ©HortonworksInc.2011–2016.AllRightsReserved

Muld-tenantAuthorizadon

NOReadPermission(talkaboutlevels,whereyoucanassignpermissions)

55 ©HortonworksInc.2011–2016.AllRightsReserved

HDF2.0has170+Processors,30%IncreasefromHDF1.2

Hash

Extract

Merge

Duplicate

Scan

GeoEnrich

Replace

ConvertSplit

Translate

RouteContent

RouteContext

RouteText

ControlRate

DistributeLoad

GenerateTableFetch

JoltTransformJSON

PrioridzedDelivery

Encrypt

Tail

Evaluate

Execute

HL7

FTP

UDP

XML

SFTP

HTTP

Syslog

Email

HTML

Image

AMQP

MQTT

AllApacheprojectlogosaretrademarksoftheASFandtherespecYveprojects.

Fetch

56 ©HortonworksInc.2011–2016.AllRightsReserved

EdgeIntelligencewithApacheMiNiFi

Ã  GuaranteeddeliveryÃ  Databuffering

‒  Backpressure‒  Pressurerelease

Ã  PrioriYzedqueuingÃ  FlowspecificQoS

‒  Latencyvs.throughput‒  Losstolerance

Ã  Dataprovenance

Ã  Recovery/recordingarollinglogoffine-grainedhistory

Ã  DesignedforextensionrÃ  SmallFootprint(~40MB)r

KeyFeatures

57 ©HortonworksInc.2011–2016.AllRightsReserved

NewStreamProcessingFeaturesHDF2.0

Ã  NewStormConnectors

Ã  Storm-KavaSpoutusingnewclientAPIs

Ã  StormDistributedLogSearch

Ã  StormDynamicWorkerProfiling

Ã  KavaGrafanaIntegraYon

Ã  StormGrafanaIntegraYon

Ã  ImprovedNimbusHA

Ã  StormAutomaYcBackPressure

Ã  StormDistributedcache

Ã  StormWindowingandStateManagement

Ã  StormPerformanceimprovements

Ã  ImprovedKavaSASL

Ã  StormTopologyEventinspector

Ã  StormResourceAwareScheduling

Ã  StormDynamicLogLevels

Ã  PacemakerStormDaemon

Ã  KavaRackAwareness

DeveloperProducdvity EnterpriseReadiness OperadonalSimplicity

58 ©HortonworksInc.2011–2016.AllRightsReserved

ThankYou