HDP2.5 Updates
-
Upload
yuta-imai -
Category
Technology
-
view
628 -
download
0
Transcript of HDP2.5 Updates
1 ©HortonworksInc.2011–2016.AllRightsReserved
HortonworksDataPla.ormUpdatesYutaImai,Hortonworks
2 ©HortonworksInc.2011–2016.AllRightsReserved
HortonworksDataPla.orm
3 ©HortonworksInc.2011–2016.AllRightsReserved
HortonworksDataPla.orm:ReleaseStrategy
MorefrequentreleasesofSpark,Hive,AmbariandotherApacheDataAccessprojects
ExtendedServices
LongerreleasearcsforcoreApacheHadoopcomponents:HDFS,YARNandMapReduce
HadoopCore2016 2017
2016 2017
4 ©HortonworksInc.2011–2016.AllRightsReserved
HORTONWORKSDATAPLATFORM
Had
oop
&YAR
N
Flume
Oozie
HDP2.3isApacheHadoop;not“basedon”Hadoop
Pig
Hive
Tez
Sqo
op
Cloud
break
Amba
ri
Slid
er
KaR
a
Kno
x
Solr
Zoo
keep
er
Spa
rk
Falcon
Ran
ger
HBa
se
Atla
s
Accum
ulo
Storm
Pho
enix
4.10.2
DATAMGMT DATAACCESS GOVERNANCE&INTEGRATION OPERATIONS SECURITY
HDP2.2Dec2014
HDP2.1April2014
HDP2.0Oct2013
HDP2.2Dec2014
HDP2.1April2014
HDP2.0Oct2013
0.12.0 0.12.0
0.12.1 0.13.0 0.4.0
1.4.4 1.4.4 3.3.23.4.5
0.4.00.5.0
0.14.0 0.14.0 3.4.6 0.5.0 0.4.00.9.30.5.2
4.0.04.7.2
1.2.1 0.60.0 0.98.4 4.2.0 1.6.1 0.6.0 1.5.21.4.5 4.1.02.0.0
1.4.0 1.5.1 4.0.0
1.3.1
1.5.1 1.4.4 3.4.5
2.2.0
2.4.0
2.6.0
2.7.1 1.4.6 1.0.0 0.6.0 0.5.02.1.00.8.2 3.4.61.5.25.2.1 0.80.0 0.5.01.7.04.4.0 0.10.0 0.6.10.7.01.2.10.15.0HDP2.3Oct2015 4.2.0
0.96.1
0.98.0 0.9.1
0.8.1
1.4.1 1.1.2
2.7.3 1.4.6 1.3.0 0.9.0 0.6.02.4.00.10.0 3.4.61.5.25.5.1 0.91.0 0.7.01.7.04.7.0 1.0.1 0.10.00.7.01.2.1+2.1***0.16.0HDP2.5*
2H20164.2.01.6.2+
2.0** 1.1.2
2.7.1 1.4.6 1.2.0 0.6.0 0.5.02.2.10.9.0 3.4.61.5.25.2.1 0.80.0 0.5.01.7.04.4.0 0.10.0 0.6.10.7.01.2.10.15.0HDP2.4Mar2016 4.2.01.6.0 1.1.2
Zep
pelin
OngoingInnovadoninApache
0.6.0
*HDP2.5–ShowscurrentApachebranchesbeingused.FinalcomponentversionsubjecttochangebasedonApachereleaseprocess.
**Spark1.6.2+Spark2.0–HDP2.5supportinstallaEonofbothSpark1.6.2andSpark2.0.Spark2.0isTechnicalPreviewwithinHDP2.5.
***Hive2.1isTechnicalPreviewwithinHDP2.5.
5 ©HortonworksInc.2011–2016.AllRightsReserved
HortonworksDataPla.orm2.5KeyHighlights
• InteracYveQueryinSeconds:HivewithLLAP(TechnicalPreview)• EnterpriseSparkatScale:ApacheZeppelinNotebookforSpark• Real-TimeApplicaYons:StormandHBase/Phoenix• StreamlinedOperaYons:ApacheAmbari• DynamicSecurity:ApacheAtlas+RangerIntegraYon• HortonworksDataCloud(TechnicalPreview)• HortonworksHDB(ApacheHAWQ)
6 ©HortonworksInc.2011–2016.AllRightsReserved
InteracdveQueryinSecondsHivewithLLAPTechnicalPreview
7 ©HortonworksInc.2011–2016.AllRightsReserved7 ©HortonworksInc.2011–2016.AllRightsReserved
LLAP
8 ©HortonworksInc.2011–2016.AllRightsReserved
Hive2withLLAPEnableInteracdveQueryInSeconds
DeveloperProducYvity:InteracYvequeryinseconds
EaseofUseandAdopYon:100%compaYblewithHiveSQL
EnterpriseReadiness:LinearscalingatTerabytesvolumeofdata
StreamlinedOperaYons:LLAPintegraYonwithAmbariwithautomateddashboards
9 ©HortonworksInc.2011–2016.AllRightsReserved
Why LLAP? • PeoplelikeHive• Disk->Memisgehngfurtheraway
– CloudStorageisn’tco-located– DisksareconnectedtotheCPUvianetwork
• Securitylandscapeischanging– Cells&Columnsarethenewsecurityboundary,notfiles– Safelymaskingcolumnsneedsaprocessboundary
• Concurrency,Performance&Scaleareatconflict– Concurrencyat100kqueries/hour– Latenciesat2-5seconds/query– Petabytescalewarehouses(withterabytesof“hot”data)
Node
LLAP Process
Cache
Query Fragment
HDFS
Query Fragment
10 ©HortonworksInc.2011–2016.AllRightsReserved
What is LLAP? • Hybrid model combining daemons and containers
for fast, concurrent execution of analytical workloads (e.g. Hive SQL queries)
• ConcurrentquerieswithoutspecializedYARNqueuesetup• MulY-threadedexecuYonofvectorizedoperatorpipelines
• Asynchronous IO and efficient in-memory caching • Relational view of the data available thru the API • Highperformancescans,execuYoncodepushdown• Centralizeddatasecurity
Node
LLAP Process
Cache
Query Fragment
HDFS
Query Fragment
11 ©HortonworksInc.2011–2016.AllRightsReserved
Hive2withLLAP:ArchitectureOverview
Deep
Storage
YARNCluster
LLAPDaemon
QueryExecutors
LLAPDaemon
QueryExecutors
LLAPDaemon
QueryExecutors
LLAPDaemon
QueryExecutors
QueryCoordinators
Coord-inator
Coord-inator
Coord-inator
HiveServer2(Query
Endpoint)
ODBC/JDBC SQL
Queries In-MemoryCache(SharedAcrossAllUsers)
HDFSandCompaYble S3 WASB Isilon
12 ©HortonworksInc.2011–2016.AllRightsReserved
MR vs Tez vs Tez+LLAP
M M M
R R
M M R
M M
R
M M
R
HDFS
HDFS
HDFS
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
HDFS In-Memorycolumnarcache
Map – Reduce Intermediate results in HDFS
Tez Optimized Pipeline
Tez with LLAP Resident process on Nodes
MaptasksreadHDFS
13 ©HortonworksInc.2011–2016.AllRightsReserved
So…
M M M
R R
R
M M
R
R
Tez
14 ©HortonworksInc.2011–2016.AllRightsReserved
AM
So…
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
Tez Tez with LLAP (auto)
auto
15 ©HortonworksInc.2011–2016.AllRightsReserved
AM
AM
So…
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
Tez Tez with LLAP (auto)
T T T
R R
R
T T
T
R
Tez with LLAP (all)
allauto
16 ©HortonworksInc.2011–2016.AllRightsReserved
Hive2withLLAP:PreliminaryNumbers
0
10
20
30
40
50
60
70
80
q3 q7 q12 q13 q19 q21 q26 q27 q42 q43 q45 q52 q55 q60 q73 q84 q89 q91 q98
Hive2.0andLLAP:TPC-DSat10TBScale,18Nodes
Hive2.0-Tez
LLAP
Minquerydme:Query55:2.38s
17 ©HortonworksInc.2011–2016.AllRightsReserved17 ©HortonworksInc.2011–2016.AllRightsReserved
ACID
18 ©HortonworksInc.2011–2016.AllRightsReserved
KeyFeatures:EDWOffload
à ACIDGAforStreamingandSQL:– 50+stabilizaYonfixes.– TestedatmulY-terabytescalewithsimultaneousingest,deleteandquery.
à BererBIToolCompaYbilitythroughExpandedOLAPCapabiliYes:– MulYparYYon-by,mulYorder-by.– OrderbyUDF/UDAF.– NullorderspecificaYon(nullsfirstornullslast).
à FasterETLwithMoreScalableParYYonLoads:– 2xfasterdynamicparYYonloads.
à ProceduralExtensions(TechPreview):– Proceduralstructures:loops,if/else.– Determinemin/maxparYYonvalues.– CopydatafromexternalsourceslikeFTP.– SimplifiesETL/dataloadprocesses.
19 ©HortonworksInc.2011–2016.AllRightsReserved
HCatalog Stream Mutation API
ORCORC
ORCORC
ORCORC
HDFS
Table
Bucket
Bucket
Bucket
ORC
20 ©HortonworksInc.2011–2016.AllRightsReserved20 ©HortonworksInc.2011–2016.AllRightsReserved
SQL Compliance
21 ©HortonworksInc.2011–2016.AllRightsReserved
DataTypes SQLFeatures FileFormats FuturesNumeric CoreSQLFeatures Columnar ProceduralExtensions(PL/SQL)
FLOAT/DOUBLE Date,TimeandArithmeYcalFuncYons ORCFile PrimaryKey/ForeignKeyDECIMAL INNER,OUTER,CROSSandSEMIJoins Parquet Non-EquijoinINT/TINYINT/SMALLINT/BIGINT DerivedTableSubqueries Text ScalableCrossProductBOOLEAN Correlated+UncorrelatedSubqueries CSV EnhancedOLAP
String UNIONALL LogfileCHAR/VARCHAR UDFs,UDAFs,UDTFs Nested/Complex ACIDMERGESTRING CommonTableExpressions Avro MulYSubqueryBINARY UNIONDISTINCT JSON Comparisontosub-select
Date,Time AdvancedAnalydcs XML INTERSECTandEXCEPTDATE OLAPandWindowingFuncYons CustomFormatsTIMESTAMP CUBEandGroupingSets OtherFeaturesIntervalTypes NestedDataAnalydcs XPathAnalyYcs
ComplexTypes NestedDataTraversalARRAY LateralViewsMAP ACIDTransacdonsSTRUCT INSERT/UPDATE/DELETEUNION
ApacheHive:JourneytoSQL:2011Analydcs
LegendExisYng
Projected:HDP3.0
Projected:HDP2.5
TrackHiveSQLComplete:HIVE-13554
22 ©HortonworksInc.2011–2016.AllRightsReserved
EnterpriseSparkatScale
23 ©HortonworksInc.2011–2016.AllRightsReserved
ApacheZeppelinGA:TheDataScienceNotebook
Web-baseddatasciencenotebook
InteracYvedataingesYonanddataexploraYon
EasysharingandcollaboraYon
Securewithsinglesign-onandencrypYon
24 ©HortonworksInc.2011–2016.AllRightsReserved
25 ©HortonworksInc.2011–2016.AllRightsReserved
ApacheSpark2.0(TechnicalPreview)
StructuringSpark:DataFrames,DatasetsandStreaming
InteracYvedataingesYonanddataexploraYon
EasysharingandcollaboraYon
Securewithsinglesign-onandencrypYon
26 ©HortonworksInc.2011–2016.AllRightsReserved
DynamicSecurityPoliciesApacheAtlasandRangerIntegradon
27 ©HortonworksInc.2011–2016.AllRightsReserved
ApacheAtlas+Ranger-PowerfulTogether
28 ©HortonworksInc.2011–2016.AllRightsReserved
DynamicMaskingandRowLevelFiltering
Dept SSN CCNo Name DOB MRN PolicyID01 232323233 4539067047629850 JohnDoe 9/12/1969 8233054331 nj23j424
02 333287465 5391304868205600 JaneDoe 9/13/1969 3736885376 cadsd984
RangerPolicyEnforcement
Dept SSN CCNo MRN Name
01 xxxxx3233 4539xxxxxxxxxxxx null JohnDoe
02 xxxxx7465 5391xxxxxxxxxxxx null JaneDoe
Dept SSN Name MRN01 232323233 JohnDoe 8233054331
MarkeYnggroupsseesCCandSSNasmaskedvaluesandMRNisnullified
Deptemployeeonlyseesdataspecifictothatdepartment
29 ©HortonworksInc.2011–2016.AllRightsReserved
Sqoop
TeradataConnector
ApacheKaRa
Expanded Native Connector: Dataset Lineage
CustomAcdvityReporter
MetadataRepository
RDBMS
30 ©HortonworksInc.2011–2016.AllRightsReserved
ApacheAtlasEnablesBusinessCatalogforEaseofUse
à Organizedataassetsalongbusinessterms– AuthoritaYve:HierarchicalbusinessTaxonomyCreaYon– Agilemodeling:ModelConceptual,Logical,Physicalassets– DefiniYonandassignmentoftagslikePII(Personally
IdenYfiableInformaYon)
à Comprehensivefeaturesforcompliance– MulYpleuserprofilesincludingDataStewardandBusiness
Analysts– ObjectaudiYngtotrack“Whodidit”– MetadataVersioningtotrack”whatdidtheydo”
KeyBenefits:EasywaytocreatebusinessTaxonomyUsefulformulYpleusertypesincludingDataStewardandBusinessAnalystsComprehensivefeaturesforcompliance
31 ©HortonworksInc.2011–2016.AllRightsReserved
BusinessCatalog ModelandexploremetadataviathenewBusinessCataloginApacheAtlas
DataSteward
32 ©HortonworksInc.2011–2016.AllRightsReserved
RealTimeApplicadonspoweredbyStormandHBase/Phoenix
33 ©HortonworksInc.2011–2016.AllRightsReserved
What’sNewinStorm
DeveloperProducYvity:Slidingandtumblingwindowingsupport
DeveloperProducYvity:NewconnectorsforsearchandNoSQLDatabase
EnterpriseReadiness:AutomaYcbackpressure
StreamlinedOperaYons:ResourceawareschedulingandStormviewforAmbari
34 ©HortonworksInc.2011–2016.AllRightsReserved
What’sNewinHBaseandPhoenix
DeveloperProducYvity:PhoenixandHiveIntegraYontorunHBASEqueriesinHIVE
EnterpriseReadiness:IncrementalBackup/Restore
EnterpriseReadiness:Performanceboostforhigh-scaleloads
DeveloperProducYvity:AdHocAnalyYcswithconnectortoanyODBCBItool
35 ©HortonworksInc.2011–2016.AllRightsReserved
StreamlinedOperadonsApacheAmbari
36 ©HortonworksInc.2011–2016.AllRightsReserved
StreamlinedOperadonsPhase1:AdvancedMetricsVisualizaYon&Dashboarding
AmbariMetricsSystem
AMBAR I Grafana
Goal:Quicklyunderstandclusterhealthmetricsandkeyperformanceindicators⬢ Capabilides
– CentralizedDashboardingfocusingoncomponentHealth&Performance
– Ad-HocGraphCreaYon
⬢ Pre-BuiltDashboards– HDFS– YARN– HBase
⬢ CoreTechnologies– AmbariMetricsSystem– Grafana
37 ©HortonworksInc.2011–2016.AllRightsReserved
Ambarinowincludespre-builtdashboardsforvisualizingthemostimportantclusterhealth.
38 ©HortonworksInc.2011–2016.AllRightsReserved
StreamlinedOperadonsPhase2:ConsolidatedClusterAcYvityReporYng
Goal:Quicklyvisualizeandreportonhowbusinessusersandtenantsareusingthecluster,top10queue’s,users,mostBmeconsumingjobs⬢ Capabilides
– TopKAcYvityReporYng– Chargeback
⬢ ServicesCovered– YARN– MapReduce– Hive/Tez– Spark– HDFS
⬢ CoreTechnologies– HortonworksSmartSense– ApacheZeppelin
SmartSenseAMBAR I
AmbariMetricsSystem
Zeppelin
39 ©HortonworksInc.2011–2016.AllRightsReserved
AcdvityExplorer:ClusterUdlizadonRepordng
40 ©HortonworksInc.2011–2016.AllRightsReserved
Preview:StreamlinedOperadonsInvestments
Solr
AMBAR I Log
Search
Phase3:Centralized&ContextualLogSearch
Goal:Whenissuesarise,beabletoquicklyfindissuesacrossallHDPcomponents⬢ Capabilides
– RapidSearchofallHDPcomponentlogs– SearchacrossYmeranges,loglevels,andforkeywords
⬢ CoreTechnologies:– ApacheAmbari– ApacheSolr– ApacheAmbariLogSearch
41 ©HortonworksInc.2011–2016.AllRightsReserved
HortonworksDataCloud
42 ©HortonworksInc.2011–2016.AllRightsReserved
Abstract: Governance and Security in Cloud Today’stransportaYonmarketplaceiscompeYYveandquicklyevolving.Ouen,unexpectedregulaYonscanposeaseriousrisktooperaYonsandtheboromline.WithHortonworksDataCloud(HDC),we’llshowhowtogainagilityinadapYngtonewchallengesthatcanturnproblemsintoopportuniYes.• QuicklyprovisionanewanalyYccloudenviroment• ClassifyandTagassetstofindandunderstandyourdata• SecurityandAuditsservicetomeetcompliancerequirements
43 ©HortonworksInc.2011–2016.AllRightsReserved
44 ©HortonworksInc.2011–2016.AllRightsReserved
Learn More
http://hortonworks.github.io/hdp-aws/index.html
45 ©HortonworksInc.2011–2016.AllRightsReserved
HortonworksHDBPoweredbyApacheHAWQ
46 ©HortonworksInc.2011–2016.AllRightsReserved
WhatisHDB/ApacheHAWQ?
Hadoop-native SQL query engine and advanced analytics MPP database
that offers high-performance interactive ANSI SQL query execution
and machine learning for Data Analysts & Data Scientists who want to find insights from large/complex
datasets.
HORTONWORKS
HDB powered by Apache HAWQ
47 ©HortonworksInc.2011–2016.AllRightsReserved
HortonworksHDBPoweredByApacheHAWQ
1. Interactive query performance • Queryperformanceinseconds • Compatible with any ANSI SQL compliant BI Tool • Largernumberofconcurrentusers
2. MADlib big data Machine Learning in SQL for data scientists and data analysts • Classification e.g. predict loan default • Regression e.g. predict value of a sale • Clustering e.g. marketing campaign segmentation, …
3. Data federation using HAWQ Extension Framework • SQL queries against otherdatasources
BI Tool X
BI Tool Y
BI Tool Z
HDP HORTONWORKS DATA PLATFORM
HORTONWORKS
HDB
SQL-89 SQL-92 SQL-2003
48 ©HortonworksInc.2011–2016.AllRightsReserved
Advanced Analytics Performance
Exceptional MPP performance, low latency, high scalability, ACID reliability,
fault tolerance
Most Complete Language Compliance
Higher degree of SQL compatibility, SQL-92, 99, 2003, OLAP, leverage
existing SQL skills
Best-in-class Query Optimizer
Maximize performance and do advanced queries with confidence
Elastic Architecture for Scalability
Scale-up/down or scale-in/out, expand/shrink clusters on the fly
Tightly integrated w/MADlib Machine
Learning Advanced MPP analytics, data science at
scale, directly on Hadoop data
HDB/HAWQAdvantages
MAD
49 ©HortonworksInc.2011–2016.AllRightsReserved
NewinHDF2.0
50 ©HortonworksInc.2011–2016.AllRightsReserved
NewFeaturesofHDF2.0Ã EnterpriseproducYvityviastreamlinedoperaYons
– AmbariIntegraYonofApacheNiFi,Kava,Storm– ApacheRangerauthorizaYon– Modernized,moreintuiYveUI– MulY-tenancyofdataflows
à 170+processors,30%morethaninApacheNiFi1.0
à EdgeintelligencewithApacheMiNiFià IncreasedsecurityopYonswithApacheKava0.10
à 10XstreaminganalyYcsperformance,windowingandproducYvitytoolswithApacheStorm1.0
51 ©HortonworksInc.2011–2016.AllRightsReserved
AmbariIntegradon
52 ©HortonworksInc.2011–2016.AllRightsReserved
ComprehensiveStorm-AmbariViews
53 ©HortonworksInc.2011–2016.AllRightsReserved
Muld-tenantAuthorizadon
ReadPermission
54 ©HortonworksInc.2011–2016.AllRightsReserved
Muld-tenantAuthorizadon
NOReadPermission(talkaboutlevels,whereyoucanassignpermissions)
55 ©HortonworksInc.2011–2016.AllRightsReserved
HDF2.0has170+Processors,30%IncreasefromHDF1.2
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
RouteContent
RouteContext
RouteText
ControlRate
DistributeLoad
GenerateTableFetch
JoltTransformJSON
PrioridzedDelivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
HTML
Image
AMQP
MQTT
AllApacheprojectlogosaretrademarksoftheASFandtherespecYveprojects.
Fetch
56 ©HortonworksInc.2011–2016.AllRightsReserved
EdgeIntelligencewithApacheMiNiFi
à Guaranteeddeliveryà Databuffering
‒ Backpressure‒ Pressurerelease
à PrioriYzedqueuingà FlowspecificQoS
‒ Latencyvs.throughput‒ Losstolerance
à Dataprovenance
à Recovery/recordingarollinglogoffine-grainedhistory
à Designedforextensionrà SmallFootprint(~40MB)r
KeyFeatures
57 ©HortonworksInc.2011–2016.AllRightsReserved
NewStreamProcessingFeaturesHDF2.0
à NewStormConnectors
à Storm-KavaSpoutusingnewclientAPIs
à StormDistributedLogSearch
à StormDynamicWorkerProfiling
à KavaGrafanaIntegraYon
à StormGrafanaIntegraYon
à ImprovedNimbusHA
à StormAutomaYcBackPressure
à StormDistributedcache
à StormWindowingandStateManagement
à StormPerformanceimprovements
à ImprovedKavaSASL
à StormTopologyEventinspector
à StormResourceAwareScheduling
à StormDynamicLogLevels
à PacemakerStormDaemon
à KavaRackAwareness
DeveloperProducdvity EnterpriseReadiness OperadonalSimplicity
58 ©HortonworksInc.2011–2016.AllRightsReserved
ThankYou