Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse...
Transcript of Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse...
1©Cloudera,Inc.Allrightsreserved.
ModernizingBusinessIntelligenceandAnalytics
1©Cloudera, Inc.Allrightsreserved.
JustinEricksonSeniorDirector,ProductManagement
2©Cloudera,Inc.Allrightsreserved.
•WhatbenefitscanIachievefrommodernizingmyanalyticDB?•WhenandhowdoImigratefromcurrentsystems?• Howdoesitworkinthecloud?
Agenda
3©Cloudera,Inc.Allrightsreserved.
EDWOptimization
DataPreparation
Self-ServiceBI&Exploration
UseyourEDWmoreefficientlybyoffloadingworkloadstoHadoop
Fast,flexibleETLoverlargedatavolumes,sodataisalwaysreadyforyourbusiness
Fastesttime-to-insightswithamodernanalyticdatabasedesignedwithHadoop’sflexibilityandagility
KeyApplications
4©Cloudera,Inc.Allrightsreserved.
Cloudera’sAnalyticDatabase
Identify,offload,&optimizeworkloadsto
Hadoop
NavigatorOptimizer
IntelligentSQLeditor
Hue
Audit,lineage,encryption,key
management,&policylifecycles
Navigator
IntegrationwiththeleadingBItools
BIPartners
InteractivequeryengineforBI&SQLanalytics
Impala
Large-scaleETL&batchprocessingengine
Hive-on-Spark
Multi-Storage,Multi-Environment
DataStorageforFast&ChangingData
Kudu
5©Cloudera,Inc.Allrightsreserved.
KeyBenefitsAnanalyticdatabasedesignedforHadoop
High-PerformanceBIandSQLAnalytics
FlexibilityforDataandUseCaseVariety
Cost-effectiveScaleforTodayandTomorrow
GoBeyondSQLwithanOpenArchitecture
6©Cloudera,Inc.Allrightsreserved.
AnalyticDBAnatomyBuiltforself-serviceandhybridcloud
7©Cloudera,Inc.Allrightsreserved.
AnatomyofanAnalyticDatabaseCloudera DecoupledbyDesign
QueryEngine
StorageEngine
Catalog
QueryEngine(Impala)
Catalog(HMS)
MonolithicAnalyticDatabase ModernAnalyticDatabase
Storage(Kudu)
Storage(S3)
Storage(HDFS)
8©Cloudera,Inc.Allrightsreserved.
LimitedtoSQLonly• Maintaindatacopiesfornon-SQL
RigidDataModel• Tightlycoupledstorageandcompute
StaticSizing• Majormaintenancetoaddcapacity/nodes
PoorlyDesignedforCloud• Noelasticityorintegrationwithobjectstorage
PainPointsTraditionalMonolithicAnalyticDatabases
∞
COMPUTESTORE
9©Cloudera,Inc.Allrightsreserved.
Benefits ofCloudera’sModernApproachCloud-Native&On-Premise
GoBeyondSQL• OpenArchitecture:Openformatsandopenstorage
• ShareddataacrossSQLandnon-SQLworkloads
DataFlexibility• Faster,moreagiledataacquisition• Dataportability:Openformatsandopenstorage
Cost-EffectiveScalability• Elasticscaleon-premorinthecloud
• Cloud-nativepay-per-useandtransience
• Provenatbigdatascale
Hybrid• Runsacrossmulti-cloud&on-prem
• Multi-storageoverS3,HDFS,Kudu,Isilon,DSSD,etcSharedData
10©Cloudera,Inc.Allrightsreserved.
EDWOptimizationExpandtheValueofYourDataWarehousingLandscape
11©Cloudera,Inc.Allrightsreserved.
MotivationsforOptimizingtheEDW
CostcontainmentforexistingworkloadsLimitedbudgetforexpansion
UnabletotakeonnewworkloadsUnabletokeepupwithchangingbusinessneeds
Difficultyhandlingbothfixed-SLAreportsandself-serviceexploration
Growingimportanceofself-serviceBI,advancedanalytics,andcloud
$$
12©Cloudera,Inc.Allrightsreserved.
ExistingEDWLandscape
DataSources
ETL/Staging
EDW
Archive
DataMarts
CannedReports
Dashboards/AnalyticApplications
Non-SQLWorkloads
Self-ServiceBI/AdHoc
13©Cloudera,Inc.Allrightsreserved.
OptimizingtheEDWwithCloudera
• Cost-EffectiveScale• Sayyestomorewithouttherisk
• GoBeyondSQL• Exploration,advancedanalytics,andmoreallinoneplatform
•ModernizetheDataWarehouseLandscape• MaximizetheEDWwhileenablingiterative,self-serviceaccess/BI• Well-suitedforon-prem,cloud,andhybriddeployments
90%lessperTBvsRDBMSand75%lessvsNetezza
Augmented itsOracleEDWwithmulti-tenantClouderasystemwiththeirBItoolconfiguredtoallowuserstopullreportsfromboth
MediaResearchFirmSavedtensofmillionsbyoffloadingDBMStoClouderainthecloud
14©Cloudera,Inc.Allrightsreserved.
ModernDataWarehouseEnvironment
DataSources
EDW
AnalyticDatabase
OperationalDatabase
DataScience&Engineering
SharedDataLayer
ModernDataPlatform
FixedReports
Dashboards/AnalyticApplications
Non-SQLWorkloads
Self-ServiceBI/AdHoc
FlexibleReporting
15©Cloudera,Inc.Allrightsreserved.
Plan Offload Optimize
EstimateEffort
RiskAnalysis
SchemaDesign
FineTuningDataModelonHadoop
OptimizeQueriesforPerformance
Test&Validate
Evaluate
IdentifyUseCases
ImpactAnalysis
Objectives PrioritizedPlan
ValidateROI,CostInitialPOC
OffloadeachworkloadEvaluatetheneedforoffload Impactanalysis,prioritizedplan
Optimizeperformance
WorkloadVisibility
NavigatorOptimizerBuilttohelpyouthroughtheoptimizationprocess
OffloadActions
16©Cloudera,Inc.Allrightsreserved.
WorkloadVisibilityGetinsightsintowhat’shappeningtoday
EvaluateQueries• Topqueries• Queryduplication• Querycomplexity• Commonaccesspatterns
EvaluateDataAccess• Toptables,topcolumns• Usage-basedERdiagram• Alltables/columnsinuse
EvaluatePOC• IdentifyinitialworkloadpieceforPoC• Getpartitioningkeysuggestions
Evaluate
17©Cloudera,Inc.Allrightsreserved.
ImpactAnalysis&PrioritizedPlanUnderstandwhatittakestooffload
ImpactAnalysis• Focuseffortsbyidentifyingduplication• Workloadriskassessmentbasedoncomplexityandbestpractices
• Understandquerycompatibility
PrioritizedPlan• Estimateeffort• Identifyeasiestpiecestostartforfastsuccess• Prioritizeworkloadsforoffload
Plan
18©Cloudera,Inc.Allrightsreserved.
PredictableOffloadRemovetheguesswork
Understandoffloadrequirements• Determinemostcommonworkload
patterns• Developdata-/usage-drivenoffload
strategy
Actionablerecommendations• Complexityassessmentforriskierareas• Focuseffortsbyidentifyingduplication• Designrecommendationsforbestresults
Offload
19©Cloudera,Inc.Allrightsreserved.
OptimizingwithinHadoopMaintainpeakperformance
Understandusageandkeepupwithdataneeds• Understandmostcommonusagepatterns• Identifyoptimizationopportunities• Proactivelyadjustdatamodels
Performanceoptimizations• BestpracticeguidanceforHiveandImpala• Queryperformanceoptimization• Increaseplatformadoption
Optimize
20©Cloudera,Inc.Allrightsreserved.
Builtforhybridcloud
21©Cloudera,Inc.Allrightsreserved.
What’sDrivingAnalyticstotheCloud?Bigdatadeploymentsincloudareaccelerating:
● ExecutiveMandate:Minimizeon-premdatacenterfootprint
● IncreasedAgility:End-userself-service
● Elasticity:Optimizeinfrastructureusage
● LowerOverallTCO
22©Cloudera,Inc.Allrightsreserved.
MostOrganizationsAreorWillbeHybridCloud
• 76%willembracehybridcloud(Gartner1)• 82%willhaveamulti-cloudstrategy(RightScale2)• 50%will“repatriate”atleastonepubliccloudworkloadbacktoprivatecloudor
on-prem forcostreasons(4513)• 50%ofCloudera’scloudcustomersrunahybridenvironment
1Gartner,MarketTrends:CloudAdoptionTrendsFavorPublicCloudWithaHybridTwist20152RightScale 2016StateoftheCloudReport3451Research:AWSLambda:newandexciting,oldandrehashed,morevendorlock-in(oralltheabove)?,November22,2016
Whyisthisacriticalstrategy?
Portability&Cost Functionality DataGravity
23©Cloudera,Inc.Allrightsreserved.
Cost-Efficiencies&FlexibilityintheCloudPrimaryAnalyticDatabasePatterns
Onlypayforwhatyouneed,whenyouneedit
▪ Transientclusters▪ Objectstoragecentric▪ Cloud-nativedeployment
ETL
ReduceOperatingCosts NewInsights,NewRevenue
BI/Analytics
Exploreandanalyzealldata,whereveritlives
▪ Long-runningclusters▪ Objectstorageorlocalstorage▪ Lift-and-shiftdeployment
24©Cloudera,Inc.Allrightsreserved.
AddUseCases,Analytics,andDataOn-Demand• AvoidtheITbacklogwithinstantaccesstoalldata
• On-demandclustersquerydirectlyonsharedobjectstorage
PredictableResultsWheneverYouWant• Consistentqueryperformance,evenduringpeaktimes
• Multi-tenancyviaisolatedclustersonshareddata
Just-in-TimeResources• Real-timecapacityforyourneeds,astheychange
• Elasticallygrow/shrinkyourclusterviadecoupledarchitecture
Contention-FreeETL• ETLanytimewithoutimpactingotherworkloadsorriskingSLAs
• SeparateETLclustersas-neededonshareddata
AdditiveBenefitsintheCloudExtendingcoreperformance,flexibility,scalability,andopenarchitecturebenefits
25©Cloudera,Inc.Allrightsreserved.
BI/AnalyticsintheCloudThreeArchitecturesOptionstoOptimizePrice/Performance
ObjectStorage
TransientCluster
TransientBI(infrequentusage)Spinupclusterswhenneeded● On-demandinstances● Usage-basedpricing● Grow/shrink● Clusterpertenantoruser
PersistentBI(regularusage)PersistentclustersforBIanytime● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup
PersistentCluster
PersistentBIwithLocalStorage(fastest)Maxspeedformoreregularworkloads● Reservedinstances● Node-basedpricing● Lessfrequentgrow/shrink● Sharedclusterforsharedlocaldata
PersistentCluster HDFSand/orKudu
PersistentCluster
TransientCluster
DefaultChoice
26©Cloudera,Inc.Allrightsreserved.
PersistentBIonObjectStorageBestforelasticity(andspeedvstransient)
● Thisisusuallythebestchoice● Bestwhenworkloadsare:
o Flexibleandchangingo Frequentduringmostworkingdayso Notscheduledforfixedhours
● Benefitsinclude:o Predictableresultsreadilyavailableo Fullmulti-tenantisolationo Commondatainsharedobjectstorageo Grow/shrinkforTCOefficiency
● Tradeoffs:o Pernodeperfofobjectstorage(usemore,
cheapernodes)ObjectStorage
SharedHMSDB
PersistentBI(regularusage)Persistentclustersforreadyavailability● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup
PersistentCluster
PersistentCluster
DefaultChoice
27©Cloudera,Inc.Allrightsreserved.
PersistentBIwithLocally-AttachedStorageBestperformanceforconsistentworkloads
● Bestwhenworkloadsare:o Regularandconsistento Consistentlyqueryingcommondatao TightSLAsforperformanceo Fastchangingdata(thatneedsKudu)o Runningwithoutobjectstorage(eg.Azure,GCE)
● Benefitsinclude:o Fasterperformancepernodeonlocaldatao Abilitytoqueryobjectstorageforrestofdata
● Tradeoffs:o Lesselasticthanobjectstoredbasedclusterso Lessisolationformulti-tenantworkloadsusing
sameHDFSdatao Costifthereareoff-peakhours
ObjectStorage
PersistentBIwithHDFS(fastest)Maxspeedformoreregularworkloads● Reservedinstances● Node-basedpricing● Lessfrequentgrow/shrink● SharedclusterforsharedHDFSdata
PersistentCluster
LocalHMSDB
HDFSand/orKudu
28©Cloudera,Inc.Allrightsreserved.
TransientBIonObjectStorageBestTCOforinfrequentusage
ObjectStorage
ClouderaDirector
● Bestwhenworkloadsare:o Infrequentorscheduled
● Benefitsinclude:o LowestTCOwithclustersonlywhenneededo Fullmulti-tenantisolationo Commondatainsharedobjectstorage
● Tradeoffs:o Delaytospin-upclusterswhenneededo CapabilityofBIuserstospinupclusterso Pernodeperfofobjectstorage(usemore,
cheapernodes)SharedHMSDB
TransientCluster
TransientBI(infrequentusage)Spinupclusterswhenneeded.● On-demandinstances● Usage-basedpricing● Grow/shrink● Clusterpertenantoruser
TransientCluster
©Cloudera,Inc.Allrightsreserved. 29
ThankyouThankYouJustinErickson