Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data...

17
@SnowflakeDB #CloudAnalytics17 LONDON

Transcript of Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data...

Page 1: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

@SnowflakeDB@SnowflakeDB #CloudAnalytics17

LONDON

Page 2: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

BringingYourDataTogetherintheCloudToddBeaucheneGlobalAlliancesArchitect,SnowflakeComputing

Page 3: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

“Data!Data!Data!Ican'tmakebrickswithoutclay.”-SherlockHolmes

Page 4: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Agenda

• CloudDataEcosystem• DataSources• Methodologies• DataIntegrationSolutions• Conclusion

Page 5: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Cloud Data EcosystemData Integration Business Intelligence &

AnalyticsData Warehouse

Enterprise apps

Data Sources

Corporate

Web

Mobile

IoT

Page 6: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Data Sources

Page 7: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Data SourcesOn-Premises• Typicallybackedbyalocaltransactionaldatabase

• Alldataliveswithinthefirewall

• Customerhasfullaccesstoalldataandsystem

Cloud• Typicallybackedbyaclouddatabase(i.e.RDS)

• CanrunincustomerVPC

• Typicallyoffersfeweroptionsthanon-premises

SaaS• Typically data is only

available via API• Outside of customer

firewall or VPC• Customer has very

little control over handling of data

Page 8: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Real World Example: Consolidated DashboardChallenges• Long-termprojectwithhigh-levelgoals

• Diversedatasources

• Differentrefreshcycles

• Inconsistentresults

Solutions• Agileprojectwithfocused,short-termgoals

• DedicatedschemainEDW

• DailyETLProcess

• DataqualitycheckswithinETL

Page 9: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Methodologies

Page 10: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

MethodologiesBulkLoading– Trunc andLoad• Runsatregularintervals• Fulldatasetloadedduringeachrunandexistingdataispurged

• Leastefficientoption,butverysimpletomanage

• Highdatavolumeseveryrun• Morecommonlyusedfordimensiontables

DailyDifferentials• RunsduringnightlyETLwindow• Requireschangedatacapturetoidentifychangedrows

• Generallyconsistsofaseriesofstepswhereeachdependsontheprevioussteps

• Mustincludelogictohandleslowlychangingdimensions

Page 11: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

MethodologiesInsert-only– Date-based• ExtractsdatabydaterangetoeliminateneedforCDC

• Simplifiedprocessing• Commonlyusedforfacttables• Changestodatafrompreviousperiodsrequiredeletionofalldataforthegivenrange

DatabaseReplication• Generallyrunsinnear-real-time• Requiresatoolthatistightlyintegratedwiththesourcedatabase

• Schemasmustmatchbetweensourceanddestination

Page 12: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

MethodologiesBatchProcessing• Generallyusedwhendataisbeingpushedfromthesource

• Batchfrequencydependsonthevolumeandvelocityofthedata

• Requiresautomatedprocesstoloadbatchesintothedatawarehouse.

Streaming• Generallyusedforhighvolumedata

• Event-basedratherthanrow-based

• Oftenrequiresmicro-batchingofdataforloadintorelationaldatabase

• Rawdatamustusuallybetransformedtosupportanalytics

Page 13: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Data IntegrationSolutions

Page 14: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Data Integration SolutionsCustomCode• Flexiblebutcomplex

• Leveragesin-databaseprocessing

• Challengingtomanageandmaintain

ETL• Simplifieddatatransformationwithnocode

• Built-independencyanderrorhandling

• ReducesdatavolumeswithinEDW

ELT• Leverages benefits of

ETL while shifting data processing to EDW

• Requires tight integration between Data Integration and EDW

• Raw and transformed data in one place

Page 15: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Data Integration SolutionsOn-Premises• Customerownshardwareandsoftwareinstall/configuration

• Don’thavetodealwithfirewalltoaccesslocalsources

Cloud• Customerownssoftwareinstall/configurationbutnothardware

• CanrunincustomerVPCtoprovidedirectaccesstodatawithinVPCorbehindfirewall

SaaS• Fully managed by

service provider• Configurable options

vary by solution• Must find secure ways

to access data not stored inside firewall

Page 16: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Conclusion

Page 17: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM ...

Cloud Data Warehousing Best Practices• Leveragethescalablecomputelayertodothebulkofthedata

processing• Isolateloadandtransformjobsfromqueriestopreventresource

contention• Eliminatephysicaldatamartsbyleveragingascalabledataplatform• QAiskey,makesureallchangesmadetodataintegrationtasksare

testedbeforetheyrolltoproduction• Whenmigratingitisimportanttoconvertonesourceatatime