Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data...

21
Big-Data Analytics Architecture for Businesses: Open-source Perspective Mert Onuralp Gökalp, Kerem Kayabay, Mohamed Zaki CambridgeService Alliance (CSA) Institute for Manufacturing University of Cambridge January 2018

Transcript of Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data...

Page 1: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Big-DataAnalyticsArchitectureforBusinesses:

Open-sourcePerspectiveMert Onuralp Gökalp,KeremKayabay,MohamedZaki

CambridgeServiceAlliance(CSA)InstituteforManufacturingUniversityofCambridge

January2018

Page 2: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Whystudyopen-sourcetoolsforBigData?

• Open-sourcetoolshavebecomethestandardBigDataprocessingplatforms*• Thegap:Studytheopen-sourcetoolsconsideringbothmanagerialandtechnicalperspective• SomeQuestions• Whypeopleprefercommercialsolutionsratherthanopen-source?• DoweneedcommercialBigDatasolutions?

Page 3: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

TheBigToolsEra

• Manytoolscontinuetoemergetodealwithbigdataatafastpace• Characteristics:Volume,speed,diversity• Problems:Processing,storage,manipulation,aggregation,visualization

• Sometoolsonlyaimtoanalyse datainacertaindomain• InternetofThings,EdgeComputing

• Justbyreviewingopen-sourcetools,wehavecomeacross6500suchtoolsandfiltereddownto241

Page 4: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Open-sourcebigdatatools

• TypicallysupportedbycompaniesthatprovideservicesoverInternet• Google,Yahoo,Twitter,LinkedIn

• Toprovidebetterservicestotheirusersandthird-partycustomers• ThetoolsaremadeavailabletoITindustryasopen-sourcetools• Theyhavebecomethestandardbigdataprocessingplatforms

Page 5: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Someexampletools

BigDataProcessing BigDataCharacteristic ToolsandTechnologies

Batchprocessing Volume Hadoop,Spark,Flink

Streamprocessing Velocity Storm,Samza,S4, SparkStreaming,Flink Streaming

BigDataStorage BigDataCharacteristic ToolsandTechnologies

NoSQL Variety MongoDB,Cassandra,Hbase,Redis

Page 6: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Choosingtherighttoolset

• Choicesdependonthecharacteristicsofdataanddomainofoperation• Businessesincurcoststryingtoadoptnewtechnologies:technicaldebt• Trainingtheworkforce• Changeexistingsourcecodetorunonnewerversions• Changetheunderlyingtoolset

Page 7: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Whyfirmsoutsource?

• Trackingthedevelopmentsinthisdomainishard• Mostofthetoolsareunknowntothebusinessworld

• NottolagbehindthehypeBUT commercialsolutionproviders• Relyonasubsetofavailableopen-sourcetools• Donothavethedomainspecificexpertise• Donotsolvetechnicalandsoftchallenges

Page 8: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Aimofthisstudy

• Systematicallyreviewtheopen-sourcetoolsinthebigdatadomain• Establishamethodfortrackingthedevelopmentsfortheopen-sourcetools• Buildareferenceopen-sourcebigdataanalyticsarchitecture• Analyse firmstogivedirectionstobusinessesusingtheproposedarchitecture

Page 9: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Toolselectionprocess

Page 10: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Somefigures

Page 11: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Open-sourceArchitecture

Page 12: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

DistributionofOpen-SourceTools

Page 13: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Howtochooseabigdatatool?

• Weneedtocomeupwithcriteria• Wecanlookat(113)real-worldusecases,solutionbriefs,whitepapers&blogpostsfromawiderangeofindustries• Telecommunication,healthcare,banking&finance,manufacturing,transportation,energy

• Secondarydata-setssupporttheproposedbig-datareferencearchitecture

Page 14: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Secondaryuse-casecompanydistribution

Page 15: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Howtochooseabigdatatool?

• Timingrequirement:Batchvsstreamprocessing• Datasize: In-memoryvson-diskprocessing• Platformindependency: Interoperabilityofabigdatatool• Datastoragemodel: Graph-based,key-value-based,document-based,time-series-based

Page 16: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Problemsofarchitecturedevelopmentinbigdata• Choosingthebesttool• Abundanceoftools• Nosinglebesttool• Maturityofatool

• Domain-specificchallenges• Thegapbetweendomain-specificknowledgeanddatascience

• Firm-specificsoftchallenges

Page 17: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Problemsofarchitecturedevelopmentinbigdata• Firm-specificsoftchallenges• Managerialskillsdeep-rootedinanorganization• Lackofdata-drivenorganizationalculture• Customersmaynotbeabletoperceivethevalueofbigdata

Page 18: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Tosumup

• Newertoolsneverceasetoemergeinthisdomain• Wecanforeseewheretheindustrywillfocusresearchefforts• Organizationsshouldtrytobuildtheirownbigdataarchitecture• Relyonopen-sourcetoolsinsteadofimposedcommercialsolutions

Page 19: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Tosumup

• Organizationsshouldtrytobuildtheirownbigdataarchitecture• Itisrewarding• Capturedomain-specificknowledge• Theprocesswouldbuildadata-drivencultureanddeveloptherightmanagerialskills• Betterdecision-making

Page 20: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Thankyou!

• Q&A

Page 21: Big-Data Analytics Architecture for Businesses: Open ... · Big Data Processing Big Data Characteristic Tools and Technologies Batch processing Volume Hadoop, Spark, Flink Stream

Forthcoming Webinars

Date14:30hr BST

Topic Invitedspeaker

2018Jan15th BigDataAnalyticsArchitectureforBusiness Mert/Kareem/MohamedFeb12th DigitalBusinessTransformationandStrategy:What

doweknowsofar?MariamHelmy IsmailAbdelaal

Mar12th Doesbuyers’dependencetranslateintofinancialperformance?Anempiricalanalysis ofmanufacturer-serviceproviderrelationships

OrnellaBenedettini