1052SCBDA03 Social Computing and Big Data...

60
Social Computing and Big Data Analytics 社群運算與大數據分析 1 1052SCBDA03 MIS MBA (M2226) (8606) Wed, 8,9, (15:10-17:00) (B505) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University 淡江大學 資訊管理學系 http://mail. tku.edu.tw/myday/ 2017-03-01 Tamkang University Tamkang University 巨量資料基礎: MapReduce典範、HadoopSpark生態系統 (Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem)

Transcript of 1052SCBDA03 Social Computing and Big Data...

Page 1: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

SocialComputingandBigDataAnalytics

社群運算與大數據分析

1

1052SCBDA03MISMBA(M2226)(8606)

Wed,8,9,(15:10-17:00)(B505)

Min-Yuh Day戴敏育

Assistant Professor專任助理教授

Dept. of Information Management, Tamkang University淡江大學資訊管理學系

http://mail. tku.edu.tw/myday/2017-03-01

TamkangUniversity

TamkangUniversity

巨量資料基礎:MapReduce典範、Hadoop與Spark生態系統

(Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem)

Page 2: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

週次 (Week)日期 (Date)內容 (Subject/Topics)12017/02/15CourseOrientationforSocialComputingand

BigDataAnalytics(社群運算與大數據分析課程介紹)

22017/02/22DataScienceandBigDataAnalytics:Discovering,Analyzing,VisualizingandPresentingData(資料科學與大數據分析:探索、分析、視覺化與呈現資料)

32017/03/01FundamentalBigData:MapReduceParadigm,HadoopandSparkEcosystem(大數據基礎:MapReduce典範、Hadoop與Spark生態系統)

課程大綱 (Syllabus)

2

Page 3: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

週次 (Week)日期 (Date)內容 (Subject/Topics)42017/03/08BigDataProcessingPlatformswithSMACK:

Spark,Mesos,Akka,CassandraandKafka(大數據處理平台SMACK:Spark,Mesos,Akka,Cassandra,Kafka)

52017/03/15BigDataAnalyticswithNumpy inPython(PythonNumpy大數據分析)

62017/03/22FinanceBigDataAnalyticswithPandasinPython(PythonPandas財務大數據分析)

72017/03/29TextMiningTechniquesandNaturalLanguageProcessing(文字探勘分析技術與自然語言處理)

82017/04/05Off-campusstudy(教學行政觀摩日)

課程大綱 (Syllabus)

3

Page 4: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

週次 (Week)日期 (Date)內容 (Subject/Topics)92017/04/12SocialMediaMarketingAnalytics

(社群媒體行銷分析)102017/04/19期中報告 (MidtermProjectReport)112017/04/26DeepLearningwithTheano andKeras inPython

(PythonTheano和 Keras深度學習)122017/05/03DeepLearningwithGoogleTensorFlow

(GoogleTensorFlow深度學習)132017/05/10SentimentAnalysisonSocialMediawith

DeepLearning(深度學習社群媒體情感分析)

課程大綱 (Syllabus)

4

Page 5: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

週次 (Week)日期 (Date)內容 (Subject/Topics)142017/05/17SocialNetworkAnalysis(社會網絡分析)152017/05/24MeasurementsofSocialNetwork(社會網絡量測)162017/05/31ToolsofSocialNetworkAnalysis

(社會網絡分析工具)172017/06/07FinalProjectPresentationI(期末報告 I)182017/06/14FinalProjectPresentationII(期末報告 II)

課程大綱 (Syllabus)

5

Page 6: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

2017/03/01巨量資料基礎:MapReduce典範、

Hadoop與Spark生態系統(FundamentalBigData:

MapReduceParadigm,HadoopandSparkEcosystem)

6

Page 7: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

ArchitectureofBigDataAnalytics

7Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications

DataMining

OLAP

Reports

QueriesHadoopMapReduce

PigHiveJaql

ZookeeperHbase

CassandraOozieAvro

MahoutOthers

Middleware

ExtractTransform

Load

DataWarehouse

TraditionalFormat

CSV,Tables

*Internal

*External

*Multipleformats

*Multiplelocations

*Multipleapplications

BigDataSources

BigDataTransformation

BigDataPlatforms&Tools

BigDataAnalytics

Applications

BigDataAnalytics

TransformedData

RawData

Page 8: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

BusinessIntelligence(BI)Infrastructure

8Source:KennethC.Laudon&JaneP.Laudon(2014),ManagementInformationSystems:ManagingtheDigitalFirm,ThirteenthEdition,Pearson.

Page 9: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

SAS®WithintheHADOOPECOSYSTEM

9

Impala

Next-GenSAS® User

User Interface

Metadata

Data Access

DataProcessing

FileSystem

SAS® User

MPI Based

SAS® LASR™AnalyticServer

SAS®High-Performance

AnalyticProcedures

HDFS

BaseSAS&SAS/ACCESS®toHadoop™

SASMetadata

Pig

MapReduce

In-MemoryDataAccess

SAS® Visual Analytics

SAS®

Enterprise Miner™

SAS® Data Integration

SAS®

EnterpriseGuide®

HiveSASEmbedded

ProcessAccelerators

SAS® In-Memory Statistics for

Haodop

Source: Deepak Ramanathan (2014), SAS Modernization architectures - Big Data Analytics

EG EM VA

Page 10: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

FundamentalBigData:MapReduceParadigm,HadoopandSpark

Ecosystem

10

Page 11: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

11Source: https://www.thalesgroup.com/en/worldwide/big-data/big-data-big-analytics-visual-analytics-what-does-it-all-mean

Page 12: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

MapReduceParadigm

12

Page 13: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

MapReduceParadigm

13

BigData

Map0 Map1 Map2 Map3

Reduce0 Reduce1 Reduce2 Reduce3

Map

ReduceMapReduceData

OutputData

Page 14: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

14Source: https://www.edureka.co/blog/mapreduce-tutorial/

DogLoveCatBirdLoveBirdDogBirdCat

Input

MapReduceWordCount

Page 15: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

15Source: https://www.edureka.co/blog/mapreduce-tutorial/

DogLoveCatBirdLoveBirdDogBirdCat

Input

Bird,3Cat,2Dog,2Love,2

MapReduceWordCountOutput

Page 16: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

16Source: https://www.edureka.co/blog/mapreduce-tutorial/

DogLoveCatBirdLoveBirdDogBirdCat

DogLoveCat

BirdLoveBird

DogBirdCat

Input

Dog,1Love,1Cat,1

Bird,1Love,1Bird,1

Dog,1Bird,1Cat,1

Bird,(1,1,1)

Cat,(1,1)

Dog,(1,1)

Love,(1,1)

Bird,3

Cat,2

Dog,2

Love,2

Bird,3Cat,2Dog,2Love,2

MapReduceWordCountOutputSplit Map Shuffle Reduce

Page 17: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

HadoopEcosystem

17

Page 18: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

TheApache™Hadoop®projectdevelopsopen-sourcesoftware

forreliable,scalable,distributedcomputing.

18Source: http://hadoop.apache.org/

Page 19: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

19

HDFS

MapReduce Processing

Storage

Source: http://hadoop.apache.org/

Page 20: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

BigDatawithHadoopArchitecture

20Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 21: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

21

BigDatawithHadoopArchitectureLogicalArchitectureProcessing:MapReduce

Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 22: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

22

BigDatawithHadoopArchitectureLogicalArchitecture

Storage:HDFS

Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 23: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

23

BigDatawithHadoopArchitectureProcessFlow

Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 24: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

24

BigDatawithHadoopArchitectureHadoopCluster

Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 25: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

25

HadoopEcosystem

Source: https://savvycomsoftware.com/what-you-need-to-know-about-hadoop-and-its-ecosystem/

Page 26: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

HadoopEcosystem

26Source: Shiva Achari (2015), Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop, Packt Publishing

Page 27: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

HDP(HortonworksDataPlatform)ACompleteEnterpriseHadoopDataPlatform

27Source: http://hortonworks.com/hdp/

Page 28: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

ApacheHadoopHortonworks DataPlatform

28Source: http://hortonworks.com/hdp/

Page 29: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

HadoopandDataAnalyticsTools

29Source: http://hortonworks.com/hdp/

Page 30: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

Hadoop1à Hadoop2

30Source: http://hortonworks.com/hadoop/tez/

Page 31: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

BigDataSolution

31Source: http://www.newera-technologies.com/big-data-solution.html

EG EM VA

Page 32: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

TraditionalETLArchitecture

32Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 33: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

33Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

OffloadETLwithHadoop(BigDataArchitecture)

Page 34: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

SparkEcosystem

34

Page 35: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

ApacheSparkisafastandgeneralengine

forlarge-scaledataprocessing.

35

Lightning-fast cluster computing

Source: http://spark.apache.org/

Page 36: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

LogisticregressioninHadoopandSpark

36

Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

Source: http://spark.apache.org/

Page 37: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

EaseofUse

• WriteapplicationsquicklyinJava,Scala,Python,R.

37Source: http://spark.apache.org/

Page 38: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

WordcountinSpark'sPythonAPI

text_file=spark.textFile("hdfs://...")

text_file.flatMap(lambdaline:line.split()).map(lambdaword:(word,1)).reduceByKey(lambdaa,b:a+b)

38Source: http://spark.apache.org/

Page 39: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

SparkandHadoop

39Source: http://spark.apache.org/

Page 40: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

SparkEcosystem

40Source: http://spark.apache.org/

Page 41: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

SparkEcosystem

41Source: https://databricks.com/spark/about

Page 42: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

SparkEcosystem

42Source: Mike Frampton (2015), Mastering Apache Spark, Packt Publishing

Spark

GraphX(graph)

SparkSQL

MLlib(machinelearning)

SparkStreaming

Kafka Flume H2O Hive

Cassandra

Titan

HBase

HDFS

Page 43: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

SMACK Stack

43

• Spark– fast and general engine for distributed, large-scale data

processing

• Mesos– cluster resource management system that provides efficient

resource isolation and sharing across distributed applications

• Akka– a toolkit and runtime for building highly concurrent, distributed,

and resilient message-driven applications on the JVM

• Cassandra– distributed, highly available database designed to handle large

amounts of data across multiple datacenters

• Kafka– a high-throughput, low-latency distributed messaging system

designed for handling real-time data feedsSource:AntonKirillov (2015),DataprocessingplatformsarchitectureswithSpark,Mesos,Akka,CassandraandKafka,BigDataAWMeetup

Page 44: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

Hadoopvs.Spark

44Source: Shiva Achari (2015), Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop, Packt Publishing

Iter.1

Iter.1

Iter.2

Iter.2

Input

Input

HDFSread

HDFSread

HDFSwrite

HDFSwrite

Page 45: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

HadoopDistribution

• ApacheHadoop– http://hadoop.apache.org/

• AmazonElasticMapReduce(EMR)– https://aws.amazon.com/emr/

• ClouderaCDH– https://www.cloudera.com/downloads.html

• HortonworksSandbox– https://hortonworks.com/products/sandbox/

45

Page 46: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

StepstoInstallHadoop

onaPersonalComputer(Windows/OSX)

46Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Page 47: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

Hodoop:LinuxBasedSoftware

47

LINUX

LINUX

LINUX

LINUX

Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Page 48: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

Appliance

48

HadoopLinux

Virtual Machine (VirtualBox / VMWare)

Personal Computer (Windows / OS X)

Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Page 49: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

ConnectiontoHadoop

49

HadoopLinux

Virtual Machine (VirtualBox / VMWare)

Personal Computer (Windows / OS X)Browser

Accessfromhost

Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Page 50: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

StepstoInstallHadooponaPersonalComputer(Windows/OSX)

50Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Step1.DownloadandInstallVirtualBox

Step2.DownloadAppliance

Step3.ImportAppliance

Step4.ConfigureVirtualMachine(VM)

Step5.StartVirtualMachine(VM)

Step6.TestConnectionFromHost

Page 51: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

VirtualBox

51https://www.virtualbox.org/

Page 52: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

StepstoInstallHadooponaPersonalComputer(Windows/OSX)

52Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Step1.DownloadandInstallVirtualBox

Step2.DownloadAppliance

Step3.ImportAppliance

Step4.ConfigureVirtualMachine(VM)

Step5.StartVirtualMachine(VM)

Step6.TestConnectionFromHost

Hortonworks Sandbox

Page 53: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

HortonworksSandboxTheeasiestwaytogetstartedwithEnterpriseHadoop

53http://hortonworks.com/products/hortonworks-sandbox/#install

Page 54: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

GetstartedonHadoopwiththesetutorialsbasedontheHortonworksSandbox

54http://hortonworks.com/tutorials/

Page 55: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

ApacheHadoop

55http://hadoop.apache.org/

Page 56: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

56

ApacheHadoophttp://hadoop.apache.org/releases.html#Download

Page 57: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

ApacheHadoopYARN

57Source: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

Page 58: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

ApacheSpark

58http://spark.apache.org/

Page 59: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

59Source: http://mattturck.com/2016/02/01/big-data-landscape/

Page 60: 1052SCBDA03 Social Computing and Big Data Analyticsmail.tku.edu.tw/myday/teaching/1052/SCBDA/1052SCBDA03_Social_Computing... · Spark, Mesos, Akka, Cassandra and Kafka (大數據處理平台SMACK:

References• EMCEducationServices(2015),

DataScienceandBigDataAnalytics:Discovering,Analyzing,VisualizingandPresentingData,Wiley

• ShivaAchari(2015),HadoopEssentials- TacklingtheChallengesofBigDatawithHadoop,PacktPublishing

• MikeFrampton(2015),MasteringApacheSpark,PacktPublishing

• DeepakRamanathan(2014),SASModernizationarchitectures- BigDataAnalytics,http://www.slideshare.net/deepakramanathan/sas-modernization-architectures-big-data-analytics

60