Big data – a brief overview

42
Big Data – A Brief Overview Petabytes, Hadoop, Analytics, Collaborative business intelligence, Data scientists, In- Memory Databases, NoSQL platforms

description

 

Transcript of Big data – a brief overview

Page 1: Big data – a brief overview

Big Data – A Brief Overview

Petabytes, Hadoop, Analytics, Collaborative business intelligence, Data scientists, In-Memory Databases, NoSQL

platforms

Page 2: Big data – a brief overview

Big Data

• What is it?• Where does it come from?• How do we process it?• What do we do with it?• Who are the players?• What are the opportunities?

Page 3: Big data – a brief overview

What Is Big Data?

Like the term Cloud, it is a bit Nebulous

Page 4: Big data – a brief overview
Page 5: Big data – a brief overview

Attributes of Big Data

• Volume• Velocity - streaming• Variety

Page 6: Big data – a brief overview

Where Does It Come From?

It Depends

Page 7: Big data – a brief overview

Key Drivers

Spread of cloud computing, mobile computing and social media

technologies, financial transactions

Page 8: Big data – a brief overview

Sources of Big Data• Chatter from social networks, • Web server logs, • Traffic flow sensors, • Satellite imagery, • Broadcast audio streams, • Banking transactions, • MP3s of rock music, • The content of web pages, • Scans of government documents, • GPS trails, • Telemetry from automobiles, • Financial market data• ….

Page 9: Big data – a brief overview
Page 10: Big data – a brief overview
Page 11: Big data – a brief overview

How Do We Process It?

Page 12: Big data – a brief overview

Source: http://radar.oreilly.com

Process Pipeline

Page 13: Big data – a brief overview

Hadoop

A distributed processing Framework based on Map/Reduce

Page 14: Big data – a brief overview

Pig

A platform for analyzing large data sets that consists of a high-level language for expressing

data analysis programs, coupled with infrastructure for evaluating these programs.

Page 15: Big data – a brief overview

Mahout

A machine learning library with algorithms for clustering, classification and batch based collaborative filtering that are

implemented on top of Apache Hadoop.

Page 16: Big data – a brief overview

Hive

Data warehouse software built on top of Apache Hadoop that facilitates querying and managing large datasets residing in

distributed storage.

Page 17: Big data – a brief overview

Pegasus

A Peta-scale graph mining system that runs in parallel, distributed manner on top of

Hadoop

Page 18: Big data – a brief overview

Sqoop

A tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational

databases.

Page 19: Big data – a brief overview

Flume

A distributed service for collecting, aggregating, and moving large log data

amounts to HDFS.

Page 20: Big data – a brief overview

Yahoo S4

S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows

programmers to easily develop applications for processing continuous unbounded streams of data.

Page 21: Big data – a brief overview

Twitter Storm

Storm can be used to process a stream of new data and update

databases in real time.

Page 22: Big data – a brief overview

Trends

Funding, Companies, Applications, Jobs, IPOs

Page 23: Big data – a brief overview

Funding & IPO

• Cloudera, (Commerical Hadoop) more than $75 million

• MapR (Cloudera competitor) has raised more than $25 million

• 10Gen (Maker of the MongoDB) $32 million• DataStax (Products based on Apache

Cassandra) $11 million• Splunk raised about $230 million through IPO

Page 24: Big data – a brief overview
Page 25: Big data – a brief overview
Page 26: Big data – a brief overview

Big Data Application Domains

• Healthcare• The public sector• Retail• Manufacturing • Personal-location data• Finance

Page 27: Big data – a brief overview

A Few Examples

Page 28: Big data – a brief overview
Page 29: Big data – a brief overview
Page 30: Big data – a brief overview

PayPal Tracking Architecture

Page 31: Big data – a brief overview

Market and Market Segments

Research Data and Predictions

Page 32: Big data – a brief overview
Page 33: Big data – a brief overview

http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues

Page 34: Big data – a brief overview

Market for big data tools will rise from $9 billion to $86 billion in 2020

Page 35: Big data – a brief overview

http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues

Page 36: Big data – a brief overview
Page 37: Big data – a brief overview

Future of Big Data

• More Powerful and Expressive Tools for Analysis• Streaming Data Processing (Storm from Twitter and S4 from

Yahoo)• Rise of Data Market Places (InfoChimps, Azure Marketplace)• Development of Data Science Workflows and Tools (Chorus,

The Guardian, New York Times)• Increased Understanding of Analysis and Visualization

http://www.evolven.com/blog/big-data-predictions.html

Page 38: Big data – a brief overview

http://www.evolven.com/blog/big-data-predictions.html

Page 39: Big data – a brief overview

Opportunities

Page 40: Big data – a brief overview

Skills Gap

• Statistics• Operations Research• Math• Programming• So-called "Data Hacking"

Page 41: Big data – a brief overview
Page 42: Big data – a brief overview