Big Data - Gerami

29
Mohammad Reza Gerami [email protected] [email protected] 1

description

Big Data داده های بزرگ

Transcript of Big Data - Gerami

Page 1: Big Data - Gerami

Mohammad Reza [email protected]

[email protected]

1

Page 2: Big Data - Gerami

2

Page 3: Big Data - Gerami

3

Page 4: Big Data - Gerami

• ‘Big Data’ is similar to ‘small data’, but bigger

•…but having data bigger it requires different approaches:• Techniques, tools and architecture

•…with an aim to solve new problems• …or old problems in a better way

4

Page 5: Big Data - Gerami

5

Page 6: Big Data - Gerami

Characteristics of Big Data: 1-Scale (Volume)

• Data Volume

Exponential increase in

collected/generated data

6

Page 7: Big Data - Gerami

Big Data in Today’s Business and Technology Environment

2.7 Zetabytes of data exist in the digital universe today. (Source)

235 Terabytes of data has been collected by the U.S. Library of Congress in April

2011. (Source)

The Obama administration is investing $200 million in big data research projects.

(Source)

IDC Estimates that by 2020,business transactions on the internet- business-to-

business and business-to-consumer – will reach 450 billion per day. (Source)

Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data.

(Source)

Akamai analyzes 75 million events per day to better target advertisements.

(Source)

94% of Hadoop users perform analytics on large volumes of data not possible

before; 88% analyze data in greater detail; while 82% can now retain more of their

data. (Source)

7

Page 8: Big Data - Gerami

Walmart handles more than 1 million customer transactions

every hour, which is imported into databases estimated to

contain more than 2.5 petabytes of data. (Source)

More than 5 billion people are calling, texting, tweeting and

browsing on mobile phones worldwide. (Source)

Decoding the human genome originally took 10 years to

process; now it can be achieved in one week. (Source)

In 2008, Google was processing 20,000 terabytes of data (20

petabytes) a day. (Source)

The largest AT&T database boasts titles including the largest

volume of data in one unique database (312 terabytes) and the

second largest number of rows in a unique

8

Page 9: Big Data - Gerami

The Rapid Growth of Unstructured Data

YouTube users upload 48 hours of new video every minute

of the day. (Source)

571 new websites are created every minute of the day.

(Source)

Brands and organizations on Facebook receive 34,722

Likes every minute of the day. (Source)

100 terabytes of data uploaded daily to Facebook.

(Source)

According to Twitter’s own research in early 2012, it sees

roughly 175 million tweets every day, and has more than

465 million accounts. (Source)

30 Billion pieces of content shared on Facebook every

month. (Source)

Data production will be 44 times greater in 2020 than it

was in 2009. (Source)9

Page 10: Big Data - Gerami

The Rapid Growth of Unstructured Data

In late 2011, IDC Digital Universe published a

report indicating that some 1.8 zettabytes of

data will be created that year. (Source)

In other words, the amount of data in the world

today is equal to:

Every person in the US tweeting three tweets

per minute for 26,976 years.

Every person in the world having more than

215m high-resolution MRI scans a day.

More than 200bn HD movies – which would take a

person 47m years to watch.

10

Page 12: Big Data - Gerami

Social media and networks

(all of us are generating data)Scientific instruments

(collecting all sorts of data)

Mobile devices

(tracking all objects all the time)

Sensor technology and

networks

(measuring all kinds of data)

12

Page 13: Big Data - Gerami

• No single standard definition…

Big Data

13

Page 14: Big Data - Gerami

14

Page 15: Big Data - Gerami

15

Page 16: Big Data - Gerami

What to do with these data?

16

Page 17: Big Data - Gerami

How much data?

640K ought to be enough for anybody.

17

Page 18: Big Data - Gerami

Why Big Data

• Key enablers of appearance and growth of Big Data are

–Increase of storage capacities

–Increase of processing power

–Availability of data

–Every day we create 2.5 quintillion bytes of data; 90% of the data in the world today has been created in the last two years alone

18

Page 19: Big Data - Gerami

Big Data Analytics

• Examining large amount of data

• Appropriate information

• Identification of hidden patterns, unknown correlations

• Competitive advantage

• Better business decisions: strategic and operational

• Effective marketing, customer satisfaction, increased revenue

19

Page 20: Big Data - Gerami

Applications for Big Data Analytics

Homeland Security

Finance Smarter Healthcare Multi-channel sales

Telecom

Manufacturing

Traffic Control

Trading Analytics Fraud and Risk

Log Analysis

Search Quality

Retail: Churn, NBO

20

Page 21: Big Data - Gerami

Healthcare

• 80% of medical data is unstructured and is clinically relevant

• Data resides in multiple places like individual EMRs, lab and imaging systems, physician notes, medical correspondence, claims etc

• Leveraging Big Data• Build sustainable healthcare systems

• Collaborate to improve care and outcomes

• Increase access to healthcare

21

Page 22: Big Data - Gerami

Market Size

Source: WikibonTaming Big Data

By 2015 4.4 million IT jobs in Big Data ; 1.9 million is in US itself

22

Page 23: Big Data - Gerami

Potential Talent Pool -Big Data

India will require a minimum of 1 lakh data scientists in the next couple of years

in addition to data analysts and data managers to support the Big Data space.

23

Page 24: Big Data - Gerami

24

Page 25: Big Data - Gerami

Future of Big Data

25

Page 26: Big Data - Gerami

Big Data Analytics Technologies

NoSQL : non-relational or at least non-SQL database

solutions such as HBase (also a part of the Hadoop

ecosystem), Cassandra, MongoDB, Riak, CouchDB, and

many others.

Hadoop: It is an ecosystem of software packages,

including MapReduce, HDFS, and a whole host of other

software packages

26

Page 27: Big Data - Gerami

Main Big Data Technologies

Hadoop NoSQL Databases Analytic Databases

Hadoop

• Low cost, reliable

scale-out architecture

• Distributed computing

Proven success in

Fortune 500

companies

• Exploding interest

NoSQL Databases

• Huge horizontal scaling

and high availability

• Highly optimized for

retrieval and appending

• Types

• Document stores

• Key Value stores

• Graph databases

Analytic RDBMS

• Optimized for bulk-load

and fast aggregate

query workloads

• Types

• Column-oriented

• MPP

• In-memory

27

Page 28: Big Data - Gerami

Thank you

More info:

www.aryatadbir.com28

Page 29: Big Data - Gerami

29