Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

11
Peter Elleby Greenlight ‘Big Data, Big Noise, Big Hope – No Miracles

description

Peter Elleby from Greenlight's presentation from our Big Data breakfast conference

Transcript of Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

Page 1: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

Peter Elleby

Greenlight

‘Big Data, Big Noise, Big Hope – No Miracles

Page 2: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

Big Data, Big Noise, Big Hope – No Miracles

27/06/2013

Page 3: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

Big Data - Volume, Velocity, Variety

As American created about 4lb of rubbish every day.

If the rest of the world produced as much, this would be 10M tons daily, or 4T tons annually.

Page 4: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

How do you define “Big Data”?

Applications involving collections of data of a size, that makes them impossible to process in a cost effective manner using traditional database management tools

and data processing applications.

Page 5: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

Traditional Data Management and Data Processing

OLTP OLAP

Application Operational Decision Support

Horizon Days & Weeks Months & Years

Refresh Immediate Periodic

Data Model Entity-Relationship Multi-Dimensional

Schema Normalized Star (de-normalized)

Emphasis Update Retrieval

Space Small Large (History)

Page 6: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

Core Big Data Strategies

• Distribution of Data• Network of Lower Cost Devices

• Compression of Data• Using Processing Power to Reduce Bandwidth Requirements

• Representation of Data• Focus on Algorithm rather than Data Model

• Change of Emphasis• From Completeness to Relevancy

Page 7: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

Big Data Application - Hydra

Page 8: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

Big Data Application Characteristics - Hydra

• Time Series Data• Storage of State versus Events

• Data Aggregation• Statistical Significance

• Dynamic Clustering• Ontologies of Keywords and Phrases

• Data Refinement• Statistical Process Control and Regression Modelling

Page 9: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

Brewers Theorem (the CAP Theorem)

The CAP theorem states that any networked shared-data system can have at most two of three desirable properties:• consistency (C) equivalent to having a single up-to-date

copy of the data• high availability (A) of that data (for updates)• tolerance to network partitions (P)

“sacrifice consistency to gain faster responses in a more scalable manner”

Page 10: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

A Practical Everyday Example

S1 S2 SN...

Page 11: Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles

The Takeaways

• The Aims of your Application determines whether you are dealing with Big Data

• The frameworks or technologies best suited to achieve your goals are determined your application