Think Big - How to Design a Big Data Information Architecture

Grab some coffee and enjoy the pre-show banter before the top of the hour!

“Think Big: How to Design a Big Data Information Architecture” Exploratory Webcast | January 22, 2014

Guests

Robin Bloor Chief Analyst, The Bloor Group @robinbloor [email protected]

Eric Kavanagh CEO, The Bloor Group @eric_kavanagh [email protected]

Findings Webcast June 25, 2014

Big Data Information Architecture

Roundtable Webcast April 9, 2014

Exploratory Webcast January 22, 2014

#BigDataArch

Big Data Information Architecture

In Three Segments

The Big Data Curve?

Data Flow

Technology Disruption

PART ONE

PART THREE

PART TWO

Part 1: The Big Data Curve

The Visible “Big Data” Trend

u  Corporate data volumes grow at about 55% per annum - exponentially

u  Data has been growing at this rate for, maybe, 40 years

u  There is nothing new about big data. It clings to an established exponential trend

The Invisible Trend: Moore’s Law Cubed

u  The biggest databases are new databases

u  They grow at the cube of Moore’s Law

u  Moore’s Law = 10x every 6 years u  VLDB: 1000x every 6 years –  1991/2 megabytes –  1997/8 gigabytes –  2003/4 terabytes –  2009/10 petabytes –  2015/16 exabytes

Technology Evolution (Bloor Curve)

The Area OfAs-Yet-Unrealized

Applications

ApplicationMigration

Source: The Bloor Group

The Traditional Force of Disruption

u  Software architectures change: centralized, C/S, 3 tier/web, SOA, etc.

u  Applications migrate according to latencies

u  Dominant applications and software brands can die via “The innovator’s dilemma”

u  Wholly new applications appear because of lower latencies, e.g., VMs, CEP


Applications



This Curve is Compromised


Applications



Two DISRUPTIVE forces have changed

the curve:

PARALLELISM and

The CLOUD

It’s not really about

Big Data???

It’s about

Part 2: Technology Disruption

It’s Over for Spinning Disk

u  SSD is now on the Moore’s Law curve

u  Disk is not and never was (in respect of seek time)

u  All traditional databases were engineered for spinning disk and not for scale-out

u  This explains the new DBMS products…

In-Memory Disruption

u  Memory may gradually become the primary store for data (this impacts data flows)

u  Almost all applications are poorly built for this

u  Memory is an accelerator – as is CPU cache. This is becoming a factor

The Memory Cascade

u  On chip speed v RAM •  L1(32K) = 100x •  L2(246K) = 30x •  L3(8-20Mb) = 8.6x

u  RAM v SSD •  RAM = 300x

u  SSD v Disk •  SSD = 10x

Note: Vector instructions and data compression

u Computer u On-line u PC u Internet u Mobile u Internet of things

u Batch u Centralized u Client/server u Multi-tier u Service Orientation u Event Driven/Big

Data

Tech Revolutions

TECH REVOLUTION ARCHITECTURE

Event Driven/Big Data Architecture?

The Open Source Picture

u  The R Language •  Over 1 million

users u  Hadoop and its

Ecosystem •  Reduced latency

for analytics u  Machine Learning

Algorithms •  Raw power

None of these are engineered for performance

Part 3: Data Flow

What Is A Data Scientist?

u Project manager u Qualified statistician u Domain Business

expert u Experienced data

architect u Software engineer

(IT’S A TEAM)

A Process, Not an Activity

u  Data Analytics is a multi-disciplinary end-to-end process

u  Until recently it was a walled-garden. But recently the walls were torn down by…

•  Data availability •  Scalable technology •  Open source tools

The CRITICAL Workload Issue

u  Previously, we viewed database workloads as an i/o optimization problem

u  With analytics the workload is a very variable mix of i/o and calculation

u  No databases were built precisely for this – not even Big Data databases

Take Note

You can know more about a BUSINESS from

its data than by any other means

The Biological System

u  Our human control system works at different speeds: •  Almost instant reflex •  Swift response •  Considered response

u  Organizations will gradually implement similar control systems

u  This suggests a data-flow- based architecture

The Corporate Biological System

u  Right now this division into two different data flows is already occurring

u  Currently we can distinguish between: •  Real-time/Business time

applications •  Analytical applications

u  We should build specific architectures for this

Some Architectural Principles

u  The new atom of data is the event

u  SUSO, scale up before scale out

u  Take the processing to the data, if you can

u  Hadoop is a component not a solution

In Conclusion

The Big Data Curve?

Data Flow

Technology Disruption

PART ONE

PART THREE

PART TWO

Questions?

#BigDataArch or

USE THE Q&A

THANK YOU!

REGISTER FOR BDIA WEBCASTS AT: http://insideanalysis.com/research/big-data-information-architecture

Think Big - How to Design a Big Data Information Architecture

Technology

Transcript of Think Big - How to Design a Big Data Information Architecture