Think Big - How to Design a Big Data Information Architecture
-
Upload
inside-analysis -
Category
Technology
-
view
145 -
download
0
description
Transcript of Think Big - How to Design a Big Data Information Architecture
Grab some coffee and enjoy the pre-show banter before the top of the hour!
“Think Big: How to Design a Big Data Information Architecture” Exploratory Webcast | January 22, 2014
Guests
Robin Bloor Chief Analyst, The Bloor Group @robinbloor [email protected]
Eric Kavanagh CEO, The Bloor Group @eric_kavanagh [email protected]
Findings Webcast June 25, 2014
Big Data Information Architecture
Roundtable Webcast April 9, 2014
Exploratory Webcast January 22, 2014
#BigDataArch
Big Data Information Architecture
In Three Segments
The Big Data Curve?
Data Flow
Technology Disruption
PART ONE
PART THREE
PART TWO
Part 1: The Big Data Curve
The Visible “Big Data” Trend
u Corporate data volumes grow at about 55% per annum - exponentially
u Data has been growing at this rate for, maybe, 40 years
u There is nothing new about big data. It clings to an established exponential trend
The Invisible Trend: Moore’s Law Cubed
u The biggest databases are new databases
u They grow at the cube of Moore’s Law
u Moore’s Law = 10x every 6 years u VLDB: 1000x every 6 years – 1991/2 megabytes – 1997/8 gigabytes – 2003/4 terabytes – 2009/10 petabytes – 2015/16 exabytes
Technology Evolution (Bloor Curve)
The Area OfAs-Yet-Unrealized
Applications
ApplicationMigration
Source: The Bloor Group
The Traditional Force of Disruption
u Software architectures change: centralized, C/S, 3 tier/web, SOA, etc.
u Applications migrate according to latencies
u Dominant applications and software brands can die via “The innovator’s dilemma”
u Wholly new applications appear because of lower latencies, e.g., VMs, CEP
The Area OfAs-Yet-Unrealized
Applications
ApplicationMigration
Source: The Bloor Group
This Curve is Compromised
The Area OfAs-Yet-Unrealized
Applications
ApplicationMigration
Source: The Bloor Group
Two DISRUPTIVE forces have changed
the curve:
PARALLELISM and
The CLOUD
It’s not really about
Big Data???
It’s about
Part 2: Technology Disruption
It’s Over for Spinning Disk
u SSD is now on the Moore’s Law curve
u Disk is not and never was (in respect of seek time)
u All traditional databases were engineered for spinning disk and not for scale-out
u This explains the new DBMS products…
In-Memory Disruption
u Memory may gradually become the primary store for data (this impacts data flows)
u Almost all applications are poorly built for this
u Memory is an accelerator – as is CPU cache. This is becoming a factor
The Memory Cascade
u On chip speed v RAM • L1(32K) = 100x • L2(246K) = 30x • L3(8-20Mb) = 8.6x
u RAM v SSD • RAM = 300x
u SSD v Disk • SSD = 10x
Note: Vector instructions and data compression
u Computer u On-line u PC u Internet u Mobile u Internet of things
u Batch u Centralized u Client/server u Multi-tier u Service Orientation u Event Driven/Big
Data
Tech Revolutions
TECH REVOLUTION ARCHITECTURE
Event Driven/Big Data Architecture?
The Open Source Picture
u The R Language • Over 1 million
users u Hadoop and its
Ecosystem • Reduced latency
for analytics u Machine Learning
Algorithms • Raw power
None of these are engineered for performance
Part 3: Data Flow
What Is A Data Scientist?
u Project manager u Qualified statistician u Domain Business
expert u Experienced data
architect u Software engineer
(IT’S A TEAM)
A Process, Not an Activity
u Data Analytics is a multi-disciplinary end-to-end process
u Until recently it was a walled-garden. But recently the walls were torn down by…
• Data availability • Scalable technology • Open source tools
The CRITICAL Workload Issue
u Previously, we viewed database workloads as an i/o optimization problem
u With analytics the workload is a very variable mix of i/o and calculation
u No databases were built precisely for this – not even Big Data databases
Take Note
You can know more about a BUSINESS from
its data than by any other means
The Biological System
u Our human control system works at different speeds: • Almost instant reflex • Swift response • Considered response
u Organizations will gradually implement similar control systems
u This suggests a data-flow- based architecture
The Corporate Biological System
u Right now this division into two different data flows is already occurring
u Currently we can distinguish between: • Real-time/Business time
applications • Analytical applications
u We should build specific architectures for this
Some Architectural Principles
u The new atom of data is the event
u SUSO, scale up before scale out
u Take the processing to the data, if you can
u Hadoop is a component not a solution
In Conclusion
The Big Data Curve?
Data Flow
Technology Disruption
PART ONE
PART THREE
PART TWO
Questions?
#BigDataArch or
USE THE Q&A
THANK YOU!
REGISTER FOR BDIA WEBCASTS AT: http://insideanalysis.com/research/big-data-information-architecture