Data Technology Landscape Is Rapidly Evolving

© 2012 Unisys Corporation. All rights reserved. 2

Data Technology Landscape Is Rapidly Evolving

Relational hegemony is over– Disruptive data technologies abound– Open source, new data models, NoSQL systems– One size no longer fits all

Focus expanded from write- to read-intensive applications

Old constraints are falling away– Big memory, big storage, big CPU farms, big interconnect– Virtual machines everywhere– New applications with massive data volumes (social networking, BI) – Less restrictive transaction models promote scalability

2

Mike Stonebraker

“It’s time for a

complete rewrite”

UC BerkeleyMITIngresPostgresIllustraStreambaseVerticaVoltDBand more

OLTP Analytics40-odd yearsOLTP

Analytics


Hadoop Mimics Google as Big Data Store

3

Google File SystemHadoop Distributed

File System

Map/Reduce Map/Reduce

BigTable HBase

MegastoreGoogle App Engine

Pig Latin,Hive, Zookeeper,Vendor Analytics

Apache Software Foundation

Distributed File System

Table-like Data Model

Data Access Technique

Applications

YourData

Everywhere


Dat

a ‘s

hard

ed’ a

cros

s no

des

How HDFS and GFS Work

“Shared Nothing” Data Nodes

YourData

Everywhere


Map/Reduce Algorithm

void map(String name, String document): // name: document name // document: document contents for each word w in document:

EmitIntermediate(w, "1");

void reduce(String word, Iterator wordCounts): // word: a word // wordCounts: list of aggregated counts int sum = 0; for each pc in wordCounts:

sum += ParseInt(pc); Emit(word, AsString(sum));

A programming pattern– Inspired by functional programming languages – For large scale parallel applications

Parallel Algorithm– Map preps input data into <key, value> pairs,

here <word, count>– Merge (or Combine) phase relevant <word,

count> pairs, arranging them by word– Reduce sums counts for each word,

constructs final result

Optimized for unstructured data– Minimum metadata stored in dist. file system– Data knowledge resides in map and reduce

programs

Parts of the algorithm are patented by Google

– US Patent #7,650,331– Filed June 18, 2004, granted January 19,

2010 – Licensed to Hadoop in April, 2010

Standard example is word counting

Return

YourData

Everywhere

Data Technology Landscape Is Rapidly Evolving

Documents

Transcript of Data Technology Landscape Is Rapidly Evolving