Data Technology Landscape Is Rapidly Evolving
description
Transcript of Data Technology Landscape Is Rapidly Evolving
© 2012 Unisys Corporation. All rights reserved. 1Unisys Corporation. Proprietary and Confidential.
© 2012 Unisys Corporation. All rights reserved. 2
Data Technology Landscape Is Rapidly Evolving
Relational hegemony is over– Disruptive data technologies abound– Open source, new data models, NoSQL systems– One size no longer fits all
Focus expanded from write- to read-intensive applications
Old constraints are falling away– Big memory, big storage, big CPU farms, big interconnect– Virtual machines everywhere– New applications with massive data volumes (social networking, BI) – Less restrictive transaction models promote scalability
2
Mike Stonebraker
“It’s time for a
complete rewrite”
UC BerkeleyMITIngresPostgresIllustraStreambaseVerticaVoltDBand more
OLTP Analytics40-odd yearsOLTP
Analytics
© 2012 Unisys Corporation. All rights reserved. 3
Hadoop Mimics Google as Big Data Store
3
Google File SystemHadoop Distributed
File System
Map/Reduce Map/Reduce
BigTable HBase
MegastoreGoogle App Engine
Pig Latin,Hive, Zookeeper,Vendor Analytics
Apache Software Foundation
Distributed File System
Table-like Data Model
Data Access Technique
Applications
YourData
Everywhere
© 2012 Unisys Corporation. All rights reserved. 4
Dat
a ‘s
hard
ed’ a
cros
s no
des
How HDFS and GFS Work
“Shared Nothing” Data Nodes
YourData
Everywhere
© 2012 Unisys Corporation. All rights reserved. 5
Map/Reduce Algorithm
void map(String name, String document): // name: document name // document: document contents for each word w in document:
EmitIntermediate(w, "1");
void reduce(String word, Iterator wordCounts): // word: a word // wordCounts: list of aggregated counts int sum = 0; for each pc in wordCounts:
sum += ParseInt(pc); Emit(word, AsString(sum));
A programming pattern– Inspired by functional programming languages – For large scale parallel applications
Parallel Algorithm– Map preps input data into <key, value> pairs,
here <word, count>– Merge (or Combine) phase relevant <word,
count> pairs, arranging them by word– Reduce sums counts for each word,
constructs final result
Optimized for unstructured data– Minimum metadata stored in dist. file system– Data knowledge resides in map and reduce
programs
Parts of the algorithm are patented by Google
– US Patent #7,650,331– Filed June 18, 2004, granted January 19,
2010 – Licensed to Hadoop in April, 2010
Standard example is word counting
Return
YourData
Everywhere
© 2012 Unisys Corporation. All rights reserved. 6Unisys Corporation. Proprietary and Confidential.
© 2012 Unisys Corporation. All rights reserved. 7Unisys Corporation. Proprietary and Confidential.