Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

47
Stonebraker Live! Navigating the Database Universe VoltDB presents

Transcript of Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Page 1: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Stonebraker Live!Navigating the Database Universe

VoltDB presents

Page 2: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

SCOTT JARR

Co-founder and Chief Strategy Officer

Page 3: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

• The (proper) design of DBMSs– Presented by Dr. Michael Stonebraker, Co-founder

• The database universe

– Presented by Scott Jarr, Co-founder and Chief Strategy Officer

• Introducing VoltDB 3.0

– Presented by Mark Hydar, VP of Market Technology and Strategy

Agenda

Page 4: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

• “Big Data” is a rare, transformative market

• Velocity is becoming the cornerstone

• Specialized databases (working together) are the answer

• Products must provide tangible customer value... Fast

We Believe…

Page 5: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

THE (PROPER) DESIGNOF THE DBMS

Dr. Michael Stonebraker

Page 6: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Lessons from 40 Years of Database Design

1. Get the user interaction right

– Bet on a small number of easy-to-understand constructs

– Plus standards

2. Get the implementation right

– Bet on a small number of easy-to-understand constructs

3. One size does not fit all

– At least not if you want fast, big or complex

Those who don’t learn from history are destined to repeat it.

“”-Winston Churchill

Page 7: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

#1: Get the User Interaction Right

Winner: RDBMS• Simple data model

(tables)• Simple access

language (SQL)• ACID (transactions)• Standards (SQL)

Loser: CODASYL• Complicated data model

(records; participate in “sets”; set has one owner and, perhaps, many members, etc.)

• Messy access language (sea of “cursors”; some -- but not all -- move on every command, navigation programming)

Loser: OODBs• Complex data model

(hierarchical records, pointers, sets, arrays, etc.)

• Complex access language (navigation, through this sea)

• No standards

Historical Lesson: RDBMS vs. CODASYL vs. OODB

Page 8: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Interaction Take Away − Simple is Good

• ACID was easy for people to understand

• SQL provided a standard, high-level language and made people productive (transportable skills)

Page 9: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

#2: Get the Implementation Right

• Leverage a few simple ideas: Early relational implementations– System R storage system dropped links– Views (protection, schema modification, performance)– Cost-based optimizer

• Leverage a few simple ideas: Postgres– User-defined data types and functions (adopted by most everybody)– Rules/triggers– No-overwrite storage

• Leverage a few simple ideas: Vertica– Store data by column– Compressed up the ging gong– Parallel load without compromising ACID

Histo

rical Win

ners

Page 10: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

#3: One Size Does NOT Fit All

• OSFA is an old technology with

hundreds of bags hanging off it

• It breaks 100% of the time when under

load

• Load = size or speed or complexity

• Load is increasing at a startling rate

• Purpose-built will exceed by 10x to 100x

• History has not been completely written

yet…but let’s look at VoltDB as an

example

…specialized systems can each be a factor of 50 faster than the single ‘one size fits all’ system…A factor of 50 is nothing to sneeze at.

”-My Top 10 Assertions About Data Warehouses, 2010

Page 11: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Example: VoltDB

• Get the interface right– SQL– ACID

• Implementation: Leverage a few simple ideas– Main memory– Stored procedures– Deterministic scheduling

• Specialization– OLTP focus allowed for above implementation choices

Page 12: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Proving the Theory

• Challenge: OLTP performance

– TPC-C CPU cycles

– On the Shore DBMS prototype

– Elephants should be similar

Recovery 24%Latching 24%

Buffer Pool 24%Locking 24%

Useful Work4%

Page 13: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Single Threaded

• Gets rid of the latching problem

• What about Multicore?

– Divide the memory on an N-core node so it looks like N single-core nodes

– Which are single threaded…

Page 14: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Implementation Construct #1: Main Memory

• Main memory format for data

– Disk format gets you buffer pool overhead

• What happens if data doesn’t fit?

– Return to disk-buffer pool architecture (slow)

– Anti-caching

• Main memory format for data

• When memory fills up, then bundle together elderly tuples and write them out

• Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin)

• Run Xact normally

Page 15: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Implementation Construct #2: Stored Procedures

• Round trip to the DBMS is expensive

– Do it once per transaction

– Not once per command

– Or even once per cursor move

• Ad-hoc queries supported

– Turn them into dynamic stored procedures

Page 16: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Implementation Construct #3: Deterministic Scheduling

• Transactions are ordered and run to completion

– No locking

• Active-active replication (HA)

– Run transaction at all replicas – in the same pre-determined order

• What about a cluster-wide power failure?– Asyn checkpointing

– With a command log

– Wildly faster than data logging

Page 17: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Result of Design Principles: VoltDB Example

• Good interface decisions – made developers more productive

– SQL & ACID

• Leveraging a few simple implementation ideas – made VoltDB wicked fast

– Main memory

– Stored procedures

– Deterministic scheduling

Page 18: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Proving the Theory

• Answer: OLTP performance

– 3 million transactions per second

– 7x Cassandra

– 15 million SQL statements per second

– 100,000+ transactions per commodity server

…we are heading toward a world with at least 5 (and probably more) specialized engines and the death of the ‘one size fits all’ legacy systems.

”-The End of an Architectural Era (It’s Time for a Complete

Rewrite), 2007

Page 19: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

THE DATABASE UNIVERSE

Scott Jarr

Page 20: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Technology Meets the Market

Believe

– “Big Data” is a rare, transformative market

– Velocity is becoming the cornerstone

– Specialized databases (working together) are the answer

– Products must provide tangible customer value… Fast

Observations

– Noisy, crowded and new – kinda like Christmas shopping at the mall

– Everyone wants to understand where the pieces fit

– Analysts build maps on technology NOT use cases

What we need is…

Page 21: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Data Value Chain

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.

• Calculate risk• Leaderboard• Aggregate• Count

• Retrieve click stream

• Show orders

• Backtest algo• BI• Daily reports

• Algo discovery• Log analysis• Fraud pattern match

Age of Data

Page 22: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Data Value Chain

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.

• Calculate risk• Leaderboard• Aggregate• Count

• Retrieve click stream

• Show orders

• Backtest algo• BI• Daily reports

• Algo discovery• Log analysis• Fraud pattern match

Value of Individual Data Item

Data V

alue

AggregateData Value

Age of Data

Page 23: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Traditional RDBMSSimple SlowSmall

FastComplexLarge

Ap

pli

cati

on

Co

mp

lexi

ty

Value of Individual Data Item Aggregate Data Value

Data V

alue

The Database Universe

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Transactional Analytic

Page 24: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Traditional RDBMS

Simple SlowSmall

FastComplexLarge

Ap

pli

cati

on

Co

mp

lexi

ty

Value of Individual Data Item Aggregate Data Value

Data V

alue

Data Warehouse

Hadoop, etc.NoSQL

The Database Universe

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Transactional Analytic

NewSQL

Velocity

Page 25: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Closed-loop Big Data

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

loginssensors impressionsorders

authorizations clickstrades

Page 26: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Closed-loop Big Data

• Make the most informed decision every time there is an interaction

• Real-time decisions are informed by operational analytics and past knowledge

Knowledge

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

loginssensors impressionsorders

authorizations clickstrades

Page 27: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

The Velocity Use Case

What’s it look like?

– High throughput, relentless data feeds

– Fast decisions on high-value data

– Real-time, operational analytics present immediate visibility

What’s the big deal?

– Batch visibility converts to real time = immediate business impact

– Decisions made at time of event = higher impact decisions with immediate returns

– Ability to ingest and manage massive amounts of data = business differentiation and

disruption

Page 28: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

HELLO 3.0!

Mark Hydar

Page 29: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Introducing VoltDB 3.0

Introducing VoltDB 3.0

• Available now!

– Both commercial and open source offerings

– www.voltdb.com/downloads

• Key improvements

– Even faster

– Easier to build high-velocity applications

– Expanded reach across developers and applications

– Extensible to integrate with existing data infrastructure

Page 30: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Latency and Throughput, 50-50 Read/Write Workload

Latency and Throughput, 50-50 Read/Write Workload

0 20000 40000 60000 80000 100000 120000 140000 160000 180000 2000000

2

4

6

8

10

12

14

16

3.02.8.4.1

TPS

La

ten

cy

(m

s)

VoltDB 3.0 vs. v2.8.4.1Key/Value 50/50 read/write workload

3 Node, K=1 Cluster

Page 31: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Read/Write Workload Latency/Throughput

Read/Write Workload Latency/Throughput

0 50000 100000 150000 200000 250000 300000 3500000

1

2

3

4

5

6

7

8

9

10% read/90% write

50% read/50% write

90% read/10% write

TPS

Avg

. L

aten

cy (

ms)

VoltDB 3.0Key/Value various read/write workload

3 Node, K=1 Cluster

Page 32: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Faster: Ad Hoc SQL Performance

• Conversational SQL

• Thousands to 10,000+ ad hoc SQL transactions/second

• Single or multiple (batch) SQL statement transactionFaster: Ad Hoc SQL Performance

Page 33: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Easier Development: New SQL Support

• SQL LIKE and NOT LIKE

• UNION

• Column Functions

• Counting function (leaderboard ranking queries)

• Ability to define index using column functions

Easier Development: New SQL Support

Page 34: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

• JSON values stored in a varchar column

• Field() column function

• Indexing on JSON elements

CREATE INDEX session_site_moderator

ON user_session_table (field(json_data, 'site'),

field(json_data, 'moderator'), username);

• New JSON sample in kit

Easier Development: JSON Support

Easier Development: JSON Support

Page 35: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Easier Development: Online Operations

Easier Development: Online Operations

• Ability to re-join a failed node to cluster with no impact to existing operations

• Online schema update

• No service window

Page 36: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Easier Development: Streamlined Development

• Elimination of project.xml

• VoltDB-specific configuration now defined in DDL

• Defaulting of deployment.xml

• New Volt Compiler CLI:

voltdb compile

Easier Development: Streamlined Development

Page 37: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Expanded Reach: Cloud-Friendly

• Reduce impact of variable node performance and latency

• Elimination of strict NTP configuration

• Scales to large # of nodesExpanded Reach: Cloud-Friendly

Page 38: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Integration: High-Performance Export

• Parallelized export

• New connectors: JDBC, Netezza, VerticaIntegration: High-Performance Export

Page 39: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Integration: Client Library Updates

• New PHP Client

• Node.js client v1.0

• Go Client

• Coming soon: updated Erlang client

Integration: Client Library Updates

http://golang.org

Page 40: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Other Notable New Features

• Explain command

• CSV loader utility

• CSV snapshots

• New Administration CLI: voltadmin– voltadmin save

– voltadmin restore

– voltadmin pause

– voltadmin resume

– voltadmin shutdown

Other Notable New Features

Page 41: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

More Samples Available for Download

More Samples Available for Download

http://voltdb.com/community/volt-labs.php

Page 42: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Volt University

• Portfolio of instructional content, classes, tools, and other resources to help them built applications quickly

• Curriculum and supporting material range from beginner to advanced

• Three types of instruction:

– Volt University Online

– Volt University Classroom

– Volt Vanguard Certification

Volt University

Page 43: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Summary: VoltDB v3.0 Features

• Even faster

• Easier to build high-velocity applications

• Expanded reach across developers and applications

• Extensible to integrate with existing data infrastructure

• Volt Labs

• Volt University

VoltDB v3.0

Page 44: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

DOWNLOAD 3.0at

www.voltdb.com

Imagine the Possibilities

Page 45: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

More Information?

E-mail [email protected]

Visit our forumshttp://community.voltdb.com/forum

Read the VoltDB “Getting Started Guide”http://community.voltdb.com/docs/GettingStarted/index

Follow @VoltDB on Twitter

More Information?

Page 46: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

QUESTIONS?

Page 47: Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

THANK YOU