Post on 15-Jul-2015
Stonebraker Live!
Navigating the Database Universe
VoltDB presents
SCOTT JARR
Co-founder and Chief Strategy Officer
• The (proper) design of DBMSs
– Presented by Dr. Michael Stonebraker, Co-founder
• The database universe
– Presented by Scott Jarr, Co-founder and Chief Strategy Officer
• Introducing VoltDB 3.0
– Presented by Mark Hydar, VP of Market Technology and Strategy
Agenda
• “Big Data” is a rare, transformative market
• Velocity is becoming the cornerstone
• Specialized databases (working together) are
the answer
• Products must provide tangible customer
value... Fast
We Believe…
THE (PROPER) DESIGN
OF THE DBMS
Dr. Michael Stonebraker
Lessons from 40 Years of Database Design
1. Get the user interaction right
– Bet on a small number of easy-to-understand constructs
– Plus standards
2. Get the implementation right
– Bet on a small number of easy-to-understand constructs
3. One size does not fit all
– At least not if you want fast, big or complex
Those who don’t learn from history are destined to repeat it.
“”-Winston Churchill
#1: Get the User Interaction Right
Winner: RDBMS
• Simple data model (tables)
• Simple access language (SQL)
• ACID (transactions)
• Standards (SQL)
Loser: CODASYL• Complicated data model
(records; participate in “sets”; set has one owner and, perhaps, many members, etc.)
• Messy access language (sea of “cursors”; some -- but not all -- move on every command, navigation programming)
Loser: OODBs
• Complex data model (hierarchical records, pointers, sets, arrays, etc.)
• Complex access language (navigation, through this sea)
• No standards
Historical Lesson: RDBMS vs. CODASYL vs. OODB
Interaction Take Away − Simple is Good
• ACID was easy for people to understand
• SQL provided a standard, high-level language and
made people productive (transportable skills)
#2: Get the Implementation Right
• Leverage a few simple ideas: Early relational implementations
– System R storage system dropped links
– Views (protection, schema modification, performance)
– Cost-based optimizer
• Leverage a few simple ideas: Postgres
– User-defined data types and functions (adopted by most everybody)
– Rules/triggers
– No-overwrite storage
• Leverage a few simple ideas: Vertica
– Store data by column
– Compressed up the ging gong
– Parallel load without compromising ACID
His
toric
al W
inn
ers
#3: One Size Does NOT Fit All
• OSFA is an old technology with hundreds
of bags hanging off it
• It breaks 100% of the time when under
load
• Load = size or speed or complexity
• Load is increasing at a startling rate
• Purpose-built will exceed by 10x to 100x
• History has not been completely written
yet…but let’s look at VoltDB as an
example
…specialized systems can each be a factor of 50 faster than the single ‘one size fits all’ system…A factor of 50 is nothing to sneeze at.
“
”-My Top 10 Assertions About Data Warehouses, 2010
Example: VoltDB
• Get the interface right– SQL
– ACID
• Implementation: Leverage a few simple ideas– Main memory
– Stored procedures
– Deterministic scheduling
• Specialization– OLTP focus allowed for above implementation choices
Proving the Theory
• Challenge: OLTP performance
– TPC-C CPU cycles
– On the Shore DBMS prototype
– Elephants should be similar
Recovery 24%Latching 24%
Buffer Pool 24%Locking 24%
Useful Work4%
Single Threaded
• Gets rid of the latching problem
• What about Multicore?
– Divide the memory on an N-core node so it looks like N single-core nodes
– Which are single threaded…
Implementation Construct #1: Main Memory
• Main memory format for data
– Disk format gets you buffer pool overhead
• What happens if data doesn’t fit?
– Return to disk-buffer pool architecture (slow)
– Anti-caching
• Main memory format for data
• When memory fills up, then bundle together elderly tuples and write them out
• Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin)
• Run Xact normally
Implementation Construct #2: Stored Procedures
• Round trip to the DBMS is expensive
– Do it once per transaction
– Not once per command
– Or even once per cursor move
• Ad-hoc queries supported
– Turn them into dynamic stored procedures
Implementation Construct #3: Deterministic Scheduling
• Transactions are ordered and run to completion
– No locking
• Active-active replication (HA)
– Run transaction at all replicas – in the same pre-determined order
• What about a cluster-wide power failure?
– Asyn checkpointing
– With a command log
– Wildly faster than data logging
Result of Design Principles: VoltDB Example
• Good interface decisions – made developers more productive
– SQL & ACID
• Leveraging a few simple implementation ideas – made
VoltDB wicked fast
– Main memory
– Stored procedures
– Deterministic scheduling
Proving the Theory
• Answer: OLTP performance
– 3 million transactions per second
– 7x Cassandra
– 15 million SQL statements per
second
– 100,000+ transactions per
commodity server
…we are heading toward a world with at least 5 (and probably more) specialized engines and the death of the ‘one size fits all’ legacy systems.
“
”-The End of an Architectural Era (It’s Time for a Complete
Rewrite), 2007
THE DATABASE UNIVERSE
Scott Jarr
Technology Meets the Market
Believe
– “Big Data” is a rare, transformative market
– Velocity is becoming the cornerstone
– Specialized databases (working together) are the answer
– Products must provide tangible customer value… Fast
Observations
– Noisy, crowded and new – kinda like Christmas shopping at the mall
– Everyone wants to understand where the pieces fit
– Analysts build maps on technology NOT use cases
What we need is…
Data Value Chain
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Milliseconds Hundredths of seconds Second(s) Minutes Hours
• Place trade
• Serve ad
• Enrich stream
• Examine packet
• Approve trans.
• Calculate risk
• Leaderboard
• Aggregate
• Count
• Retrieve click
stream
• Show orders
• Backtest algo
• BI
• Daily reports
• Algo discovery
• Log analysis
• Fraud pattern match
Age of Data
Data Value Chain
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Milliseconds Hundredths of seconds Second(s) Minutes Hours
• Place trade
• Serve ad
• Enrich stream
• Examine packet
• Approve trans.
• Calculate risk
• Leaderboard
• Aggregate
• Count
• Retrieve click
stream
• Show orders
• Backtest algo
• BI
• Daily reports
• Algo discovery
• Log analysis
• Fraud pattern match
Value of Individual
Data Item
Da
ta V
alu
e
Aggregate
Data Value
Age of Data
Traditional RDBMSSimple SlowSmall
FastComplexLarge
Ap
pli
ca
tio
n C
om
ple
xit
y
Value of Individual Data Item Aggregate Data Value
Da
ta V
alu
e
The Database Universe
Interactive Real-time Analytics Record Lookup Historical AnalyticsExploratory
Analytics
Transactional Analytic
Traditional RDBMS
Simple SlowSmall
FastComplexLarge
Ap
pli
ca
tio
n C
om
ple
xit
y
Value of Individual Data Item Aggregate Data Value
Da
ta V
alu
e
Data
Warehouse
Hadoop, etc.NoSQL
The Database Universe
Interactive Real-time Analytics Record Lookup Historical AnalyticsExploratory
Analytics
Transactional Analytic
NewSQL
Velocity
Closed-loop Big Data
Interactive & Real-time Analytics
Historical Reports & Analytics
Exploratory Analytics
loginssensors impressionsorders
authorizations clickstrades
Closed-loop Big Data
• Make the most
informed decision
every time there is an
interaction
• Real-time decisions
are informed by
operational analytics
and past knowledge
Knowledge
Interactive & Real-time Analytics
Historical Reports & Analytics
Exploratory Analytics
loginssensors impressionsorders
authorizations clickstrades
The Velocity Use Case
What’s it look like?
– High throughput, relentless data feeds
– Fast decisions on high-value data
– Real-time, operational analytics present immediate visibility
What’s the big deal?
– Batch visibility converts to real time = immediate business impact
– Decisions made at time of event = higher impact decisions with immediate returns
– Ability to ingest and manage massive amounts of data = business differentiation and disruption
HELLO 3.0!
Mark Hydar
Introducing VoltDB 3.0
Introducing VoltDB 3.0
• Available now!
– Both commercial and open source offerings
– www.voltdb.com/downloads
• Key improvements
– Even faster
– Easier to build high-velocity applications
– Expanded reach across developers and applications
– Extensible to integrate with existing data infrastructure
Latency and Throughput, 50-50 Read/Write Workload
Latency and Throughput, 50-50 Read/Write Workload
0
2
4
6
8
10
12
14
16
-50000 0 50000 100000 150000 200000 250000 300000
Late
ncy (
ms)
TPS
3.0
2.8.4.1
VoltDB 3.0 vs. v2.8.4.1
Key/Value 50/50 read/write workload
3 Node, K=1 Cluster
Read/Write Workload Latency/Throughput
Read/Write Workload Latency/Throughput
0
1
2
3
4
5
6
7
8
9
-50000 0 50000 100000 150000 200000 250000 300000 350000
Avg
. L
ate
ncy (
ms)
TPS
10% read/90% write
50% read/50% write
90% read/10% write
VoltDB 3.0
Key/Value various read/write workload
3 Node, K=1 Cluster
Faster: Ad Hoc SQL Performance
• Conversational SQL
• Thousands to 10,000+ ad hoc SQL transactions/second
• Single or multiple (batch) SQL statement transactionFaster: Ad Hoc SQL
Performance
Easier Development: New SQL Support
• SQL LIKE and NOT LIKE
• UNION
• Column Functions
• Counting function (leaderboard ranking queries)
• Ability to define index using column functions
Easier Development: New SQL Support
• JSON values stored in a varchar column
• Field() column function
• Indexing on JSON elements
CREATE INDEX session_site_moderator
ON user_session_table (field(json_data, 'site'),
field(json_data, 'moderator'), username);
• New JSON sample in kit
Easier Development: JSON Support
Easier Development: JSON Support
Easier Development: Online Operations
Easier Development: Online Operations
• Ability to re-join a failed node to cluster with no impact to
existing operations
• Online schema update
• No service window
Easier Development: Streamlined Development
• Elimination of project.xml
• VoltDB-specific configuration now defined in DDL
• Defaulting of deployment.xml
• New Volt Compiler CLI:
voltdb compile
Easier Development:
Streamlined Development
Expanded Reach: Cloud-Friendly
• Reduce impact of variable node performance and latency
• Elimination of strict NTP configuration
• Scales to large # of nodesExpanded Reach:
Cloud-Friendly
Integration: High-Performance Export
• Parallelized export
• New connectors: JDBC, Netezza, VerticaIntegration: High-Performance Export
Integration: Client Library Updates
• New PHP Client
• Node.js client v1.0
• Go Client
• Coming soon: updated Erlang client
Integration: Client Library Updates
http://golang.org
Other Notable New Features
• Explain command
• CSV loader utility
• CSV snapshots
• New Administration CLI: voltadmin
– voltadmin save
– voltadmin restore
– voltadmin pause
– voltadmin resume
– voltadmin shutdown
Other Notable New Features
More Samples Available
for Download
More Samples Available for Download
http://voltdb.com/comm
unity/volt-labs.php
Volt University
• Portfolio of instructional content, classes, tools, and other resources to help them built applications quickly
• Curriculum and supporting material range from beginner to advanced
• Three types of instruction:
– Volt University Online
– Volt University Classroom
– Volt Vanguard Certification
Volt University
Summary: VoltDB v3.0 Features
• Even faster
• Easier to build high-velocity applications
• Expanded reach across developers and applications
• Extensible to integrate with existing data infrastructure
• Volt Labs
• Volt University
VoltDB v3.0
DOWNLOAD 3.0
atwww.voltdb.com
Imagine the
Possibilities
More Information?
E-mail info@voltdb.com
Visit our forumshttp://community.voltdb.com/forum
Read the VoltDB “Getting Started Guide”http://community.voltdb.com/docs/GettingStarted/index
Follow@VoltDB on Twitter
More Information?
QUESTIONS?
THANK YOU