drsql.org How In-Memory Affects Database Design Louis Davidson
Certified Nerd 1
Slide 2
drsql.org 2 Who am I? Been in IT for over 19 years Microsoft
MVP For 10 Years Corporate Data Architect Written five books on
database design Ok, so they were all versions of the same book.
They at least had slightly different titles each time Basically: I
love Database Design 2
Slide 3
drsql.org Questions are Welcome Please limit questions to ones
I know the answer to. 3
Slide 4
drsql.org Attention: There Is Homework (lots of it) I cant
teach you everything about In-Memory in 1 hour The code will be
available, but it is still very rudimentary It will get you
started, but is only just the tip of the iceberg Do lots of thinkin
and testin before divin in 4
Slide 5
drsql.org Introduction: What exactly is In-Memory OLTP in SQL
Server 2014? A totally new, revamped engine for data storage,
co-located in the same database with the existing engine Obviously
Enterprise Only Purpose built for certain scenarios Terminology can
be confusing Existing tables: Home - On-Disk, but ideally cached
In-Memory In-Memory tables: Home - In-Memory: but backed up by
On-Disk Structures If you have enough RAM, On-Disk tables are also
in memory But the implementation is very very different In-Memory
is both very easy, and very difficult to use 5
Slide 6
drsql.org Design Basics (And no, I am not stalling for time due
to lack of material) Designing and Coding is Like the Chicken and
the Egg Design is what you do before coding Coding patterns can
greatly affect design Engine implementation can greatly affect
design and coding patterns We will discuss how In-Memory
technologies affect the entire design/development lifecycle As if
Children I was first Relics 6
Slide 7
drsql.org Design Basics - Separate your design mind into three
phases 1.Logical (Overall data requirements in a data model format)
2.Physical Implementation Choice (Indexes, Physical Structures,
etc) 3.Physical (Relational Code) Before the engine choice I always
suggested 3 before 2 We will look at each of these phases and how
in-mem may affect your design of each output 7
Slide 8
drsql.org Logical Design (Though Not Everyones Is) This is the
easiest part of the presentation You still need to model Entities
and Attributes Uniqueness Conditions General Predicates As I see
it, nothing changes 8
Slide 9
drsql.org Logical Data Model 9
Slide 10
drsql.org SQL Server.exe Data Filegroup TDS Handler and Session
Management Physical Implementation Overview Buffer Pool for Tables
& Indexes Proc/Plan cache for ad- hoc T-SQL and SPs Client App
Transaction Log Interpreter for TSQL, query plans, expressions
Access Methods Parser, Catalog, Algebrizer, Optimizer 10-30x more
efficient Reduced log bandwidth & contention. Log latency
remains Memory-optimized Table Filegroup Engine for
Memory_optimized Tables & Indexes Natively Compiled SPs and
Schema Hekaton Compiler Query Interop Checkpoints are background
sequential IO No improvements in communication stack, parameter
passing, result set generation Hekaton Component Key Existing SQL
Component Generated.dll
Slide 11
drsql.org Physical Implementation (Or DBA stuff that I only
slightly care about) Everything is different, and I am not here to
cover these details In-Mem data structures coexist in the database
alongside On- Disk ones Data is housed in RAM, and backed up in
Delta Files and Transaction Logs Delta files are stored as
filestream storage The transaction log is the same one as you are
used to Tables and Indexes are extremely coupled MVCC (Multi-Valued
Concurrency Control) used for all isolation 11
Slide 12
drsql.org Physical Design (No, lets not get physical) Your
physical design will almost certainly need to be affected So much
changes, even just changing the table structure In this section, we
will discuss: Creating storage objects Table Creation Index
Creation (which is technically part of the table creation) Altering
a Tables Structure Accessing (Modifying/Creating) data Using Normal
T-SQL (Interop) Using Compiled Code (Native) Using a Hybrid
Approach No Locks, No Latches, No Waiting 12
Slide 13
drsql.org Creating Storage Objects - Tables The syntax is the
same as on-disk, with a few additional settings You have a
durability choices In-Mem Table: Schema_Only or Schema_and_Data
Database level for transactions: Delayed (also for on-disk tables)
Basically Asynchronous Log Writes Aaron Bertrand has a great
article on this here:
http://sqlperformance.com/2014/04/io-subsystem/delayed-durability-in-sql-
server-2014http://sqlperformance.com/2014/04/io-subsystem/delayed-durability-in-sql-
server-2014 You also have less to work with... Rowsize limited to
8060 bytes (Enforced at Create Time) Not all datatypes allowed (LOB
types,CLR,sql_variant, datetimeoffset, rowversion) No check
constraints No foreign keys Limited unique constraints (just one
unique index per table) Every durable (Schema_and_Data) table must
have a primary key Note: There are memory optimized temporary
tables too: See Kendra Littles article here:
http://www.brentozar.com/archive/2014/04/table-variables-good-temp-tables-sql-2014/
http://www.brentozar.com/archive/2014/04/table-variables-good-temp-tables-sql-2014/
13
Slide 14
drsql.org Dealing with Un-Supported Datatypes Say you have a
table with 10 columns, but 1 is not allowed in a In-Memory table
First: Ask yourself if the table really fits the criteria we arent
done covering Second: If so, consider vertically partitioning
CREATE TABLE In_Mem (KeyValue, Column1, Column2, Column3) CREATE
TABLE On_Disk (KeyValue, Column4) It is likely that uses of
disallowed types wouldnt be good for the OLTP aspects of the table
in any case. 14
Slide 15
drsql.org Creating Storage Objects - Index creation Syntax is
inline with CREATE TABLE Indexes are linked directly to the table 8
indexes max per table due to internals Only one unique index
allowed Indexes are never persisted, but are rebuilt on restart
String index columns must be a binary collation (case AND access
sensitive) Two types Hash Ideal for single row lookups Fixed size,
you choose the number of hash buckets (approx 1-2 * # of unique
values http://msdn.microsoft.com/en-us/library/dn494956.aspx)
http://msdn.microsoft.com/en-us/library/dn494956.aspx Bw Tree Best
for range searches Very similar to a BTree index as you (hopefully)
know it, but optimized for MVCC and pointer connection to table
15
Slide 16
drsql.org A Taste of the Physical Structures A table with two
hash indexes From Kalens Whitepaper:
http://t.co/T6zToWc6y6http://t.co/T6zToWc6y6 16
Slide 17
drsql.org Do you want to know more? For more in-depth coverage
check Kalen Delaney's white paper...
http://t.co/T6zToWc6y6http://t.co/T6zToWc6y6 Or for an even deeper
(nerdier?) versions: Hekaton: SQL Servers Memory-Optimized OLTP
Engine
http://research.microsoft.com/apps/pubs/default.aspx?id=193594 or
The Bw-Tree: A B-tree for New Hardware Platforms
(http://research.microsoft.com/pubs/178758/bw-tree-icde2013-
final.pdf)
http://research.microsoft.com/apps/pubs/default.aspx?id=193594http://research.microsoft.com/pubs/178758/bw-tree-icde2013-
final.pdf Books Online: http://technet.microsoft.com/en-
us/library/dn133186.aspxhttp://technet.microsoft.com/en-
us/library/dn133186.aspx 17
Slide 18
drsql.org Creating Storage Objects - Altering a Table The is
the second easiest slide in the deck No alterations allowed -
Strictly Drop and Recreate You can rename a table, which makes this
at east easier ALTER 18
Slide 19
drsql.org DEMO - CREATING TABLES 19
Slide 20
drsql.org Accessing the Data - Using Normal T-SQL (Interop)
Using typical interpreted T-SQL Most T-SQL will work with no change
(you may need to add isolation level hints) A few Exceptions
TRUNCATE TABLE - This one is really annoying :) MERGE (In-Mem table
cannot be the target) Cross Database Transactions (other than
tempdb) Locking Hints 20
Slide 21
drsql.org Accessing the Data using Compiled Code (Native)
Instead of being interpreted, the stored procedure is compiled to
machine code Limited syntax (Like programming with both hands tied
behind your back) Allowed syntax is listed in what is available,
not what isn't
http://msdn.microsoft.com/en-us/library/dn452279.aspx
http://msdn.microsoft.com/en-us/library/dn452279.aspx Some really
extremely annoying ones: SUBSTRING supported; LEFT, RIGHT, not so
much No Subqueries OR, NOT, IN, not supported in WHERE clause Cant
use on-disk objects (tables, sequences, views, etc) So you may have
to write some "interesting" code 21
Slide 22
drsql.org Accessing Data Using a Hybrid Approach Native code is
very fast but very limited Use Native code where it makes sense,
and not where it doesnt Example: Creating a sequential value In the
demo code I used RAND() to create CustomerNumbers and
SalesOrderNumbers. Using a SEQUENCE is far more straightforward So
I made one Interpreted procedure that uses the SEQUENCE outside of
native code, then calls the native procedure 22
Slide 23
drsql.org Accessing the Data - No Locks, No Latches, No Waiting
On-Disk Structures use Latches and Locks to implement isolation
In-Mem use Optimistic-MVCC You have 3 Isolation Levels: SNAPSHOT,
REPEATABLE READ, SERIALIZABLE Evaluated before, or when the
transaction is committed This makes data integrity checking
"interesting" Essential difference, your code now must handle
errors 23
Slide 24
drsql.org Concurrency is the #1 difference you will deal with
Scenario1: 2 Connections - Update Every Row In 1 Million Rows Any
Isolation Level On-Disk Either: 1 connection blocks the other Or:
Deadlock In-Mem One connection will fail, saying: the row you are
trying to update has been updated since this transaction started
EVEN if it never commits. 24
Slide 25
drsql.org Another slide on Concurrency (Because if I had
presented it concurrently with the other one, you wouldnt have
liked that) Scenario2: 1 Connection Updates All Rows, Another Reads
All Rows (In an explicit transaction) On-Disk Either: 1 connection
blocks the other Or: Deadlock In-Mem Both Queries Execute
Immediately In SNAPSHOT ISOLATION the reader will always succeed In
REPEATABLE READ or SERIALIZABLE Commits transaction BEFORE updater
commits: Success Commits transaction AFTER updater commits: Fails
25
Slide 26
drsql.org The Difficulty of Data Integrity With on-disk
structures, we used constraints for most issues (Uniqueness,
Foreign Key, Simple Predicates) With in-memory code, we have to
implement in stored procedure Uniqueness on > 1 column set
suffers from timing (If N connections are inserting the same
data...MVCC will let them) Foreign Key can't reliably be done
because: In Snapshot Isolation Level, the row may have been deleted
while you check In Higher Levels, the transaction will fail if the
row has been updated Check constraint style work can be done in
stored procedures for the most part. 26
Slide 27
drsql.org Problem: How to Implement Uniqueness on > 1 Column
Set: INDEXED VIEW? CREATE VIEW
Customers.Customers$UniquenessEnforcement WITH SCHEMABINDING AS
SELECT customerId, emailAddress, customerNumber FROM
customers.Customer GO CREATE UNIQUE CLUSTERED INDEX emailAddress ON
Customers.Customers$UniquenessEnforcement (emailAddress) GO Msg
10794, Level 16, State 12, Line 8 The operation 'CREATE INDEX' is
not supported with memory optimized tables. 27
Slide 28
drsql.org Problem: How to Implement Uniqueness on > 1 Column
Set: Multiple Tables? Wow, that seems messy And what about
duplicate customerId values in the two subordinate tables? 28
Slide 29
drsql.org Problem: How to Implement Uniqueness on > 1 Column
Set: Simple code You cantexactly. But what if EVERY caller has to
go through the following block: DECLARE @CustomerId INT SELECT
@CustomerId = CustomerId FROM Customers.Customer WHERE EmailAddress
= @EmailAddress IF @customerId is null Do your insert This will
stop MOST duplication, but not all. Two inserters can check at the
same time, and with no blocks, app locks, or constraints even
available, you may get duplicates. Remember the term: Optimistic
Concurrency Control 29
Slide 30
drsql.org When Should You Make Tables In-Memory - Microsoft's
Advice From
http://msdn.microsoft.com/en-us/library/dn133186.aspxhttp://msdn.microsoft.com/en-us/library/dn133186.aspx
Implementation Scenario Benefits of In-Memory OLTP High data
insertion rate from multiple concurrent connections. Primarily
append-only store. Unable to keep up with the insert workload.
Eliminate contention. Reduce logging. Read performance and scale
with periodic batch inserts and updates. High performance read
operations, especially when each server request has multiple read
operations to perform. Unable to meet scale-up requirements.
Eliminate contention when new data arrives. Lower latency data
retrieval. Minimize code execution time. Intensive business logic
processing in the database server. Insert, update, and delete
workload. Intensive computation inside stored procedures. Read and
write contention. Eliminate contention. Minimize code execution
time for reduced latency and improved throughput. Low latency.
Require low latency business transactions which typical database
solutions cannot achieve. Eliminate contention. Minimize code
execution time. Low latency code execution. Efficient data
retrieval. Session state management. Frequent insert, update and
point lookups. High scale load from numerous stateless web servers.
Eliminate contention. Efficient data retrieval. Optional IO
reduction or removal, when using non-durable tables 30
Slide 31
drsql.org When Should You Make Tables In-Memory Louis's Advice
More or less the same as Microsoft's really (duh!) Things to factor
in High concurrency needs/Low chance of collisions Minimal
uniqueness protection requirements Minimal data integrity concerns
(minimal key update/deletes) Limited searching of data (binary
comparisons only) Limited need for transaction isolation/Short
transactions Basically, the hot tables in a strict OLTP
workloads... 31
Slide 32
drsql.org The Choices I made Louis has improved his methods for
estimating performance, but your mileage will still vary. Louis
tests are designed to reflect only one certain usage conditions and
user behavior, but several factors may affect your mileage
significantly: How & Where You Put Your Logs Computer Condition
& Maintenance CPU Variations Programmer Coding Variations Hard
Disk Break In Therefore, Louis performance ratings are a minimally
useful tool for comparing the performance of different strategies
but may not accurately predict the average performance you will
get. I seriously suggest you test the heck out of the technologies
yourself using my code, your code, and anyone elses code you can to
make sure you are getting the best performance possible.
Slide 33
drsql.org Model Choices Logical Model 33
Slide 34
drsql.org Model Choices Physical Model 34
Slide 35
drsql.org Model Choices Tables to Make In-Mem (First Try)
35
Slide 36
drsql.org Model Choices Tables to Make In-Mem (Final)
Slide 37
drsql.org The Grand Illusion (So you think your life is
complete confusion) Performance gains are not exactly what you may
expect, even when they are massive In my examples (which you have
seen), I discovered when loading 20000 rows (10 connections of 2000
each) (Captured using Adam Machanic's
http://www.datamanipulation.net/SQLQueryStress/
tool)http://www.datamanipulation.net/SQLQueryStress/ A.On-Disk
Tables with FK, Instead Of Trigger - 0.0472 seconds per row - Total
Time 1:12 B.On-Disk Tables withOUT FK, Instead Of Trigger - 0.0271
seconds per row - Total Time 0:51 C.In-Mem Tables using Interop
code - 0.0202 seconds per row - Total Time 0:44 D.In-Mem Tables
with Native Code - 0.0050 second per row - Total Time 0:31 E.In-Mem
Tables, Native Code, SCHEMA_ONLY 0.0003 seconds per row - Total
Time 00:30 F.In-Mem Tables (not CustomerAddress), Hybrid code
0.0163 Total Time 0:55 But should it be a lot better? Don't forget
the overhead... (And SQLQueryStress has extra for gathering
stats)
Slide 38
drsql.org 38 Contact info Louis Davidson -
[email protected]@drsql.org Website http://drsql.org
Slide 39
drsql.org Demo As Much Code Review As We Have Time For!