Drsql.org How In-Memory Affects Database Design Louis Davidson Certified Nerd 1.

download Drsql.org How In-Memory Affects Database Design Louis Davidson Certified Nerd 1.

If you can't read please download the document

Transcript of Drsql.org How In-Memory Affects Database Design Louis Davidson Certified Nerd 1.

  • Slide 1
  • drsql.org How In-Memory Affects Database Design Louis Davidson Certified Nerd 1
  • Slide 2
  • drsql.org 2 Who am I? Been in IT for over 19 years Microsoft MVP For 10 Years Corporate Data Architect Written five books on database design Ok, so they were all versions of the same book. They at least had slightly different titles each time Basically: I love Database Design 2
  • Slide 3
  • drsql.org Questions are Welcome Please limit questions to ones I know the answer to. 3
  • Slide 4
  • drsql.org Attention: There Is Homework (lots of it) I cant teach you everything about In-Memory in 1 hour The code will be available, but it is still very rudimentary It will get you started, but is only just the tip of the iceberg Do lots of thinkin and testin before divin in 4
  • Slide 5
  • drsql.org Introduction: What exactly is In-Memory OLTP in SQL Server 2014? A totally new, revamped engine for data storage, co-located in the same database with the existing engine Obviously Enterprise Only Purpose built for certain scenarios Terminology can be confusing Existing tables: Home - On-Disk, but ideally cached In-Memory In-Memory tables: Home - In-Memory: but backed up by On-Disk Structures If you have enough RAM, On-Disk tables are also in memory But the implementation is very very different In-Memory is both very easy, and very difficult to use 5
  • Slide 6
  • drsql.org Design Basics (And no, I am not stalling for time due to lack of material) Designing and Coding is Like the Chicken and the Egg Design is what you do before coding Coding patterns can greatly affect design Engine implementation can greatly affect design and coding patterns We will discuss how In-Memory technologies affect the entire design/development lifecycle As if Children I was first Relics 6
  • Slide 7
  • drsql.org Design Basics - Separate your design mind into three phases 1.Logical (Overall data requirements in a data model format) 2.Physical Implementation Choice (Indexes, Physical Structures, etc) 3.Physical (Relational Code) Before the engine choice I always suggested 3 before 2 We will look at each of these phases and how in-mem may affect your design of each output 7
  • Slide 8
  • drsql.org Logical Design (Though Not Everyones Is) This is the easiest part of the presentation You still need to model Entities and Attributes Uniqueness Conditions General Predicates As I see it, nothing changes 8
  • Slide 9
  • drsql.org Logical Data Model 9
  • Slide 10
  • drsql.org SQL Server.exe Data Filegroup TDS Handler and Session Management Physical Implementation Overview Buffer Pool for Tables & Indexes Proc/Plan cache for ad- hoc T-SQL and SPs Client App Transaction Log Interpreter for TSQL, query plans, expressions Access Methods Parser, Catalog, Algebrizer, Optimizer 10-30x more efficient Reduced log bandwidth & contention. Log latency remains Memory-optimized Table Filegroup Engine for Memory_optimized Tables & Indexes Natively Compiled SPs and Schema Hekaton Compiler Query Interop Checkpoints are background sequential IO No improvements in communication stack, parameter passing, result set generation Hekaton Component Key Existing SQL Component Generated.dll
  • Slide 11
  • drsql.org Physical Implementation (Or DBA stuff that I only slightly care about) Everything is different, and I am not here to cover these details In-Mem data structures coexist in the database alongside On- Disk ones Data is housed in RAM, and backed up in Delta Files and Transaction Logs Delta files are stored as filestream storage The transaction log is the same one as you are used to Tables and Indexes are extremely coupled MVCC (Multi-Valued Concurrency Control) used for all isolation 11
  • Slide 12
  • drsql.org Physical Design (No, lets not get physical) Your physical design will almost certainly need to be affected So much changes, even just changing the table structure In this section, we will discuss: Creating storage objects Table Creation Index Creation (which is technically part of the table creation) Altering a Tables Structure Accessing (Modifying/Creating) data Using Normal T-SQL (Interop) Using Compiled Code (Native) Using a Hybrid Approach No Locks, No Latches, No Waiting 12
  • Slide 13
  • drsql.org Creating Storage Objects - Tables The syntax is the same as on-disk, with a few additional settings You have a durability choices In-Mem Table: Schema_Only or Schema_and_Data Database level for transactions: Delayed (also for on-disk tables) Basically Asynchronous Log Writes Aaron Bertrand has a great article on this here: http://sqlperformance.com/2014/04/io-subsystem/delayed-durability-in-sql- server-2014http://sqlperformance.com/2014/04/io-subsystem/delayed-durability-in-sql- server-2014 You also have less to work with... Rowsize limited to 8060 bytes (Enforced at Create Time) Not all datatypes allowed (LOB types,CLR,sql_variant, datetimeoffset, rowversion) No check constraints No foreign keys Limited unique constraints (just one unique index per table) Every durable (Schema_and_Data) table must have a primary key Note: There are memory optimized temporary tables too: See Kendra Littles article here: http://www.brentozar.com/archive/2014/04/table-variables-good-temp-tables-sql-2014/ http://www.brentozar.com/archive/2014/04/table-variables-good-temp-tables-sql-2014/ 13
  • Slide 14
  • drsql.org Dealing with Un-Supported Datatypes Say you have a table with 10 columns, but 1 is not allowed in a In-Memory table First: Ask yourself if the table really fits the criteria we arent done covering Second: If so, consider vertically partitioning CREATE TABLE In_Mem (KeyValue, Column1, Column2, Column3) CREATE TABLE On_Disk (KeyValue, Column4) It is likely that uses of disallowed types wouldnt be good for the OLTP aspects of the table in any case. 14
  • Slide 15
  • drsql.org Creating Storage Objects - Index creation Syntax is inline with CREATE TABLE Indexes are linked directly to the table 8 indexes max per table due to internals Only one unique index allowed Indexes are never persisted, but are rebuilt on restart String index columns must be a binary collation (case AND access sensitive) Two types Hash Ideal for single row lookups Fixed size, you choose the number of hash buckets (approx 1-2 * # of unique values http://msdn.microsoft.com/en-us/library/dn494956.aspx) http://msdn.microsoft.com/en-us/library/dn494956.aspx Bw Tree Best for range searches Very similar to a BTree index as you (hopefully) know it, but optimized for MVCC and pointer connection to table 15
  • Slide 16
  • drsql.org A Taste of the Physical Structures A table with two hash indexes From Kalens Whitepaper: http://t.co/T6zToWc6y6http://t.co/T6zToWc6y6 16
  • Slide 17
  • drsql.org Do you want to know more? For more in-depth coverage check Kalen Delaney's white paper... http://t.co/T6zToWc6y6http://t.co/T6zToWc6y6 Or for an even deeper (nerdier?) versions: Hekaton: SQL Servers Memory-Optimized OLTP Engine http://research.microsoft.com/apps/pubs/default.aspx?id=193594 or The Bw-Tree: A B-tree for New Hardware Platforms (http://research.microsoft.com/pubs/178758/bw-tree-icde2013- final.pdf) http://research.microsoft.com/apps/pubs/default.aspx?id=193594http://research.microsoft.com/pubs/178758/bw-tree-icde2013- final.pdf Books Online: http://technet.microsoft.com/en- us/library/dn133186.aspxhttp://technet.microsoft.com/en- us/library/dn133186.aspx 17
  • Slide 18
  • drsql.org Creating Storage Objects - Altering a Table The is the second easiest slide in the deck No alterations allowed - Strictly Drop and Recreate You can rename a table, which makes this at east easier ALTER 18
  • Slide 19
  • drsql.org DEMO - CREATING TABLES 19
  • Slide 20
  • drsql.org Accessing the Data - Using Normal T-SQL (Interop) Using typical interpreted T-SQL Most T-SQL will work with no change (you may need to add isolation level hints) A few Exceptions TRUNCATE TABLE - This one is really annoying :) MERGE (In-Mem table cannot be the target) Cross Database Transactions (other than tempdb) Locking Hints 20
  • Slide 21
  • drsql.org Accessing the Data using Compiled Code (Native) Instead of being interpreted, the stored procedure is compiled to machine code Limited syntax (Like programming with both hands tied behind your back) Allowed syntax is listed in what is available, not what isn't http://msdn.microsoft.com/en-us/library/dn452279.aspx http://msdn.microsoft.com/en-us/library/dn452279.aspx Some really extremely annoying ones: SUBSTRING supported; LEFT, RIGHT, not so much No Subqueries OR, NOT, IN, not supported in WHERE clause Cant use on-disk objects (tables, sequences, views, etc) So you may have to write some "interesting" code 21
  • Slide 22
  • drsql.org Accessing Data Using a Hybrid Approach Native code is very fast but very limited Use Native code where it makes sense, and not where it doesnt Example: Creating a sequential value In the demo code I used RAND() to create CustomerNumbers and SalesOrderNumbers. Using a SEQUENCE is far more straightforward So I made one Interpreted procedure that uses the SEQUENCE outside of native code, then calls the native procedure 22
  • Slide 23
  • drsql.org Accessing the Data - No Locks, No Latches, No Waiting On-Disk Structures use Latches and Locks to implement isolation In-Mem use Optimistic-MVCC You have 3 Isolation Levels: SNAPSHOT, REPEATABLE READ, SERIALIZABLE Evaluated before, or when the transaction is committed This makes data integrity checking "interesting" Essential difference, your code now must handle errors 23
  • Slide 24
  • drsql.org Concurrency is the #1 difference you will deal with Scenario1: 2 Connections - Update Every Row In 1 Million Rows Any Isolation Level On-Disk Either: 1 connection blocks the other Or: Deadlock In-Mem One connection will fail, saying: the row you are trying to update has been updated since this transaction started EVEN if it never commits. 24
  • Slide 25
  • drsql.org Another slide on Concurrency (Because if I had presented it concurrently with the other one, you wouldnt have liked that) Scenario2: 1 Connection Updates All Rows, Another Reads All Rows (In an explicit transaction) On-Disk Either: 1 connection blocks the other Or: Deadlock In-Mem Both Queries Execute Immediately In SNAPSHOT ISOLATION the reader will always succeed In REPEATABLE READ or SERIALIZABLE Commits transaction BEFORE updater commits: Success Commits transaction AFTER updater commits: Fails 25
  • Slide 26
  • drsql.org The Difficulty of Data Integrity With on-disk structures, we used constraints for most issues (Uniqueness, Foreign Key, Simple Predicates) With in-memory code, we have to implement in stored procedure Uniqueness on > 1 column set suffers from timing (If N connections are inserting the same data...MVCC will let them) Foreign Key can't reliably be done because: In Snapshot Isolation Level, the row may have been deleted while you check In Higher Levels, the transaction will fail if the row has been updated Check constraint style work can be done in stored procedures for the most part. 26
  • Slide 27
  • drsql.org Problem: How to Implement Uniqueness on > 1 Column Set: INDEXED VIEW? CREATE VIEW Customers.Customers$UniquenessEnforcement WITH SCHEMABINDING AS SELECT customerId, emailAddress, customerNumber FROM customers.Customer GO CREATE UNIQUE CLUSTERED INDEX emailAddress ON Customers.Customers$UniquenessEnforcement (emailAddress) GO Msg 10794, Level 16, State 12, Line 8 The operation 'CREATE INDEX' is not supported with memory optimized tables. 27
  • Slide 28
  • drsql.org Problem: How to Implement Uniqueness on > 1 Column Set: Multiple Tables? Wow, that seems messy And what about duplicate customerId values in the two subordinate tables? 28
  • Slide 29
  • drsql.org Problem: How to Implement Uniqueness on > 1 Column Set: Simple code You cantexactly. But what if EVERY caller has to go through the following block: DECLARE @CustomerId INT SELECT @CustomerId = CustomerId FROM Customers.Customer WHERE EmailAddress = @EmailAddress IF @customerId is null Do your insert This will stop MOST duplication, but not all. Two inserters can check at the same time, and with no blocks, app locks, or constraints even available, you may get duplicates. Remember the term: Optimistic Concurrency Control 29
  • Slide 30
  • drsql.org When Should You Make Tables In-Memory - Microsoft's Advice From http://msdn.microsoft.com/en-us/library/dn133186.aspxhttp://msdn.microsoft.com/en-us/library/dn133186.aspx Implementation Scenario Benefits of In-Memory OLTP High data insertion rate from multiple concurrent connections. Primarily append-only store. Unable to keep up with the insert workload. Eliminate contention. Reduce logging. Read performance and scale with periodic batch inserts and updates. High performance read operations, especially when each server request has multiple read operations to perform. Unable to meet scale-up requirements. Eliminate contention when new data arrives. Lower latency data retrieval. Minimize code execution time. Intensive business logic processing in the database server. Insert, update, and delete workload. Intensive computation inside stored procedures. Read and write contention. Eliminate contention. Minimize code execution time for reduced latency and improved throughput. Low latency. Require low latency business transactions which typical database solutions cannot achieve. Eliminate contention. Minimize code execution time. Low latency code execution. Efficient data retrieval. Session state management. Frequent insert, update and point lookups. High scale load from numerous stateless web servers. Eliminate contention. Efficient data retrieval. Optional IO reduction or removal, when using non-durable tables 30
  • Slide 31
  • drsql.org When Should You Make Tables In-Memory Louis's Advice More or less the same as Microsoft's really (duh!) Things to factor in High concurrency needs/Low chance of collisions Minimal uniqueness protection requirements Minimal data integrity concerns (minimal key update/deletes) Limited searching of data (binary comparisons only) Limited need for transaction isolation/Short transactions Basically, the hot tables in a strict OLTP workloads... 31
  • Slide 32
  • drsql.org The Choices I made Louis has improved his methods for estimating performance, but your mileage will still vary. Louis tests are designed to reflect only one certain usage conditions and user behavior, but several factors may affect your mileage significantly: How & Where You Put Your Logs Computer Condition & Maintenance CPU Variations Programmer Coding Variations Hard Disk Break In Therefore, Louis performance ratings are a minimally useful tool for comparing the performance of different strategies but may not accurately predict the average performance you will get. I seriously suggest you test the heck out of the technologies yourself using my code, your code, and anyone elses code you can to make sure you are getting the best performance possible.
  • Slide 33
  • drsql.org Model Choices Logical Model 33
  • Slide 34
  • drsql.org Model Choices Physical Model 34
  • Slide 35
  • drsql.org Model Choices Tables to Make In-Mem (First Try) 35
  • Slide 36
  • drsql.org Model Choices Tables to Make In-Mem (Final)
  • Slide 37
  • drsql.org The Grand Illusion (So you think your life is complete confusion) Performance gains are not exactly what you may expect, even when they are massive In my examples (which you have seen), I discovered when loading 20000 rows (10 connections of 2000 each) (Captured using Adam Machanic's http://www.datamanipulation.net/SQLQueryStress/ tool)http://www.datamanipulation.net/SQLQueryStress/ A.On-Disk Tables with FK, Instead Of Trigger - 0.0472 seconds per row - Total Time 1:12 B.On-Disk Tables withOUT FK, Instead Of Trigger - 0.0271 seconds per row - Total Time 0:51 C.In-Mem Tables using Interop code - 0.0202 seconds per row - Total Time 0:44 D.In-Mem Tables with Native Code - 0.0050 second per row - Total Time 0:31 E.In-Mem Tables, Native Code, SCHEMA_ONLY 0.0003 seconds per row - Total Time 00:30 F.In-Mem Tables (not CustomerAddress), Hybrid code 0.0163 Total Time 0:55 But should it be a lot better? Don't forget the overhead... (And SQLQueryStress has extra for gathering stats)
  • Slide 38
  • drsql.org 38 Contact info Louis Davidson - [email protected]@drsql.org Website http://drsql.org
  • Slide 39
  • drsql.org Demo As Much Code Review As We Have Time For!