© 2012 Microsoft SQL AND NOSQL ARE TWO SIDES OF THE SAME COIN Michael Rys, Microsoft Corp....
-
Upload
florence-shields -
Category
Documents
-
view
217 -
download
1
Transcript of © 2012 Microsoft SQL AND NOSQL ARE TWO SIDES OF THE SAME COIN Michael Rys, Microsoft Corp....
© 2012 Microsoft
SQL AND NOSQL ARE TWO SIDES OF THE SAME COIN
Michael Rys, Microsoft Corp.@SQLServerMike
Strata 2012 Conference, March 2012
AGENDA
• Scaling out your business is important!• NoSQL Paradigms and NoSQL Platforms• SQL learns from NoSQL (with a demo of SQL Azure Federations)
• NoSQL learns from SQL• Scalable Data Processing Platform of the Future
Attract Individual Consumers:- Provide
interesting service
- Provide mobility
- Provide social
Monetize Individual:- Upsell service
- VIP- Speed- Extra
Capabilities
Monetize the Social:- Improve individual
experience- Re-sell Aggregate
Data (e.g., Advertisers)
THE WEB 2.0 BUSINESS ARCHITECTURE
SOCIAL NETWORKING: THE BUSINESS PROBLEM• 100s of million of users
• 10s of million of users concurrently
• Terabytes to petabytes of data• Structured and unstructured
• Required (eventual) data consistency across users• E.g. show your updated state in your friends’ profile pages
SOLUTION• Shard/Partition user data across
hundreds to thousands of SQL Databases• Propagate data changes from one DB to
other DBs using reliable, async Message Service• Managing routes from each DB to every other
DB would be too complex• Global Transactions would hinder scale and
availability
• Provide a caching layer for performance• And also used for
oClean-up state (e.g. on account close)
oDeploy business logic (stored procedures)
EXAMPLE ARCHITECTURE
1-1000
3001-4000
2001-3000 1001-
2000
4001-5000
5001-6000
I change my status
userId=1024
Web TierData Tier
ServiceDispatch
er
AsyncMessage
AsyncMessage
TX2
My DBgets updatedTX
1AsyncMessage
TX3
TX4
TX5
MANY LARGE SCALE CUSTOMERS USING SIMILAR PATTERNS• Patterns
• Sharding and reliable messaging• Sharding and fan/out query layer• Caching layer
• Customer Examples• Social Networking: Facebook, MySpace, etc• Online electronic stores (cannot give names )• Travel reservation systems (e.g. Choice International)• MSN Casual Gaming• etc.
LESSONS LEARNED FROM THESE SCENARIOS• Require high availability• Be able to scale out:
• Functional and Data Partitioning Architecture• Provide scale-out processing:
o Function shippingo Fanout and Map/Reduce processing
• Be able to deal with failures:o Quorumo Retrieso Eventual Consistency (similar to Read-consistent Snapshot Isolation)
• Be able to quickly grow and change:• Elastic scale• Flexible, open schema• Multi-version schema support
Move better support for these patterns into the Data Platform!
WHAT IS NOSQL ABOUT?• NoSQL = operational and developer agility at low CapEx and OpEx!
• Low Cost• Free Open Source Stores• Scale CapEx cost below customer growth rate• Web friendly developer model and tool chain
• Processing Paradigms• High Availability (scalable Replication, Fast Failover, DR/GeoDR, tunable latency)• Scale-out (Sharding, Map-Reduce, Elasticity)• Performance (tuned for specific workloads, Caching, co-located compute with partitioned state)• Tunable/Eventual Consistency
• Data Model Paradigms• Data first: Flexible Schema• Low-impedance mismatch between programming and data model:
o Key-Documents and Objects (BLOBS, JSON, XML, POJO)o Key-Wide Sparse Column Setso Graphs (e.g., RDF)
• Range from devices, over OLTP Web 2.0 applications to BigData Analytics
DATA MODELSData Model Example Stores (apologies to the ones I did not
list)
Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure Caching
Wide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse columns
BLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store, SQL Server RBS/FileTable
JSON Documents MongoDB, CouchBase, Riak, RavenDB
Graph Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension
Objects and XML Documents
Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC HiveDB, SQL Server/Azure, Oracle, IBM DB2
Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL Server/Azure
WHAT CAN SQL LEARN FROM NOSQL?
• Low CapEx, Low OpEx
• Built-in tunable High-Availability
• Data scale-out (Sharding)
• Processing scale-out (Map-Reduce, Fan-Out, tunable consistency)
• Flexible Data Models• JSON (& XML) support
• Sparse columns/Column sets
• Integrate with BigData Analytics (e.g., Hadoop)
Many Relational Database Systems are incorporating these learning!
EXAMPLE: SQL AZURE FEDERATIONS
• Provides Data Partitioning/Sharding at the Data Platform• Enables applications to build elastic scale-out applications• Provides non-blocking SPLIT/DROP for shards (MERGE to
come later)• Auto-connect to right shard based on sharding keyvalue• Provides SPLIT resilient query mode
SQL AZURE FEDERATION CONCEPTS
16
Federation “Orders_Fed”
ShardedApplication
Azure DB with Federation Root
Federation Directories, Federation Users, Federation Distributions, …
Federation- Represents the data being sharded
Federation Root- Database that logically houses federations,
contains federation meta data Federation Key
- Value that determines the routing of a piece of data (defines a Federation Distribution)
Atomic Unit- All rows with the same federation key value:
always together! Federation Member (aka Shard)
- A physical container for a set of federated tables for a specific key range and reference tables
Federated Table- Table that contains only atomic units for the
member’s key range Reference Table
- Non-sharded table
Member: PK [min, 100)
Member: PK [100, 488)
Member: PK [488, max)
(Federation Key: CustomerID)
AUPK=
5
AUPK=25
AUPK=35
AUPK=105
AUPK=235
AUPK=365
AUPK=555
AUPK=254
5
AUPK=356
5C
on
nectio
n
Gate
way
17
DEMOMAP-REDUCE SCALE-OUT OVER SQL AZURE FEDERATIONS
• Sharded GamesInfo table using SQL Azure
Federations
• Use a C# library that does implement a
Map/Reduce processor on top SQL Azure
Federations
• Mapper and Reducer are specified using SQL
WHAT CAN NOSQL LEARN FROM SQL?
• Flexible data is good, but:• Provide optional schema in data platform to help with constraints and optimizations
• Procedural Scale-Out processing is good, but:• Develop a declarative language suited for and across the data models (e.g., coSQL)
• Standardize suitable abstractions and languages
• Eventual Consistency is good, but:• Provide users the choice
• Simple Queries are good, but:• Provide me with secondary indexes
• it will be more efficient to join between two collections of JSON documents in the query engine than in the Application layer
Many NoSQL Database Systems are starting to incorporate these learnings!
Attract Individual Consumers:- Provide
interesting service
- Provide mobility
- Provide social
Monetize Individual:- Upsell service
- VIP- Speed- Extra
Capabilities
Monetize the Social:- Improve individual
experience- Re-sell Aggregate
Data (e.g., Advertisers)
THE WEB 2.0 BUSINESS ARCHITECTURE
OLTP Workloads
Highly AvailableHigh ScaleHigh Flexibility
mostly touching 1 to low number of shards
Dynamic OLAP Workloads
3Vs (Volume, Velocity, Variety) Exploratory
Scale-out queries, often using eventual consistent scale-out frameworks like Hadoop
SCALE-OUT DATA PLATFORM ARCHITECTURE
Traditional OLAP Workloadsknown schemaData warehouse, “Star joins”
Copy
Query
SQL or NoSQL Store
22
CALL TO ACTION
• Familiarize yourself with the NoSQL genes in the Microsoft Online Platform• Free 3-Month Trial for Windows and SQL Azure: http://www.windowsazure.com
• Engage with us throughout Strata
• Download slides with additional information and related resources: http://.../....
Presentation Speaker Date and TimeDo We Have the Tools We Need to Navigate the New World of Data?
Dave Campbell 2/29 9:00am PST
Onsite Interview *Tim O’Reilly, Dave
Campbell2/29 10:15am PST
Unleash Insights on All Data With Microsoft Big Data
Alexander Stojanovic 2/29 11:30am PSTOffice Hours (Q&A session) Dave Campbell 2/29 1:30pm PSTHadoop + Javascript: What We Learned
Asad Khan 2/29 2:20pm PSTDemocratizing BI at Microsoft: 40,000 Users and Counting
Kirkland Barrett 3/1 10:40am PSTData Marketplaces For Your Extended Enterprise
Piyush Lumba 3/1 2:20pm PST
RELATED RESOURCES• Scale-Out with SQL Databases
• http://gigaom.com/cloud/facebook-shares-some-secrets-on-making-mysql-scale / • Windows Gaming Experience Case Study: http://
www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000008310 • Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql• http://www.slideshare.net/MichaelRys/scaling-with-sql-server-and-sql-azure-federations
• NoSQL and the Windows Azure Platform• Whitepaper:
http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE-6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf
• SQL Federation blog: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-sql-azure-federations.aspx
• Contact me• @SQLServerMike• http://sqlblog.com/blogs/michael_rys/default.aspx