Survey of the Microsoft Azure Data Landscape
Transcript of Survey of the Microsoft Azure Data Landscape
![Page 1: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/1.jpg)
Survey of the Microsoft Azure
Data Landscape
Ike Ellis, Partner, Crafting Bytes
![Page 2: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/2.jpg)
2
Please silence cell phones
![Page 3: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/3.jpg)
3
Explore Everything PASS Has to Offer
FREE ONLINE WEBINAR EVENTS FREE 1-DAY LOCAL TRAINING EVENTS
LOCAL USER GROUPS AROUND THE WORLD
ONLINE SPECIAL INTEREST USER GROUPS
BUSINESS ANALYTICS TRAINING
VOLUNTEERING OPPORTUNITIES
PASS COMMUNITY NEWSLETTER
BA INSIGHTS NEWSLETTER FREE ONLINE RESOURCES
![Page 4: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/4.jpg)
Agenda• Azure Blob Storage• Azure Table Storage• Azure DocumentDB• Azure SQL Database• Azure SQL in a VM• Azure SQL Data Warehouse• Azure Data Lake• Lots of other things supported: • Postgres, MySQL, MongoDB, Redis
4
![Page 5: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/5.jpg)
Topic Agenda• What is it?• How is it used?• What are the competitors?• DEMO!
5
![Page 6: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/6.jpg)
6
Azure Blob Storage
![Page 7: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/7.jpg)
7
Azure Blob Storage• Blobs are files (PDFs, JPGs, DOCs, etc)• Highly durable, massively scalable• More than 40 trillion stored objects• 3.5+ Million requests/second• Exposed via REST APIs• Use them in .NET, C++, Java, Node.JS, Android…• AzCopy, PowerShell
![Page 8: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/8.jpg)
8
Blob Storage Fault Tolerance & Scalability
![Page 9: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/9.jpg)
9
What kind of blobs can I have?• Share files with clients• off-load static content from web servers (invoices, contracts,
resumes)• Azure Websites – Platform as a Service – no files on a webserver• SQL BAK Files• VM Hard Drives
![Page 10: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/10.jpg)
10
Competitors?• On premise SANS and arrays• Amazon S3 Blob Storage
![Page 11: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/11.jpg)
Azure Blob StorageDemo
![Page 12: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/12.jpg)
12
Azure Table Storage• Much of it similar to Azure Blob Storage• Same scalability & redundancy• Affordable price• Very, very fast• NoSQL key value pair solution• Quick data retrieval, little configuration
![Page 13: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/13.jpg)
13
![Page 14: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/14.jpg)
14
![Page 15: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/15.jpg)
15
Competitors• Amazon DynamoDB Table Storage
![Page 16: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/16.jpg)
Azure Table StorageDemo
![Page 17: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/17.jpg)
JSON DocumentStandard for passing data between a server and a web applicationReplacement for XMLHierarchicalTerseSimple data types
![Page 18: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/18.jpg)
Modeling in DocumentDB{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "[email protected]"}, {"phone": "+1 555 555-5555", "extension": 5555} ] }
Reading is one operation
Writing is one operation
No assembly de-assembly
![Page 19: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/19.jpg)
Query Playground
http://www.documentdb.com/sql/demo
![Page 20: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/20.jpg)
20
• MongoDB• Amazon DynamoDB
Competitors
![Page 21: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/21.jpg)
22
![Page 22: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/22.jpg)
Azure SQL Database
• Platform as a Service• All data is backed up for you• Point in time restore• Can be geo-redundant• Scalable both in performance and in data size• Up to 1TB• Not feature complete with SQL Server in a VM
![Page 23: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/23.jpg)
24
Database Replicas
Replica 1
Replica 2
Replica 3
DB
Single Logical Database
Multiple Physical Replicas
Single Primary
![Page 24: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/24.jpg)
25
https://azure.microsoft.com/en-us/documentation/articles/sql-database-transact-sql-information/
Azure SQL Database Unsupported Features
![Page 25: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/25.jpg)
26
You can also make it scale up!
![Page 26: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/26.jpg)
27
Amazon RDS
Competitors
![Page 27: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/27.jpg)
Azure SQL DatabaseDemo
![Page 28: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/28.jpg)
29
• You manage backups• You create fault tolerant options• You manage disk space• You manage patching• You don’t manage hardware failure• You don’t manage purchasing hardware• You don’t manage networking infrastructure
Azure SQL Server in a VM
![Page 29: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/29.jpg)
30
• Use Premium Storage.• Use a VM size of DS3 or higher for SQL Enterprise edition
and DS2 or higher for SQL Standard edition.• Use a minimum of 2 P30 disks (1 for log files; 1 for data files
and TempDB).• Keep the storage accountand SQL Server VM in the same
region.• Disable Azure geo-redundant storage (geo-replication) on
the storage account.• Avoid using operating system or temporary disks for
database storage or logging.
Performance Considerations
![Page 30: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/30.jpg)
31
• Back up to Azure Blob Storage• Use Always on Availability Groups and Windows
Failover Clustering Services (WFCS) for fault tolerance
• Can use mirroring or log shipping, too• Can also mix in on-premise
Backups & Fault Tolerance
![Page 31: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/31.jpg)
32
Amazon EC2 – VMs in the cloud
Competitors
![Page 32: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/32.jpg)
33
• Elastic Massively Parallel Processing System• Use T-SQL to query across relational and non-
relational data• Up to petabyte volumes of data• Scale compute separately from data• When paused, you only pay for storage• Deploys in seconds
Azure SQL Data Warehouse
![Page 33: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/33.jpg)
34
• Supports 32 concurrent queries• Used for fanning out queries over multiple
machines for processing/aggregation/analytics• Performance becomes far more predictable than
with just straight SQL Server• Not used in OLTP environments
Azure SQL Data Warehouse
![Page 34: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/34.jpg)
35
• A unit of scale that determines how much hardware will give great performance
• Done in increments of 100 (mostly)• How many DTUs?• Start Small• Monitor • Change as needed, it’s instant
What is a DTU (Data Warehouse Unit)?
ALTER DATABASE MySQLDW MODIFY (SERVICE_OBJECTIVE = 'DW1000') ;
![Page 35: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/35.jpg)
36
Two choices:• Distribute data based on hashing values from a
single column• Good if clusters of tables will be joined and are related
• Distribute data evenly but randomly• Fail-safe method
Partitioning Data
![Page 36: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/36.jpg)
37
![Page 37: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/37.jpg)
38
Non-supported data types•geometry, use a varbinary type•geography, use a varbinary type•hierarchyid, CLR type not native•image, text, ntext when text based use varchar/nvarchar (smaller the better)•nvarchar(max), use varchar(4000) or smaller for better performance•numeric, use decimal•sql_variant, split column into several strongly typed columns•sysname, use nvarchar(128)•table, convert to temporary tables•timestamp, re-work code to use datetime2 and CURRENT_TIMESTAMP function. •varchar(max), use varchar(8000) or smaller for better performance•uniqueidentifier, use varbinary(8)•user defined types, convert back to their native types where possible•xml, use a varchar(8000) or smaller for better performance - split across columns if needed
![Page 38: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/38.jpg)
39
• primary keys• foreign keys• check constraints• unique constraints• unique indexes• computed columns• sparse columns• user-defined types• indexed views• identities• sequences• triggers• synonyms
Unsupported Features
![Page 39: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/39.jpg)
40
Amazon RedShift
Competitors
![Page 40: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/40.jpg)
Azure SQL Data WarehouseDemo
![Page 41: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/41.jpg)
42
HDFS for the cloudCan use tools like Spark, Storm, Flume, Sqoop, Kafka, etc.No fixed limits on account size or file size
Azure Data Lake
![Page 42: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/42.jpg)
43
• An enterprise wide repository of every type of data collected in a single place
• Prior to any formal definition of requirements or schema. Allows every type of data to be kept without discrimination Organizations can then use Hadoop or advanced analytics to find patterns of the data.
• Serve as a repository for lower cost data preparation prior to moving curated data into a data warehouse.
What is a generic data lake?
![Page 43: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/43.jpg)
44
• Azure Data Lake Store – Built on HDFS• Azure Data Lake Analytics – Built on Yarn.
Introduces U-SQL
Products
![Page 44: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/44.jpg)
45
A lot of Hadoop implementations, but nothing really quite like it
Competitors
![Page 45: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/45.jpg)
46
• MongoDB• PostGres• Redis• MySQL• Oracle
More data options….
![Page 46: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/46.jpg)
47
Session Evaluations
ways to access
Go to passSummit.com
Download the GuideBook App and search: PASS Summit 2015
Follow the QR code link displayed on session signage throughout the conference venue and in the program guide
Submit by 5pmFriday November 6th toWIN prizes
Your feedback is important and valuable.
![Page 47: Survey of the Microsoft Azure Data Landscape](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f0e9551a28ab305c8b45f7/html5/thumbnails/47.jpg)
Ike EllisCrafting Bytes• Small San Diego Software Studio• Modern web, mobile, Azure, SQL Server• Looking for future teammates!Book: Developing Azure SolutionsPodcast Guest: Talk Python to Me – Dec 2015
.NET Rocks – Sept 2015• www.craftingbytes.com• blog.ikeellis.com• www.ikeellis.com• SDTIG – www.sdtig.com
Ike Ellis, MVP@[email protected]