04-DocumentDB-SQL

Post on 23-Dec-2015

47 views 2 download

Tags:

description

04-DocumentDB-SQL

Transcript of 04-DocumentDB-SQL

Dezvoltareaaplicațiilor de tip CloudSTORE DATA IN AZURE

s.l. dr. ing. Daniel Iercan

DocumentDB

What is DocumentDB?

schema free

+

non-trivial queries

+

transactional processing

What is DocumentDB?•NoSQL document database

•JavaScript and JSON

•Rich query (SQL-like) and transactions overschema-free JSON data• JSON documents are indexed automatically

• can be queried

What is DocumentDB?•Reliable and configurable performance• SSD storage• Tune and trade-off consistency• strong• bounded-stateless• session• eventual

•Data automatically replicated

•RESTful API

What is DocumentDB?•Fully managed by Azure

•Elastically scale throughput and storage (database units)

•Open by design (JavaScript, JSON, RESTfulHTTP)

Democonfigure DocumentDB in Azure portal

DocumentDB resources

Database account and administrative quota•100 databases

•500.000 users

•2.000.000 permissions

Elastic collections•One database can contain any number of

collections (limited by number of capacity units bought)

•Transaction domain for documents

•Scope for document storage and query execution

What is a Capacity Unit (CU)•a way to buy resources (CPU, RAM, IO, storage)

•1CU = 3 elastic collections, 10GB of SSD, 2000 request units

What is a request unit (RU)• request unit = CPU + memory + IO

•measured as rate per second

What is a request unit (RU)document size 1KB, 10 properties, session consistency, all documents indexed

DATABASE OPERATIONSNUMBER OF OPERATIONS PER SECOND PER CU

Reading a single document 2000

Inserting/Replacing/Deleting a single document

500

Query a collection with a simple predicate and returning a single document

1000

Stored Procedure with 50 document inserts

20

Developing Against Azure DocumentDB•RESTful API

•.NET (LINQ provider)

•Node.js

•JavaScript

•Python

•Support for CRUD operation and SQL syntax

DEMOOverview of .Net Client

DocumentDBresource model

QueryingSQL like syntax (sub-set of ANSI SQL)

User Defined Functions (JavaScript) can be used in queries

Transactions and JavaScript executions•SPs

•Triggers

•UDF

• JavaScript replaces T-SQL

• JavaScript logic executes in ACID transactions (snapshot isolation)

•Entire transaction is aborted in case of JavaScirptexception

Demo- create SP

- create trigger

- create UDF

http://azure.microsoft.com/en-gb/documentation/articles/documentdb-resources/

Documents• JSON objects

• Free schema

• Stored in collections

• Can be inserted, replaced, deleted, read, enumerated and query

Attachments and Media• binary blobs/media

• can be stored in DocumentDb or externally:

• special JSON document that captures the metadata of the media stored in a remote media store

• attachments stored in DocumentDB have _media property to point to the resource URI

• attachments in DocumentDB are GC automatically

• for media stored externally developer has to manage it

Users• Logical names for grouping permissions

• Implement multi-tenancy (one user for each actual application user)

• Shard data:• each user maps to database

• each user maps to a collection

• all documents for a user stored in a collection

• documents from different user stored in various collections

Permissions• administrative resources vs. application resources

• master key vs. resource key• master key – access to everything

• resource key – granular access to specific resources

Optimistic concurrencyETag and If-Match header attributes

DocumentDBtuning performance

Configuring Indexing Policy of a Collection• Choose whether the collection automatically indexes all of

the documents or not

• Choose whether to include or exclude specific paths or patterns in your documents from the index

• Choose between synchronous (consistent) and asynchronous (lazy) index updates

Configure consistencytrade-offs between consistency, availability and latency

•Strong

•Bounded-stateless – total ordering of writes and maximum staleness

•Session – read your own writes

•Eventual

Referenceshttp://azure.microsoft.com/en-us/services/documentdb/

http://azure.microsoft.com/en-us/documentation/services/documentdb/

http://azure.microsoft.com/en-us/documentation/articles/documentdb-introduction/

http://azure.microsoft.com/en-gb/documentation/articles/documentdb-resources/

http://azure.microsoft.com/en-gb/documentation/articles/documentdb-interactions-with-resources/

SQL in the Cloud

Need for scalabilityAs data grow performance degrade

SQL Partitioning (horizontal scalling)• Master/Slave: one (master) SQL server for write operations (CRUD),

and one ore more SQL server for read operations• master can be a bottleneck

• replication is near-real-time

• master single point of failure

• Cluster Computing: multiple server that act as nodes and uses a centralized shared disk facility• all nodes can be used for read, only one for write, if the node that does the

write fails another one takes its place (shared disk can be a bottle neck, write does not scale)

• advanced clustering uses real-time memory replication so that all nodes can do writes (network traffic between node can be a bottle neck + shared disk)

SQL Partitioning (horizontal scalling)• Table Partitioning• data in single large tables can be split across multiple disks to improve I/O,

partition can be done both horizontally (by rows) as well as vertically (by columns), issues with join operations

• Federated Tables• tables can be access across multiple servers (complex to administrate, good

for reporting but not for general read/write transactions), federation key is very important

SQL Partitioning (horizontal scalling)• Sharding (Shared-Nothing)• Independent servers (CPU, memory and disk)

• Smaller databases are: easier to manage and maintain, faster, and reduce costs

• Challenges: reliability, distributed queries, avoidance of cross-shard joins, auto-increment key management, support for multiple shard schema (session-based, transaction-based, statement-based) , determine optimum method for sharding data (by primary key, by modulus of a key, maintain a master shard index table)

Ways to use SQL in the Cloud

• SQL as a service

• dedicated SQL VM

SQL Azure• Old – federation

• New - sharding

Multi-tenancy• a single instance of the software runs on a server, serving

multiple tenants

• has to ensure data separation

SQL Azure Database is

Get started quickly

Ready to get started?

Provision Your ServerServer defined

Service head that contains databases

Connect via automatically generated FQDN (xxx.database.windows.net)

Initially contains only a master database

Provision servers interactively

Log on to Windows Azure Management Portal

Create a SQL Azure server

Specify admin login credentials

Add firewall rules and enable service access

Automate server provisioning

Use Windows Azure Platform PowerShell cmdlets (or use REST API directly)

wappowershell.codeplex.com

Build Your DatabaseUse familiar technologies

Supports Transact-SQL

Supports popular languages

.NET Framework (C#, Visual Basic, F#) via ADO.NET

C / C++ via ODBC

Java via Microsoft JDBC provider

PHP via Microsoft PHP provider

Supports popular frameworks

OData (REST data access)

Entity Framework

WCF Data Services

NHibernate

Supports popular tools

SQL Server Management Studio (2008 R2 and later)

SQL Server command-line utilities (SQLCMD, BCP)

CA Erwin® Data Modeler

Embarcadero Technologies DBArtisan®

Differences in comparison to SQL Server Focus on logical vs. physical administration

Database and log files automatically placed

Three high-availability replicas maintained for every database

Databases are fully contained

Tables require a clustered index

Maximum database size is 50 Gb

Unsupported SQL Server featuresBACKUP / RESTORE

USE command, linked servers, distributed transactions, distributed views, distributed queries, four-part names

Service Broker

Common Language Runtime (CLR)

SQL Agent

Database

Thin client database development

Rich client database development

Database

Data-tier Application Framework (DAC Fx)

How to get the latest DAC Fx

Database

Interactive approach for dacpac v1 and v2

Interactive approach for bacpac v2

Upgrading a dacpac or bacpac

Secure Your DatabaseServer identity and access control

SQL authentication supported

Integrated authentication not supported

Connect to master to administer logins and create / drop databases

The admin login (configured during service provisioning) is like sa

The admin login has full rights on the server (and all databases) and should only be used for administration

Manage logins with CREATE / ALTER / DROP LOGIN commands

Membership in the loginmanager server role grants CREATE / ALTER / DROP LOGIN priveleges

Membership in the dbmanager server role grants CREATE / DROP DATABASE privileges

Database identity and access controlLogins must have an associated user account to connect to a database

The admin login is automatically associated with a special user known as dbo (database owner)

The dbo has full rights in the database and should only be used for administration

Manage users with CREATE / ALTER / DROP USER commands

Add users to system or user-defined database roles to grant privileges via sp_add_rolemember

Organize database objects into schema containers based upon common access control requirements

Grant privileges to schema containers instead of individual objects for better productivity

Connect Your ApplicationConnecting to SQL Azure

TDS (Tabular Data Stream) protocol over TCP/IP supported

SSL required

Use firewall rules to connect from outside Microsoft data center

ASP.NET example:

Special considerations

Legacy tools and providers may require special format for login: [login]@[server]

Idle connections terminated after 30 minutes

Long running transactions terminated after 24 hours

DoS guard terminates suspect connections with no error message

Failover events terminate connections

Throttling may cause errors

Use connection pooling and implement retry logic to handle transient failures

Latency introduced for updates due to HA replicas

No cross-database dependencies, result sets from different databases must be combined in application tier

<connectionStrings><addname="AdventureWorks"connectionString=

"Data Source=[server].database.windows.net;Integrated Security=False;Initial Catalog=ProductsDb;User Id=[login];Password=[password];Encrypt=true;"

providerName="System.Data.SqlClient"/></connectionStrings>

DemoSQL db from laboratory enrol.

No-Sql vs. Sql

Performancecertain types of queries can be slow

for business application SQL most likely are better

for fetching few bits of information but high traffic and concurrency NO-SQL is better

Business Intelligencebest works with SQL

NO-SQL (wide-columns) works good with “BIG DATA”

Referenceshttp://codefutures.com/database-sharding/

http://azure.microsoft.com/en-us/documentation/articles/sql-database-elastic-scale-documentation-map/