SQL Server Warehousing (Fast Track 4.0 & PDW) Stephen Strong, Matthew Winter DBI331.

46
SQL Server Warehousing (Fast Track 4.0 & PDW) Stephen Strong, Matthew Winter DBI331

Transcript of SQL Server Warehousing (Fast Track 4.0 & PDW) Stephen Strong, Matthew Winter DBI331.

SQL Server Warehousing (Fast Track 4.0 & PDW)Stephen Strong, Matthew Winter

DBI331

Agenda

Fast Track Data Warehouse SQL Server 2012The ProblemBalanced ArchitecturePerformance GainsLessons Learnt

SQL Server Parallel Data WarehouseWhat is Parallel Data WarehousePDW Architecture & ComponentsAU3.0 EnhancementsRoadmap

Q & A

The Problem

OLTP – Tune your application (DW less control)Business Case - BudgetTime to value

Solution

Step 1: Buy new shiny SAN arrayStep 2: Buy big serverStep 3: Install SQL Server 2012Step 4: Bulk load 30TB DW with SSISStep 5: Grant access

Users

30TB

SOLUTION COMPLETE

Why that “solution” doesn’t work with DW

Infrastructure CostLarge scans of huge fact tablesAudience Survey – SAN throughput

DATA WAREHOUSING OFFERINGSWHERE FAST TRACK FITS IN

Self Build Business Data

Warehouse

Fast Track Reference

Architectures

Parallel Data Warehouse Half Rack

Parallel Data Warehouse Full Rack

Effort to Build Variable Very low Moderate Very low Very low

Capacity Variable 5TB 14TB – 95TB 80TB 500TB +

Concurrency Variable Light Light – high Very high Very high

Query Complexity

Variable Medium Medium – high Very high Very high

Architecture SMP SMP SMP MPP MPP

Form factor Software only

Appliance Reference Architecture

Appliance Appliance

Fast Track DW Reference Architectures

Software:• SQL Server 2012 Enterprise• Windows Server 2008 R2

Hardware System Design:• Tight specifications for

servers, storage and networking

• Resource Balanced & Validated

• Latest generation Servers & Storage

Database Configuration:• Workload Specific• Database Architecture• SQL Server settings• Windows Server settings• Performance Guidance

Workload specific Software & System designs that help Customers

accelerate their SQL Server 2012 projects.

Key benefits

Still Windows Server & SQL Server for admins & mgmtCan select from a range of HW vendorsEngineered by expert from HW vendor and MicrosoftPre-tested for predictable resultsFaster implementationGreat performance/price ratioLow SQL Server license count

Under the covers – the “secret” sauce

Sequential ReadsParallel reads across multiple disk setsBalanced architectureDifficult to achieve in large server / share SAN environmentDatabase layout matches hardware layoutMatching throughput for each componentLUNs mapped to Storage Processors, network paths, CPUs

Server

Windows Server 2008 R2

SQL Server 2012

HBA

Fib

er

Sw

itch

HBA

Storage Enclosure

Storage Processor 3GB/sec

Storage Processor 3GB/sec

RAID10

RAID10

RAID10

RAID10

Storage Enclosure

Storage Processor 3GB/sec

Storage Processor 3GB/sec

RAID10

RAID10

RAID10

RAID10

Storage Enclosure

Storage Processor 3GB/sec

Storage Processor 3GB/sec

RAID10

RAID10

RAID10

RAID10

CPU

CPU

Storage Enclosure

Storage Processor 3GB/sec

Storage Processor 3GB/sec

RAID10

RAID10

RAID10

RAID10

HBA

HBA

400MB/sec each

1.6GB/secper port

1.6GB/sec each6.4GB/sec total

6.4GB/sec totalPCIe 3.0

2 socket x8 core16-cores total5.8GB/sec total

Performance Enhancements over Fast Track 3.0

Take DL380 FT 3.0 hardware – 20TB solutionSQL Server 2008 R2 – 2.4GB/sec benchmark queriesSQL Server 2012 – 5GB/sec with ColumnStore IndexesNo hardware changesNo Solid State Drives10x – 100x query improvement with ColumnStore

SQL Server Configuration

Trace FlagsDB file placementTempdb: 8-12 files maxIndex light – optimise for sequential scan throughputCapacity is based on Page and ColumnStore compressionSupport for AlwaysOn Failover Clustering

Fast Track Integration

Fast TrackSQL Server

2012

ETL(SSIS)

Source Systems

Source Systems

Source Systems

Cubes(SSAS)

Excel

SSRS

SP2010

PowerView

3rd Party

Lessons Learnt

Use SQLIO for validationAim for consistent MB/sec above published #sTake care with DB file placementOn 30TB+ configuration pay attention to SP-LUN mappingsSimulate FT LUN layout in Dev environmentTake advantage of ColumnStore for extra performanceSet MAXDOP to number of physical coresUse Resource Governor to limit memory/query to 10-15%

Agenda

Fast Track Data Warehouse SQL Server 2012The ProblemBalanced ArchitecturePerformance GainsLessons Learnt

SQL Server Parallel Data WarehouseWhat is Parallel Data WarehousePDW Architecture & ComponentsAU3.0 EnhancementsRoadmap

Q & A

What is Parallel Data WarehouseScalable• Scales from 10’s to 100’s of TB of Data• Implemented as Massively Parallel Processing (MPP)

System

Standards Based• Leverages Commodity Hardware• Speaks SQL Server Language (T-SQL)

Flexible• Offers Hardware of Choice (HP/Dell)• Supports Multiple Hardware Architectures

Cost Effective• Low Price/TB Ratio

SQL Server vs. PDWSQL Server• Symmetric Multiprocessing

(SMP) System• ‘Shared Everything’

Architecture• High Contention for System

Resources (e.g. I/O, CPU)• Extremely High Concurrency

for Some Workloads• Limited Scalability (up to

95TB)• Requires Building High

Availability• Rich Functionality for OLTP

and Data Warehousing

Parallel Data Warehouse• Massively Parallel Processing (MPP)

System• ‘Shared Nothing’ Architecture• Balanced Resource Utilization (e.g.

I/O, CPU, Memory)• Higher Concurrency for Very

Complex Workloads• Excellent Scalability (up to 1PB)• Built-in High Availability and

Redundancy• Tuned for Data Warehousing

Control Rack Data Rack

PDW ArchitectureCompute Nodes Storage Nodes

Spare Compute Node

Du

al

Fib

er

Ch

an

nel

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

Du

al

Infi

nib

an

d

Control Nodes

Active / Passive

Landing Zone

Backup Node

SQL

Management Servers

Client Drivers

ETL Load Interface

Corporate Backup Solution

Support / Patching

SQL

SQL

Control Node

Client Drivers SQL

Control Node and Client DriversClient Connections Always Go Through The Control Node

Contains No Persistent User Data

PDW “Secret Sauce”• Processes SQL requests• Prepares execution plan• Orchestrates distributed execution

Local SQL Server• Final Query Plan Processing• Result Aggregation

Drivers • TDS Protocol (SQL Native 10)• Wire Protocol (SeQuel Link)• ODBC, OLE-DB, ADO.NET and JDBC Client Drivers for both 32-bit and 64-bit

Management Node

Support / Patching

Management Node

Runs the Windows Domain Controller (Active Directory)

Used for deploying patches to all nodes in the appliance

Holds images in case a node needs reimaging

Landing Zone

ETL Load Interface

Landing ZoneProvides High-Capacity Storage for Data Files from ETL processes

Integration Services available on the Landing Zone

Connected to Internal Network

Available as Sandbox for other Applications and Scripts that Run on Internal Network

Source Landing Zone Files Data Loader Compute

Nodes

DWLOADERor

SSIS

Backup Node

Corp. Backup Solution

Backup Node

Coordinated Backup across Nodes

Database-Level Backup• Full or Differential• Metadata Backup

Can Restore to a Larger Appliance

Optional Item – 1 Size per Configuration• Up to 654TB of Capacity• Orderable in Different Capacities

“PDW Node”

Storage nodeCompute Node

Compute Node

Each MPP Node is a Highly Tuned SMP Node with Standard Interfaces

Dedicated Hardware, Database, and Storage

Running SQL Server 2008 Enterprise Edition

SQL as Primary Interface

Compute Node

Enterprise-ClassDBMS

TempDBWorkspace

Dual Multi-CoreProcessors

DUAL 4Gb FC Dual InfiniBand

CPU

CPU

RAM

Compute Node Hardware

Pre-Configured for Each SQL Server Instance on Each Compute Node.• Drives Configured as RAID 1 to Avoid Appliance Failover for a Single-Drive Failure• Configured with 3 LUNs (i.e. 3 RAID 1 Pairs)

TempDB Used for the Following Purposes:• Sort-Work Area for Data Loading into Clustered Index Tables• Spill Area for Hash Joins Not Fitting into Memory• Temporary PDW Tables

Storage Node

Dual 8 GB FiberStorage Processor

Dual 8 GB FiberStorage Processor

Data/Log Drives (RAID 1) - LFF

Hot Spare

Dual Fiber Channel Controllers• Active/Active• Provides Fault Tolerance

Storage Node – Drive Array (User Databases)

LUN 9

LUN 9

LUN 9

LUN 9

LUN 9

LUN1 LUN2 LUN3 LUN4 LUN5 LUN6 LUN7 LUN8

Log

User

Data

base

Small Form Factor Drives (SFF) (24) – 8 RAID 1 groups containing a single LUNLarge Form Factor Drives (LFF) (11) – 4 RAID 1 groups containing 2 LUNs each

LUN 9

LUN1 LUN2 LUN3 LUN4 LUN5 LUN6 LUN7 LUN8

Hot Spare

Hot Spare(SFF)

SFF – 6 disk in a RAID 1 + 0 groupLFF – 2 disk in a RAID 1 group

SFF - 2 Hot SparesLFF – 1 Hot Spare

Storage Node – Physical File Layout

LUN1 LUN2 LUN3 LUN4 LUN5 LUN6 LUN7 LUN8

User

Data

base

UDB Replicated FG

UDBRepl.ndf

FG UDB_Dist_A

UDBDist.ndf

UDBRepl.ndf UDBRepl.ndf UDBRepl.ndf UDBRepl.ndf UDBRepl.ndf UDBRepl.ndf UDBRepl.ndf

FG UDB_Dist_B

UDBDist.ndf

FG UDB_Dist_C

UDBDist.ndf

FG UDB_Dist_D

UDBDist.ndf

FG UDB_Dist_E

UDBDist.ndf

FG UDB_Dist_F

UDBDist.ndf

FG UDB_Dist_G

UDBDist.ndf

FG UDB_Dist_H

UDBDist.ndf

8 Distribution Filegroups

1 Replicated Filegroup with 8 Files

1 Log Filegroup

LUN 9

LUN 9

LUN 9

LUN 9

LUN 9

Log

LUN 9

PDW Benefits – Massive Parallel Processing

Control Rack Data Rack

Compute Nodes Storage Nodes

Spare Compute Node

Du

al

Fib

er

Ch

an

nel

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

Du

al

Infi

nib

an

d

Control Nodes

Active / Passive

Landing Zone

Backup Node

SQL

Management Node

SQL

SQL

Query 1

Query 1 is Submitted to SQL Server via the Control Node.

?????????

Query is Executed on All 10 Nodes.

Results are Sent Back to the Client.

?

Customer SuccessesData Volume • 80 TB data warehouse analyzing data from

exchanges• Existing system based on SQL SMP farm • 2 different clusters of 6 servers each

Requirement • Linear scalability with additional hardware• Support hourly loads with SSIS – 300GB/day• BI Integration: SSRS, SSAS and PowerPivot

AU3 Feedback• SP and increased T-SQL support was great• Migrating SMP SSRS to PDW was painless• 142x for scan heavy queries & no summary tables• Enabled queries that do not run on existing

system

Reports

Dashboards

Scorecards

Portal

ETL

PDWOperationa

l DB’s

CUSTOMER EXAMPLE:Stock Exchange in the US

Customer SuccessesData Volume • Currently 11 TB data warehouse analyzing data from

transactional and clickstream sources (2 years data)• Business need to expand to 7 year data window, i.e. ~39

TB.

Requirement • Scalability - growing data volume does not affect

performance• Performance and ad-hoc analysis for interactive

querying by users• BI Integration with Microsoft BI stack - SSAS and SSRS

AU3 Feedback• SSAS cubes worked ‘out-of-box’• Performance an order of magnitude faster than existing

system (~30x on an expanded data set)

CUSTOMER EXAMPLE:Major Retailer in the

US

PDW

Nielsen OLTP system

SSAS

Click-Stream

Demo

Matthew WinterArchitectPDW & Appliance CoE

Parallel Data WarehouseLoading DataBreaking the Marketing Numbers

What’s New in Appliance Update 3.0 (AU3)

Best-in-class Manageability

10X Performance

More BI Tools,Any Language

Low Entry Cost

New in AU3: Breakthrough Performance

Support More Users with High Concurrency

Breakthrough Performance

Faster Query Performance Drives Greater Productivity

Faster Throughput for Mixed Workloads

Improvement in Data Movement in the appliance

6X

10X

Reduced cost of Upgrading from Scale Up SQL Server to MPP

New in PDW AU3: Best-in-Class Manageability

Best-in-Class Manageability

Reduced operational cost through unified management of appliances and other SQL instances

PDW Integration with

Cost

Order From Your Preferred Vendor

New in PDW AU3: Low Entry Cost

Low Entry Cost

New Lower Cost Half Rack Solution

HP

DELL

Use PDW on Data in Any Language

New in PDW AU3: More BI Tools, Any Language

More BI Tools,Any Language

Greater Support for Non-Microsoft BI & ETL Tools

International Data Support

SASHadoopInformaticaMicrostrategySAP Business Objects

In Review

Session Objectives • Scaling Microsoft SQL Server to manage the trend to analyse and report

on increasing data volumes.

Key Takeaways• The Fast Track Reference Architecture provides a cost-effective

approach to accelerate your data warehouse strategy. Providing a guide to building a reliable, scalable solution that will help you reduce your risk and save time.

• Parallel Data Warehouse is a preconfigured appliance based upon the proven SQL Server database technology. PDW is capable of scaling out as your data volumes grow, reducing the maintenance overhead to a minimum and provides the raw performance an Enterprise Data Warehouse requires.

Q & A

Data Warehousing with SQL Server (Fast Track & PDW)Stephen Strong, Matthew Winter

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS

PRESENTATION.