Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

33

description

Attention, Session en Anglais. Attention Session en 2 parties. Ceci est la première partie. Cette session sera animée par Scott Schnoll, Senior Content Developer chez Microsoft Corp et veritable Gourou Exchange. La messagerie est un élément ultra critique du système d'information : Elle ne DOIT PAS tomber. Pour cela, Exchange 2013 intègre les toutes dernières technologies en terme de tolérance de panne et de haute disponibilité. Scott Schnoll vous expliquera la mécanique de l'intérieur ! Cette session vous donne accès à l'état de l'art sur Exchange. C'est LA session à suivre pour découvrir la mécanique de haute disponibilité d'Exchange 2013. Speaker : Scott Schnoll (Microsoft)

Transcript of Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

Page 1: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)
Page 2: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

Infrastructure, communication & collaboration

Exchange Server 2013

High Availability and Site Resilience (1/2)

Scott SchnollSenior Content Developer

Microsoft Corporation

[email protected]://aka.ms/Schnoll

Twitter: @Schnoll

Page 3: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Database Availability Group Internals• Witness Server

Agenda – Part 1

Page 4: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Dynamic quorum• DAG member maintenance

Agenda – Part 2 (16:30-17:15, salle 253)

Page 5: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

Infrastructure, communication & collaboration

#mstechdays

DATABASE AVAILABILITY GROUP INTERNALS

Page 6: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Microsoft Exchange Replication service

• Active Manager Client component• Microsoft DAG Management service• Cluster service and components• Windows Crimson Channel

DAG Internals

Page 7: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Introduced in Exchange 2007 RTM– Microsoft Exchange Replication service |

MSExchangeRepl– MSExchangeRepl.exe– Required on all Mailbox servers (not just DAG

members)– Communicates with Active Directory and other

DAG members

DAG Replication Service

Page 8: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Active Directory lookup• Copy status lookup• Replay core manager• Seed manager• Autoreseed manager• Disk reclaimer manager• Replay RPC server

wrapper• Remote data provider

wrapper

• Active Manager• Active Manager RPC

server wrapper• Failure item manager• TPR API manager• Support API manager• Server locator manager• Health state tracker• VSS Writer

DAG Replication Service• Includes 16 components

Page 9: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Runs inside client access and transport services– Microsoft Exchange RPC Client Access -

MSExchangeRPC– Microsoft Exchange Transport –

MSExchangeTransport– Microsoft Exchange Frontend Transport –

MSExchangeFrontendTransport– Client Access Front End (CAFÉ) components

Active Manager Client Component

Page 10: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• When connecting clients or routing messages, client access and transport services query Active Directory and Active Manager to find out location of the active copy of a mailbox– DAG members with Standby Active Manager

(SAM) role respond to these queries

Active Manager Client Component

Page 11: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Introduced in Exchange 2013 CU2– Microsoft Exchange DAG Management service | MSExchangeDagMgmt

– MSExchangeDagMgmt.exe– Runs on all Mailbox servers (not just DAG members)– Communicates with Active Directory and other DAG

members

• Includes 4 components– Active Directory lookup– Copy status lookup– Monitoring– Tracer instance

DAG Management Service

Page 12: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Created for two primary reasons:– so the Replication service can have more focused

functionality– so Managed Availability actions can kill lower-priority

activities

• Logs events in same place as Replication service

• Other functions will move to this service– AutoReseed, Disk reclaimer, Dynamic replay lag playdown– Future AutoDAG copy layout and mobility features

DAG Management Service

Page 13: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Introduced in NT Server Enterprise Edition (1997)– Cluster Service | ClusSvc– Clussvc.exe

• Exchange DAGs use several cluster components– Membership and node management– Networks and heartbeating– Quorum– Cluster registry

Cluster service

Page 14: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Quorum is required in order to mount databases• Quorum is based on votes, not membership• Voting can be rigged

– Votes can be taken away manually or dynamically

• Exchange manages quorum model, not quorum– Exchange management of quorum model based on nodes, not

votes– Removing votes requires manual configuration of quorum

model– Exchange will make incorrect quorum model management

decisions if votes are manually removed at the cluster level

Cluster service

Page 15: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Active Manager stores database / server information in the cluster registry for DAG members– Registry changes are replicated immediately to all DAG

members

• Stored information is used as part of BCSS

Cluster registry

Page 16: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• ActiveServer– Name of server where database is currently mounted

or expected to be mounted when mount operation completes

• LastMountServer– Name of server where database was last successfully

mounted

• LastMountedTime– Date and time stamp of the last time database was

mounted

Cluster registry

Page 17: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• MountStatus– Current mount status for database (mounted /

dismounted)

• IsAdminDismounted– Designates whether current dismounted status is

the result of administrator action (true / false)

• IsAutomaticActionsAllowed– Designates whether the database can be

automatically activated (true / false)

Cluster registry

Page 18: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Applications and Services logs– Area of Windows Server event log used by applications for logging and

internal communication– These logs store events from a single application or component rather

than events that might have system-wide impact– This is referred to as an application's crimson channel

• Exchange 2013 has a crimson channel with multiple areas– ActiveMonitoring– HighAvailability– MailboxDatabaseFailureItems– ManagedAvailability– PushNotifications– Troubleshooters

Crimson Channel

Page 19: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

Page 20: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

Infrastructure, communication & collaboration

#mstechdays

WITNESS SERVER AND WITNESS SERVER PLACEMENT

Page 21: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• A server that participates in a failover cluster with an even number of members– Is not a member of the cluster/DAG– Does not contain a copy of quorum data

• File share on this server is represented by File Share Witness resource in cluster core resource group– Uses IsAlive check for availability

Witness Server

Page 22: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• File Share Witness Resource Behavior– If server or share are not available, cluster resources are

failed and moved to another node– If FSW resource does not come back online, it remains in

a Failed state, with restart attempts every 60 minutes– If witness server needed for quorum, and resource

cannot be brought online, quorum will be lost• Single restart attempt for FSW resource in Failed state• No restart attempt for FSW resource in Offline state

Witness Server

Page 23: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• When witness server is needed to maintain quorum, one of the nodes locks the witness.log on witness server– Node that locks witness.log file is called the locking

node– If enough nodes are in contact with the locking

node to constitute a majority, quorum is maintained– Nodes that can’t communicate with locking node

lose quorum

Witness Server

Page 24: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Attempts to lock witness.log file occur in a specific order– Node that owns cluster core resource group

tries immediately– Nodes not owning cluster core resources wait 6

seconds before trying

Witness Server

Page 25: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

Witness Server• Cluster Core Resources• Sequence #: 20

• Sequence #: 20

Cluster state change – node owning cluster core resources locks FSW – updates sequence number

• Cluster Core Resources• Sequence #: 21

• Lock witness.log• Sequence #: 21

Challenging node attempts witness lock. Lock already exists – sequence # higher, challenge not successful.

All nodes available. FSW lock released. Changes replicated, sequence numbers in sync.

• Sequence #: 22

• Cluster Core Resources• Sequence #: 22

0 1 5432 6 7 111098 12 13 161514

Page 26: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

Witness Server• Cluster Core Resources• Sequence #: 20

Cluster state change – node owning cluster core resources unavailable.

• Cluster Core Resources• Sequence #: 21

• Lock witness.log• Sequence #: 21

Challenging node attempts witness lock. No lock exists, lock successful, sequence number updated.

All nodes available. FSW lock released. Changes replicated, sequence numbers in sync.

• Sequence #: 22

• Cluster Core Resources• Sequence #: 22• Sequence #: 20

0 1 5432 6 7 111098 12 13 161514

Page 27: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Exchange 2010 guidance– “We recommend that you use a Hub Transport server

running on Exchange Server 2010 in the Active Directory site containing the DAG. This allows the witness server/directory to remain under the control and visibility of an Exchange administrator.”

– “If your DAG is extended to multiple datacenters, we recommend deploying the witness server in the datacenter considered to be the primary datacenter.”

Witness Server Placement

Page 28: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Exchange 2013 guidance more complicated due to options introduced by architectural changes

• Exchange 2013 includes support for new DAG configuration options– A third location, such as a third physical datacenter or

branch office

• Ultimately, the placement of a DAG’s witness server depends on your business requirements and the options available to you

Witness Server Placement

Page 29: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

Witness Server PlacementDeployment scenario Placement RecommendationSingle DAG deployed in one datacenter

Locate witness server in the same datacenter as DAG members

Single DAG deployed across two datacenters; no additional locations available

Locate witness server in primary datacenter

Multiple DAGs deployed in one datacenter

Locate witness server in the same datacenter as DAG members. Additional options include:• Using the same witness server for multiple DAGs• Using a DAG member to act as a witness server for a

different DAG

Multiple DAGs deployed across two datacenters

Locate witness server in the same datacenter as DAG members. Additional options include:• Using the same witness server for multiple DAGs• Using a DAG member to act as a witness server for a

different DAG

Single or Multiple DAGs deployed across more than two datacenters

Locate the witness server in the datacenter where you want the majority of quorum votes to exist

Page 30: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• If your organization has a 3rd location, a witness server can be deployed there for automatic database failover between two other sites– The witness server location must have network

infrastructure and connectivity that is isolated from network failures that affect the two datacenters with DAG members

• For all DAGs, the availability of the witness server should be on the Exchange administrator’s radar

Witness Server Placement

Page 31: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• IaaS providers and cloud providers are not supported for use as a witness server– This includes Windows Azure, which does not

yet support the required underlying network configuration to allow an Azure file server VM to act as a witness server in a multi-datacenter deployment

– More info at http://aka.ms/DAGAzure

Witness Server Placement

Page 32: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

#mstechdays Infrastructure, communication & collaboration

• Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 2/2 première partie) – 12/02/14 - 16:30-17:15, salle 253

• Exchange 2013 Dimensionnement et Performance – 12/02/14 – 17:45-18:30, salle 252B

Related Content

Page 33: Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 première partie)

Infrastructure, communication & collaboration

#mstechdays

QUESTIONS?

Thank You!