Citrix XenDesktop: Dealing with Failure - SYN408
-
Upload
tom-gamull -
Category
Technology
-
view
266 -
download
0
Transcript of Citrix XenDesktop: Dealing with Failure - SYN408
SYN408: XenDesktop 7.6 Architecture:
dealing with failure
Tom Gamull – Ericsson Consulting ManagerCitrix Synergy – May 2015
@magicalyak
Prevent Failures to begin with•Failures are bad events
•Today’s technology should be bulletproof
• Is 99.999% uptime the new normal?
Our thinking is brokenCustomer: “I can’t get to my
desktop”
Support/Admin: “The desktops aren’t working because storage
failed”
CIO/Boss: “We need to ensure storage never fails”
Solution• Upgrade/Redundant SAN
• Somehow believe replication can occur without penalty (sales guy promised)
• Storage stays up!
Netflix Chaos Monkey
2010: Netflix moves to AWS2011: US-East Outage - Netflix posts lessons learned
The best way to avoid failure is to fail constantly
Since 2013: Chaos Monkey is run in Production except holidays and weekends
Before you buy more stuff – try this• How do you respond to events today?
• How long to identity them?• How long to solve them?
• Mean time before Failure is legacy• Focus on Mean time to Resolution or Cycle time
MTBF
VS
MTTR
Before you buy more stuff – try this
• How are you rolling out Citrix or changes?• AUTOMATE!!!
• RULE: If you do it twice, it should be automated• Focus on reducing Cycle time
• time(what is wrong) + time(how to fix it) + time(implement fix) = cycle time
• Immutable Servers• Servers are rebuilt from scratch for changes
Survive Failure - Architecture
• Does Citrix still work if:• Your storage fails (SAN, Local,
whatever)?• Your database fails?• NetScaler fails?
• What can your users handle?• Most can handle getting logged
off if they can log in again• Most can NOT handle
• Applications hangs• Print failures• Can’t log in or connect Source: theoatmeal.com
User Profiles and Folders
• Redirect Folders as much as possible• This is where data that people use live (My Docs, Downloads, etc).
• Profiles• Profiles should be as light as possible• Can you use mandatory profile settings?
• Replicate profiles across 2 data centers• Profiles are not going to work on DFS-R without corruption (except one-way)• Active/Passive only (not active/active)• Split users so some are active for one data center, passive for the other
• Use cloud storage• Hack OneDrive for My Docs - https://office365drivemap.codeplex.com/
Storage / DB
• Use redundancy in the software, not hardware
• PVS fails over on the fly (not for CIFS/SMB though!)
• Local disk with PVS is better than an expensive SAN (and likely performs better, esp if you have SSD local)
Local Disk on ServerWhiptail_61 Whiptail_62
Mirror Aware Databases:
Standalone Databases:
Primary DatabaseAPS-DCXA1SQL01
Mirror DatabaseAPS-DCXA2SQL02
Witness(no Database)
APS-DCXDCSQL03
PVS HA/DR Components
SQLDatabase
(highly available)
PVS Server
PVS Server
Vdisk Store
Vdisk Store
DHCP – can be split on 2008 R2/2012
TFTP can be load balanced with a hardware load
balancer
2 DifferentLocations
Mirror – storage resilientCluster – server resilient
Network
• Multiple Sites = Netscaler GSLB• Active/Passive is easiest to setup
• All components should be load balanced if possible• Even TFTP, double up on every component
• No NetScaler stags in Production• HA/Failover Pair
• They share the VIP but have separate IP info (so the VIP floats)• 1 NS + Hypervisor != Pair
NS LB
Zone US-East1
Zone US-West1
NS LB
NS LB
VIP
BLUE/GREEN
LB
App v1.0
App v1.0
App v1.1
App v1.1
Db v1.0
Db v1.1
Limiting Downtime• Like active/passive
Don’t use DNS for this• can’t trust TTL
When to use• ANY database/schema upgrade• Restore from backup is too large/long
• Like active/active but with a purpose• Canary in the coal mine
• See if someone screams!• Live to production
• Limiting Risk• Back up your data• All nodes use production database• Route new connections to new nodes
CANARY
LB
App v1.0
App v1.0
App v1.1
Db v1.0
External Firewall
Internal Firewall
2 MPX 11500
External Users
Internal Users
24,000 Zero
Clients
School Districts
Printers
CitrixPVS
XA1 SCVMMXA2 SCVMMXDC SCVMMAPPVPublishAPPVReport
SQL Mirror
Profiles
User Data
2 Delivery Controllers
2 Provisioning Servers
License Servers
AppV Cluster
SCVMM Server
Storefront
2008 R2 Desktops2008 R2
Applications
2 Delivery Controllers
2 Provisioning Servers
SCVMM Server
2008 R2 Desktops2008 R2
Applications
2 Delivery Controllers
2 Provisioning Servers
SCVMM Server
Windows 7 Desktops
Atlanta Public Schools Citrix Delivery Overview
Architect: Thomas GamullCompany: PresidioDate: 3/17/2014
File Server
Print Servers
CLL Data Center - 8,000 Concurrent Desktops for Students
XENAPP1
APS-DCXA1HOST01 APS-DCXA1HOST02
APS-DCXA1 Management Cluster
vSwitch
vSS-iSCSI-BvSS-PVS-XAPP1-B : 10.90.68.0/23 – VLAN 68vSS-XAPP1-A : 10.90.72.0/23 – VLAN 72vSS-Servers-A
APS-DCXA1PVS01 APS-DCXA1SF01 APS-DCXA1DDC01 APS-DCXA1VMM01 APS-DCXA1WDM01APS-DCXA1SQL01 APS-DCXA1APPV01
PVS02
SF02
DDC02
Rack LayoutNetScaler NetScaler
Top of Rack Switch Top of Rack Switch
Compute Blades
Compute Blades
Compute Blades
Compute Blades
Compute Blades
Compute Rack-MountLocal Disk Storage
Compute Rack-MountLocal Disk Storage
Compute Rack-MountLocal Disk StorageCompute Blades
iSCSI/FC Storage iSCSI/FC StorageStorage is always in pairs if needed• Prefer multiple smaller arrays over monolithic SAN• Let app/software do the work
Network redundancy is important• Load balancers can remove switch dependencies• Leverage common NIC cabling
Server choice can vary• Blades are dense but lack local disk• Rack Mounts are often very flexible
• Without automation you will have scaling problems
“Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte.” – Blaise Pascal, Provincial Letters:
Letter XVI, 1657
English Translation: “If I had more time, I would have written a shorter
letter.”