Bringing your datacenter to the cloud, a different way of thinking
IR. KOENRAAD MEULEMANTECHNOLOGY EXPERT, TEAM LEAD VDI & CITRIX TECHNOLOGIES
Intro
• Despite the fact that ‘the cloud’ is slowly getting more mature, there is still a lot of ‘fog’ around the concept. This not the least because the flag is covering lots of cargos. Just think of the different possibilities of SaaS, PaaS or IaaS solutions.
• Today we focus on Infrastructure as a Service (IaaS). In a highly simplified model, we could say that this is "just another external datacenter '.
• We go deeper into why this is or is not a correct approach and how Microsoft Windows 2016 can be used to address a number of ‘inconveniences’.
• Doing so, we cover some real-life aspects we learned from the deployment of a large Citrix XenApp ‘server-based shared desktop’ project in Azure.
Content
• Part 1:Citrix XenApp ‘server-based shared desktop’ POC in Azure
• Part 2:Citrix XenApp ‘server-based shared desktop’ project in Azure – Iaas:Architecting a datacenter in Azure, a different way of thinking
Citrix XenApp ‘server-based shared desktop’ POC in Azure
• Current Situation:▶ +/- 3000 End-users with thin clients (peak)
- Day use (8 am – 6 pm) +/- 2000 internal users- Day use (7 am – 8 pm) +/- 800 external users- Night use +/- 100 external users
▶ Highly- Available Citrix XenApp v6.5 with +/- 150 XenApp Workers- 2 datacenters, fully redundant
▶ XenApp Workers provisioned with Citrix Provisioning Services (PVS)- Nightly XenApp Server reboot -> reprovisioning ( 2 am)
▶ Windows Server 2008 R2
Citrix XenApp ‘server-based shared desktop’ POC in Azure
• Desired new situation (February 2016):▶ Windows Server 2012 R2▶ XenApp v7.6 or higher▶ Rapid server reprovisioning▶ Cost reduction by using modern techniques▶ Easy management▶ High Availability
- Using Cloud (Azure) ‘datacenters’ (hybrid: as back-end is out of scope)- Using fail-over to the cloud- Using 2 on-prem datacenters
10 © 2016 Citrix | Confidential
Users
Resources
Access & Control
Hardware
• Manage users• Manage user groups• Manage entitlements
• Install & configure apps• Set policies• Patch & update
• Configure broker• Configure user storefront• Monitor & troubleshoot
• Purchase hardware• Rack & stack• Network & orchestration
On Premises
Cloud hosted
Citrix Workspace
Cloud
Citrix Partner
Citrix Service Provider
IT managed
Partner managed
Citrix XenDesktop Hybrid Flavours
11 © 2016 Citrix | Confidential
XenDesktop – On-Premises Cloud Hosted XenDesktop
Citrix Workspace Cloud Citrix Service Provider
Hardware Layer
Control Layer
Access LayerUser Layer Resource Layer
StoreFront
Delivery Controller
Remote PC Access
Windows apps
Windows desktops
Linux Desktops
SQL Database
SSL
Delivery GroupNetScaler Gateway
Director
Studio
Active Directory
License Server
Access & Control Hosts
Resource Hosts
Hardware Layer
Control Layer
Access LayerUser Layer Resource Layer
StoreFront
Delivery Controller
Remote PC Access
Windows apps
Windows desktops
Linux Desktops
SQL Database
SSL
Delivery GroupNetScaler Gateway
Director
Studio
Active Directory
License Server
Access & Control Hosts
Resource Hosts
Hardware Layer
Control Layer
Access LayerUser Layer Resource Layer
StoreFront
Delivery Controller
Remote PC Access
Windows apps
Windows desktops
Linux Desktops
SQL Database
SSL
Delivery GroupNetScaler Gateway
Director
Studio
Active Directory
License Server
Access & Control Hosts
Resource Hosts
Hardware Layer
Control Layer
Access LayerUser Layer Resource Layer
StoreFront
Delivery Controller
Remote PC Access
Windows apps
Windows desktops
Linux Desktops
SQL Database
SSL
Delivery GroupNetScaler Gateway
Director
Studio
Active Directory
License Server
Access & Control Hosts
Resource Hosts
Underlying hypervisors:Resource Hosts vs Access & Control Hosts
Citrix XenApp ‘server-based shared desktop’ POC in Azure
• POC goal: provide additional information to make founded decisions.
▶ Desired new situation (February 2016):- Windows Server 2012 R2- XenApp v7.6 or higher- Rapid server reprovisioning- Cost reduction by using modern techniques- Easy management- High Availability
▹ Using Cloud (Azure) ‘datacenters’ (hybrid: back-end out of scope)▹ Using fail-over to the cloud▹ Using 2 on-prem datacenters
Rapid server reprovisioning for POC in Azure
• Citrix Provisioning Services (PVS)▶ No option as it needs PXE-boot and Azure does not support that▶ Not supported on Azure by Citrix, even if we would be able to trick it somehow
• Citrix Machine Creation Services (MCS)▶ No support for Azure initially (February 2016)▶ Only support for ‘Classic’ Azure (Summer 2016)▶ Now (February 2017) support for Azure Resource Management (ARM)
- Missing user defined naming convention support- Missing user defined (security) roles support- Realdolmen as a member of the Citrix Partner Technical Expert Council works with the Citrix product team to include these features
• Powershell▶ In only a few weeks Realdolmen’s consultants wrote the needed scripts to
successfully conclude this part of the POC:- Automatically create Virtual Machines (VMs) from a ‘golden master’- “Domain Join” these virtual machines- Enable extensive logging and monitoring of the VMs from within Azure- Start VMs at a predefined time- Stop VMs from a predefined time onwards, but only when not in use- Other
High Availability: on-prem, cloud or hybrid ?
• Most important decision factor: cost calculation• Known on-prem cost
▶ After a fast on-prem POC (March 2016):- WS 2012R2- XenApp 7.8
• How to calculate on-Azure cost ?▶ Data transfer cost (in and) out of Azure
- Tests showed an average of 100kbps per XenApp user (-> fixed)
▶ Storage cost- Depending on the number of Virtual Hard Disks needed- Depending on the number of VMs needed- Relatively low influence in the total cost
▶ Compute cost (VMs)- Depending on the number of VMs needed- Depending on the time the VMs are allocated (active)
How to calculate the minimal compute cost ?
• Created different Azure VMs with different sizes
• Used Loadrunner (and a representative in-house created test scenario) to put load on the VMs
• Checked the VMs’ resource usage (memory, cpu, disk I/O)
• Determined two values per machine size:▶ U75: # of users when one of the resources is 75% depleted▶ Umax: Maximum number of users possible
How to calculate the minimal compute cost ?
• Calculated per machine size the “Cost per user per hour”
• We did the same tests for WS2016 and had almost the same numbers for U75 and UMAX !
Size Cores Memory Cost/h (€) U75 Umax Cost/U75/h Cost/Umax/hStandard D1 v2 1 3.5 0.1088 5 14 0.0218 0.0078Standard D2 v2 2 7 0.2176 22 38 0.0099 0.0057Standard D11 v2 2 14 0.253 29 39 0.0087 0.0065Standard D12 v2 4 28 0.506 49 50 0.0103 0.0101Standard D13 v2 8 56 0.9108 75 80 0.0121 0.0114Standard D14 v2 16 112 1.6394 125 140 0.0131 0.0117Standard D15 v2 20 140 2.0492 150 170 0.0137 0.0121
Size Cores Memory Cost/h (€) U75 Umax Cost/U75/h Cost/Umax/hStandard D1 v2 1 3.5 0.1088 5 15 0.0218 0.0073Standard D11 v2 2 14 0.253 30 42 0.0084 0.0060Standard D12 v2 4 28 0.506 38 50 0.0133 0.0101
How to calculate the minimal transfer cost ?
• Data Transfer Cost▶ # Users is Fixed -> total cost is fixed▶ ~1750€/month
Est. Outb. Trafic /session (kbps) 96Est. outb. trafic (TB) 21.93
Bandwidth req. (mbps) 237.09
Outbound Trafic price per GB € 0.0700 31
Outbound Traffic price € 1 606
How to calculate the minimal storage cost ?
• Storage Cost▶ Less VMs -> less disks -> less costs (€/U/h)▶ Largest VMs are expected to run about 150 users▶ We need 2 disks per server (OS disk & Programs disk), 3000 Users
- D11_V2: 30 Users/VM -> 100 VMs -> 200 Disks- D15_V2: 150 Users/VM -> 20 VMs -> 40 Disks
▶ -> Larger VMs are cheaper when it comes to storage
▶ But…- ‘Blue Screen Of Death’ Impact ?- The Shutdown Dilemma
How to calculate the minimal compute cost ?
• The Shutdown Dilemma▶ In the morning, we can turn on servers as we need them▶ In the evening, we can only turn off servers once they are ‘empty’▶ The Citrix XenApp Delivery Controller distributes the load on logon
- The server with the lowest load is chosen
▶ -> All servers remain on as long as there are more users than servers▶ -> Big servers (D15_V2, 150 users)
- will remain always on - are very expensive per hour (> 2€ /h)
▶ -> Very small servers would become cost effective
How to calculate the minimal compute cost ?
• The VDI Dilemma▶ How about 1 user per VM ?▶ Start the VM on logon, shutdown the VM after logoff▶ Very cost effective in terms of ‘pay what you use’▶ VDI Cost per user per hour ?
▶ Compared to Server Based Shared Desktop (SBSD)
▶ A VDI solution is about 10x more expensive than a SBSD solution▶ VDI is only interesting when the average user works less than 24/10 = 2.4 h/d ▶ This might (will) change in the future (2018 ?)
- Citrix & Microsoft joint effort: nested virtualization
Size Cores Memory Cost/h (€) U75 Cost/U75/hA0 Basic 1 0.75 0.0152 1 0.0152A1 Basic 1 1.75 0.0632 1 0.0632A2 Basic 2 3.5 0.1265 1 0.1265A0 Standard 1 0.75 0.0169 1 0.0169A1 Standard 1 1.75 0.0759 1 0.0759A2 Standard 2 3.5 0.1518 1 0.1518Standard D1 v2 1 3.5 0.1088 1 0.1088
Size Cores Memory Cost/h (€) U75 Umax Cost/U75/h Cost/Umax/hStandard D11 v2 2 14 0.253 29 39 0.0087 0.0065
How to calculate the minimal compute cost ?
• The Startup/Shutdown Dilemma (Part 2)▶ If relatively small machines (30 users per machine) are the optimal
choice in terms of cost, how do we minimize runtime ?▶ Based on the current Citrix License Usage, we have historic data on
the number of users per 15 minutes for the past year.
0
500
1000
1500
2000
2500
3000
License Usage for 1 week
How to calculate the minimal compute cost ?
• The Startup/Shutdown Dilemma (Part 2)• Working day
• Sun- & Holidays: < 300 users
0
500
1000
1500
2000
2500
3000
00:0
6:57
01:0
6:59
02:0
7:01
03:0
7:02
04:0
7:03
05:0
7:05
06:0
7:06
07:0
7:07
08:0
7:08
09:0
7:09
10:0
7:12
11:0
7:13
12:0
7:15
13:0
7:16
14:0
7:18
15:0
7:19
16:0
7:20
17:0
7:21
18:0
7:23
19:0
7:24
20:0
7:25
21:0
7:27
22:0
7:28
23:0
7:30
Licenses used for 1 day
How to calculate the minimal compute cost ?
▶ Excel emulations:- Dynamic Holiday Function: construction of a usage prediction function (2nd
degree, 3rd degree, 4th degree) based on the usage evolution over the previous period of 15min, 30min, 45min, 1h
- -> Failed due to overshoot in the morning
0
20
40
60
80
100
120
140
Servers needed
-500
0
500
1000
1500
2000
2500
3000
3500
Users forcasted
How to calculate the minimal compute cost ?
▶ Excel emulations:- Static trapped function based on historic data -> Failed due to seasonal
changes:▹ Year-end higher▹ Summertime lower▹ Others (changing Easter vacations, …)
How to calculate the minimal compute cost ?
▶ Excel emulations:- One-trap approach:
▹ 7:00 – 19:00 100 Servers (30 users each) running▹ 19:00 – 7:00 10 Servers running▹ One-trap approach is cheapest ! (~3750€/month for XenApp Workers)
0
20
40
60
80
100
120
0:00
2:00
4:00
6:00
8:00
10:0
012
:00
14:0
016
:00
18:0
020
:00
22:0
00:
002:
004:
006:
008:
0010
:00
12:0
014
:00
16:0
018
:00
20:0
022
:00
VMs startup schedule
Group 0 Group 1 Group2
How to calculate the minimal compute cost ?
▶ New evolution: Citrix Smart Scale- Supported for on-prem XenApp/XenDesktop virtual power management- For Azure: support to be announced ?
▹ Realdolmen PTEC provides input to Citrix
High Availability: on-prem, cloud or hybrid ? (recap)
• Most important decision factor: cost calculation• Known on-prem cost
▶ fast on-prem POC tests:- WS 2012R2- XenApp 7.8
• How to calculate on-Azure cost ?▶ Data transfer cost in and out of Azure
- Tests showed an average of 100kbps per XenApp user (-> fixed)- ~1750€/month
▶ Storage cost- Depending on the number of Virtual Hard Disks needed- Depending on the number of VMs needed- Relatively low influence in the total cost - <500€/month
▶ Compute cost (VMs)- Depending on the number of VMs needed- Depending on the time the VMs are allocated (active)- ~3750€/month
High Availability: on-prem, cloud or hybrid ? (cont)
• Most important decision factor: cost calculation• Known on-prem cost• Known (estimated) Azure cost• Estimated Azure cost is cheaper than on-prem cost !• Decision was taken: Citrix XenApp ‘server-based shared desktop’ project in Azure !
Part 2:
Citrix XenApp ‘server-based shared desktop’ project in Azure – IaaS:
Architecting a datacenter in Azure, a different way of thinking
Azure Availability: a deciding factor
• SLA for virtual Machines:▶ “For any Single Instance Virtual Machine using premium storage for all
disks, we guarantee you will have Virtual Machine Connectivity of at least 99.9%.”
▶ 99.9% availability = up to 45min/month of unplanned downtime without penalty
▶ Single Instance Virtual Machines NOT using (more expensive) premium storage are not in the SLA (have no guarantee?).
Azure Availability: a deciding factor
• SLA for virtual Machines:▶ “For all Virtual Machines that have two or more instances deployed in
the same Availability Set, we guarantee you will have Virtual Machine Connectivity to at least one instance at least 99.95% of the time.”
▶ “Availability Set”: a container for similar virtual machines- Web servers- Citrix XenApp/XenDesktop Workers- Cluster Nodes (eg file server clusters)- Active Directory Domain Controllers- HA Node pairs (Firewalls, Load Balancers, …)
How to increase service availability ?
• If we want to improve service availability excellence, we have to understand the influencing factors:
▶ Hardware failures▶ Hypervisor patching▶ Hardware or software upgrades▶ Storage (service) failures
Hardware failures
• Hardware:▶ Heavy competition between
- cloud providers (Azure, AWS, Google, …)- on-premise datacenters
▶ Datacenter (containers) constructed at lowest possible price▶ No server redundancy (power, fan, nic, memory, cpu)▶ -> If you need redundancy, deploy over multiple servers▶ -> Servers grouped in (up to) 3 ‘Fault-Domains’
• By design, servers in different Fault-Domains should not stop functioning at the same time.
▶ power distribution EU 3 phases + N -> 3 fault-domains▶ power distribution US 2 phases + N -> 2 fault-domains
Hypervisor Patching
• Software:▶ Open Source hypervisors: Xen▶ In-house developed hypervisors: Microsoft Hyper-V▶ Open Source or In-house management and billing tools▶ -> These softwares need to be patched and upgraded regularly▶ -> Servers grouped in 20 ‘Patching-Domains’
• By design, servers in different Patching-Domains should not be patched at the same time.
▶ Original patching implementation: - VM Pause, server restart, VM Resume -> up to 15min downtime
▶ Current patching implementation:- VM Pause, VM Resume on different hypervisor – seconds downtime
Hardware or software upgrades
• Hardware in active hardware containers is not repaired or replaced• Hardware containers are repaired when efficiency becomes too low• -> Active VMs are stopped and restarted elsewhere• Similar for software upgrades
▶ Hyper-V 2012R2 -> Hyper-V 2016• Tip: deallocate (stop) and reallocate (start) your VMs regularly
• Restart without warning:▶ Single Instance VM on Standard storage
• Planned downtime (announced):▶ Single Instance VM on Premium storage▶ VMs deployed in an Availability Set
• VMs can only be added to an AS at creation time• -> always create VMs in a AS, even when initially single instance
Storage Service Failure
• Standard Storage: ▶ Standard Storage Accounts ▶ Always minimum 3 copies of data (6 for geo-redundant)▶ Software defined storage on Standard disks▶ Max 500TB, 20.000 IOps per Storage Account (SA)▶ Standard VMs have 500 IOps disks -> max 40 disks per SA
• Premium Storage:▶ Premium Storage Accounts▶ Software defined storage on SSD disks▶ Up to 5000 IOps per disk▶ VMs with up to 64TB, 80.000 IOps, 2000MBps throughput possible
• Security on Storage Account Blobs and files:▶ Access Keys▶ No (Azure) Active Directory Integration Possible
• https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
Windows Server 2016 DE Storage Spaces Direct on Scale-Out File Server clusters• Shared-nothing cluster• Minimum 4 nodes needed for Storage Spaces Direct• Additional Cloud Witness on Azure Storage Account Blob• High-Available cluster
▶ 5 witnesses▶ 3 is enough for a majority (half + 1)▶ 2 storage nodes can be down
without losing the cluster
• Storage tiering possible (disks + SSD)• Limited in performance by the 200Mbps network• -> possible need for Accelerated Networking
▶ Only available on Azure D15_V2 (1500€/month -> 6000€/cluster)
Architecture: Original Thought
WAN
Internet
Azure
XenAppWorkers
XenAppWorkers
XenAppDelivery
Controller
AD DC File Server
On-prem A
AD DC File Server
Back-endProcesssing
Mail server
Proxy Server
DatabaseServer
On-prem B
AD DC File Server
Back-endProcesssing
Mail server
Proxy Server
DatabaseServer
XenAppStorefront
XenAppStorefront
LoadBalancer
LoadBalancer
To Accommodate for hardware failures
• We need to implement every component highly available• Hence we need every component at least twice• Availability groups (and possibly loadbalancers) are needed
On Azure, you need Azure access
• In order to create licensed Windows VMs• In order to domain join VMs• In order to provide Azure Management tools with VM health info• In order to access Azure blob storage
• Optionally▶ Fast Access to Exchange Online and Office 365
• ‘Azure Access’ means ‘Internet Access’ unless you implement routing tables and/or firewalls
Architecture: What Is Needed
Internet
WAN
Internet
WAN
Azure
XenAppWorkers
XenAppWorkers
XenAppDelivery
Controller
AD DC
On-prem A
AD DC File Server
Back-endProcesssing
Mail server
Proxy Server
DatabaseServer
On-prem B
AD DC File Server
Back-endProcesssing
Mail server
Proxy Server
DatabaseServer
XenAppStorefront
XenAppStorefront
LoadBalancer
LoadBalancer
Azure
FileFile FileFileAD DC
XenAppDelivery
Controller
ProxyServer
ProxyServer
Microsoft Hypervisor Patch Week
• During MS Hypervisor patch week, we will lose in term 5% of our XenApp Workers while Microsoft is patching that patching-domain
• MS reduced downtime from 15 minutes to a ‘10 seconds freeze’• We do not know the order of patching nor the exact time• This is still unacceptable in a 24 x 7 customer facing environment
• Solution:▶ When patching Europe-West, we move to Europe-North▶ When patching Europe-North, we move to Europe-West
Architecture: High Available
Internet
WAN
Internet
WANEurope West
XenAppWorkers
XenAppWorkers
XenAppDelivery
Controller
AD DC
On-prem A
AD DC File Server
Back-endProcesssing
Mail server
Proxy Server
DatabaseServer
On-prem B
AD DC File Server
Back-endProcesssing
Mail server
Proxy Server
DatabaseServer
XenAppStorefront
XenAppStorefront
LoadBalancer
LoadBalancer
Azure
FileFile FileFileAD DC
XenAppDelivery
Controller
ProxyServer
ProxyServer
Europe North
XenAppWorkers
XenAppWorkers
XenAppDelivery
Controller
AD DC FileFile FileFileAD DC
XenAppDelivery
Controller
ProxyServer
ProxyServer
Dual Region Location Implementation
• Cost reduction:▶ We do not need to run machines that we don’t actively need▶ Exception: Active Directory Domain Controllers (and firewalls)
• Negative impact:▶ We will need data replication between file server clusters in different
locations▶ Microsoft Windows Server 2016 provides techniques for this
Naming Convention
• You have to create lots and lots and lots of ‘objects’• Formally define a naming convention
▶ Extend your existing naming convention to include new object types▶ Avoid ‘long’ names▶ Use names that are ‘meaningfull’ for humans▶ Some object names (eg storageaccounts)
- can only have lowercase letters and digits- no uppercase letters, no dashes
▶ Do not put object properties in the object name (eg -VM, -NIC )- If you do put object properties in the object name, make sure they cannot change
▹ eg object Location: NE, WE▹ storage type: s(tandard), p(remium)▹ storage redundancy: lrs, grs
• Stick to your naming convention▶ Or change it and start all over again (most objects cannot be renamed)
Azure ARM Resource Groups
• Resource Groups▶ Bunch of resources together▶ Security boundary▶ RG Networking -> Networking team is contributor, others have read
access▶ RG Common -> Supporting services (AD, DNS, AV, SCOM, …)▶ RG per project
Connectivity
• On-Prem to Cloud Connectivity▶ VPNs▶ Express Route▶ Direct Internet Access (& Direct Access)
• Bandwidth and Latency▶ Best effort (VPN)▶ Guaranteed Bandwidth and Latency (Express Route)
• Availability▶ How about single points of failure ?
- Virtual firewalls▹ Check Point Firewall-1 cluster
- Loadbalancers▹ Citrix Netscaler cluster
- Others• Service Guarantee (CoS, QoS)
▶ Firewall to Virtual-firewall▶ Network Bandwidth Optimizer to virtual Network Bandwidth Optimizer
Azure in short
• Azure is a worthy datacenter extension / replacement• A lot of things are still happening in the cloud (new / preview)• The feature set is huge, start with what you need and know
▶ Then expand and explore – and optimize
• Cloud architecture requires a different (cheaper) approach• If you don’t go to the cloud (yet), use cloud principles on-prem• Microsoft Windows Server 2016: build for the cloud
▶ private & public !
• Citrix Netscaler can help you control your cloud traffic• Citrix XenApp/XenDesktop brings your users to your cloud
workloads for better performance• The cloud is now !
Top Related