Virtualization in the Real World
Transcript of Virtualization in the Real World
11
Virtualization in the Real WorldVirtualization in the Real WorldA customer experience
Session 9214
Speaker: Mike [email protected]
Fidelity InvestmentsFidelity Investments
One Destiny Way MZ CC2OOne Destiny Way MZ CC2O
Westlake, Tx 76262Westlake, Tx 76262
22
Virtualization in the Real WorldVirtualization in the Real WorldA customer experience
33
zSeries & s/390 LinuxzSeries & s/390 Linux
The zSeries Linux Implementation FormulaThe zSeries Linux Implementation Formula
Unix versus z/VM & Linux Unix versus z/VM & Linux
Infrastructure Reduction Infrastructure Reduction
Grid on zSeriesGrid on zSeries
Support Model Support Model
Practical ExamplesPractical Examples
TCO Model TCO Model
Workload managementWorkload management
44
((((RIT * PT)NITP)M((((RIT * PT)NITP)M22)V)V33))NTWNTW
ISVPISVPTABNYDTABNYD
zSeries & s/390 LinuxzSeries & s/390 Linux
PITPIT
RITRIT
PTPT
NITPNITP
MM
VV
NTWNTW
ISVPISVP
TABNYDTABNYD
- Project Implementation Time- Project Implementation Time
- Real Implementation Time- Real Implementation Time
- Project Time- Project Time
- Needlessly Involved Technical People- Needlessly Involved Technical People
- Managers- Managers
- Vice Presidents- Vice Presidents
- Number of Turf Wars- Number of Turf Wars
- ISV Products- ISV Products
- Talked About But Not Yet Delivered- Talked About But Not Yet Delivered
The zSeries Linux Implementation FormulaThe zSeries Linux Implementation Formula
PIT=PIT=
Unix versus z/VM & LinuxUnix versus z/VM & Linux
66
z/OSz/OSNetwork
Server
UNIX
App
Ca
ble
Server
AIX
App
Server
Win..
App
Ca
ble
Ca
ble
Typical Open environmentTypical Open environment
Linux
App
Linux
App
Linux
App
Virtual Cables
Shared Disks
z/VM processors, memory, channels...
Ca
ble
Linux on z/VMLinux on z/VM
Unix versus z/VM & LinuxUnix versus z/VM & Linux
IEEE VLANIEEE VLAN
Discrete compared to virtualized with z/VM
77
LinuxLinuxLinuxLinuxLinuxLinux
Manageability of the Virtual Environment
CMSCMS
VM VM OPER OPER (REXX)(REXX)
LinuxLinux
CPCPHypervisorHypervisoroperationsoperations
CP CP MonitorMonitor
• Virtual ConsolesVirtual Consoles• Single Console Image FacilitySingle Console Image Facility• PROP(CA) or VM/Oper (CA)PROP(CA) or VM/Oper (CA)• Performance ToolkitPerformance Toolkit• Standard VM monitor dataStandard VM monitor data• MICS and/or Merrill’s MXGMICS and/or Merrill’s MXG• Integrate with z/OS dataIntegrate with z/OS data• RMF LPAR reportingRMF LPAR reporting• RMF for LinuxRMF for Linux
Virtual ConsoleVirtual Console
SCIFSCIF
MonitorMonitorDataDataz/
VM
z/V
M
CMSCMS
Perf. Perf. ToolkitToolkit
Workload ManagementWorkload Management
88zSeries HardwarezSeries Hardware
zSeries Hardware layerzSeries Hardware layer
AutomationAutomation z/VM Virtualization layerz/VM Virtualization layer ManagementManagement
Linux OSLinux OS
MiddlewareMiddleware
ApplicationApplication
Linux OSLinux OS
MiddlewareMiddleware
ApplicationApplication
Linux OSLinux OS
MiddlewareMiddleware
ApplicationApplication
Linux OSLinux OS
MiddlewareMiddleware
ApplicationApplication
Linux OSLinux OS
MiddlewareMiddleware
ApplicationApplication
Linux OSLinux OS
MiddlewareMiddleware
ApplicationApplication
Unix versus z/VM & LinuxUnix versus z/VM & Linux
On Demand is there today!!On Demand is there today!!Dynamic addition of resources is possible for certain resources and is Dynamic addition of resources is possible for certain resources and is expanding rapidly in the zSeries infrastructure.expanding rapidly in the zSeries infrastructure.
99
Typical Open environmentTypical Open environment
Unix versus z/VM & LinuxUnix versus z/VM & Linux
Discrete compared to virtualized with VMWare
Network
Server
UNIX
App
Ca
ble
Server
AIX
App
Server
Win..
App
Ca
ble
Ca
ble
Windows & Linux on VMWareWindows & Linux on VMWare
Virtual CablesVirtual Cables
Ca
ble
Win..Win..
AppApp
LinuxLinux
AppApp
VMWare cpu-mem-I/O
Win..Win..
AppApp
LinuxLinux
AppApp
VMWare cpu-mem-I/O
Virtual CablesVirtual Cables
Win..Win..
AppApp
LinuxLinux
AppApp
VMWare cpu-mem-I/O
Ca
bleBlade FrameBlade Frame Blade FrameBlade Frame
z/OSz/OS
1010
Unix versus z/VM & LinuxUnix versus z/VM & Linux
What are some differences?
Virtualization with 3 decades of IBM software and hardware Virtualization with 3 decades of IBM software and hardware experience behind it.experience behind it.
• Instruction based VirtualizationInstruction based Virtualization• End-to-End Error RecoveryEnd-to-End Error Recovery• Workload ManagementWorkload Management• Dynamic pathing to DiskDynamic pathing to Disk• Hipersockets between LPARs Hipersockets between LPARs • On Demand Infrastructure On Demand Infrastructure • Simplified Administration, Simplified Administration,
Monitoring and AutomationMonitoring and Automation• Infrastructure simplificationInfrastructure simplification• Shared Segments & Disk SharingShared Segments & Disk Sharing• Maintenance and UpkeepMaintenance and Upkeep zSerieszSeries
zSeries Hardware layerzSeries Hardware layerz/VM Virtualization layerz/VM Virtualization layer
1111
Unix versus z/VM & LinuxUnix versus z/VM & Linux
ConsiderationConsideration z/VMz/VMEnd-to-End Error Recovery
Autonomic Workload Management
Hipersocket connectivity to other LPARs
Disk I/O subsystem dynamic pathing
Environment requires high degree of sharing
Centralized Administration and Capacity Management
Dynamic “On Demand” resource allocation
Linux automation capability
Dynamic provisioning (creating & manipulate guests on the fly)
Total Cost of Ownership (infrastructure for power/cooling)
Total Cost of Acquisition (initial cost for small implementation)
Virtualization of Windows
Virtualization Considerations for Mainframe UsersVirtualization Considerations for Mainframe Users
1212
Infrastructure ReductionInfrastructure Reduction
Firewall
InternetInternetWeb
Server
eMail Server
Application Server
Database Server
Security Server
Backup ServerSecurity
Server
Web ServerWeb
Server
eMail Server
Application ServerApplication
Server
Database ServerDatabase
Server
Backup Server
IntranetIntranet
Firewall
FirewallFirewall
Typical Server Environment – What are the Problems?Typical Server Environment – What are the Problems?What is Missing?What is Missing?
1313
• Something will always be broken or malfunctioningSomething will always be broken or malfunctioning• Something in this infrastructure needs upgradeSomething in this infrastructure needs upgrade
─ Hardware/software upgradeHardware/software upgrade─ Upgrade (technology exchange) is very disruptiveUpgrade (technology exchange) is very disruptive─ No provision for dynamic upgradesNo provision for dynamic upgrades
• The majority of this infrastructure will be The majority of this infrastructure will be underutilizedunderutilized─ But when processing spikes occur, there will always be a But when processing spikes occur, there will always be a
bottleneck somewherebottleneck somewhere─ Unknown SPOFsUnknown SPOFs
• End-to-End management is difficult to impossibleEnd-to-End management is difficult to impossible─ Monitoring Management & control does not span silos Monitoring Management & control does not span silos ─ Administration is difficult and requires to many levels of interaction Administration is difficult and requires to many levels of interaction
to solve problemsto solve problems
• No real way to achieve significant infrastructure & No real way to achieve significant infrastructure & administrative cost reductionadministrative cost reduction
• Automation is difficult so autonomic computing and Automation is difficult so autonomic computing and Disaster Recover are nearly impossible to achieve Disaster Recover are nearly impossible to achieve
Infrastructure ReductionInfrastructure Reduction
What are the problems with the distributed Infrastructure?What are the problems with the distributed Infrastructure?
1414
Infrastructure ReductionInfrastructure Reduction
What’s Missing? What’s Missing? Support Infrastructure! Support Infrastructure!
Firewall
InternetInternet
Web Server
eMail Server
Application Server
Database Server
Security Server
Backup ServerSecurity
Server
Web ServerWeb
Server
eMail Server
Application ServerApplication
Server
Database ServerDatabase
Server
Backup Server
FirewallFirewall
Firewall
IntranetIntranet
This configuration contains 50+ levels of infrastructure This configuration contains 50+ levels of infrastructure
1515In this configuration, 14+ levels of infrastructure have been eliminatedIn this configuration, 14+ levels of infrastructure have been eliminated
Infrastructure ReductionInfrastructure Reduction
Reduce Infrastructure with Linux on zSeriesReduce Infrastructure with Linux on zSeries
Firewall
InternetInternet
Web Server
eMail Server
Application Server
Database Server
Security Server
Backup ServerSecurity
Server
Web ServerWeb
Server
eMail Server
Application ServerApplication
Server
Database ServerDatabase
Server
Backup Server
FirewallFirewall
Firewall
IntranetIntranet
Backup serverBackup server
Application serverApplication server
DB2 ConnectDB2 Connect
z/VM-az/VM-a
File serverFile server
tapetape
z9xxz9xx z/OS-az/OS-a
Backup serverBackup server
Application serverApplication server
DB2 ConnectDB2 Connect
z/VM-bz/VM-b
File serverFile server
z9xxz9xx
DB2DB2
CICSCICS
Tape mgtTape mgt
z/OS-bz/OS-b
DB2DB2
CICSCICS
Tape mgtTape mgt
MQSeriesMQSeries
MQSeriesMQSeries
1616
• Virtualization simplifies the infrastructure Virtualization simplifies the infrastructure • Common software provides for simpler Common software provides for simpler
upgrades and hardware can be transparently upgrades and hardware can be transparently upgradedupgraded
• Administration and management simplifiedAdministration and management simplified• Real cost savings can be achieved because Real cost savings can be achieved because
levels are moved from real to virtuallevels are moved from real to virtual• Resources can be better utilizedResources can be better utilized• On Demand dynamic addition of resourcesOn Demand dynamic addition of resources• Better automation, autonomic computing Better automation, autonomic computing • Disaster recovery actually possibleDisaster recovery actually possible
Infrastructure ReductionInfrastructure Reduction
Consolidation on zSeries – What are the Benefits?Consolidation on zSeries – What are the Benefits?
1717
• JES2 MASJES2 MAS– Jobs processed where resources are Jobs processed where resources are
availableavailable
• CICS MROCICS MRO– Function shipping throughout sysplex Function shipping throughout sysplex
based on available resourcesbased on available resources– Transactions routed based on Transactions routed based on
available resources or transaction available resources or transaction affinityaffinity
• DB2 Data SharingDB2 Data Sharing– Data-sharing allows any CICS region Data-sharing allows any CICS region
to access data as though it were localto access data as though it were local • VSAM Record Level SharingVSAM Record Level Sharing
– Allows access to VSAM from Allows access to VSAM from individual regions across a sysplex individual regions across a sysplex rather than from file owning regionsrather than from file owning regions
• On Demand resource additionOn Demand resource addition
Workload ManagerWorkload Manager
Parallel SysplexParallel Sysplex
NetworNetworkk
NetworNetworkk
CICSCICSCICSCICSCICSCICS
CICSCICSCICSCICSCICSCICS
CICSCICSCICSCICSCICSCICS
CICSCICSCICSCICSCICSCICS
JavaJava
Java
Java
Java
DB2DB2
VSAMVSAM
Grid on zSeriesGrid on zSeries
1818
• Sysplex Websphere gridSysplex Websphere grid– Servers dynamically added & quiescedServers dynamically added & quiesced– Resources balanced across sysplexResources balanced across sysplex
• WebSphere Application ServerWebSphere Application Server– Can take advantage of z/OS security, Can take advantage of z/OS security,
crypto and zAAP featurescrypto and zAAP features
• Work Load ManagerWork Load Manager– Dynamic Management of WAS Dynamic Management of WAS
application serversapplication servers– Work loads prioritized and balanced Work loads prioritized and balanced – Running hardware at 100% with Running hardware at 100% with
heterogeneous workloadsheterogeneous workloads
• On Demand resource additionOn Demand resource addition– Activate standard processors, zAAPs, Activate standard processors, zAAPs,
IFLs and Memory dynamicallyIFLs and Memory dynamically– Deactivate resources dynamicallyDeactivate resources dynamically
WebSphere – A Grid?WebSphere – A Grid?
Grid on zSeriesGrid on zSeries
Work Load ManagerWork Load Manager
Parallel SysplexParallel Sysplex
NetworNetworkk
NetworNetworkk
ServeletServeletJavaJavaEJBEJB
1919
• WebSphere & CICS WebSphere & CICS – CICS Web ServerCICS Web Server– J2EE, Java transactionsJ2EE, Java transactions– Business transformation logicBusiness transformation logic
• DB2 Data SharingDB2 Data Sharing– Enterprise Java BeansEnterprise Java Beans– Stored ProceduresStored Procedures– DB2 ConnectDB2 Connect
• VSAM Record Level SharingVSAM Record Level Sharing– Sysplex wide sharing of VSAM filesSysplex wide sharing of VSAM files– Web enabled VSAM connectorsWeb enabled VSAM connectors
• On Demand resource additionOn Demand resource addition– Add resources manually or Add resources manually or
automaticallyautomatically– Scale up and/or outScale up and/or out
DB2 Data Sharing – A Grid?DB2 Data Sharing – A Grid?
Grid on zSeriesGrid on zSeries
Workload ManagerWorkload Manager
Parallel SysplexParallel Sysplex
NetworNetworkk
NetworNetworkk
DB2DB2
VSAMVSAM
CICSCICS
CICSCICS
Java
CICSCICS
CICSCICS
Java
CICSCICS
CICSCICS
JavaCICSCICS
CICSCICS
Java
2020
Job ManagerJob Manager
Grid on zSeriesGrid on zSeries
Data Grid Exploitation with zSeriesData Grid Exploitation with zSeries
We could this, but our We could this, but our applications groups would applications groups would have to recode all of our have to recode all of our applications to fit this model.applications to fit this model.Eventually this will happen, Eventually this will happen, but not in the short term.but not in the short term.
Resource Resource LibraryLibrary ProcessProcess
ProcessProcessProcessProcess
ProcessProcess
ClientClient
GatekeeperGatekeeper
Security Security InfrastructureInfrastructure
Open Grid Services ArchitectureOpen Grid Services Architecture
Resource ManagerResource Manager
Hosting EnvironmentHosting Environment
Grid Service ContainerGrid Service Container
User-Defined ServicesUser-Defined Services
Base ServicesBase Services
System-Level ServicesSystem-Level Services
OGSI Spec OGSI Spec ImplementationImplementation
Security Security InfrastructureInfrastructure
Web Service EngineWeb Service Engine
Security admin. Security admin. RSL admin.RSL admin.
2121
z/VM 1z/VM 1 z/OS 1 z/OS 1
OS
A/X
OS
A/X
z/VM 2z/VM 2 z/OS 2z/OS 2
OS
A/X
OS
A/X
Data Grid Exploitation with Data Grid Exploitation with zSeries Linux & DB2 ConnectzSeries Linux & DB2 Connect
hip
ersockets
hip
ersockets
DB2 ConnDB2 ConnGuest 1Guest 1
DB2 ConnDB2 ConnGuest 2Guest 2
DB2 ConnDB2 ConnGuest 3Guest 3
DB2 ConnDB2 ConnGuest ..nGuest ..n
DB2/DSDB2/DS
DB2/DSDB2/DS
DB2/DSDB2/DS
hip
ersockets
hip
ersockets
DB2/DSDB2/DS
DB2/DSDB2/DS
DB2/DSDB2/DS
DB2 ConnDB2 ConnGuest 1Guest 1
DB2 ConnDB2 ConnGuest 2Guest 2
DB2 ConnDB2 ConnGuest 3Guest 3
DB2 ConnDB2 ConnGuest ..nGuest ..n
sysplex
sysplex
Compute environment taking Compute environment taking advantage of zSeries data advantage of zSeries data grid to provide a high speed grid to provide a high speed connection to DB2 data on connection to DB2 data on the zSeries sysplex. Low the zSeries sysplex. Low network latency & high data network latency & high data rates can be achieved with rates can be achieved with hipersockets.hipersockets.
Example of this configuration Example of this configuration in “Practical Example”.in “Practical Example”.
Grid on zSeriesGrid on zSeries
Co
mp
ute
In
ten
sive
Pro
cess
ing
Co
mp
ute
In
ten
sive
Pro
cess
ing
2222
Middleware & Middleware & DBMS SupportDBMS Support
OS support with IBM OS support with IBM for level1 & 2 – level 3 for level1 & 2 – level 3 support with RedHat.support with RedHat.
Support ModelSupport Model
How we do zSeries Linux installation & supportHow we do zSeries Linux installation & support
zSeries HardwarezSeries Hardwarez/VM & Virtual Guestsz/VM & Virtual Guests
WASWAS WASWAS MQMQ DBDBDBDB
Test/Dev/QA –Test/Dev/QA – MainframeMainframe
z/OS Supportz/OS SupportProduction – Production –
UNIX Technical SupportUNIX Technical Support
MainframeMainframeHardware &Hardware &
StorageStorageManagementManagement
WASWASDBDBJavaJava
2323
Lin
ux
Lin
ux
MQ
Series
MQ
Series
Lin
ux
Lin
ux
DB
2 Co
nn
DB
2 Co
nn
z/VM (ZVMx)z/VM (ZVMx)
Lin
ux
Lin
ux
WA
S 5
WA
S 5
z/OS (CPUx)z/OS (CPUx)
LP
AR
1L
PA
R1
LP
AR
2L
PA
R2
Server Server CreationCreation
Server Server CreationCreation
YO
UR
New
YO
UR
New
Lin
ux
Lin
ux
Servers can be provisioned Servers can be provisioned through “Server Central”. Once through “Server Central”. Once the request is received it takes the request is received it takes about ½ hour to create the server about ½ hour to create the server and in many cases the server can and in many cases the server can be completely provisioned in less be completely provisioned in less than one day. Test/Dev/QA than one day. Test/Dev/QA supported by z/OS support group. supported by z/OS support group. Production supported by UNIX Production supported by UNIX Technical Support group. Technical Support group. Middleware & DBMS supported by Middleware & DBMS supported by Open Systems DBMS support.Open Systems DBMS support.
Server CreationServer Creation
Support ModelSupport Model
2424
Lin
ux
Lin
ux
Lin
ux
Lin
ux
Java
Java
Lin
ux
Lin
ux
c++
/ftpc+
+/ftp
Lin
ux
Lin
ux
MQ
Series
MQ
Series
Lin
ux
Lin
ux
DB
2 Co
nn
DB
2 Co
nn
z/VMz/VML
inu
xL
inu
xW
AS
5W
AS
5
z/OSz/OS
zSeries – Test/QA
Dev – Test – QA Dev – Test – QA
TestPlex
QA Plex
Support ModelSupport Model
2525
Lin
ux
Lin
ux
Lin
ux
Lin
ux
Java
Java
Lin
ux
Lin
ux
c++
/ftpc+
+/ftp
Lin
ux
Lin
ux
MQ
Series
MQ
Series
Lin
ux
Lin
ux
DB
2 Co
nn
DB
2 Co
nn
z/VMz/VM
Lin
ux
Lin
ux
WA
S 5
WA
S 5
z/OSz/OS
zSeries – Prod
Other Plexes
Site 1Site 1
Site 2Site 2
Other Plexes
Production Production zOS/zVMzOS/zVM
zOS/zVMzOS/zVM
zOS/zVMzOS/zVM
zOS/zVMzOS/zVM
Support ModelSupport Model
2626
z/OS 2z/OS 2
DBMSDBMSDBMSDBMSDBMSDBMSDBMSDBMS
z/OS 1z/OS 1
DBMSDBMSDBMSDBMSDBMSDBMSDBMSDBMS
IPIPDist.Dist.
IPIPDist.Dist.
SiteSiteDist.Dist.
SiteSiteDist.Dist.
Site 1Site 1
AIXAIXC1C1C2C2C3C3
Site 2Site 2
AIXAIXC1C1C2C2C3C3
DB2 ConnectDB2 Connect
• AIX Servers in a High AIX Servers in a High Availability multi-site Availability multi-site configuration resulting configuration resulting in unused capacityin unused capacity
• Maintenance difficult Maintenance difficult to schedule because to schedule because all connects share the all connects share the same DB2 binariessame DB2 binaries
• Multiple network hops Multiple network hops increase latency increase latency resulting in higher resulting in higher response timesresponse times
• Memory configuration Memory configuration limited to total memory limited to total memory available on hardwareavailable on hardware
Old ConfigurationOld Configuration
Practical ExamplesPractical Examples
2727
DB2 ConnectDB2 Connect
• Shares hardware in a Shares hardware in a continuous availability continuous availability configurationconfiguration
• Maintenance can be Maintenance can be easily scheduled easily scheduled because each instance because each instance has its own DB2 has its own DB2 binariesbinaries
• One network hop One network hop reduces network latency reduces network latency to near zeroto near zero
• Memory can be Memory can be customized for each customized for each server guestserver guest
z/VM 1z/VM 1 z/OS 1z/OS 1
DBMSDBMSDBMSDBMSDBMSDBMSDBMSDBMS
z/VM 2z/VM 2 z/OS 2z/OS 2
DBMSDBMSDBMSDBMSDBMSDBMSDBMSDBMS
IPIPDist.Dist.
IPIPDist.Dist.
New ConfigurationNew Configuration
Practical ExamplesPractical Examples
2828
WAS 5.1.0 applicationsWAS 5.1.0 applications
CSC Hostbridge & EOSCSC Hostbridge & EOS
• High availability configurationHigh availability configuration
• Mainframe centric Mainframe centric applications with low applications with low utilization.utilization.
• One network hop reduces One network hop reduces network latency to near zero network latency to near zero (except in failover)(except in failover)
• Both Hostbridge and EOS are Both Hostbridge and EOS are running on a single guest to running on a single guest to leverage server costs.leverage server costs.
Practical ExamplesPractical Examples
2929
DB2 Connect/JavaDB2 Connect/Java
• High availability High availability configurationconfiguration
• Maintenance can be easily Maintenance can be easily scheduled because each scheduled because each instance has its own DB2 instance has its own DB2 binariesbinaries
• One network hop reduces One network hop reduces network latency to near network latency to near zero (except in failover)zero (except in failover)
• Low utilization server Low utilization server allows for consolidation, allows for consolidation, simplification and low simplification and low network latencynetwork latency MerrimackMerrimack
z/VM 1z/VM 1 z/OS 1z/OS 1
DBMSDBMS
IPIPDist.Dist.
IPIPDist.Dist.
z/VM 1z/VM 1 z/OS 1z/OS 1
DBMSDBMS
DallasDallas
High AvailabilityHigh AvailabilityFailoverFailover
Practical ExamplesPractical Examples
3030
SNSNAA
WANWAN
OSA/eOSA/e
z/OS z/OS EEEE
SNASNAAppApp
SNSNAA
Remote
Data Center
TCP/IPTCP/IP
SNA environments will be around for some time and have evolved to become a complex infrastructure. SNA over IP requires many levels of infrastructure. DLSw and EE gateway technologies are not always compatible and when a problem occurs, diagnosis is very difficult. Channel Attached
CIP Routers
TN3270TN3270
37xx
37xx
37xx
z9xx
z9xx
TCP/TCP/IPIP
SNSNAA
37xx
LoadBalancing
SNA Apps
SNA Apps
SNA Elimination – Current EnvironmentSNA Elimination – Current Environment
Practical ExamplesPractical Examples
3131
OSA/eOSA/e
z/OSz/OS
SNA AppSNA App
Remote
Data Center
zSeries Linux Communications Server, Communications Controller, and SSL server provide the ability to collapse the SNA infrastructure back into the mainframe platform eliminating the need for distributed SNA appliance technology which is reaching end-of-life status over the next 12-24 months.
TN3270TN3270
TCP/IPTCP/IP
SNASNA
SNA Apps
SNA Apps
CSCC
SNA Elimination – Future EnvironmentSNA Elimination – Future Environment
Practical ExamplesPractical Examples
3232
“We project that improving UNIX/Intel workload management will drive average utilization rates from the 15% to 20% to 40% to 50% within three years. When the significant Intel/zSeries annual price/performance improvement gap is overlaid on these projections, it becomes clear that any business case for mainframe Linux will evaporate by 2005/06, in the face of the Linux on Intel juggernaut.”
(Meta Group, “Mainframe Linux Server Consolidation: The Near-Term Business Case”, Delta 2107 Mar 03)
TCO versus TCA!TCO versus TCA!
3333
“Action Item: Investigate all options to consolidate. Closely evaluate the migration costs, all assumptions (including staffing efficiency and over-provisioning for peak workloads), availability requirements and alternative mechanisms for reducing TCO. Those who dismissed Linux on the zSeries two years ago may wish to revisit it
because IBM has made progress.”
(Gartner, “The IBM Mainframe: 40 Years, Now What?”, 30 November – 2 December 2004)
TCO versus TCA!TCO versus TCA!
3434
• Long Term costs versus initial cost!Long Term costs versus initial cost!– How long before hardware push-pull is required?
• Total Infrastructure costs versus server hardware cost!!Total Infrastructure costs versus server hardware cost!!– How much infrastructure resources does the server require?
• How much capacity will go unused?How much capacity will go unused?– Low utilization equals poor ROI– Utilization only during certain time frames
• Downtime does have a cost!Downtime does have a cost!– Server outages should include appropriate resolution costs– Business outages do cost real dollars
• Ongoing maintenance, monitoring and capacity Ongoing maintenance, monitoring and capacity planning costs real dollars!planning costs real dollars!– What real networking, monitoring, admin & capacity planning
costs are visible to the project?
TCO versus TCA!TCO versus TCA!
3535
• Benchmarks are Benchmarks are notnot real workloads!! real workloads!! – Benchmarks don’t represent real production workloads
• One-to-One hardware comparisons don’t work!!One-to-One hardware comparisons don’t work!!– Single application hardware comparison: ex. blade-IFL $$$
• Sharing not considered as part of the model!!Sharing not considered as part of the model!!– Workload sharing is become a necessity in all environments– 24X7 utilization
• Downtime not considered as part of the model!!Downtime not considered as part of the model!!– Outages should include appropriate resolution costs
• Infrastructure reduction not considered!!Infrastructure reduction not considered!!– Networking, monitoring, admin & capacity planning cost $$$
• On Demand versus excess capacity is a reality on zSeries!On Demand versus excess capacity is a reality on zSeries!– Add and remove resources dynamically– No unused infrastructure for capacity is required
TCO versus TCA!TCO versus TCA!
3636
zSeries20%2 IFL
One-to-One Comparisons are Misleading
Cost Comparison
Intel40%
3 Ghz
zSeries10%2 IFL
Actual Implementation
Intel10%
3 Ghz
zSeries10%2 IFL
Intel10%
3 Ghz
Intel10%
3 Ghz
Intel10%
3 Ghz
zSeriesTest
zSeriesqa
Inteltest
IntelqaShared
resources
The comparison is done on one The comparison is done on one box but the deployment is box but the deployment is implemented in the standard high implemented in the standard high availability configuration which is availability configuration which is much more costly.much more costly.
TCO versus TCA!TCO versus TCA!
3737
So far all of the testing has focused on “Primary Shift” projects. This only takes advantage of a window of resources available on zSeries Linux. This leaves more than 60% of the resources available for other application deployments.
8:008:00
5:005:00
5:005:00
8:008:00
WAS
Oracle
UDB
Java/DB2 Connect
Web Portal
Offshore development
Extracts and reporting
Other exploitation of unused timeframe
Great area of Great area of opportunity liesopportunity liesbetween end and between end and start of primary shiftstart of primary shift
TCO versus TCA!TCO versus TCA!
What are the OpportunitiesWhat are the Opportunities
3838
AppServer
App Server
App Server
DBMSServer
DBMS Server
DBMS Server
Web Server
Web Server
Web Server
Prime Shift ApplicationsPrime Shift Applications
Web Server
Web Server
Web Server
Report Extract
Database Server
Database Server
Database Server
Report Extract
Report Extract
Non-Prime Hours ApplicationsNon-Prime Hours Applications
DBMS ServerDBMS Server
Report ExtractReport Extract
Report ExtractReport Extract
z/VM-az/VM-a
App ServerApp Server
tapetape
z9xxz9xx z/OS-az/OS-a
Database Database ServerServer
CICSCICS
MQSeriesMQSeries
CICSCICSDBMS ServerDBMS Server
App ServerApp Server
Database Database ServerServerReport ExtractReport Extract
DBMS ServerDBMS Server
App ServerApp Server
MQSeriesMQSeries
zSeries is designed for zSeries is designed for sharing so scaling can be sharing so scaling can be accomplished both accomplished both vertically and horizontallyvertically and horizontally
TCO versus TCA!TCO versus TCA!
3939
Lin
ux
Lin
ux
DB
2 Co
nn
DB
2 Co
nn
Lin
ux
Lin
ux
WA
S 5
WA
S 5
Lin
ux
Lin
ux
Java
Java
Lin
ux
Lin
ux
DB
2 Co
nn
DB
2 Co
nn
Lin
ux
Lin
ux
WA
S 5
WA
S 5
Lin
ux
Lin
ux
WA
S 5
WA
S 5
Lin
ux
Lin
ux
WA
S 5
WA
S 5
TCO versus TCA!TCO versus TCA!
z/VMz/VM
z/OSz/OS
Heterogeneous workloads can reduce costsHeterogeneous workloads can reduce costsUse the workload management capability Use the workload management capability of z/VM to allow production peaks to of z/VM to allow production peaks to utilize the test & development resources.utilize the test & development resources.
• Provision Test/DevProvision Test/Dev– Build as many Build as many
test/development guests as test/development guests as you can to fill unused you can to fill unused resourcesresources
– Set the priority of the test/dev Set the priority of the test/dev guests lowguests low
• Provision ProductionProvision Production– Build production guests with Build production guests with
the intent of satisfying peaks the intent of satisfying peaks by stealing resources from by stealing resources from test/devtest/dev
– Set the priority of the Set the priority of the production guests highproduction guests high
• Configure the LPAR with Configure the LPAR with sufficient resources to sufficient resources to run bothrun both
TestTest ProductionProduction
Lin
ux
Lin
ux
Ja
vaJ
ava
Lin
ux
Lin
ux
DB
2 C
on
nD
B2
Co
nn
Lin
ux
Lin
ux
WA
S 5
WA
S 5
Lin
ux
Lin
ux
WA
S 5
WA
S 5
Lin
ux
Lin
ux
WA
S 5
WA
S 5
TestTest
Lin
ux
Lin
ux
DB
2 Co
nn
DB
2 Co
nn
Lin
ux
Lin
ux
WA
S 5
WA
S 5
ProductionProduction
Network
Current Unix Life Cycle StrategyCurrent Unix Life Cycle Strategy
Server
UNIX
App
Ca
ble
h/wh/w
s/ws/w
OLDOLD
diskdisk
App
Ca
ble
UNIXs/ws/w
Serverh/wh/w
NEWNEW
diskdisk
TCO versus TCA!TCO versus TCA!
The Boundless Proliferation loop!
• Provision serverProvision server– Floor space, Power & Floor space, Power &
HardwareHardware– OS, Network & MiddlewareOS, Network & Middleware
• Test the configurationTest the configuration• Install the applicationInstall the application• QA the configurationQA the configuration• Run parallel to validate Run parallel to validate
the applicationthe application• Cutover to productionCutover to production• Decommission the old Decommission the old
serverserver
Network
Linux
App
Linux
App
Linux
App
Virtual Cables
Shared Disks
z/VM processors, memory, channels...
Ca
ble
zSeries Linux Life Cycle StrategyzSeries Linux Life Cycle Strategy
OLDOLD
TCO versus TCA!TCO versus TCA!
Ending the loop with zSeries Linux!
z/VM processors, memory, channels... NEWNEW
Linux
App
Linux
App
Linux
App
Virtual Cables
Shared Disks
Ca
ble
4242
TCO versus TCA!TCO versus TCA!
What works in zSeries Linux!What works in zSeries Linux!
Test & DevTest & Dev
Low UtilizationLow Utilization
MQ ConcentrationMQ Concentration
DB2 ConnectDB2 Connect
Continuous Continuous availabilityavailability
eMaileMail ServerServer
z/OS dataz/OS dataaccessaccess
I/O BoundI/O Bound
DB2 & OracleDB2 & OracleApp ServerApp Server
File ServerFile Server
z/OS Utility z/OS Utility ProcessingProcessing
Test & DevTest & Dev
Low UtilizationLow Utilization
MQ ConcentrationMQ Concentration
DB2 ConnectDB2 Connect
Continuous Continuous availabilityavailability
eMaileMail ServerServer
z/OS dataz/OS dataaccessaccess
I/O BoundI/O Bound
DB2 & OracleDB2 & OracleApp ServerApp Server
File ServerFile Server
z/OS Utility z/OS Utility ProcessingProcessing
4343
Workload CharacteristicsWorkload Characteristics zLinuxzLinuxNeeds access to mainframe data or application
DB2 Connect & MQSeries Concentration
Test & Development
Low CPU utilization
High I/O activity
Infrastructure Simplification/Reduction
Non-primary shift workloads
Time to Market
Dynamic “On Demand” resource allocation
CPU intensive workloads (where CPU is not I/O related)
Mainframe reliability requirements
Scalability beyond 4 CPUs
How do you decide what works? How do you decide what works?
TCO versus TCA!TCO versus TCA!
4444
LinuxLinuxLinuxLinuxLinuxLinux
Manageability of the Virtual Environment
CMSCMS
VM VM OPER OPER (REXX)(REXX)
LinuxLinux
CPCPHypervisorHypervisoroperationsoperations
CP CP MonitorMonitor
• Virtual ConsolesVirtual Consoles• Single Console Image FacilitySingle Console Image Facility• PROP(CA) or VM/Oper (CA)PROP(CA) or VM/Oper (CA)• Performance ToolkitPerformance Toolkit• Standard VM monitor dataStandard VM monitor data• MICS and/or Merrill’s MXGMICS and/or Merrill’s MXG• Integrate with z/OS dataIntegrate with z/OS data• RMF LPAR reportingRMF LPAR reporting• RMF for LinuxRMF for Linux
Virtual ConsoleVirtual Console
SCIFSCIF
MonitorMonitorDataDataz/
VM
z/V
M
CMSCMS
Perf. Perf. ToolkitToolkit
Workload ManagementWorkload Management
4545
Lin
ux
Lin
ux
Lin
ux
Lin
ux
Java
Java
Lin
ux
Lin
ux
c++
/ftpc+
+/ftp
Lin
ux
Lin
ux
MQ
Series
MQ
Series
Lin
ux
Lin
ux
DB
2 Co
nn
DB
2 Co
nn
z/VM (ZVMx)z/VM (ZVMx)
IFLIFL MemoryMemoryDiskDisk
OS
A/X
OS
A/X
Lin
ux
Lin
ux
WA
S 5
WA
S 5
z/OS (CPUx)z/OS (CPUx)
CPUCPU MemoryMemoryDiskDisk
zSeries hardware z/VM & z/OS
hip
ersockets
PRODCICS / DBMS
QPRDCICS / DBMS
RPRDCICS / DBMS
CPRDCICS / DBMS
LocalLocal
RemoteRemote
z/VM #1 z900z/VM #1 z900
z/VM #2 z990z/VM #2 z990
z990z990
z990z990
Test – 25Test – 25Prod – 10Prod – 10
Test – 22Test – 22Prod – 12Prod – 12
Test – 0Test – 0Prod – 2Prod – 2
Test – 0Test – 0Prod –0Prod –0
Workload ManagementWorkload Management
4646
0
5
10
15
20
25
30
35
Ave
rag
e %
CP
U/1
5 M
inu
te In
terv
al
0
5
10
15
20
25
30
35
Nu
mb
er
of
z/V
M G
ue
sts
ZVM Test Prod NumTestGuests NumProdGuests TotalNumGuests
z900z900
All z/VM accounting data was pulled for the week of 2005/01/31 and 2005/02/04. Only records between the hours of 09:00 and 17:00 EST were included. The data was summarized for 15 minute intervals. The graph below reflects the average cpu utilization for the week between 09:00 and 17:00 EST normalized to 100%.
z/VM #1 Average CPU Usage for 01/31-02/24
Workload ManagementWorkload Management
4747
ZVM4 Paging 01/31-02/04
0
5
10
15
20
25
30
Dem
an
d P
ag
ing
Ra
te
750
755
760
765
770
775
780
785
790
795
800
805
Gu
est
Siz
e in
Me
ga
byte
s
Prod Test ZVM Total AvgProdGuestSize AvgTestGuestSize
All z/VM accounting data was pulled for the week of 2005/01/31 and 2005/02/04. Only records between the hours of 09:00 and 17:00 EST were included. The data was summarized for 15 minute intervals. The graph below reflects the average server size for production and test as well as the demand paging rate for each.
z/VM #1 Average Paging for 01/31-02/24
Workload ManagementWorkload Management
4848
ZVM5 Average CPU Usage 01/31-02/04
0
5
10
15
20
25
30
35
Ave
rag
e %
CP
U/1
5 M
inu
te I
nte
rval
0
5
10
15
20
25
30
35
Nu
mb
er
of
z/V
M G
ue
sts
ZVM Test Prod NumTestGuests NumProdGuests TotalNumGuests
z990z990
All z/VM accounting data was pulled for the week of 2005/01/31 and 2005/02/04. Only records between the hours of 09:00 and 17:00 EST were included. The data was summarized for 15 minute intervals. The graph below reflects the average cpu utilization for the week between 09:00 and 17:00 EST normalized to 100%.
z/VM #2 Average CPU Usage for 01/31-02/24
Workload ManagementWorkload Management
4949
ZVM5 Paging 01/31-02/04
0
2
4
6
8
10
12
14
16
Dem
an
d P
ag
ing
Ra
te
0
100
200
300
400
500
600
700
Gu
es
t S
ize in
Meg
ab
yte
s
Prod Test ZVM Total AvgProdGuestSize AvgTestGuestSize
All z/VM accounting data was pulled for the week of 2005/01/31 and 2005/02/04. Only records between the hours of 09:00 and 17:00 EST were included. The data was summarized for 15 minute intervals. The graph below reflects the average server size for production and test as well as the demand paging rate for each.
z/VM #1 Average Paging for 01/31-02/24
Workload ManagementWorkload Management
5050
Based on the current usage patterns of the ZVM5 infrastructure the average production utilization is ~.5% and that of test is ~.25%. The chart below shows how that utilization would scale across 8+ CPUs in one or more environments. This would allow for ~.5 hours of CPU utilization for each production guest.**
z990 Scaling 20/80 - Prod .5%CPU/Test .25%CPU
0
500
1000
1500
2000
2500
3000
3500
1 1 1 2 2 2 3 3 3 3 4 4 4 5 5 5 6 6 6 6 7 7 7 8 8 8 9 9 9 9 10 10 10 11 11
Number of Processors
Nu
mb
er
of
Te
st
Gu
es
ts
z990 Prod Scaling z990 Test Scaling
**SMP scaling factor not included in this chart.
Workload ManagementWorkload Management
5151
Assuming estimated usage of 10% for production guests and .5% for test guests the chart below shows how that utilization would scale across 31+ CPUs in one or more environments. This would allow for ~2.5 hours of CPU utilization for each production guest.**
z990 Scaling 20/80 - Prod 10%CPU/Test .5%CPU
0
200
400
600
800
1000
1200
1400
1600
3 5 7 9 11 14 16 18 20 22 25 27 29 31 33 36 38 40
Number of Processors
Nu
mb
er
of
Te
st
Gu
es
ts
z990 Prod Scaling z990 Test Scaling
**SMP scaling factor not included in this chart
Workload ManagementWorkload Management
5252
Linux GuestLinux Guest
Chat ServerChat Server
Chat ClientChat Client
Start Server
1-16 JVMs
Start Client
Exit Test Collect ResultsNoNo YesYes
Each user added increases the thread count by 16.
Workload ManagementWorkload Management
5353
zLinux 1Chat Client/Server
zLinux 2Chat Client/Server
zLinux 3Chat Client/Server
zLinux 4Chat Client/Server
Start Server
1-16 JVMs
Start Client
Exit Test
Collect Results
NoNo YesYes
QuickdspShare 400
QuickdspShare 400
QuickdspShare 400
QuickdspShare 400
Test was run on all four guestssimultaneously to simulate multiple high priority workloads. Each guest was set at a relative share of 400 and quick dispatch.
Workload ManagementWorkload Management
5454
Average number of messages per second
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0:00 0:01 0:02 0:03 0:04 0:05 0:06 0:07 0:08 0:09 0:10
Mes
sage
s pe
r S
econ
d
ldal5004 ldal5005 ldal5006 ldal5007
Workload ManagementWorkload Management
5555
Average number of messages per second
0
1000
2000
3000
4000
5000
6000
7000
Mes
sage
s pe
r S
econ
d
ldal5004 ldal5005 ldal5006 ldal5007
Workload ManagementWorkload Management
5656
zLinux 1Chat Client/Server
zLinux 2Chat Client/Server
zLinux 3Chat Client/Server
zLinux 4Chat Client/Server
Start Server
16 JVMs
Start Client
Exit Test
Collect Results
NoNo YesYes
QuickdspShare 400
QuickdspShare 300
QuickdspShare 200
QuickdspShare 100
Test was run on all four guestssimultaneously to simulate multiple high priority workloads. Each guest was set at a relative share of 400 and quick dispatch.
Workload ManagementWorkload Management
5757
title
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0:01
0:02
0:03
0:04
0:05
0:06
0:07
0:08
0:09
0:10
0:11
0:12
0:13
0:14
0:15
0:16
0:17
0:18
0:19
0:21
0:22
0:24
0:26
0:28
0:29
0:30
0:31
0:33
0:34
0:35
0:36
Time
Tra
nsa
ctio
ns/
seco
nd
zLinux 1 zLinux 2 zLinux 3 zLinux 4
Workload ManagementWorkload Management
5858
zLinux 1Chat Client/Server
zLinux 4-20Chat Client/Server
Start Server
16 JVMs
Start Client
Exit Test
Collect Results
NoNo YesYes
Share 400
down
Test was run on one guest with 4 and 8 CPUs. 1-16 JVMs were started and each test was run with 2-16 Threads per JVM.
Workload ManagementWorkload Management
5959
Benchmark results in the Poughkeepsie benchmark center showed scalability to 8 processors and beyond for a single guest. The chart below shows that as the number of JVMs and the number of threads per JVM increases, scalability increases dramatically until the processor capacity reaches 100 percent as indicated by the red shade.
4 CP Scaling
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
1 2 4 8 16
NumJVMs
Me
ss
ag
es
pe
r S
ec
on
d
0
2
4
6
8
10
12
14
16
Nu
mb
er
of
CP
Us
2Thread 4Thread 8Thread 16Thread NumCPUs
Preliminary results
Workload ManagementWorkload Management
6060
Benchmark results in the Poughkeepsie benchmark center showed scalability to 8 processors and beyond for a single guest. The chart below shows that as the number of JVMs and the number of threads per JVM increases, scalability increases dramatically until the processor capacity reaches 100 percent as indicated by the red shade.
8CP Scaling
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
1 2 4 8 16
NumJVMs
Me
ss
ag
es
pe
r S
ec
on
d
0
2
4
6
8
10
12
14
16
Nu
mb
er
of
CP
Us
2Thread 4Thread 8Thread 16Thread NumCPUs
Preliminary results
Workload ManagementWorkload Management
6161
zLinux 1-3Chat Client/Server
zLinux 4-6Chat Client/Server
zLinux 7-20Chat Client/Server
Start Server
16 JVMs
Start Client
Exit Test
Collect Results
NoNo YesYes
Share 400
Share 200
Share 100
Test was run on all twenty guestssimultaneously to simulate multiple high priority workloads. Guests 1-3 share at 400, guests 4-6 share at 200 and the remaining guests share was at 100.
Workload ManagementWorkload Management
6262
Benchmark results in the Poughkeepsie benchmark center showed scalability to 8 processors and beyond for a multiple guests. The chart below contrasts a single guest (orange) 1-16 JVM with 2 Threads per JVM versus 20 guests with varying workloads.
8 CP Workload Scaling Across 20 Guests
0
20000
40000
60000
80000
100000
120000
140000
160000
Me
ss
ag
es
pe
r S
ec
on
d
0
20
40
60
80
100
120
% C
PU
Uti
liza
tio
n
High Medium Low NumCPUs % CPU 2ThreadDedicated
Preliminary results
Workload ManagementWorkload Management
6363
Benchmark results in the Poughkeepsie benchmark center showed scalability to 16 processors and beyond for a multiple guests. The chart below contrasts a single guest (orange) 1-16 JVM with 2 Threads per JVM versus 20 guests with varying workloads.
16 CP Workload Management
0
50000
100000
150000
200000
250000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Me
ss
ag
es
pe
r S
ec
on
d
0
20
40
60
80
100
% C
PU
Uti
liza
tio
n
High Medium Low Best 8CP 2Thread Run NumCPUs % CPU
Preliminary results
Workload ManagementWorkload Management
6464
Benchmark results in the Poughkeepsie benchmark center showed scalability to 16 processors and beyond for a multiple guests. The chart below contrasts varying workloads across 20 guests with 8 and 16 CPUs.
8 vs 16 CPU Scapability
0
50000
100000
150000
200000
250000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Me
ss
ag
es
pe
r S
ec
on
d
16 CP Scalability 8 CP scalability Log. (8 CP scalability) Log. (16 CP Scalability)
Preliminary results
Workload ManagementWorkload Management
6565
Virtualization in the Real WorldVirtualization in the Real WorldA customer experience
6666
Virtualization in the Real WorldVirtualization in the Real WorldA customer experience