RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe...

125
RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka For additional information visit www.totalsitesolutions.com

Transcript of RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe...

Page 1: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

RAMPS©RAMPS©

Reliability, Availability, Maintainability, Predictability,

Scalability

Reliability, Availability, Maintainability, Predictability,

Scalability

Presented by Joe Soroka

Presented by Joe Soroka

For additional information visit

www.totalsitesolutions.com

Page 2: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

RELIABILITY

AVAILABILITY

MAINTAINABILITY

PREDICTABILITY

SCALABILITY

While budgets may be tighter the requirement for maximum uptime has not gone away

The design of your facility is only one piece of the pie that will effect your site’s uptime

It is important that we are aware of how Reliability, Availability, Maintainability, Predictability and Scalability all affect your site’s uptime

Page 3: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

Reliability is the ability of a system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances

RELIABILITYRELIABILITY

Page 4: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

What is reliability?• Weibull• Markov Reward modeling

Modeling• IEEE Gold Book• Procedures: accurate, confirmed/tested

Equipment selection• Generator• UPS Systems• EPO Systems• Switchgear• Monitoring systems

For additional information visit www.totalsitesolutions.com

Page 5: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• Reliability• Reliability modeling • Equipment • Commissioning• Operations & maintenance

For additional information visit www.totalsitesolutions.com

Page 6: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• Bathtub curve of reliability– Infant mortality

• Burn in/load testing• Commissioning

– Useful life• Proper maintenance

– End of life• Identify and replace prior to entering this period

For additional information visit www.totalsitesolutions.com

Page 7: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• The reliability of a system is no greater than the weakest component in a system series

• In a complex system you need to identify and quantify the importance of each component in the system

• A reliability block diagram is a graphical representation of the components of the system and how they are related to reliability

For additional information visit www.totalsitesolutions.com

Page 8: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• Many of the reliability design ideas share a common philosophy with those recommended for availability

• This is because there is a very close relationship between reliability and availability

• While reliability is about how long an application runs between failures, availability is the ability of a system to tolerate failures and how long it is accessible to the users

• Obviously, when a system's components and services are highly reliable, they cause fewer failures from which to recover and thereby help increase availability

For additional information visit www.totalsitesolutions.com

Page 9: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• Major manufacturers– Past experiences– Local maintenance support– Parts distribution centers

• Fine line between leading edge and bleeding edge• Formal submittal review meetings

Equipment

For additional information visit www.totalsitesolutions.com

Page 10: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• Generator’s isolation valves• ATS bypass• TVSS indicators and alarms• Lightening protection• EPO systems

– Wiring– Control relays– Covers– Diagrams– Testing– Day 2 changes

Equipment

For additional information visit www.totalsitesolutions.com

Page 11: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• Generators– Redundant batteries– Battery monitoring– Fuel level monitoring– Water heater jacket isolation valves– Silicon heater hoses– Coolant level pre-alarms, both cores– Water separators (Racor Filters) with alarms– Engine diagnostic link

Equipment

For additional information visit www.totalsitesolutions.com

Page 12: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• UPS systems– Dual input– Maintenance bypass cabinet– Advanced monitoring– Battery monitoring – Redundant battery strings for VRLAs– Site specific procedures

Equipment

For additional information visit www.totalsitesolutions.com

Page 13: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• Automatic Transfer Switches (ATS) – Maintenance bypass or wrap around breakers– Phase sync monitoring– Pause Neutral/dual solenoids– Monitoring

• Transient Voltage Surge Suppression (TVSS) – Monitoring– Indication of operation– Surge counter

Equipment

For additional information visit www.totalsitesolutions.com

Page 14: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• EPO systems– Wiring in conduit and not open plenum– Control relay coils should not be energized until

activation– Secondary covers installed over the EPO buttons– Detailed and accurate schematics diagrams– System should be designed so it can be tested– System should be capable of making day 2

changes without risk– Part of an engineered drawing and not a cloud

saying “by others”

Equipment

For additional information visit www.totalsitesolutions.com

Page 15: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

• Thermal runway– Increase heat density

• Reduce time to thermal runway• Increase the need for a reliable HVAC system• Specialized HVAC systems • Possibly switching from emergency to UPS

power• Long UPS battery runtimes may be unclear

ReliabilityReliability

• Rack layout, equipment airflow direction– Cold/hot aisle– Enclosed hot aisles

• Type rack– Doors– Vents– Fans

Equipment

For additional information visit www.totalsitesolutions.com

Page 16: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• Water storage– Chilled water

• In the event of power outage or temporary chiller failure, do you have the capability to ride through

– Makeup water• How reliable is the city water supply• Do you have diverse sources• Water storage tanks• Well• Other water sources

Equipment

For additional information visit www.totalsitesolutions.com

Page 17: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

Commissioning – With each project being unique, there is a need to determine how much commissioning is appropriate for the project. Factors that influence this decision include:

ReliabilityReliability

• Building’s mission-criticality • Facility’s use or purpose• Complexity of the building’s systems• Building type and size• Project type, whether existing building

system or retrofit, or both• Building tenant or occupant

demographics• System reliability requirements• Owner’s objective in commissioning

the building; IAQ, system reliability and/or

energy efficiency• Project budget

Commissioning

For additional information visit www.totalsitesolutions.com

Page 18: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ReliabilityReliability

• Use a pilot/copilot approach Commercial airplanes do not fly with just one pilot - why would you

• Standardize as much as possible– Standard procedures– Standard process

• Use a Computer Maintenance Management System (CMMS)– Timely reports and schedules– Accurate information– Archive past performance– Instant access to information

Operation and Maintenance

For additional information visit www.totalsitesolutions.com

Page 19: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

Availability is the ability of a system to tolerate failures

Refers to the time that a system is available to its users

This means the process continues to be served through the failure and that, ideally, the failure is transparent to the user

AVAILABILITYAVAILABILITY

For additional information visit www.totalsitesolutions.com

Page 20: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

Availability Availability

• Availability• Design• Resources• Procedures

For additional information visit www.totalsitesolutions.com

Page 21: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Availability is typically expressed by the number of nines

• Downtime per year

Availability # of nines Downtime90% 1-nine 36.5 days/year99% 2 nines 3.65 days/year99.9% 3 nines 8.76 hours/year99.99% 4 nines 52 minutes/year99.999% 5 nines 5 minutes/year99.9999% 6 Nines 31 seconds/year

For additional information visit www.totalsitesolutions.com

Page 22: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Failures can be attributed to the following causes:

• Design failures– This class of failures takes place due to inherent

design flaws in the system. In a well designed system, this class of failures should make a very small contribution to the total number of failures

• Infant mortality – This class of failures cause newly manufactured

hardware to fail. This type of failure can be attributed to manufacturing problems like poor soldering, leaking capacitor etc.

– These failures should not be present in systems leaving the factory as these faults will show up in proper factory system burn-in tests

For additional information visit www.totalsitesolutions.com

Page 23: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Random failures– Random failures can occur during the entire life-

cycle of a system. These failures can lead to system failures. Redundancy is provided to recover from this class of failure

– • Wear out

– Once a hardware module has reached the end of its useful life, degradation of component characteristics will cause hardware modules to fail. These types of faults can be weeded-out by preventive maintenance and routing of hardware

For additional information visit www.totalsitesolutions.com

Page 24: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Designing systems with sufficient levels of redundancy

• Eliminating single points of failure• Availability design guidelines

– Consult your engineer– TIA Standard - TIA 942 – Uptime Institute – Tier Definition

Design

For additional information visit www.totalsitesolutions.com

Page 25: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

Design

• System design should have multiple paths– Active or passive, depending upon the site

reliability requirements– If redundant paths need to be VE? out to meet

the project budget, consider adding the breaker or valve now or later; when budget allows add the actual feed

– By adding the breaker or valve up front you will be able to install temporary cable or piping when an emergency arises

For additional information visit www.totalsitesolutions.com

Page 26: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

Design

• When performing maintenance, and decreasing the availability of system redundancy, move the reduction of availability away from the critical load and toward the utility as much as possible– i.e. If you had a system plus system design and you

are going to take the UPS out of service for maintenance, do not just open the UPS system and allow downstream dual cord devices and static transfer switch handle the loss of redundancy (?)

– Place the UPS in maintenance bypass to continually feed the second source with stable power

– Better yet, place the UPS on generators or alternate UPS supply to avoid sending unprotected utility power to the critical load

For additional information visit www.totalsitesolutions.com

Page 27: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

Resources

• Technical resources– Operation staff– Response staff– Maintenance & repair staff

• Parts– Onsite spares– Manufacturer spares– Vendor spares– Supply houses

For additional information visit www.totalsitesolutions.com

Page 28: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Operation staff– Whether you are using in-house or

contracted staff, it is important to ensure they have the proper resources

• Proper access to the facility• If using key card system what happens when

the card readers lose power? Who has the keys?

• Do you have all of your operation staff’s phone numbers

– Cell numbers and home numbers – Company and personal emails

Resources

Operation StaffOperation Staff

For additional information visit www.totalsitesolutions.com

Page 29: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Emergency response– Types of emergency responses

• Additional operation staff• Electrical, mechanical & plumbing contractors• General construction• Testing and repair firms• Fire and security• Hazardous material spill

– List of suppliers and vendors• Emergency contact information • Alternate contact information

– Contracts in place to execute after hours support

– Meet them before an emergency arises, have them at the site for lunch

Resources

Response StaffResponse Staff

For additional information visit www.totalsitesolutions.com

Page 30: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Do you have the necessary contracts in place?

• Is there maintenance your operation staff can perform in house?

• Do you have alternate contact numbers for your maintenance providers?

• Do they have proper access to the facility?• Do you have a second string waiting on the

sidelines in case of an emergency?

Resources

Maintenance & Repair StaffMaintenance & Repair Staff

For additional information visit www.totalsitesolutions.com

Page 31: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Parts and supplies– Define and assess critical parts– Stock critical parts onsite

• Have an annual budget for spare parts that increases a little each year

– Verify that your vendors and contractors have spare parts handy

– Identify supply houses and suppliers that have parts you need

– Have after hours phone number(s) to get parts from supply houses

– Have contracts in place and make sure they are active

Resources

PartsParts

For additional information visit www.totalsitesolutions.com

Page 32: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

Procedures

• Operation• Maintenance • Emergency• Troubleshooting

For additional information visit www.totalsitesolutions.com

Page 33: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Operation procedures– Have detailed procedures that are specific

to your developed site– Procedures should be tested and verified– Procedures should be inventoried and

updated regularly– Operating procedures should be placed at

the point of use and not locked-up in the building manger’s office

Procedures

OperationOperation

For additional information visit www.totalsitesolutions.com

Page 34: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Maintenance procedures– Have detailed procedures for maintenance– Ask your maintenance provider to furnish all of

the required maintenance procedures prior to performing maintenance, so you can review and comment on them

– Use detailed procedures during your maintenance activities

– Review procedures after the maintenance has been completed

Procedures

MaintenanceMaintenance

For additional information visit www.totalsitesolutions.com

Page 35: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Emergency procedures– In case of an emergency, where are your

procedures – Can you access them– Are they at multiple locations– During an emergency is not the time to try

to figure out how to restore a system– Perform dry runs on the procedures at least

once a year– Update and change, as required

Procedures

EmergencyEmergency

For additional information visit www.totalsitesolutions.com

Page 36: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

AvailabilityAvailability

• Manuals – Available– Correct

• Drawings– Available and complete– As-builts

• Develop troubleshooting flow diagrams

Procedures

TroubleshootingTroubleshooting

For additional information visit www.totalsitesolutions.com

Page 37: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

Maintainability is defined as the probability of performing a successful repair action or preventative maintenance within a given time

In other words, maintainability measures the ease and speed with which a system can be restored to operational status

MAINTAINABILITYMAINTAINABILITY

Page 38: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Design

• Equipment

• Staff

• Location

• Maintenance program

• Training

• Coordination

• Maintenance windows

For additional information visit www.totalsitesolutions.com

Page 39: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Goals of Maintainability– Maximize efficiency and accuracy of on-line

replacement of system components

– Facilitate and minimize troubleshooting time at each level of maintenance activity

– Allow test, checkout, troubleshooting and repair procedures to be unit-specific and structured to aid in identification of faulty units, then sub units

– Reduce downtime

– Provide easy access to malfunctioning components

– Allow for high degree of standardization

– Minimize time and cost of maintenance training

– Simplify new equipment design and shorten design time by using previously developed, standard building blocks

Design

For additional information visit www.totalsitesolutions.com

Page 40: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Equipment Access• Labeling• Minimize troubleshooting

time– Monitoring– Procedures– Standardization – Test and service points

Design

For additional information visit www.totalsitesolutions.com

Page 41: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

• Accessibility refers to the relative ease with which a system can be accessed

– Sufficient clearance to use the tools needed to complete the tasks

– Adequate space to permit convenient removal and replacement of components

– Adequate visual exposure to the task area

– Adequate safety and working clearances– Adequate space for required rigging

equipment – Adequate hallway, corner and door

clearances back to loading dock

Design

Equipment AccessibilityEquipment Accessibility

MaintainabilityMaintainability

For additional information visit www.totalsitesolutions.com

Page 42: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Equipment rooms should be designed so that rapid, safe and easy removal and replacement of malfunctioning components can be accomplished by one technician, when possible

Design

Ease Removal and ReplacementEase Removal and Replacement

For additional information visit www.totalsitesolutions.com

Page 43: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Labeling should:– Identify a specific device– Identify the purpose or function

of a specific device– Present critical information – Present safety Information– Should be legible– Should use contrasting colors

• Ensure that your labeling is controlled to ensure its accuracy and standardization

• Periodic inspections and examinations

Design

LabelingLabeling

For additional information visit www.totalsitesolutions.com

Page 44: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Comprehensive monitoring• Procedures• Standardization• Test and service points

Design

Minimize Troubleshooting TimeMinimize Troubleshooting Time

For additional information visit www.totalsitesolutions.com

Page 45: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

• Monitoring capabilities

– Event notification

– Event reconstruction

– Event mitigation

– Determine maintenance frequencies

– Allow for accurate and efficient

communication of events

Design

MonitoringMonitoringMaintainabilityMaintainability

For additional information visit www.totalsitesolutions.com

Page 46: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• What type of monitoring system do I need?– No monitoring

• Not recommended for any mission critical facility

– Remote Alarm Status Panel (RASP)• No trending or time stamping• Gives visual and auditable notification• Usually for one device or system

– Monitoring with dry contacts • Limited number of points• Limited time stamping • Status is either on or off

– Serial interfaces• Comprehensive data • Data points with values rather than on/off• Flexible and expandable

Design

MonitoringMonitoring

For additional information visit www.totalsitesolutions.com

Page 47: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

• Emergency Operating Procedures (EOP)– Developed for failure modes– Readily available for use – locate at point-of-

service– Should be developed and tested during the

commissioning phase– Detailed – switch level– Update any changes discovered

• Method Operating Procedure (MOP)– Developed for all operations– Detailed – switch level– Have back-out procedures included– Use with pilot/copilot approach– Update any changes discovered– Should be developed and tested during the

commissioning phase

MaintainabilityMaintainability

Design

ProceduresProcedures

For additional information visit www.totalsitesolutions.com

Page 48: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

• Trouble-shooting procedures– Trouble-shooting flow charts– Restoration procedures

• Maintenance procedures – Detailed procedures – Include measure points for future

trending– Used and completed during maintenance

MaintainabilityMaintainability

Design

ProceduresProcedures

For additional information visit www.totalsitesolutions.com

Page 49: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Common procedures error traps– In-field decisions– Vague instructions– Undefined or uncommon terms– Burdensome or complex

instruction– Multiple actions – Inconsistent statements or

actions– Misleading or missing critical

information– Interfacing with external

procedures– Lack of ownership– Lack of quality assurance review

Design

ProceduresProcedures

For additional information visit www.totalsitesolutions.com

Page 50: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Standardization ensures consistency and comparability of knowledge and parts– Acronyms

• Reduce confusion

– Manufacturers• Reduced spare part counts• Familiarization with operations and

maintenance

– Layouts• Reduce confusion• Increase ease-of-use

– Labeling • Reduce confusion

StandardizationStandardization

Design

For additional information visit www.totalsitesolutions.com

Page 51: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Test points provide a means for conveniently and safely determining the operational status of equipment and isolating malfunctions

• Test points, strategically placed, make signals available to the technician for checking, adjusting or troubleshooting

• Service points provide means for lubricating, filling, draining, charging and similar functions

Test and Service PointsTest and Service Points

Design

For additional information visit www.totalsitesolutions.com

Page 52: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• General principles for test and service points– Avoiding need for frequent

testing and service– Standardization– Test and service point

compatibility– Labeling dangerous test and

service compatibility– Distinctively different

connectors and fittings– Location of test, service and

adjustment points

Test and Service PointsTest and Service Points

Design

For additional information visit www.totalsitesolutions.com

Page 53: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Ordering the right accessories with your equipment can make a big difference when it comes to the maintainability of your equipment

• When ordering equipment or reviewing design documents, solicit input from your operations and maintenance staff involved

• It’s much cheaper to order it right the first time, than to upgrade it later in the field

EquipmentEquipment

For additional information visit www.totalsitesolutions.com

Page 54: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Water separators for fuel

• Radiator water level

• Isolation valves on water jacket heaters

• Generator-mounted circuit breakers

• Battery cables

• Battery monitoring

• Fuel-level monitor

Equipment

GeneratorsGenerators

For additional information visit www.totalsitesolutions.com

Page 55: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Annual infrared thermal scanning

• Protective relays

• Breaker testing

• PLC Code

– Hard copy

– Up-loadable copy

• Beware of small UPS systems

• Station batteries

• Internal cleaning

• Mimic bus

Equipment

SwitchgearSwitchgear

For additional information visit www.totalsitesolutions.com

Page 56: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Maintenance bypass– Order it with a maintenance bypass or

design the system to have a manually operated breaker bypass to wrap around the ATS to both sources

Equipment

Automatic Transfer SwitchesAutomatic Transfer Switches

For additional information visit www.totalsitesolutions.com

Page 57: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• AC filter capacitors– 3-5 years

• DC filter capacitors– 3-5 years

• Transfer circuits– Capture the transfer between UPS and

bypass

• Procedures– Detail PM procedures– Capture before and after readings

• Calibration/maintenance– Capture details– Don’t just do a “dust and clean” PM

Equipment

UPS SystemsUPS Systems

For additional information visit www.totalsitesolutions.com

Page 58: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• VLA (flooded)– Vented lead acid– Quarterly maintenance

• VRLA (sealed)– Valve-regulated lead acid– Semi-annual maintenance

• Float voltage• Room temperature• Proper maintenance• Water as required• Battery monitoring • Batteries found

– UPS systems– Generators– Switchgear– PLCs and breakers– Telecom equipment

Equipment

BatteriesBatteries

For additional information visit www.totalsitesolutions.com

Page 59: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Shutdown alarms– Identify and understand them

• EPO circuits– If used, is it maintainable?

• Monitoring– Main– Sub-panels– Branch circuit breakers

• Snap-in vs. bolt-in breakers– Use bolt-in breakers only

• Transformers– K-rated

Equipment

PDU’sPDU’s

For additional information visit www.totalsitesolutions.com

Page 60: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Permanently installed load banks

• Generator testing– Annual load test

– Troubleshooting

• UPS system testing– Annual load test

– Troubleshooting

• Paralleling gear– Set-up and calibration

– Troubleshooting

Equipment

Load BanksLoad Banks

For additional information visit www.totalsitesolutions.com

Page 61: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Alternate water source needs to be capable of supplying water, so that the primary water source can be removed for maintenance

• Usage metering should be on each water source

• Types of alternate water source– City water– Wells– Storage tanks

Equipment

Water SourceWater Source

For additional information visit www.totalsitesolutions.com

Page 62: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Alignment

─ Will reduce wear and tear on shafts, bearings and seals

─ Reduce vibration

─ Decrease current draw

• Bearings

─ Accessible grease fittings

─ Grease as required

• Infrared thermal scanning

─ Motor problems

─ Alignment issues

Equipment

PumpsPumps

For additional information visit www.totalsitesolutions.com

Page 63: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Temperature and humidity set points – Should be set the same

• Humidifiers – Have replacements for bulbs and

canisters• Filters

– Use a pre-filter in dirty locations– Make sure your dirty filter Differential

Pressure (DP) switch is set correctly• Alignment

– Proper alignment will reduce wear on the shaft and bearings

• Bearings– Grease when required– Infrared thermal heat scan

• Refrigerant leaks can activate fire alarms

Equipment

CRAH/CRACCRAH/CRAC

For additional information visit www.totalsitesolutions.com

Page 64: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Dispatched service– Verify your vendors qualifications as a

company– Request resumes of the people performing

work at your site– Review their technical aptitude – Verify your vendors training programs

• Onsite operation and maintenance staff– Verify that they are managed correctly (in-

house or contracted)– Verify your staff’s resumes and qualifications – Review their technical aptitude– Verify training programs

StaffStaff

For additional information visit www.totalsitesolutions.com

Page 65: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Location and access of valuable resources is important when situations arise– 3:00 am Sunday morning is not the time to

try to locate fuses required to get your site up and running

• There are various resources you should consider before the need arises;– Equipment– Technicians – Parts– Procedures– Manuals– Drawings

LocationLocation

For additional information visit www.totalsitesolutions.com

Page 66: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• It is important that your operation and maintenance staff is adequate and regularly trained

• When an emergency occurs they should have the confidence and experience to complete the task at hand– Available training methods;

• Self paced• Classroom• Web based• Manufacturer’s training• On-the-job training• Procedure development • Training module development• Test beds• Simulators

TrainingTraining

For additional information visit www.totalsitesolutions.com

Page 67: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Work activities – it is important to closely coordinate maintenance activities, to maintain a reliable, efficient and safe working environment

• During outage windows we have the tendency to plan too many activities at once. Make sure you don’t have too many people working in the same space at once

CoordinationCoordination

For additional information visit www.totalsitesolutions.com

Page 68: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Pay particular attention to planning of your maintenance activities – CRAC units – refrigerant leaks will activate the

fire systems; make sure you disable the fire system* prior to charging a system

– Under floor cleaning – can activate the fire alarm system; make sure you deactivate the fire alarm system* before you start to clean under the floor

– There are other maintenance activities and tests that could mistakenly set-off the fire alarm system

*When you disable a fire alarm system, make sure you follow the required procedures by OSHS, NFPA, local authorities, your company and your insurance underwriter. This could include, but is not limited to; additional fire extinguishers, posting fire watch, notification, special procedures, and tagging

CoordinationCoordination

For additional information visit www.totalsitesolutions.com

Page 69: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• Maintenance activities – If you are planning to transfer your UPS to a

generator maintenance bypass to perform maintenance on the UPS, PM the generator first

– If you are planning to perform an open transfer to the building electrical system, inspect your UPS batteries first

– Be aware of maintenance activities of building-wide systems that can effect the data center’s

• Chillers• Pumps • Electrical service

CoordinationCoordination

For additional information visit www.totalsitesolutions.com

Page 70: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

• Maintenance windows

• Downtime vs. reduced reliability

• Reduction in reliability

• Design system to have various maintenance capabilities

• Move away from critical loads and towards utility

Maintenance WindowsMaintenance WindowsMaintainabilityMaintainability

“Make sure you plan your maintenance windows carefully between IT and Facilities.”

For additional information visit www.totalsitesolutions.com

Page 71: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

MaintainabilityMaintainability

• IT maintenance windows are often loaded with IT tasks and therefore are not completely available for facilities tasks

• Need to clearly define the true window for facility maintenance– Maintenance window is midnight to 6 am

– IT takes an hour to shut down and an hour to start-up

– Real outage is limited to 1 am to 5 am

Maintenance WindowsMaintenance Windows

For additional information visit www.totalsitesolutions.com

Page 72: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

Predictability is the ability to detect the onset of a failed system before it happens

Predictive analysis can be performed by:• Reviewing PM data• Conducting failure analysis• Monitoring systems• Trending • Advance diagnostics

PREDICTABILITYPREDICTABILITY

Page 73: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Reviewing PM data– PM should not only be a time to complete

preventative maintenance tasks, but also be used as a diagnostic tool

– Use detailed PM guides and complete them so they can be reviewed later

– Review your PM task list and add additional items that can be used to perform predictive analysis

– Record before and after data. This is important to set baselines and conduct trending

For additional information visit www.totalsitesolutions.com

Page 74: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Conducting failure analysis– Event occurs– Complete an incident report

• Incident report should only contain facts of what happened during the event

– Stabilize the system– Repair the system

• Take accurate and specific notes• Take before and after readings• Document

For additional information visit www.totalsitesolutions.com

Page 75: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

Predictability Predictability

• Conduct root cause analysis– It is not necessary to prevent the first, or root cause from

happening– It is merely necessary to break the chain of events at any

point and thus final failure cannot occur

• Recommendations – Make recommendation to prevent future failures– Implement those changes in the failed system and other

similar systems – When the fault leads to an initial design problem,

redesign is necessary – Where the fault leads back to equipment failure, develop

ways to improve the component wear, quality and life– Where the fault leads back to a failure of procedures, it is

necessary to either address the procedural weakness or to install a method to protect against the damage caused by the procedural failure

For additional information visit www.totalsitesolutions.com

Page 76: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Monitoring systems– Install a monitoring system– Monitor as much as you can, as long as you

do something with the points you select– Know what you are monitoring and what

effects the points– Develop your point list to assist you in

predictive analysis – Comprehensive monitoring systems will

provide you with the best information

For additional information visit www.totalsitesolutions.com

Page 77: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Trending– Once your monitoring system is installed,

select key points to trend– Use your trends to develop replacement

and PM intervals – Items you can trend:

• Temperatures• Pressure• Flow rates• Usage

– Time– Consumption

• Load

For additional information visit www.totalsitesolutions.com

Page 78: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Advance diagnostic techniques– Infrared thermal imaging– Oil analysis– Coolant analysis– Fuel analysis– Ultrasonic analysis– Power quality testing – Battery impedance testing – Vibration testing– Motor analysis– Eddy current analysis – Laser alignment– Balancing

For additional information visit www.totalsitesolutions.com

Page 79: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Uses for an IR camera– Belt tension– Pump alignment– Bearings– Electrical connections– Turbo chargers– Roof leaks– Poor insulation– Room seals

For additional information visit www.totalsitesolutions.com

Page 80: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Infrared thermography – Is the process of developing visual

images that represent variations in the IR spectrum

– Any object that is above absolute zero omits IR energy

– IR spectrum is between 2.0 and 15 microns

– IR spectrum falls outside the range of the human eye

– IR cameras detect the temperature changes that can potentially mean the presence of conditions or stressors that act to decrease the life of the equipment design

– The IR camera can have many uses in a data center

Unless you are the Predator you will need to use an IR

Camera

For additional information visit www.totalsitesolutions.com

Page 81: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

Fuse ConnectionOverloaded Breaker

Loose Cable Defective Breaker

For additional information visit www.totalsitesolutions.com

Page 82: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

Pump AlignmentWater Under Roof

Tank Level Missing Insulation

For additional information visit www.totalsitesolutions.com

Page 83: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Oil analysis– Oil analysis is used to define three

basic machine conditions • Condition of the oil can determine

lubricate viscosity, acidity , etc.• Lubrication system condition: Have

physical boundaries been violated? i.e. fuel in oil

• Machine condition by looking for wear particulars

For additional information visit www.totalsitesolutions.com

Page 84: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Oil analysis– Oil condition is most easily determined by

measuring the viscosity, acid number and base number

– Additional tests can determine the presence and/or effectiveness of oil additives such as anti-wear addictiveness, antioxidants, corrosion inhibitors, and anti-foam agents

– Component wear can be determined by measuring the amount of wear metals such as iron, copper, chromium, aluminum, lead, tin and nickel, and can identify when a particular part is wearing

– Contamination is determined by measuring water content, specific gravity, and the level of silicon. Change in specific gravity typically indicates presence of other oil or fuel contamination

For additional information visit www.totalsitesolutions.com

Page 85: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

Metals Engines Gears

Iron Cylinder heads, rings, gears, crankshafts

Gears, bearings

Chrome Rings, liners, exhaust valves Roller bearings

AluminumPistons, thrust bearings, turbo bearings, main bearings

Pump, thrust washers

Nickel Valve plating, steel alloy from crankshaft, camshafts

Steel alloy from roller bearings

CopperLube coolers, main and rod bearings, bushings, turbo bearings

Brushings, thrust plates

Lead Main and rod bearings, bushings, lead solder

Bushings, grease contamination

Tin Piston flashing, bearing overlays, bronze alloy

Bearing cage metal

Silver Wrist pin bushings, silver solder from lube coolers

Silver solder from lube coolers

Titanium Gas turbine bearings. Hubs, turbine blades

N/A

For additional information visit www.totalsitesolutions.com

Page 86: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Coolant analysis– Regular coolant testing and routine maintenance

can help you achieve maximum system efficiency and save you time and money in less downtime

– A cooling system is subject to pitting, corrosion, cavitations, erosion and electrolysis

– Although coolants are formulated to help prevent these problems from occurring, coolant analysis will identify if they are present and determine if the coolant you're using is providing adequate protection

For additional information visit www.totalsitesolutions.com

Page 87: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Fuel analysis– Fuel analysis can point to solutions for filter

plugging, loss of power or poor injector performance

– Testing bulk fuel storage tanks can verify compliance with required supplier specifications

For additional information visit www.totalsitesolutions.com

Page 88: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Ultrasonic inspection– Ultrasonic or ultrasound are sound waves above

20kHz to 100kHz that can not be heard by humans– Unlike IR, ultrasound travels a short distance from

the source– Ultrasonic detectors can be used to detect

component wear, fluid leaks, vacuum leaks and steam trap failures

– Even though such a leak may not be audible to the human ear, ultrasound will still be detectable with the appropriate tool

For additional information visit www.totalsitesolutions.com

Page 89: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Pressure and vacuum leaks can occur in various locations – Compressed air– Heat exchangers– Boilers– Condensers– Tanks– Pipes– Valves– Steam traps

• Ultrasonic inspections can detect these small leaks

For additional information visit www.totalsitesolutions.com

Page 90: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Mechanical systems suffer from wear through constant operation, and ultrasonic inspection can detect wear in these systems

• Mechanical applications– Bearings– Lack of lubrication– Pumps– Motors– Gear/gearboxes– Fans– Compressors

For additional information visit www.totalsitesolutions.com

Page 91: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Mechanical devices are not the only devices that omit ultrasonic sound. Electrical equipment will also generate ultrasonic waves if arching, tracking or corona are present

• Electrical applications– Arching, tracking and corona– Switchgear– Transformer– Insulators– Circuit breakers

For additional information visit www.totalsitesolutions.com

Page 92: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Power quality testing– Hardware and software are

frequently blamed for all types of problems that may actually originate from within your building’s electrical distribution system; poor power quality

– In many cases, the number one indication that you have a power quality problem is intermittent, unexplained technology equipment or process failures

– Responding service technicians may complete a work report with the words “no trouble found"

For additional information visit www.totalsitesolutions.com

Page 93: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Impedance testing– A substitute to performing a full

load test– The internal resistance of a cell

can be determined by how that cell responds to a momentary load

– The instantaneous voltage drop and load current applied are used to calculate the resistance

– Most cell testers can check the impedance with the battery online or offline

For additional information visit www.totalsitesolutions.com

Page 94: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Vibration analysis– The level and frequency of the

vibration of rotating machinery are not distinguishable to the human touch

– Can be used to discover and diagnose a wide range of problems related to rotating equipment

For additional information visit www.totalsitesolutions.com

Page 95: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Vibration monitoring can detect;– Unbalance– Eccentric rotors– Misalignment– Mechanical looseness or

weakness • Types of systems that vibration

analysis should be performed on;– Generators– Cooling tower fans– Chillers– Pumps– CRAH/CRAC– Air handlers

For additional information visit www.totalsitesolutions.com

Page 96: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Tests used to perform motor analysis – Infrared – Vibration analysis– Surge comparison – Motor current signature comparison

• Motor faults or conditions can be detected– Winding short circuits– Open coils– Improper torque settings– As well as other mechanical problems

For additional information visit www.totalsitesolutions.com

Page 97: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

Predictability Predictability

• Types of motor analysis– Surge comparison testing

identifies insulation deterioration by applying a high frequency transient surge to equal parts of a winding, and by comparing the resulting voltage waveform

– Motor Current Signature Analysis (MCSA) provides a non-intrusive method of detecting mechanical and electrical problems

For additional information visit www.totalsitesolutions.com

Page 98: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Eddy current analysis– Detects surface and subsurface

defects– Detects variations in alloy, heat

treatments, hardness, structure and other physical metallurgical conditions

– Should be done on chillers each year when the tubes are being cleaned

For additional information visit www.totalsitesolutions.com

Page 99: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Alignment inspection – Shafts and pumps should have the proper

alignment, and is best accomplished by using laser alignment

– When machines are improperly aligned there are added loads to the bearings and couplings which can result in early and unplanned failures

For additional information visit www.totalsitesolutions.com

Page 100: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

PredictabilityPredictability

• Balance– Reduce wear and tear on

bearings, shafts and motors– Can be detected with the use

of infrared cameras and vibration meters

– Requires balancing equipment to verify and correct balancing

For additional information visit www.totalsitesolutions.com

Page 101: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

SCALABILITYSCALABILITY

Scalability is a desirable property of a system which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged without impact to operations

For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added

Page 102: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• What do we want… a flexible, scalable, reliable, highly performing, and highly available computer infrastructure that adapts to a wide range of continuously evolving and challenging demands

For additional information visit www.totalsitesolutions.com

Page 103: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

• Requirements analysis• Basis of Design (BOD)• Design

– Modular approach– Avoid excessive equipment– Pay as you go

• Expansion techniques

What does it take?

ScalabilityScalability

For additional information visit www.totalsitesolutions.com

Page 104: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• Good planning and decisions are the foundation of a highly scalable facility

• At no point in the lifecycle of a mission-critical facility can you have greater impact on scalability then during the design phase

• Start with a Requirements Analysis (RA) of your data center needs

• Use the results of your RA to develop a Basis of Design (BOD)

• The RA and BOD are living documents and you need to update them as changes occur

For additional information visit www.totalsitesolutions.com

Page 105: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

• Requirements analysis

– Growth modeling takes the hardware platform requirements and turns them into space, power and cooling requirements

– Considers both current and future technology impacts on space, power and cooling

– Typically done for 3+ year planning

– This leads to the critical infrastructure’s BOD

Requirements AnalysisRequirements AnalysisScalabilityScalability

For additional information visit www.totalsitesolutions.com

Page 106: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

• Roadmap to a reliable and quality-designed site

• More often then not, the BOD is lacking in detail

• Define the requirements of the site

• Defines the reliability, availability, maintainability, scalability and operational parameters

• Should be updated regularly

Basis of DesignBasis of DesignScalabilityScalability

For additional information visit www.totalsitesolutions.com

Page 107: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• Designing with scalability in mind• Scalability

– Reduced initial cost – Reduced time to install equipment – Reduces the requirements of purchasing

large systems – Not an advantage for fast-growing facilities

• Modular design can be more precisely matched to reflect;– Lower capital investment “Pay as you go

approach”– Budget/capital constraints– Controlled growth– Unanticipated growth

For additional information visit www.totalsitesolutions.com

Page 108: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• Equipment rooms– When possible, design equipment rooms with

space for expansion– Design hallways, corridors and doors to allow

access for new equipment – Conserve wall space for future panels and

equipment

For additional information visit www.totalsitesolutions.com

Page 109: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• Switchgear– Expansion breakers– Expansion cells– Be aware of bussing configuration, use fully-

rated bus throughout– Use larger frame breakers with adjustable trips – Have expansion in your Programmable Logic

Controller (PLC)• Have access to programming codes• Have current backup

For additional information visit www.totalsitesolutions.com

Page 110: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• UPS systems – Size parallel cabinet and static switch for full

build-out– If modules are upgradeable, size feeders to full

build-out– If equipped with sync control cabinet, size for

full build- out

• Remember– When you start to add more then 3 modules in

parallel, the redundancy begins to drop

For additional information visit www.totalsitesolutions.com

Page 111: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• Critical distribution– Dual main input

• Allow for the possibility of a second source to supply load during cutover or expansion activities

• Could be used to connect temporary equipment for emergencies

• Load bank testing

– Spare breakers • Allow for additional PDU and

expected new load • Up-frame the breaker so that larger

loads may be added– i.e. use 400A frame breakers with

225A rating plugs to power PDUs

For additional information visit www.totalsitesolutions.com

Page 112: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• Power Distribution Units (PDUs)– Typically you run out of circuits before

capacity– Install junction box below floor to allow for

additional power whips. Bottom plates usually do not have enough knock-out

– Order PDU’s with additional 225A sub-fed breakers to support additional Remote Power Panel (RPP)

– Consider in-row PDU’s to save space

For additional information visit www.totalsitesolutions.com

Page 113: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• EPO systems– Plan on the fact that the EPO system will have

items added and removed from it– EPO should be an engineered device and not a

cloud stating ”by others”– System should be documented– Should have an Active, Test and Off mode of

operation– Installed with isolation relays – Centrally located in an EPO control cabinet

with room for expansion

For additional information visit www.totalsitesolutions.com

Page 114: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• Chilled water systems– When possible, up-size piping– Have additional valves installed under the

floor so you can add CRAH units as needed– Have valves installed for additional pumps and

chillers – Have a valve connection that can be easily

hooked-up to a temporary chiller

For additional information visit www.totalsitesolutions.com

Page 115: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• Monitoring systems– Make sure that the system is expandable– Some systems are not up-gradable, while others

require adding another module to the communication trunk

– Make sure you will not be locked in with an uncooperative manufacturer

– Have access to the programming function and required passwords

For additional information visit www.totalsitesolutions.com

Page 116: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

ScalabilityScalability

• Expansion techniques– Implementation of new systems while

the facility is in “production” is a business reality

– The need for hot cutover occurs more often. For safety reasons, hot cutover should be a last resort

– With proper upfront planning, the need for hot taps and cutovers can be reduced or eliminated

For additional information visit www.totalsitesolutions.com

Page 117: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

UPTIMEUPTIME

Uptime (Ŷ) is a measure of the time a system has been "up“, running and available. It came into use to describe the opposite of downtime, times when a system was not operational

ρ = Reliability

ά = Availability

ц = Maintainability

∏ = Predictability

∑ = Scalability

Page 118: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

RELIABILITY

AVAILABILITY

MAINTAINABILITY

PREDICTABILITY

SCALABILITY

Reliability (ρ) is the ability of a system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances

Page 119: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

RELIABILITY

AVAILABILITY

MAINTAINABILITY

PREDICTABILITY

SCALABILITY

Availability (ά) is the ability of a system to tolerate failures

Refers to the time that a system is available to its users

This means the process continues to be served through the failure and that, ideally, the failure is transparent to the user

Page 120: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

RELIABILITY

AVAILABILITY

MAINTAINABILITY

PREDICTABILITY

SCALABILITY

Maintainability (ц) is defined as the probability of performing a successful repair action or preventative maintenance within a given time

In other words, maintainability measures the ease and speed with which a system can be restored to operational status

Page 121: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

RELIABILITY

AVAILABILITY

MAINTAINABILITY

PREDICTABILITY

SCALABILITY

Predictability (∏) is the ability to detect the onset of a failed system before it happens

Predictive analysis can be performed by:

– Reviewing PM data– Conducting failure analysis– Monitoring systems– Trending – Advance diagnostics

Page 122: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

RELIABILITY

AVAILABILITY

MAINTAINABILITY

PREDICTABILITY

SCALABILITY

Scalability (∑) is a desirable property of a system which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged

For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added

Page 123: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

UPTIMEUPTIME

ρ * ά *ц * ∏ * ∑ = Ŷ

Page 124: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

RELIABILITY

AVAILABILITY

MAINTAINABILITY

PREDICTABILITY

SCALABILITY

Be sure to look at more than just the design of your facility…

don’t miss a step. Use RAMPS to achieve maximum uptime!

Page 125: RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

RAMPS©RAMPS©

Reliability, Availability, Maintainability, Predictability,

Scalability

Reliability, Availability, Maintainability, Predictability,

Scalability

Presented by Joe Soroka

Presented by Joe Soroka

For additional information visit

www.totalsitesolutions.com