Designing Requirements

12

Click here to load reader

Transcript of Designing Requirements

Page 1: Designing Requirements

Oracle System Performance Group Technical Paper, January 3, 1996

Designing Your System to Meet Your Requirements

Cary V. MillsapOracle Corporation

January 3, 1996

The technical architect of an application system is responsible for building a system that meets thegoals of the end users. To succeed, this person must combine the right hardware componentswith suitable application architectures, always obeying complex functional, operational, and eco-nomic constraints. This job is complicated. A noticeable lack of good tools, methods, andexperience have made it all the more difficult. Yet an inadequate technical architecture will dooman application.

Oracle’s success in the mainframe downsizing market has begun to produce tools, methods, andexperience that tremendously reduce technical architecture risk. This paper will identify factorscritical to the success of a technical architecture design project. We will discuss successful meth-ods used at the most demanding relational database projects in the world, and we will tell youhow to use those methods to make your application succeed.

Permission to copy without fee all or part of this material is granted provided that the copies are not made or dis-tributed for direct commercial advantage, the Oracle Corporation copyright notice and the title of the publicationand its data appear, and notice is given that copying is by permission of Oracle Corporation. To copy otherwise, orto republish, requires a fee and/or specific permission. 1996 Oracle Corporation. No Oracle part number has yet been assigned.

Page 2: Designing Requirements

2 • Cary V. Millsap

Oracle System Performance Group Technical Paper, January 3, 1996

Contents1. INTRODUCTION2. SPECIFYING YOUR REQUIREMENTS

2.1 Service Level Agreement (SLA)2.2 Making Your SLA

3. UNDERSTANDING YOUR TECHNOLOGY3.1 Functional Requirements3.2 Operational Constraints3.3 Economic Needs

4. DESIGNING YOUR SYSTEM4.1 Disk4.2 Memory4.3 CPU4.4 Network

5. CONCLUSIONS6. REFERENCES

1. IntroductionJust about every vendor in the world has thewords “satisfied customer” somewhere in itsmission statement. The Oracle Server-basedsystem you paid for must satisfy you. But whatis satisfaction?

Satisfaction is something you usually just know“when you see it.” However, with expensivethings, like military aircraft or computer sys-tems, “I’ll know it when I see it” usually isn’t asolid enough basis upon which to build a busi-ness relationship. Vendors and customers tendto see satisfaction differently when there’s a lotof money involved.

Whatever satisfaction is, at the very least it’s anabsence of problems; and problem is a word thatscience has a grip on. A problem is a perceiveddifference between expectation and reality.1Satisfaction then is an acceptably small per-ceived difference between expectation andreality.

There’s one troublesome word in there. Per-ceived is one of those subjective words thatengineers try not to use, and that you don’tusually find in contracts. Perception is theproblem when you think your system is slow,and your vendor thinks it is fast. Your com-puter system is important enough to yourbusiness that you can’t tolerate subjectivemeasures of its success.

1 Thank you, Dr. Ray Quiet, for giving me this tooland several others in a fall 1981 episode of introduc-tory sociology.

2. Specifying Your RequirementsThe way to fix this problem is to form a cus-tomer-vendor agreement up-front on whatspecifically constitutes satisfaction. The moremeasurable your specifications are, the less sub-jective your estimates of satisfaction will be. Tomeasure satisfaction objectively, you must dothe following:

1. First, identify the important attributes ofyour system. Perhaps for your system tosucceed, it must process 50 on-line ordersper minute at peak times. Perhaps yourbusiness could not tolerate an unplannedsystem outage of more than four hours. Youmust identify your system’s critical successfactors (CSFs). You probably already havean idea about what they are.

2. Second, document your expectations inprecise detail for each CSF. By specifyingyour expectations in detail, you will be ableto compare these expectations with yourmeasurements of reality.

3. Third, specify your tolerance for deviationfrom these expectations. People are alwaysmore tolerant in some areas than others, soa blanket “±10%” clause, for example,probably isn’t good enough.

4. Fourth, measure reality for each CSF. Themost reliable measures of reality are actualhands-on measurements of a real-live sys-tem. However, if we haven’t built our

dissatisfaction

satisfaction

tolerance

expectation

reality

Figure 1. You can envision satisfaction as an area oftolerance around your expectation. If reality is outsidethat satisfaction boundary, then you’re not satisfied.To measure satisfaction, you must know four things:your expectation, your reality, the distance betweenthose two points, and your tolerance of that distance.

Page 3: Designing Requirements

Designing Your System • 3

Oracle System Performance Group Technical Paper, January 3, 1996

system yet, we must rely on mathematicalmodels, prototypes, and measurements ofexisting systems resembling the one wehope to build.

5. Fifth, measure the reality-expectation dif-ference for each CSF (step 4 minus step 2).This step is simpler if you’ve stated yourexpectations in the same units reported onby your system management tools used toproduce your measurements of reality.

6. And last, compare your reality-expectationdifference (step 5) to your satisfaction toler-ance (step 3). If your system meets all ofyour requirements, then you are in goodshape; otherwise, you either have to tunesomething, relax a requirement, or buymore stuff.

2.1 Service Level Agreement (SLA)

A service level requirement is a formally specifiedend-user expectation about your system. Aservice level agreement, or SLA, is a collection ofservice level requirements that have been nego-tiated and mutually agreed upon by yourinformation providers and your informationconsumers (Figure 2).

A useful SLA is a contract with three attributes:

• Structure—Your SLA mustn’t leave any-thing out; no service level requirement thatwill be important either to the service or-ganization or its customers may be omitted.

• Precision—To measure satisfaction objec-tively, each service level requirementshould be expressed in the same units thatwill be used to measure the real system.

• Feasibility—The collection of requirementsmust obey the physical laws of nature;service level requirements must not con-tradict one another or lie outside theconstraints of your technology.

2.1.1 StructureStructure allows both the service organizationand its customers to manage risk. Most riskcomes in categories that the designers of a sys-tem just never considered. For example, Oraclecustomers who never consider their system up-time and downtime requirements before theybuy their hardware later have to rely on luck tomeet their “availability requirements.”

Your computer system requirements and con-straints come from three directions:

• Functional requirements—End-users specifyfunctional requirements. These includespecification of the kinds of tasks the sys-

hardwarevendors

applicationsvendors ordevelopers

DBMSvendor

informationservices

department

systemmanagementtools vendors end-user

departments

OE

sales

GL

marketing

Figure 2. The relationships among providers and consumers of information make your information services de-partment the arbitrator among the groups involved in your system. One function of this department will be tounderstand user pain and then adjust your technology cost-benefit trade-offs to remove that pain without creatingintolerable pain elsewhere. If your users do not understand the costs of technology benefits, then the job is all themore difficult. The SLA construction process benefits your business by educating you and your users early aboutthose trade-offs.

Page 4: Designing Requirements

4 • Cary V. Millsap

Oracle System Performance Group Technical Paper, January 3, 1996

tem must perform, how long those taskscan take, and at what times the system mustbe on-line to perform them.

• Operational constraints—Your system man-agers specify operational constraints likethe schedule of planned downtime, themethods for requesting repair of unex-pected functional or operational errors, andlimits on throughput required to meet userresponse time demands.

• Economic needs—When the functional re-quirements and operational constraintsdon’t meet, your economic needs determinethe trade-offs. Economic needs are usuallynegotiated at the executive level wherecompany strategy and spending authoritymeet [Boar].

2.1.2 PrecisionYour SLA will be a valuable system design toolif you specify your expectations in units thatyou can measure and compare to vendor specsheets. For example, “50 orders per minute” atyour company doesn’t equal “50 orders per mi-nute” at my company because we use differentapplication setup options, and we have differ-ent numbers of lines per order. Hence there’s nospec sheet that will reliably tell you how big acomputer to buy to process “n orders a min-ute.” However, if you can convert your“n orders per minute” to disk read and writecalls per second, bytes of memory per user, andPentium CPU milliseconds per order, then youhave information to help design a system.

2.1.3 FeasibilityThe final necessary attribute of an SLA is fea-sibility. Building a feasible SLA requirescommitment of both the information providersand the information consumers to engage in thecost-benefit analysis.

Without technology advice, users tend to over-constrain their requirements. For example, us-ers commonly specify without flinching thattheir database system must have full “7 x 24”availability (up 7 days a week, 24 hours a day),not realizing that even 99.8% of true 7 x 24 ac-cess would require geographically replicatedmulti-way clusters of perfectly designed appli-cations with three or more very smart and veryexpensive system administrators working 8-hour shifts through the night at each site using

very expensive custom-designed monitoringtools. They don’t realize that sacrificing just onedown weekend night each week would saveprobably 95% of the cost.

On the other hand, when you put technologistsin a room without users, they tend to over-engineer systems. I once sat in a four-hour user-free meeting in which a bunch of us smart guysadded over a million dollars to proposed sys-tem’s price trying to meet a strict end-userthroughput requirement to run several batchprograms in a peak time window. After threeand a half hours, our user manager joined usand said that if that’s what it would cost, he’drun the batch load at night. End of meeting.

Your business and your technology will imposeconstraints upon your system. Creating a feasi-ble SLA requires time from the people whoknow how to negotiate the trade-offs amongthose constraints.

2.2 Making Your SLASLA construction is a negotiation process, notso much among groups of people as betweenone team of people and the immutable conser-vation principles of physical law that require acost for every benefit. In the process, you try tosacrifice only those things that you don’t reallyneed so that you can afford all those things youreally do need. People who buy groceries haveto do this every day. So do computer systemarchitects.

2.2.1 ParticipantsThe SLA is a binding contract between infor-mation providers (like your informationservices department and your application de-velopment departments) and informationconsumers (like your end users’ departments).To create an SLA that works, you need dedi-cated participation from the following people:

• System architect—The project leader for SLAconstruction is usually an experienced sys-tem architect, either from your staff or froman outside consulting firm. This leader isresponsible for managing the SLA con-struction project or sub-project.

• Operations manager(s)—These participantsmust have the authority to make cost-benefit compromise decisions for the entireinformation services department. Theymust understand the economic implications

Page 5: Designing Requirements

Designing Your System • 5

Oracle System Performance Group Technical Paper, January 3, 1996

of user requirements upon the operationalcomplexity of the system.

• End-user manager(s)—These participantsmust have the authority to make cost-benefit compromise decisions for the entireend-user community. They must under-stand the economic implications of systemoperational restrictions on the business.

• Oracle Server technician(s)—These partici-pants must understand the power andlimitations of the Oracle Server systemupon which your applications are made.They must understand the economic impli-cations of your proposed throughput,availability, volume, load, and data localityrequirements on the server.

• Application technician(s)—These participantsmust understand the technical architectureof the applications being considered. Thesecould be vendors, end-users, or peoplefrom your own applications developmentdepartment. They must understand theeconomic implications of your requirementsupon the design (or redesign) of the appli-cations.

• Hardware technician(s)—You may or may notrequire dedicated hardware specialists forthe duration of the process, but throughoutthe SLA construction project, you will re-quire information about CPU, disk,memory, bus, and network capacities of anyhardware being considered.

Building an SLA will require dedicated timefrom some of your company’s busiest and mostimportant people. Your return on investmentwill be the information you need to optimizeyour trade-off decisions as you design your sys-tem. A good SLA gives you the specific, realisticspecifications and tolerances you need to meas-ure your information services department’ssuccess.

2.2.2 FormatAs you build your SLA, remember the give-and-take, trade-off orientation of a cost-benefitanalysis. For each category of service providedto your end-users, ask the questions:

• What must the end-users expect of the informa-tion services department?

• What must the information services departmentexpect of the end users?

Technical responsecommitment

What the end-usersmust expect

What the IS departmentmust expect

Fast responsive-ness

Lowest possible cost by intelligent useof resources

Funding for sufficient high-performancehardware

Response times guaranteed to fall belownegotiated limits, as a function of re-quired throughput

Resource consumption rates guaranteedto fall below negotiated limits, as afunction of response time requirements

High availability Uptime performance guaranteed tomeet or exceed negotiated limits, as afunction of business need

Realistic uptime requirements that al-low for scheduled maintenance

Funding for fault-resilient hardware,software, procedures, and people

Downtime guaranteed to fall below ne-gotiated limits, as a function of businessneed

Realistic downtime requirements thatallow for unscheduled repairs

Transaction durations guaranteed to fallbelow a negotiated limit

Figure 3. A tool to assist you in constructing your SLA is a worksheet like the one shown here. The format mayremind you of a T-account worksheet if you have some accounting in your background. The principal idea of thisformat is to encourage the matching of complementary expectations among the departments involved. You mayneed another worksheet like this to help construct the service level requirements between your IS department andyour application development department, another for your IS department and your hardware vendors, and so on.

Page 6: Designing Requirements

6 • Cary V. Millsap

Oracle System Performance Group Technical Paper, January 3, 1996

The worksheet in Figure 3 shows some of thetrade-offs that should be communicated in theSLA.

There is a comprehensive SLA document de-scription and outline in [Kern and Johnson]. Thetechnical topics you will see in this paper fallpredominantly into the “Technical ResponseCommitments” section of that outline.

3. Understanding Your TechnologyTo create a feasible SLA, you must understandyour technology. Is it feasible to “require” thatyour system be as fast when a thousand usersare logged in as when you’re the only one onthe system? Maybe, maybe not. Is it feasible to“require” that your Oracle Server system neverbe down longer than an hour when one of yourdisks crashes? Maybe, maybe not. The follow-ing sections will help you understand thefactors involved in determining whether re-quirements like these are feasible for yoursystem.

3.1 Functional RequirementsPeople who build or buy software generallyunderstand very well that some of the most im-portant selection criteria have to do withwhether the application does the things it’s sup-posed to do. An accounts payable system has towrite checks properly or you wouldn’t evenconsider it. An electronic mail system has tohandle unexpected system outages withoutlosing messages, or you wouldn’t consider it.

We will not discuss these kinds of functionalconstraints in this paper, but we will focus onperformance, a very important functional re-quirement that is often overlooked becauseforecasting performance is a difficult technicalchallenge.

3.1.1 PerformanceAn on-line user measures interactive perform-ance as response time, the time required to seeyour information after pressing the “enterquery” key, or the time it takes to regain controlof your application after pressing the “commit”key. A user has different response time expec-tations for different application functions. Forexample, interactive validation of an Oracle Ac-counts Payable accounting flexfield shouldhappen almost instantaneously, but a GeneralLedger trial balance report may perform ac-

ceptably well if it prints within a half hour ofwhen you submit it.

Total response time for any system componentis the sum of the time consumed by that com-ponent to execute your request (called theservice time) plus the time spent waiting (calledqueueing delay) for that component’s attention.For example, if you request to read a byte on adisk drive that is busy serving another request,your request must wait for that request to fin-ish. Your total response time is the time spentdoing your work, plus the time spent waiting.The waiting gets worse as the popularity of theresource increases.

Page 7: Designing Requirements

Designing Your System • 7

Oracle System Performance Group Technical Paper, January 3, 1996

The branch of mathematics called queueing the-ory gives us some of the tools we need tocompute expected wait times. Sophisticatedformulas in excellent texts like [ J ain] and[Menascé et al.] can forecast your responsetimes if you can feed them your service timesand your throughput requirements. Oracle con-sultants use these formulas with our systemmeasurement tools in our system design andcapacity planning projects (Figure 4).

An application end-user response time is thesum of several component response times, in-cluding:

presentation managementCPU service timeI/O service timequeueing delays (for multi-user clients)

networkservice timequeueing delay

transaction processing monitorservice timequeueing delay

database serverCPU

service timequeueing delay

I/O

service timequeueing delay

Understanding this list will help you createfeasible performance requirements in your SLA.

• Presentation management—Will your end-users’ PCs have fast enough CPUs and I/Osubsystems to run your application front-end? The easiest way to predict this is totest and measure a prototype application.

• Network—Will your network give you fastenough communication between the appli-cation front-end and the database server?Will your network have sufficient through-put capacity to meet your response timerequirements as concurrency hits peak lev-els?

• Transaction processing monitor—Using a TPmonitor will increase the number of in-structions that your system must executefor each transaction (your code path). Willthe resulting service times grow your totalresponse times beyond your tolerances?You may have to make up the difference bybuying faster network or database serverhardware.

Figure 4. This is a screen shot from the queueing model that Oracle consultants use to forecast response time as afunction of throughput. Model parameters include the unloaded service time and the response time requirementfor the component being modeled.

Page 8: Designing Requirements

8 • Cary V. Millsap

Oracle System Performance Group Technical Paper, January 3, 1996

• Database server—Will your database serverbe fast enough to meet your response timerequirements? Different CPU architecturesare optimized for different requirements.For example, fast single-processor systemsexecute large single-threaded transactionsquickly, but symmetric multiprocessing(SMP) computers generally handle higherOLTP throughput. What about I/O? Willyour disk drives be fast enough to meetsingle-user response time requirements?Will you have enough controllers anddrives that performance won’t degrade un-der peak loads?

Application design trade-offs pervade each ofthese areas. You will save money on clienthardware if you use a simpler character-basedinterface instead of a graphical user interface(GUI)—but your best economic benefit may beto spend money to improve user productivity.You may be able to save money on networkhardware by configuring your application togenerate less network traffic—but your besteconomic benefit may be to ensure fast, reliableremote access to data. You may be able to savemoney on database server hardware by tuningyour application to reduce code path length andI/O call counts—but new hardware may costless than tuning your application.

If you’ve ever tried to compress a water balloonwith your hands, then you have a good mentalimage of how you can squeeze here or you cansqueeze there, but you can’t squeeze every-where at once. You have to live within yourtechnology’s constraints. You can’t compresswater, and you can’t get more out of your sys-tem than its peak capacity. You’ve got to decidewhat your goals are, understand your trade-offs, and invest intelligently.

3.2 Operational Constraints

It’s easy to list the things you want—it’s harderto list what you can afford. Operational realitiesof technology constrain your functional re-quirements. Accurately documentedoperational constraints will keep the functionalrequirements in your SLA from raging beyondwhat your information services department canactually provide.

To identify operational constraints accurately,you need experienced people who understandthe complex operational requirements of Oracleapplications. In this section we will discuss

some of the operational constraints you mustknow to construct a feasible SLA.

3.2.1 AvailabilityAn operations manager measures system avail-ability with two quantities, MTTF and MTTR.Mean time to failure (MTTF) is a measure of howlong a system or component is on-line andready for action. Mean time to repair (MTTR) is ameasure of how long it takes to repair and re-start a system or component once it goes down.MTTF is average uptime, and MTTR is averagedowntime.

To specify structured, precise, and feasibleavailability requirements, you must understandall the ways your system can fail [Gray andReuter].2 For example:

environmentfire, flood, network, electrical failure

hardwarephysical device failure (CPU, disk, etc.)

maintenancehardware repair errors

operationsconfiguration, administration errors

softwareO/S, DBMS, and application bugs

processdata entry error, labor disputes, etc.

For each of the faults that can bring your systemdown, you must understand the process re-quired to recover from that fault before you cancreate a feasible downtime requirement. Andyou must understand the MTTF for each ofyour system’s components before you can cre-ate a feasible uptime requirement.

Let’s walk through the recovery process for anexample unplanned outage. Let’s explore diskdrive failure. How often can you expect this tooccur, and what is the expected impact of eachoutage?

The MTTF of a non-RAID disk farm is easy tocalculate once your hardware vendor tells youthe MTTF rating on each of the disks. If eachdrive had a 200,000-hour MTTF,3 then a 50-

2 You should not embark upon designing a fault-tolerant transaction processing system withoutowning and studying the Gray and Reuter book.3 Ask your disk drive vendor about the MTTF forspecific models. These figures vary dramatically fordifferent disk drives.

Page 9: Designing Requirements

Designing Your System • 9

Oracle System Performance Group Technical Paper, January 3, 1996

drive system would have a 4,000-hour MTTF—roughly five and a half months.4 This meansyou could expect two unplanned disk crashes ayear without RAID. MTTF for the 50 disks mir-rored in a RAID level 5 configuration would beover 3,000 years [Chen et al.].

Now let’s calculate disk crash MTTR.5 This re-quires understanding of the recovery process.Let’s assume that you’ve wisely invested intothe operational overhead of scheduled hotbackups that will allow you to recover your da-tabase to your users’ most recent commit. Therecovery process then looks roughly like thefollowing:

Shut down the Oracle application and theOracle Server.If the disk can be repaired, repair the disk.Otherwise, salvage all the files you can fromthe corrupt disk, replace the disk drive, andreplace the salvaged files.Restore from tape the most recent hotbackup of the Oracle database files thatwere corrupted.Restore all of the archived redo log filesgenerated since that hot backup.Initiate the Oracle Server startup process.Wait for Oracle to roll forward through thearchived redo logs.Wait for Oracle to roll back all pending un-committed transactions.Notify your users that they may resumetheir work.

By practicing this process on a prototype sys-tem, you will learn how long each of these taskswill take in your environment, and you willlearn how to perform some of these tasks simul-taneously to save time. Disk failure MTTR willbe the elapsed duration from the start of thefirst task to the finish of the last.

Note that each task in the recovery process out-lined above bears risk—for example, what ifone of the tapes you need for restoration is cor-

4 See Gray and Reuter’s equation 3.5.5 The Oracle market trend today is to use mirroreddisks, which, as we’ve seen, have extraordinarilyhigh MTTF ratings. Some RAID system managersconsider the reliability to be so good that they don’teven formulate a disk recovery plan. Although RAIDreliability is excellent, RAID systems can still crashand require Oracle Server media recovery. So even ifyou will use RAID, you should still analyze yourdisk corruption MTTR.

rupt? It happens. When it does, you must usean alternate recovery plan that has a muchhigher MTTR because it requires users to redowork they thought they’d finished. Using tapereliability (MTTF) statistics, you can calculatehow often you can expect this event to happento you. The economic impact of this possibilitymay motivate you to multiplex your tape back-ups. You must analyze each step in yourrecovery process for possible surprises like this.

As you study the recovery process for the diskfailure event, you’ll notice that MTTR dependsheavily upon economic trade-off decisions. Youcan reduce MTTR for any event by incurringcost somewhere else. Depending on your de-sired level of investment, your MTTR for diskcrash outages might be anything from minutesto hours.

For example, in our disk crash example, youcould reduce MTTR by reducing the elapsedtime of the roll-forward step: simply take hotbackups more frequently. What are the costs?You would have to buy more tapes; you’d haveto construct more complex tape managementprocedures; you’d pay increased backup proc-ess labor costs; and you would enduremarginally degraded database performancemore often during the more frequent backups.

Your information providers’ real job is to useingenuity to find ways to reduce the cost ofyour benefits. For example, you can get many ofthe benefits of more frequent hot backups bybacking up your most heavily updated Oracletablespaces more often than you back up yourinfrequently updated ones.

For each type of system failure you can encoun-ter, have your vendors help you computeMTTF. To compute MTTR, create a miniatureproject plan for the repair process using a proj-ect planning tool. After you quantify yourreliability (MTTF) and service interruption(MTTR) statistics, you can quantify the eco-nomic impact of the requirements that youdocument in your SLA. As you implement yoursystem, test your estimates and refine your SLAto reflect the actual facts of your experience.

3.3 Economic NeedsYour users want 7 x 24 system access. Your op-erations staff say no way unless you’re willingto spend several millions of dollars each year onthe system. Does your business need real 7 x 24

Page 10: Designing Requirements

10 • Cary V. Millsap

Oracle System Performance Group Technical Paper, January 3, 1996

access? If your system is a general ledger for acompany with offices in two mainland USAtime zones, then probably no. If your systemmanages over 10 million reservations per dayfor a global airline with customers calling fromevery time zone on the planet, then definitelyyes.

Economic need—your company’s unique mix-ture of investment aggressiveness and costconsciousness—determines the right answer totrade-off decisions. Do you go with a less ex-pensive disk system with a 6-month MTTF, ordo you buy a more expensive system with a3,000-year MTTF? What do you do if your sys-tem is too slow? Do you tune your software,upgrade your hardware, or negotiate relaxedfunctional requirements? These are all economicdecisions, and the ingenuity you need to makethem will come from understanding your tech-nology.

Understand your technology before you aspire tomake optimal economic decisions about your system.

The best way to understand your technology isto get hands-on experience with it as quickly aspossible. Don’t assume that you understandyour system’s response time performance untilyou’ve tested it under varying throughputloads. Don’t dare assume that you understandyour system’s recovery processes until you’vestudied and tested them at your site. Youwouldn’t assume that a custom-built or pre-packaged application does what you needwithout studying it. Don’t treat your otherfunctional and operational specifications anyless seriously.

4. Designing Your SystemAfter you have specified your cost-benefit bal-anced requirements in your preliminary SLA,you are ready to begin choosing hardware. Thefollowing sections will orient you to many ofthe issues you must consider when selectingyour system components.

4.1 DiskSpecifying disk hardware is not just an exercisein figuring out how much data you will need tostore. In addition to data volume requirements,you must consider:

• How many I/O calls per second will yourapplication generate? You must design your

system to have enough disk drives and con-trollers that your I/O subsystem won’tcause unacceptable delays as you increasethe load on your system. Measure yourapplication or a prototype to learn yourthroughput requirements. Ask your hard-ware vendor about your disk drive andcontroller specifications. You must buyenough devices to keep individual compo-nent utilizations at acceptably low levels[ J ain].

• How much hardware fault resilience is re-quired to meet your uptime and downtimerequirements? Can you withstand the fail-ure rate of un-mirrored disks, or do youneed to purchase a more expensive RAIDconfiguration?

4.2 MemoryMeasuring memory consumption is well-documented [Loukides]. Although memory isone of the easiest system components to specifyaccurately, there are a few factors that make thetask tricky in spots:

• All operating systems give you the ability tomeasure the size of a program’s text, data,and stack areas. UNIX, for example, givesyou the size command. However, mostwell-designed, complex applications allo-cate memory dynamically at run-timecommensurate to memory need of a par-ticular user’s invocation.6 You cannotforecast dynamic memory consumptionusing a static command like size—you mustmeasure this memory usage operationallyon a system in use. If your system doesn’texist yet, then you must measure using aprototype, or you’re confined to makingeducated guesses.

• It may be difficult to get an accurate reportof actual memory usage. For example,many implementations of the UNIX pscommand do not report memory consump-tion accurately. Most vendors have toolsunique to their systems for measuringmemory consumption, but you often haveto ask specifically for these tools before youget to use them.

6 If you’re a C programmer, we’re talking here aboutprograms that do a lot of malloc() function calls.

Page 11: Designing Requirements

Designing Your System • 11

Oracle System Performance Group Technical Paper, January 3, 1996

• You must understand your operating sys-tem’s implementation of memory sharingfor program text. Your hardware vendor oryour Oracle consultant can help you under-stand how to count memory accuratelywithout double-counting shared segments.

4.3 CPUCPU is the most difficult system component tosize. The root of the difficulty is that no usefultransaction unit exists for CPU specifications.With disk drives, we can compare milliseconds-per-call response time specs and bytes-per-second throughput specs directly to your appli-cation’s requirements. With networks, we havemilliseconds-per-transmission response timespecs and bits-per-second throughput specs.But with CPUs, all we have are benchmark sta-tistics that allow you to compare one CPU toanother. These numbers don’t reliably tell youwhat kind of CPU you need to buy for yourunique application.

Computing a system’s CPU response time andthroughput capacities is expensive. The lowestrisk method is to measure a system’s perform-ance while it is running your application withyour setup and your configuration. You can re-duce your cost for this information withprototypes and models if you know your tech-nology well enough minimize the risk ofmissing something important.

Guessing is a high-risk method that has workedsuccessfully for 75% or more of our customers.Guessing works okay for you if you can be cer-tain that the machine you will buy will easilyoutperform your requirements. If your re-quirements are sufficiently easy to meet, it maycost less to buy an overpowered computer thanto figure out exactly how much power youreally do need. But if your requirements aretough, guessing won’t be good enough.

Prototype testing yields the best cost-benefitbalance to reduce risk at sites who need themost from their technology. Several tools andservices exist to help you accomplish this test-ing, including:

• Measurement and recording tools let yousee the service and delay times of yourapplication’s resource requests. Many ven-dors supply these tools. Oracle consultantsuse almost all of these tools in addition to

several produced by Oracle’s System Per-formance Group.

• Remote terminal emulators let you simulateuser loads without having to have severalhundred of your closest friends come useyour system while you measure it.

• Queueing models allow you to predict theresponse time degradation effects of addingload onto a system. A portion of Oracle’squeueing model tool is shown in Figure 4.

• Discrete event simulators forecast the be-havior of your system given differentconfigurations.

• And several service providers, includingOracle, can bring technology experience toyour project to help you get the informationyou need at the lowest long-term cost.

In addition to performance considerations, youmust also consider computer node fault resil-ience. You must analyze your CPU systemMTTF and MTTR statistics, like we’ve done inthis paper for disk drives, to determine whetheryour business will require redundant computernodes to minimize the impact of unplannedCPU outages.

4.4 NetworkSpecifying network hardware is much likespecifying disk hardware. You must consider:

• How fast must the network’s response timebe to supply satisfactory round-trip re-sponse times to applications makingnetwork requests?

• How many transmissions will your appli-cation require for each databasetransaction? You must design your net-work’s throughput capacity to meet yourresponse time requirements as concurrencyhits peak levels.

• How much additional network hardwarewill you need to purchase to meet yournetwork availability goals? Answering thisrequires an MTTF and MTTR analysissimilar to the analysis for disk shown ear-lier.

Designing your network will require the serv-ices of an experienced network engineer to meetall of your networking goals. Understanding theOracle application and database server re-

Page 12: Designing Requirements

12 • Cary V. Millsap

Oracle System Performance Group Technical Paper, January 3, 1996

quirements upon the network will require Ora-cle-specific experience.

5. ConclusionsIn this paper, we’ve developed the followingline of reasoning:

• Your computer system is much too impor-tant to risk measuring your satisfactionsubjectively.

• To measure your satisfaction objectively,you must specify your expectations and tol-erances, and then continually measure themagainst reality as expectations and realityevolve.

• To understand your expectations, youshould write them down in a documentcalled your service level agreement—the SLA.

• Your expectations must be structured, pre-cise, and feasible or you’ll encounter anunrelenting barrage of nasty and expensivesurprises.

• To create structured, precise, feasible expec-tations, your company’s best people mustparticipate.

• You must understand your technology be-fore you aspire to make optimal economicdecisions about your system.

In addition to these principles, we’ve discussedspecific technical issues, including:

• You have seen an approach to structuringyour SLA for Oracle applications so thatyou will not overlook categories of technicalresponse commitments.

• You have seen how to evaluate the feasibil-ity of Oracle system performance andavailability requirements.

• You have seen how to begin the selectionprocess for CPU, disk, memory, and net-work hardware to suit your expectations.

• You have seen references to further readingfrom which you can learn more about thetopics presented here.

6. ReferencesBOAR, B. The Art of Strategic Planning for Infor-

mation Technology. John Wiley & Sons, NewYork NY, 1993.

CHEN, P.; LEE, E.; GIBSON, G.; KATZ, R.;PATTERSON, D. “RAID: high-performance,reliable secondary storage” in ACM Comput-ing Surveys, Vol. 26 No. 2, June 1994.

GRAY, J.; REUTER, A. Transaction Processing: Con-cepts and Techniques. Morgan Kaufman, SanFrancisco CA, 1993.

JAIN, R. The Art of Computer Systems PerformanceAnalysis. John Wiley & Sons, New York NY,1991.

KERN, H.; JOHNSON, R. Rightsizing the New En-terprise. P T R Prentice Hall, EnglewoodCliffs NJ, 1994.

LOUKIDES, M. System Performance Tuning.O’Reilly & Associates, Sebastopol CA, 1991.

MENASCÉ, D.; ALMEIDA, V.; DOWDY, L. CapacityPlanning and Performance Modeling. P T RPrentice Hall, Englewood Cliffs NJ, 1994.

About the AuthorCary Millsap is the director of Oracle’s SystemPerformance Group, a part of the Oracle Serv-ices Advanced Technologies group. The team isresponsible for building new tools and capabili-ties like the ones described in this paper forOracle and its customers. The System Perform-ance Group provides system design, capacityplanning, and performance management serv-ices to customers worldwide.

Since joining Oracle in 1989, Mr. Millsap hasworked with over 100 Oracle customers. He haspublished several papers, developed and taughtinternationally acclaimed courses, and he is theauthor of Oracle’s Optimal Flexible Architecture(OFA) standard.