Stop Leaks in Your Custom Interfaces - Excelon Development · founders, cloud tech-nologies and the...

PERFORM YOUR SOFTWARE | CROWDSOURCING CASE | CLOUD WHIZ RANDY HAYES

VOLUME 6 • ISSUE 10 • OCTOBER 2009 • $8.95 • www.stpcollaborative.com

Six Sigma In FiveEasy Pieces

Stop Leaks in YourCustom Interfaces

Threading: TheGood, The BadAnd the Ugly

Simple Rules For Safety-Critical Code page 12

1122 MAIN FEATURE

Power of 10: Rules forSafety Critical CodeIn life-critical software, undiscovered bugs can befatal. These simple rules can improve the qualityand reliability of any application. By Gerard Holzmann and Michael McDougall

1188 Stop Interface Leaks Connecting new systems to an existing backend?Here’s a look at common points of failure and waysto fix or prevent them.By Nels Hoenig

2222 Three Sides of ThreadingThreading is a great way to improve application per-formance. But it can also bring it to a screeching halt.Learn how to do one without the other and how totest for both.By Claire Cates

3300 Six Sigma in Five Pieces Six Sigma methods are designed to spot tiny varia-tions in quality before they get big. The techniquesalso apply to software quality. Here's howBy Jon Quigley and Kim Pries

VOLUME 6 • ISSUE 10 • OCTOBER 2009contents

DEPARTMENTS

6 EditorialDead-ends in a userinterface are annoy-ing for the user, andshould be avoided atall costs. Find outabout a major serviceprovider that misseda few biggies.

7 Out of the BoxNews of the testingindustry and productsof interest to testers.

8 ST&PediaIs software develop-ment and testing aperformance art? Andif it were, how mightthe lingo cross over? By Matt Heusser & Chris McMahon

10 CEO ViewsSoftware perform-ance guru RandyHayes talks about hishumble beginnings,ties to Microsoft’sfounders, cloud tech-nologies and thefounding of his owncompany, CapacityCalibration. By Andrew Muns

34 Case StudyThe phenomenon ofcrowdsourcing ishelping IT depart-ments get over thehump—withoutbleeding jobs. By Joel Shore

4 • Software Test & Performance

www.stpcollaborative.com • 5

A PublicationREDWOODCollaborative Media

GET MORE ONLINE AT

Software Test & Performance (ISSN- #1548-3460) is published monthly by Redwood Collaborative Media, 105 Maxess Avenue, Suite 207, Melville, NY, 11747.Periodicals postage paid at Huntington, NY and additional mailing offices. The price of a one year subscription is US $69 for postal addresses within North America;$119 elsewhere. Software Test & Performance is a registered trademark of Redwood Collaborative Media. All contents copyrighted 2009 Redwood CollaborativeMedia. All rights reserved. POSTMASTER: Send changes of address to Software Test & Performance, 105 Maxess Road, Suite 207, Melville, NY 11747. To contactSoftware Test & Performance subscriber services, please send an e-mail to [email protected] or call 631-393-6051 ext. 200.

Discussion topic: Do offshoretesting operations build stake-holder value? Visit our commu-nity forum to take part in thediscussion.stpcollaborative.com/sabourin

The Test & QA Report offerscommentary and insights on thelatest topics in testing. Readwhy cloud computing is the new SOA.stpcollaborative.com/eseminars

Don't miss STPCon 2009 Oct.19 - 23 in Cambridge, Mass,with in-depth training courses,one-day workshops and fivespecialty tracks, all taught byindustry leaders.stpcollaborative.com/conferences

Forrester Analyst MARGOVISITACION discusses howimplementing a dynamicapproach to coverage enablesorganizations to trim costs.Webinar archive available now!tinyurl.com/0916-MKS

Learn about new products,search by solution type andaccess product descriptions,white papers, online demos,podcasts, videos, members-only offers and more. Commenton and rate your favorite prod-ucts & services.stpcollaborative.com/resources

CONTRIBUTORS

GERARDHOLZMANNjoined Bell LabsResearch in 1980and moved toNASA/JPL in2003.

JON QUIGLEYis a test groupmanager at Volvo and a private producttest and develop-ment consultant.

NELS HOENIGis an expert QAengineer, projectleader, businessanalyst and appsupport specialist.

CLAIRECATES is withSAS Institute for23 years, and forthe past 10 hasworked on soft-ware performance.

MICHAELMCDOUGALLresearched visuali-zation techniquesto achieve high-quality software.

At Stoneridge,KIM PRIES isresponsible forhardware-in-the-loop software test-ing, and automat-ed test equipment.

THE DEVELOPMENT TEAM AT A MAJORcable company missed some major use caseswhen testing the software in its set-top boxes.

For two weeks in September, I visitedthe lovely Pacific-coast stateof Washington. And duringthat time had the occasion touse numerous applications,including that of my portableGPS device, the high-techtoaster in the rental houseand the cable box.

Yes, I admit it—I watchedsome TV. And it wasn’t longbefore I discovered severalserious flaws in the navigationsoftware used in the set-topbox provided by Comcast. Thebox didn’t crash or start charging my creditcard, but it was at times annoying. I wouldclassify the flaw as a dead-end; a place wherenavigation led the user (viathe handheld remote control)from which there was noescape except to start over.

The bug seemed to beconfined to the navigationthrough selections for on-demand programming. Forexample, let’s say I selectedto watch a movie, from a listthat also included TV showsand music videos. From thenext list—of genres—I thenchose to see a comedy. Afterthe list of comedies was dis-played, I decided I wanted anaction movie instead. Butthere was no back button; theonly way out was to canceland start all over. I had founda UI dead-end!

My guess is that devel-opers and testers assumedthat every trip down the on-demand tree would lead to the selection of aprogram. They never thought of a use case inwhich someone would change their mind. It

was curious, because other areas of the inter-face did offer the option of going back. Forexample, when viewing the description anddetails of a program, icons are available to

view that program (an eyeball)or go back (a curved arrow) tothe program listings grid.

Contrast that with the set-top box interface offered byDIRECTV, which as of Sept-ember 12 became my televisionservice provider. I didn’t knowthis until after I signed up, but itssoftware and its interface issuperior. It’s intuitive, logical andconcise—it was designed withthe user more in mind.

Here’s just one example,and it’s hard to believe more set-top boxdesigners haven’t thought of this use case.One of the features I’ve heard is somewhat

unusual among such servic-es is the ability to customizethe program guide toinclude only channels thatyou’re able to get. What’smore, DIRECTV also letsyou further customize theguide to include only thechannels you would everwant to watch. For me, thateliminates about 300 chan-nels right off the bat. Theydon’t appear in the guideand they don’t come upwhen I surf, either of whichwould result in a “falsechoice” if selected.

I could go on, but I thinkyou get the point. Whenperforming your UI andacceptance testing, it’s crit-ical to put yourself in theshoes of the user, and notsimply test features that

you already know about. If so, then you’resimply repeating your function tests, whicha script can do better. !

6 • Software Test & Performance

ed notes

A Use-Case BlackHole On Demand !

“The flaw was a

dead-end; a place

where navigation

led from which

there was no

escape. ”

VOLUME 6 • ISSUE 10 • OCTOBER 2009

President Andrew Muns

Chairman Ron Muns

105 Maxess Road, Suite 207Melville, NY 11747+1-631-393-6051+1-631-393-6057 faxwww.stpcollaborative.com

Cover Art by The Design Diva

REDWOODCollaborative Media

Editor Edward J. [email protected]

Contributing EditorsJoel ShoreMatt HeusserChris McMahon

Copy Editor Michele [email protected]

Art DirectorLuAnn T. [email protected]

Publisher Andrew [email protected]

Associate PublisherDavid [email protected]

Director of OperationsKristin [email protected]

Chief Marketing OfficerJennifer [email protected]

Marketing CoordinatorTeresa [email protected]

ReprintsLisa [email protected]

Membership/Customer [email protected] x 200

Circulation and List ServicesLisa [email protected]

out of the boxSpringSource Sprouts a Cloud FoundrySpringSource has introduced Cloud Foundry, a self-service,pay-as-you-go, public cloud deployment platform for JavaWeb applications. The tool is designed to help developersdeploy and manage Spring, Grails and Java applications withina public cloud environment. tinyurl.com/CloudFoundry

Embed With Windows 7 Microsoft released the Windows 7-based WindowsEmbedded Standard 2011 (formerly code-named“Quebec”) Community Technology Preview to OEMs anddevelopers of specialized devices worldwide. The release isa componentized form of Windows 7 that OEMs can use inproducts for industrial automation, entertainment, and con-sumer electronics.tinyurl.com/embedWindows7

Parasoft Seeks Harmony with ConcertoParasoft released Concerto, a product designed to inte-grate and facilitate the software development lifecyclethrough end-to-end process visibility and control. It con-nects distributed components such as requirements, defecttracking, and source control management to facilitate natu-ral human workflow.tinyurl.com/ParasoftConcerto

Cover Your Assets with FVPArbour Group announced a Financial Verification Package(FVP) that checks for the presence of controls for effectivecorporate financial governance. FVP consists of a planningtemplate, control process definitions and specifications, andtest scripts. A reporting capability summarizes the scope oftesting and identifies remedial actions.tinyurl.com/ArbourFVP

Original Marketing With AppLabsOriginal Software and AppLabs entered a marketing agree-ment under which AppLabs will market Original’s quality man-agement suite and Original will position AppLabs’ QA andthird-party validation services into its own customer base.tinyurl.com/OriginalReleasetinyurl.com/AppLabsRelease

Check the Vector, TaskingVector Software announced support for Tasking C166 compil-er on Infineon chips. Previous versions of VectorCAST sup-ported the Tasking compiler on 56k chips; the new releasesupports several chips based on the C166 architecture,including the C167, XC2287 and XC2364.tinyurl.com/VectorTasking (pdf)

LDRA Now Likes RoseLDRA integrated its tool suite with IBM Rational’s RoseRealTime model-driven development environment. This inte-gration introduces LDRA’s code quality analysis and codecoverage techniques into the domain of model-driven devel-opment using Unified Modeling Language.tinyurl.com/LDRARose (pdf)

VMLogix Manages EC2 VMLogix announced availability of LabManager Cloud Edition.The product supports software teams running virtual labs with-in the Amazon Elastic Compute Cloud (EC2) public cloud, andoffers a pay-as-you-go enterprise-class virtual lab solution.tinyurl.com/VMLogixEC2

Microsoft Tool Says if Logo’s a No-GoMicrosoft released beta 2 of the Windows 7 Software LogoToolkit, an automated tool to validate the compliance ofapplications with Windows 7 Software Logo requirements.tinyurl.com/MSLogoTool

Easing Parallel, Wireless Sensor JobsNational Instruments LabVIEW 2009 eases development ofparallel hardware architectures, according to the company,and new virtualization technology leverages multicore sys-tems with compiler improvements and enhancements infield-programmable gate array (FPGA) design. It also sup-ports deployment of code to wireless sensor networks andWLAN, WiMAX, GPS and MIMO standards.tinyurl.com/labview2009

Monitis Installs a SOAP DispenserMonitis has made its cloud-based SOAP testing serviceavailable for on-demand Web services and application loadtesting. This update to its WebLoadTester testing suite isintended for SOA applications, and now simulates heavytraffic in real time, allowing webmasters to determine howperformance and integration can be optimized. tinyurl.com/webloadtester

Gomez Fluffs Out Cloud-based ToolGomez has enhanced its Web-based load testing, perform-ance management and Web cross-browser testing solu-tions, with the ability to test rich Internet application trans-actions. A new diagnostics dashboard is designed to easediscovery of performance-issue root causes. It alsoexpands its family of multi-browser testing agents andincreased its high-volume, cloud-generated load capability. tinyurl.com/gomezSaaS

Send product announcements to [email protected]

OCTOBER 2009 www.stpcollaborative.com • 7

Compiled by Joel Shore

TOM DEMARCO, AUTHOR OF"Controlling Software Projects: Manage-ment, Measurement, and Estimation,"(Yourdon Press, 1982), who’s also theperson responsible for thephrase "You can't controlwhat you can't measure,"wrote a remarkable articlefor the IEEE Software pub-lication in July of this year.In it he claims that thegoals of software engi-neering are not relevantfor today's software devel-opment environments,and that they have neverbeen relevant to softwaredevelopment for the entire40 year history of software engineering.

We welcome DeMarco's statement,and have long suspected that softwareengineering was, if not irrelevant, certainlya wildly incomplete and inaccurateapproach to creating software. So itstands to reason that systems designedto make the handling of physical objects(bridges, cars) efficient would map poorlyto software, where we manipulate onlyalgorithms and concepts. If we are free todiscount software engineering, we canopen to the door to more radical andexperimental approaches to describe thework of creating software.

Chris has been saying for some timethat software development, done in a cer-tain style by a certain kind of highly skilledteam, is in fact a form of artistic perform-ance, like music or theater. Matt views theart more broadly, likening development asa creative endeavor to activities such aswriting and mathematics.

This month, we define a set of wordsfrom the performing arts as they mightapply to a software development project.You may not agree with their applicability,

but we hope that you’ll be intrigued.

RepertoireEverything we could potentially include in

the release. Similar to aproject backlog, a reper-toire demands more analy-sis and introspection fromthe team than a backlogusually receives. "I don'tthink we can add "GiantSteps" to the repertoireunless we improve ourplaying a lot."

Set ListThe list of features we’reworking on in preparation

for the next performance. Essentially thesame as the story cards in play for an iter-ation, a set list implies a certain order anda certain harmony or theme for all the fea-tures being developed. If you prefer amore disciplined description, consider ascene list, as a theatrical performancewould have.

PerformanceRelease to production. This is where theaudience gets to appreciate the talent andskill of the performers by experiencing thesoftware. Again, we think artistic softwaredevelopment must conform to a certainstyle, and performance is a key element ofthat style. It is crucially important for anartistic software team to commit them-selves to releasing a high quality set list ata particular time, over and over, withoutmissing the dates and always with veryhigh quality. Consider a concert tour orthe run of a play. Probably the most pureexample of software performance wouldbe Software as a Service (SaaS) applica-tions, where the audience is always liveand is always committed to the product.Regression bugs or broken features inSaaS applications are not acceptable ifwe are to keep our audience.

AudienceSoftware development has always strug-

gled to define exactly what a "user" is.Instead, we should treat our users as anaudience. We have had audiences forthousands of years; we know what theyare and how they behave. They are thepeople who pay us to attend our perform-ance. It is our job to make them want toattend our performances over and overand over.

InstrumentsThe tools we use to perform our work.Some instruments demand specializedknowledge and skill to operate. Someinstruments are shared. For instance,everyone who gets a degree in music isrequired to be proficient on piano. Everymember of the team needs to be profi-cient with the source control system. "Iwish we had a better instrument forautomating our UI tests."

RehearsalThe activity leading up to performance,including analysis, spiking, coding, testing,packaging, documentation and so on. Werehearse in order to put on a successfulperformance. "Our early rehearsals werepretty rough, but it all came together oncewe figured out what order the set listshould be in for the performance." Theword "rehearsal" is particularly nicebecause it reminds us that the work is only

8 • Software Test & Performance OCTOBER 2009

Performing the Software!

st& pedia

Matt Heusser and Chris McMahon are careersoftware developers, testers and bloggers. Mattalso works at Socialtext, where he performstesting and quality assurance for the compa-ny’s Web-based collaboration software.

The encyclopedia for software testing professionals

“We define words from

the performing arts as

they apply to software

development.”

important when it is performed. As SteveJobs famously says, "Real artists ship."

PaceRehearsals, performances and featuresthemselves tend to have a natural, con-sistent pace. We know how many fea-tures we need to have to put on a goodperformance, we know not to overloadour audience with too many features at atime, and we know how long it takes tocreate any particular feature. Increasingthe pace for any of these elements runs arisk of annoying the audience. Just as in amusical performance, increasing the paceof software development can reduce thequality of the performance.

AestheticsUltimately all of our judgments about qual-ity, whether of art or software, are aes-thetic judgments. We measure code cov-erage not because code coverage is anintrinsic quality of good software, butbecause we appreciate the value of codecoverage in an aesthetic sense. It pleases

us to have a high amount of code cover-age because we judge software with ahigh percentage of code coverage to begood. If we acknowledge that evaluatingthe quality of our work is essentially anaesthetic act, we are free to bring to beartools from psychology, philosophy, literarycriticism, etc. in our effort to do excellentwork. Again quoting comments fromJobs, this time with an opinion aboutMicrosoft: "They have no taste."

ChopsThe original sense of the term referred tothe lips of horn players, as in “to lick one'schops.” Over time the notion of chopshas come to mean a high degree of tech-nical skill, experience and discipline onone's instrument, even if one plays a nonwind instrument such as guitar or piano."We should bring in Jon on this, he hasthe performance testing chops we need."

CrewIt is worth noting that in rehearsal, thewhole team moves forward together. But

the audience tends to see only the per-formers on the stage. Linus Torvalds andGuido Van Rossum are the ones in thespotlight, but their success depended onan army of testers, developers, critics andso on. The people running the sound andthe lights are as creative and expert as thepeople under the spotlights.

We group music, drama, and dancetogether and call them "performing arts"because the process by which they areexecuted, and the language used todescribe that process are similar for each.If we can use the language of the per-forming arts to describe accurately theprocess of software development, itseems likely that software developmentbears a close relation to the performingarts. Human beings have been performingfor thousands of years, and we havesophisticated tools and concepts thatecause of our institutional bias for engi-neering concepts no one has ever thoughtto bring to bear on software development.We’re excited at the proposition of bring-ing it about. !


Advertiser URL Page

Hewlett-Packard www.hp.com/go/alm 36

Keynote www.keynote.com/loadtesting 2

STP Collaborative www.stpcollaborative.com 17, 35

STP Collaborative www.stpcollaborative.com/resources 9

Resources Directory

STPCon Fall 2009 www.stpcon.com 3

STPCon Fall 2009 www.stpcon.com 16

Exhibitor Opportunities

IInnddeexx ttoo AAddvveerrttiisseerrss

RANDY HAYES IS CO-FOUNDERand CEO of Capacity Calibration, asoftware company that specializes incloud-based Web load testing.

Incorporated in 2008, CapacityCalibration develops and markets testautomation solutions for medium andlarge companies. Its flagship product isCapCal, a Web perform-ance testing tool intend-ed to help reduce theeffort of script creationand maintenance. CapCalis designed for develop-ers, QA and IT teams andintegrates with functionaltesting tools.

Regular readers ofSoftware Test & Perf-ormance might recog-nize the name of Randy’ssister Linda Hayes,who’s a frequent contrib-utor to the magazine. Their careerpaths have crossed on multiple occa-sions.

Andrew Muns: Let’s start by talk-ing about your background. How didyou get involved in the software test-ing industry?

Randy Hayes: Well, I am fromAlbuquerque, New Mexico, and duringthe [19]70’s there were some excitingthings happening there.

One of them was a little companycalled MITS [Micro Instrumentationand Telemetry Systems] which had acomputer called the Altair and a pair ofguys named Bill Gates and Paul Allanwere living there at the time and work-ing with this company.

It was shortly after they left andmoved to Seattle that I got a job atMITS writing diagnostic test programs.That was my beginning in testing.

I began as an entrepreneur in1985 when my sister [Worksoft CTO]Linda [Hayes] and I started a company

called Autotester, which introduced thefirst automated testing tool for the PC.

At the time,I assume therewere automat-ed testing toolsfor mainframeenvironments?

Yes, there were, butthey were all script-based, nothing like whatwe had. We used a taperecorder analogy so yourecord “tapes” insteadof programming.

People without anyprogramming experience could buildthem all day long, itwas great. But weraised a lot of venturecapital to fund thatcompany and theyhad the majority inter-est. At some pointwe had a falling outwith them so wechose to move on. Inthe subsequent yearsit finally tanked, I’msorry to say.

So you learnedabout venture capi-tal the hard way Iguess.

Yes indeed,although it doesn’talways go that way.After I left Autotester,I joined Linda and acouple of others to do a project forFidelity Investments in Boston that

evolved into a company called Worksoft,which is now focused in the SAP marketand doing quite well.

Finally, CapCal wasstarted [as DistributedComputing] in March of2000, when I raised a cou-ple of million dollars, hired ateam of really good develop-ers and spent almost twoyears creating the first ver-sion of it.

What was it that ledyou to the business ideabehind CapCal?

I was hired as a consult-ant by a dotcom company to evaluateload testing tools and tell them whichone they should buy. I evaluated all ofthem and came back to say I wouldn’tbuy any of them. That was a bad ideafrom a consulting point of view.

What was lacking in the toolsthat you evaluated at the time that

led you to believethat you could do abetter job?

Well, the Inte-rnet. Period. Thesetools were all de-signed for client-server. They wereantiques. CapCal wasdesigned from theground up for theInternet, to be dis-tributed and service-oriented. And in fact,we started it with aprogram that peoplecould download, andwhen their comput-ers were used in atest, we would paythem.

What has beenthe evolution of the company?

When we talk about performance


CEO VIEWS

“Many companies

will pay for

performance

testing, but it has

to be in their

budget. ”

RESUMÉJOB: Founder, Chief Architect,Capacity Calibration (CapCal)

SLOGAN: "Calibrating scalability andperformance from the cloud."PRIORS: Worksoft senior software consultant and engineer,Distributed Computing solution architect, 7th Levelsenior software engineer

By Andrew Muns

CapCal’s Randy Hayes SeesClouds in the Forecast!

testing at CapCal, we are really talkingabout three different offerings. One ofthem is what we call CloudBurst. Thisis a service on the Amazon cloud thatoffers on-demand, self-service, pay-as-you-go load testing.

The second are integrated perform-ance testing services in which we capturewhat functional test tools do and usethese inputs for performance testing.

Lastly, we have an agile testingtool that conducts nightly performancetests to allow testers to compare theirapplication’s per-formance after eachiteration in the devel-opment lab. ButCloudBurst is theway that the majorityof the people willpurchase and usethe product.

What has beenthe biggest growthdriver so far?

The most signifi-cant drivers havebeen small- to medi-um-sized businessesthat need these services but havebeen priced out of the market. Thereare many companies that will pay forperformance testing, but it has to be intheir budget and the ability to use theAmazon cloud puts this in the pricerange of smaller shops by allowing pay-as-you-go pricing.

How are your customers ableto simulate various types of usersinteracting in different ways at dif-ferent times?

They can record different profiles,or sessions, and those can be mixedand matched so you can construct atest that has a variety of differentbehaviors.

How do you think about thecompetitive landscape of this busi-ness and how is it changing?

There has been an onslaught ofentrants to the space and there will bemore coming, because it is such a nat-ural use of cloud computing. What wewant to do is to bring together allforms of test automation and makethem available as a service on thecloud. So if someone wants to run

their functional tests in parallel, we cando that. Nobody is quite there yet, butI think that if you can deliver all of it asa service on the cloud people will even-tually rush for it. It is so easy…and alot cheaper.

For load testing, the advantageof using of the cloud seems obvi-ous, but with regard to functionaltesting, what is the primary benefit?

Well, let’s say that you have a func-tional regression test and let’s say itruns for four hours. If you could run it in

10 minutes, then you just shaved off alot of time from your development cycle.

Do you feel like the popularityof agile methods makes a lot ofthese tools even more important asteams work on the basis of shortiterations?

Definitely. Performance and scala-bility testing need to be part of thatcycle. They aren’t in most cases, butthey need to be. We have a way to dothat and we are hoping that people willget on board with this. I think it willhappen when the tools become easyand affordable enough.

Do you think the efficiency ofthe agile methodology itself hasincreased as a result of some of thejust-in-time tools that now exist? Istechnology an enabler of agile justlike agile is to technology?

Oh wow, that is a good question.The way I think of it is that test automa-tion in the beginning meant getting ridof all this mind numbing repetitivework. We were replacing the personand doing it much, much faster. Butnow, with cloud computing, we don’t

just replace the person, we get rid ofthe labs that don’t need to be taking upspace because they are just doingthings like testing. We think that peo-ple will gravitate quickly to the cloudfor test automation of all kinds.

When people do Web load test-ing with CapCal, do applicationstypically outperform or underper-form expectations?

They are typically surprised that itwas not as good as they expected.

Do you guys load test your ownsite with your ownproduct?

Yes, we do and itis yielding some veryinteresting results.We have come upwith a way to deployCapCal on the cloudand scale automati-cally when it needsto. We only have oneinstance runningmost of the time butwhen it starts to getweighed down at all,it spins up another

one, and so on. Looking ahead, what do you

think are the next big shift will bethat testers will have to adapt to?

As people start to have test toolsthat are as sophisticated as the appsthey are testing and as easy to use, verylarge numbers of people who don’t cur-rently use tools will start to do so, espe-cially as these become more affordable.

This in itself is a sea change, so Iam not worried about what happensafter that, because it is going to take awhile for testers to catch up with toolsalready being introduced.

Where does Web load testingstand on the spectrum from beingan essential expense to a luxuryexpense?

Determining the value of perform-ance testing is just a matter of thinkingwhat would it cost per hour of downtime.Customers should ask what this cost isrelative to the cost of the tool, and thehigher that number is, the more negli-gent you are being if you don’t do per-formance testing. So I encourage peo-ple to do this math. !


“To determine the value of performance

testing, think what it would cost

per hour of downtime. ”

When Lives Are

At Stake, It's Best

To Err on The

Side Of Caution.

Analyze Your

Code And Its

Safety Using

These 10

Simple Rules

fifth that runs a separate backup flight system.When software failure can lead to injury or death, it is

considered safety-critical. It’s present in planes, cars, med-ical devices, spacecraft and other systems. The “Power ofTen” coding rules were introduced in 2006 as a minimal setof rules for writing safety-critical code in C. Unlike other rules,the Power of Ten were specifically designed to leverage thepower of static analysis tools. They were developed at NASA’sJet Propulsion Laboratory (JPL), but they can be applied fruit-fully to any safety-critical software written in C. They havebeen integrated into JPL’s new institutional coding standardfor flight software. For example, all flight software writtenfor the upcoming Mars Science Laboratory Mission to Mars,which is due to launch in the fall of 2011, will comply with thisnew coding standard.

The rules may at first sight seem overly strict, but recall thatthey are intended to guard the development of safety-critical sys-tems. The rules can be compared to the use of seat belts in cars—they are perhaps a little bit constraining at first, but easily justi-fied by the reduction of the risk of death or injury that they bring.

The recent leap-year bug in the Zune 30 MP3 player is anice example of the violation of Power of Ten Rule 2 (see page17), and had significant consequences . The rule requires thatall loops have a verifiable bound. This is particularly impor-

tant in embedded systems, where runaway code can be dire.The loop that the Zune 30 used to process dates did not prop-erly handle the 366th day of a leap year. Nor did the loop havea failsafe limit to catch infinite execution. As a result, Zuneswere unusable on Dec. 31, 2008, the 366th day of that year.Any Zune powered up on that day would be caught in an end-less loop until the battery drained. While the Zune is not a safe-ty-critical device, the same failure in an airplane or car or evenon a cell-phone (the user would be unable to call 911) wouldbe a different story.

Figure 1 shows a report from a static analysis tool, warningthat the loop in the Zune driver code is unbounded. The loop isunbounded in the case where the ‘days’ variable is equal to 366.

WHY A NEW CODING GUIDELINE?Existing coding guidelines, like those by MISRA and the JointStrike Fighter project, list hundreds of rules with voluminous sup-porting documentation. While such guidelines can be admirablycomprehensive, many software teams will balk at having to learnand apply so many rules. In practice, this means that most ruleswill be ignored. A project team could decide to monitor a smallsubset of the rules, but it can be difficult to determine which ofthe hundreds of rules would be most effective in terms of soft-ware quality improvements.

Another factor inhibiting adoption is a lack of generality.Guidelines often mix well-established best practices—whichcould apply to many projects—with stylistic rules (such as reg-ulating the use of white-space) that are peculiar to a project orcompany. Rules may also deal with APIs or domains that are notbroadly relevant. For example, rules about working with WindowsDLLs are not relevant to a Unix project, while rules about man-aging page tables are irrelevant outside an OS kernel.

The Power of Ten rules were in part designed to overcomethese problems. The rules are easy to memorize and applywhen writing and reviewing code, yet they capture the essence

By Gerard J. Holzmann and Michael McDougall

In manned space flight, system failure tragically has ledto many deaths. NASA’s space shuttle orbiters are con-trolled by five on-board computers: four running identi-cal software that constitute the primary system, and a

Gerard J. Holzmann is a fellow at the Jet Propulsion Laboratory and a FacultyAssociate in Computer Science at the California Institute of Technology.Michael McDougall is a senior scientist at GrammaTech, which develops soft-ware analysis tools.

www.stpcollaborative.com • 13


of writing code in a safety-critical con-text. The rules do not define project-spe-cific constraints on names or code lay-out; nor do they refer to particular APIsor platform-specific problems. Short,powerful and widely applicable, thePower of Ten rules are a great startingpoint for embedded developers lookingfor a core set of coding rules.

REACTION TO THE POWER OF TENThe Power of Ten rules were first publishedin IEEE’s Computer magazine in 2006, with a brief summary posted at spinroot.com/p10. They have been dis-cussed extensively on blogs and at work-shops and tradeshows. The topic tends togenerate much enthusiasm and some con-troversy. Frequently, the set of rules isstrongly endorsed, but almost everyone willfind one rule unworkable. Curiously, thereappears to be no consensus on which ruleshould be cast from the set: Everyone willpick a different one. This probably meansthat the set is reasonably well defined. It’shard to reach complete agreement on anyattempt at standardization, and coding rulesare, of course, no exception.

THE RULES AT JPLLike any software development tool, thePower of Ten Rules needed to be testedin the field. The initial testing was carriedout by a small JPL team developing a setof mission-critical flight software modules.The team used a combination of manualreviews and automatic checking with a

prototype version of a code analysis tooldeveloped by GrammaTech. These earlyexperiments showed that it was readilypossible to develop software under theconstraints of the rules and the sched-ule constraints of a real flight project.

The relatively effortless adoption ofthe Power of Ten rules led to the cre-ation of a new formal “JPL Institutional

Coding Standard” that combines thePower of Ten rules with a small numberof additional rules that are more specif-ic to the spacecraft context. The JPLCoding Standard applies to all new flightsoftware development at JPL. The nextMars rover will indeed be running soft-ware that complies with the Power ofTen rules. It has been a significant chal-lenge to reach this point, because thisnext rover will also run more code thanall previous missions to Mars combined(a few million lines of C).

Many of the additional rules in theJPL Standard constrain how tasks in areal-time system may interact; complextask interaction has been a source ofmany spacecraft bugs, including the onethat almost led to the loss of the MarsPathfinder lander in 2007. In some cas-es, the JPL Coding Standard puts addi-tional restriction on which features of Ccan be used—preprocessor features areespecially constrained. The ability to cus-tomize the rules (by defining the limit onfunction length or the minimal number ofassertions per functions of a given min-imal size, for example) allows for quickexperimentation, which, in turn, makesit easy to find an acceptable and enforce-able level of thoroughness.

Similar to the CMMI standard, theJPL Institutional Coding Standard dividesits coding rules into six separate levels of

FIG. 1: UNBOUNDED ZUNE CODE

HOW THE RULES TOOK FLIGHT AT JPL

The JPL standard was developed over a period of about four years. The initial stepwas to ask all key developers for their opinion of the MISRA-C 2004 coding guide-lines. We also asked the developers to identify what they believed to be the 10 mostimportant rules from that set, as well as the 10 least important rules. This revealedconsiderable consensus among the developers, but, curiously, we found little or nocorrelation between the set of rules that most developers considered to be the mostcritical and the actual coding practices followed by those same developers in theirown software development.

Sometimes we need help sticking to what we know is the right thing to do. Until theInstitutional Coding Standard was adopted, every project and mission defined its ownstandard, always slightly differently from other projects, and always with hundreds ofrules but no verification that the code complied with those rules. JPL managementstrongly supported the move toward a single unified standard with verifiable rules,but the support that really counted was that of the software developers themselves.

Many of the key developers were part of the process that led to the adoption of thenew coding standard, and virtually all have embraced it as a valuable tool in the effortto increase code quality and reduce risk. What’s perhaps most unexpected is that thenew Coding Standard has evoked a sense of enthusiasm among managers and devel-opers. For a standard that’s meant to restrict what one can do, this is perhaps the bestof all possible outcomes.

compliance. Different projects can adoptdifferent levels of compliance dependingon how mission-critical the project is andthe cost of bringing any legacy code intofull compliance. Highly critical code shouldbe compliant at level 6.

The introduction of different levels ofcompliance was an important factor in get-ting critical support at JPL (among bothmanagement and developers) for theadoption of the new standard. Qualityimprovement is never an “all or nothing”proposition. It’s possible to make signifi-cant improvements with a more gradualapproach, easing the transition towardbroader improvements of the softwaredevelopment process.

ENFORCING THE RULESA coding guideline has little value if it’snot enforced, but manually reviewingpotentially millions of lines of code tocheck compliance with the rules wouldbe tedious, if not impossible. Automatictools can check compliance quickly andthoroughly, while freeing software review-ers to focus on higher-level designissues. The Power of Ten rules weredesigned with automatic analysis in mind.The use of static analysis tools to checkcompliance with the Power of Ten rulesis today a required part of the flight soft-ware development process at JPL.

In order to support automaticenforcement of the Power of Ten, JPL’sLaboratory for Reliable Software workedwith GrammaTech, which develops andmarkets analysis tools, to define the rulesprecisely enough to be checked mechan-ically. GrammaTech then developed a pro-totype (based on the CodeSonar staticanalysis tool) for checking the rules. Theextended prototype was deployed at JPLfor use by engineers coding under thePower of Ten rules.

As expected, early use revealedsome cases where the tool’s judgmentdeviated from what its users expected(for example, flagging loops as poten-tially unbounded when they used safeand commonly understood idioms). Butafter some revisions, the prototypebecame the primary tool used to ensurecompliance with the rules. A handful ofstatic analysis tools can be customizedto enforce an organization’s specific cod-ing rules, and, out of the box, many canfind common errors such as null-point-er dereferences and buffer overflows.

JPL uses a proprietary code review-ing tool called Scrub to track compliancewith the coding rules as an integral part

of a peer code review process. Analysistools perform automatic checks on everybuild of flight code, and the results are col-lated and presented to reviewers via Scrub.JPL expects to make the Scrub tool avail-able to the public so that others can try thetool with their own static analysis tools.JPL also plans to release some of the non-commercial analysis scripts that feedresults into Scrub, so users without accessto the commercial analyzers still will be able

to check for some undesirable patterns.

EVOLUTION OF THE RULESThe public response to the Power of Tenrules and the experience of applying therules to flight software development ledto some clarifications and fine-tuning ofthe original rules. For example, initiallyRule 9 forbade the use of function point-ers under any circumstances, but thiswas too restrictive and provided a sig-


TABLE 1: RULES TO THE POWER OF TEN

1. Restrict to simple control flow constructs. For instance, do not use goto state-ments or setjmp or longjmp constructs, and do not use direct or indirect recursion.

2. Give all loops a fixed upper-bound. It must be possible for a checking tool to provestatically that a preset upper-bound on the number of iterations of a loop cannot beexceeded. If the loop-bound cannot be proven statically, the rule is considered violated.

3. Do not use dynamic memory allocation after initialization. This forbids the use ofmalloc, sbrk, alloca and all variants, after thread or process initialization.

4. Limit functions to no more than N lines of text. In principle, a function shouldn’tbe longer than what can be printed on a single sheet of paper in a standard referenceformat with one line per statement and one line per declaration. Typically, this meansno more than about 60 lines of code per function.

5. Use minimal N assertions for every function of more than M lines. Assertions areused to check for anomalous conditions that should never happen in an execution.Assertions must be side-effect free and are best defined as Boolean tests. For safe-ty critical code, typical values for N and M are N=2 and M=20. For less critical appli-cations, the value of M could be increased, but to be meaningful it should always besmaller than the maximum function length. (See Rule 4.)

6. Declare data objects at the smallest possible level of scope. Data objects usedin only one file should be declared file static. Data objects used in only one functionshould be declared local static.

7. Check the return value of all non-void functions, and check the validity of all func-tion parameters. The return values of non-void functions must be checked by each call-ing function, and the validity of parameters must be checked inside each function.

8. Limit the use of the preprocessor to file inclusion and simple macros. The C pre-processor can exhibit surprisingly complex behavior that’s best avoided. Token pasting,variable argument lists (ellipses) and recursive macro calls are not permitted. All macrosmust expand into complete syntactic units. The use of conditional compilation directivesshould be restricted to the prevention of duplicate file inclusion in header files.

9. Limit the use of pointers. Use no more than N levels of dereferencing (star opera-tors) per expression. A strict value for N is 1, but in some cases using N=2 can bejustified. Pointer dereference operations may not be hidden in macro definitions orinside typedef declarations. The use of function pointers should be restricted to sim-ple cases.

10. Compile with all warnings enabled, in pedantic mode, and use one or moremodern static source code analyzers. All code must compile without warnings andmust be checked on each build with at least one (but preferably more) good staticsource code analyzer. It should pass the analyses with zero warnings.

nificant hurdle in the use of legacy code.Rule 9 has since been relaxed somewhatto allow function pointers, as long asthey’re always tractable (by a human ora static analyzer) to which function orfunctions are actually pointed to. Rule 5initially specified a minimum density oftwo assertions per function, but this wastoo severe for small procedures that hadminimal functionality, so it was general-ized somewhat.

Discussion of the rules has evolvedas well. The Power of Ten website(http://spinroot.com/p10) now has anextended discussion of the intent andprecise meaning of each rule, its ration-ale and tips for the rules’ application.This content continues to evolve toaddress comments by others and to clar-ify the rules.

TEN IN OTHER CONTEXTSEven though the Power of Ten rules weredesigned for safety-critical software writ-ten in C, their aim of reducing coding errorsby making source code easier to under-

stand and analyze is relevant to any proj-ect in which software quality is important.

If your project is written in a languageother than C, many of the rules can stillbe applied with minimal modification. Forexample, Rule 2 (“Give all loops a fixedupper-bound”) can be applied to virtuallyall currently used programming languages.

If your project doesn’t warrant suchstrict guidelines, keep the general prin-ciples in mind. Perhaps limiting your func-tions to 60 lines is too constraining foryour project, but the overall goal of keep-ing functions short and easy to under-stand should still apply. Complex point-er manipulation may be warranted in proj-ects that are not safety-critical, but suchprojects can still benefit if they keep inmind the spirit of Rule 9: Pointers are acommon source of confusion, so limitlevels of indirection when possible. Ingeneral, the idea that making sourcecode easier to understand and analyzewill yield better software is appropriatewherever software quality matters, evenif the greatest risk is a paper cut. !


The Power of Ten rules were intro-duced in ''The Power of Ten—Rulesfor Developing Safety Critical Code,''IEEE Computer, June 2006, pages 93-95, and are discussed in more detail athttp://spinroot.com /p10/.

An overview of commercial and freestatic analysis tools is available athttp://spinroot.com/static/.

The Motor Industry SoftwareReliability Association (MISRA) hascoding guidelines for C and C++.They can be purchased at http://www.misra.org.uk/.

The JSF coding standard for C++ is available at http://www.jsf.mil/downloads/documents/JSF_AV_C%2B%2B_Coding_Standards_Rev_C.doc.

The JPL Institutional Coding Standardis maintained by JPL’s Laboratory forReliable Software: http://eis.jpl.nasa.gov/lars/. A copy of the coding stan-dard is available upon request.

LEARNING MORE

FOR MORE ABOUT SOFTWARE QUALITY IN MANNED SPACE FLIGHT, joinus at STPCon 2009, where retired astronaut Colonel Mike Mullane will present "The Dangers of the Normalization of Deviance: A Lesson in Leadershipfor the Test & QA Profession." His keynote describes how such deviance led tothe loss of the Space Shuttle Challenger. Register now! wwwwww..ssttppccoonn..ccoomm

migration is impossible, costly or error prone. 3) To reducerisk: The current solution is known and its flaws are alreadyunderstood.

However, the desire to access data for new and dif-ferent purposes means more demand for services fromIT. To satisfy user demand and seize on new opportuni-ties quickly, IT adds new applications to interface withexisting systems. This often involves a process known asextract, transform and load (ETL).

But whenever bits travel from the old system to thenew and back again, as in ETL-based systems, therecan be multiple potential points of failure. In this article, Ilook at the various points in the interface process andidentify key considerations and test points. I’ve also includ-ed some hints to make you look good and give your proj-ect a better chance for success (always a worthy goal).

I. EXTRACTIONHow’s the data?When extracting “old” data for use in new systems, I oftencome across the “ugly stuff” that’s been under the rugand ignored. Referential integrity problems can includemultiple copies of a parent record with slight variationsin the name or other field, inactive master records withactive details or detail records with missing parents. Andjust as often, those lingering problems are passed alongto members of the new system project.

You certainly don’t want (and might not be able) tomove them to the new solution. This is a great chanceto get the house in order, and there might even be a cleans-ing process already in place that’s forgotten or just notbeing used. If not, you should create a method to clean

up those “bad” records. There are several data cleansing solutions available

to help. While the exact process is beyond the scope ofthis article, there are some techniques that can help youdetermine the severity of the problem and whether youcan build your own solution to correct it.

How’s the accuracy?An interface typically includes taking data from an exist-ing system by some type of query. Once you have thedata, you need to test it from macro and micro perspec-tives.

Micro-—Using a single record that’s known to begood, ask questions similar to the following:

• Are all the fields present and in the correct order?• Are there any leading zeros, incorrect capitaliza-

tion, etc.• Are the values in the fields correct, and are they

rounding where appropriate?• Are special characters (if used) being extracted and

handled correctly?Macro—With a group of records, analyze the fol-

lowing:• Do you have the expected number of rows?• Does a summary of given fields match the expect-

ed value?• When variances exist, can they be tied to a spe-

cific record?• Are there valid reasons for the variances?

II. TRANSFORMATIONTransformation occurs (or is required) when one systemuses one set of values for a specific meaning and anoth-er system uses or stores those values differently. A goodexample is in assigning a status level such as Active,

Companies often use interfaces to link new solutions or front-end interfaces to existing systems.Some of the key benefits of doing this: 1) To minimize cost: When an existing application isalready serving a business need, the cost of adding on to increase access is lower than the costof rebuilding. 2) To preserve data: The business data is already on an existing system, and

Nels Hoenig is a software quality engineer with Unicon Consulting,a software development and IT services consultancy.

Stop The Leaks

By Nels Hoenig


Prevent ETL

Errors When

Building or

Maintaining

New Front-ends



Inactive or Pending. One system might storeActive status as a string of characters whileanother uses an integer value (1, 2, 3).

If the old and new systems don’t sharethe same rules and values, middleware willbe needed to perform transformations. Fortransformation to function correctly, allsource and destination data must bemapped properly in advance before begin-ning any transformation process.

Did all records get transformed?You should be able to compare counts ofrecords submitted, processed and gener-ated as output. If counts don’t match, thefirst place to look is the “error bucket,”which catches records that produce an errorduring transformation. Also check formerged records; sometimes multiple trans-actional records are combined into a singlerecord on the generated side.

Since it’s difficult to examine all the dif-ferent scenarios in a detail format, youshould once again use micro and macroanalysis. If your data contains all the varia-tions, your summary can quickly tell you ifyou have a problem to investigate.

How’s the output format?Are the generated records in the correctformat? Do they contain the expected val-ues? This is the same set of tests you didat the point of extraction, but by checkingagain here, you build further confidence inthe process and eliminate a common sourceof defects.

III. LOADAt the end stage of the ETL process, weneed to ensure once again that all recordswere processed, that correct data is in thesystem and that any errors have been iden-tified.

Were all records loaded?As in the transformation step, you knowwhat the input is supposed to be. But sinceyou might be processing transactionalrecords, it’s possible that you’ll need to

know the “before” value in certain fields—such as those that hold a running total—orthe amount of expected change from oneiteration of a record to the next.

So in this step, it’s possible thatyou’ll be comparing a predicted total toan actual total. Many systems also keepa transaction log. If yours does, use it tocompare record counts and look for anycount mismatch between the middlewareand the log file.

Did the data load correctly?It’s not enough to make sure all submittedrecords were processed; you also need toensure that the records were processedcorrectly. This is another place where microand macro testing can really make theprocess go faster.

Does the solution meet the businessneed?I was once part of a project to build a glob-al interface that management really loved.But when we rolled it out after eight months,the users were less than impressed. Theysaw no value in the solution and would notlog in and use it. It was a great idea; it justwasn’t anything people wanted to use andit should have been killed much earlier inthe cycle.

The question of suitability to task isusually handled by the business analyst. But

it’s important for the development team tomake sure the question gets asked of realbusiness users, and as early in the processas possible.

Data typesThere are two basic types of data in playhere: base data and transaction data. Basedata includes database-style records suchas those of customers and products againstwhich transactions are conducted.Transactional data are records that repre-sent transactions performed against baserecords—for example, sales figures, openorders and overdue invoices.

Base data usually persists for long peri-ods of time, while transaction data is tran-sient. The data type is important to the testerbecause it determines the types of tests toperform and the output to expect from thosetests. When testing interfaces, it’s impor-tant to note that the same data could bebase in one system and transactional inanother. Take, for example, an interfacebetween your sales system and your ship-ping company. While your sales order istransactional to you, that same order (andresulting shipment) is treated as a persist-ent object that’s tracked and stored as adeliverable by the shipper, with transactionsperformed against it. In other words, thatsales order becomes part of the shipper’sdatabase.

When developing your tests, consid-er how the data is perceived in the appli-cations being interfaced. Base data shouldhave only a single insert, with applied to agiven base record based on transactions.Transactional records should be updatingsome fields in the related base records, butnot creating new base records. Base trans-actions may include transactional data butmay just be doing an insert or delete of thebase record.

Owning the dataWhen dealing with interfaced solutions,there’s always the concept of master andslave data sources. This is important, as onlyone system should be considered the “mas-ter” database—the ultimate source for keyrecord items. You may have multiple slaveapplications, each using and processing aset of records for a specific purpose.

While it’s possible for a slave applica-tion such as a Web site to create a newcustomer record, the master application ordatabase must own the processes of check-ing for duplicates and assigning a uniqueidentifier in the database used to recognizethe record, regardless of where it’s used.Again, it’s important to understand whichsystem is the master, especially if you’retying together multiple systems or addinga new system to an existing group of inter-faces.

Sending back the resultsWhen sending back transactional or new,unique records to the master application,many of the same tests already describedmust be considered and used. In most cas-es, the data is primarily transactional innature and smaller in volume.

Interface failure pointsGone are the days when your network

Hint: Transformationissues can be found earlyby using SQL to validate

all the records to be interfaced.

Hint: Beware of rounding; it can be a real

factor, particularly when analyzing

transactional records.

defined your IT footprint. Software asa Service (SaaS), cloud computing andother factors in the marketplace nowmean the application(s) you interfacewith might not be under your roof, underyour control or even in the same coun-try as your data center. You need to planfor and include connectivity, perform-ance and recovery elements such asthese in your testing plan:

• Loss of the connection beforetransmission

> Your interfaces are likely ona schedule. What happensif the schedule can’t be fol-lowed?

• Loss of the connection duringtransmission

> For any number of reasons,the connection can be lost.What is the process toresend and how do youmake sure records are notprocessed more than once?

• Garbled records during trans-mission

> The Internet has safeguards,but not addressing this issuecan be inviting a nasty sur-prise.

• Loss of the connection aftertransmission

> Your transfer is completed,but the connection is lostbefore you get acknowl-edgement of success. Doesthe system resend or canthe acknowledgment comelater?

Test your interface testing toolsNo matter how good the cook or howclosely a recipe is followed, an inaccu-rate measuring cup will yield a subpardish. If the tool you’re using to validateand verify your test results is flawed, yourun the risk of leaving bugs unfound orfixes applied incorrectly.

Existing toolsWhen testing interfaces, you may needto use existing reports included in thetarget and source solutions, SQL queriesto examine the data at different placesin the process and/or third-party report-ing tools for an easy macro/micro per-spective.

One of the first things you shoulddo is test the testing tools and reportsthemselves to ensure that they’re accu-rate. On more than one occasion, thereport I was using to validate test results

was flawed, yielding incorrect results. Even if you’re building new SQL

queries or reports to validate results,test the tool before testing the data. Thistesting ensures correct results from yourtool, which may include or exclude someof the data in your sample set. It’s too

easy to write a test where the expect-ed results include all the data in the sam-ple set and pass the test without anynegative testing to ensure the selectionis correct.

If you’re working with large amountsof data, be aware that the structure ofyour query can impact the time need-ed to get the results. A poor join clausewill take a long time to run and may yieldinaccurate results. The tables you’reusing may include indexes to improveperformance, and understanding thoseindexes can make a real difference inresponse time.

A secret I’ve learned is to getadvice from the developers whose codeI’m testing. They know the tables andsystems involved and will enjoy review-ing your code for a change.

Performance testingJust like any other software application,interfaces need to be tested for per-

formance. With interfaces, you’re typi-cally moving large volumes of datathrough small windows of processingopportunity, so bottlenecks can really tiethings up.

Determine expected volumeOften in my experience, the businessneed for creating the interface is clear-ly understood, but the actual expecteddata volume has never been estimated.Your testing should include the expect-ed volume, as well as worst-case sce-nario volume tests.

A project I recently worked onincluded product updates being sentto a warehouse system each night. Onoccasion, it was necessary to reload theentire product master to the warehouse.Pre-deployment testing showed that ittook more than 30 hours to create themaster file and an additional six hours tosend it across the interface. In this case,spotting the issue prior to going live wasa critical success factor.

Expected connection speedIf the application you’re interfacing withis remote from your data center, yourtesting should consider Internet con-nection speed and fluctuations due totraffic congestion. You also need to planfor unexpected events, such as naturaldisasters, media events and other occur-rences that can sap Web performance.Testing should include peak and off-peakhours to measure the impact of networklatency.

Similarly, if you’re interfacing withan e-commerce Web site hosted off siteand your business cycle is linked to aholiday season, it’s good to plan ahead.Remember, Black Friday is one of theheaviest days for Internet transactionvolume, and it can also be a heavy dayfor bricks-and-mortar stores to support,so account for that in your testing.

When building and testing inter-faces, it’s important to keep an eye onall the potential trouble spots. After all,you’re connecting two unlike entitiesthat might not have been intended tocommunicate. Those disparate systemsare being connected via an unpre-dictable medium, and the informationbeing displayed may contain years’worth of data-entry errors. What’s more,you’re using code that was developedby an imperfect human being.

If you know where the weakness-es are, you’ll know just where to lookwhen something goes wrong. !


Hint: An “order by” clause always adds

processing time andshould be avoided when

working with large blocks of data.

Hint: Interacting withdevelopers can be an

invaluable team-buildingstep, and you get better

SQL to boot.

of the multi-core and multi-processor machines that have quick-ly become the norm. To accomplish this, an application must bemulti-threaded. Unfortunately, threading, if not managed proper-ly, can cause programs to slow down or to stop entirely.

WHY USE THREADS? (THE GOOD) To understand a thread, it’s helpful to first understand how aprocess is handled by the system. A process doesn’t shareresources with other processes running on a system and istherefore independent. For example, a word processor and aspreedsheet running on the same computer are two process-es. The two processes do no share or interact with each oth-er. Alternatively, threads within one process share that process’memory and other resources. A thread does have its own stackand a copy of the registers, including the program counter, andeach thread within a process executes independently of otherthreads.

On a multi-core or multi-processor machine, each thread(or process) can run on a different processor and can there-fore complete more work in a given amount of time. Threadscan also run on a single-processor machine, of course, butonly one thread may execute at a time. When an applicationhas more threads than processors, the operating system willcontext-switch between the different threads to give eachthread a chance to execute. Having more threads than proces-sors isn’t necessarily bad, and in many instances can be ben-eficial. Furthermore, a developer often doesn’t know what classof machine the application will be executed on, so an applica-tion should be written to take advantage of more cores if they’reavailable.

There are four main reasons why an application should bethreaded:

1. To improve application responsiveness2. To separate disjoint tasks within an application3. To improve application throughput4. To improve performance via parallelism

Many related events in the world can occur asynchro-nously; similarly, components within a software application mayalso be able to execute asynchronously. For example, an inter-active application might process keyboard or mouse input inone thread and execute CPU-bound tasks in a second, inde-pendent thread. Unlike legacy systems in which the keyboardlocks when a user selects a task, today’s systems allow con-tinuous input while other tasks are processing. Threading thistype of application can be beneficial even on a single-proces-sor machine simply because the number of context switches(when the OS must let another thread run) will normally be lowand therefore the delays will not be noticed by the user. Theuser gets the advantage of instantaneous interaction withthe GUI without compromising machine performance.

Even on single-processor machines, applications oftenhave disjoint tasks that are being performed. These tasks mightbest be handled by individual threads. One example of sepa-rating disjoint tasks within an application could be one that’sprocessing work but also waiting on a socket to accept moredata. One thread simply listens on the socket while the otherthread performs the data processing. Another example mightbe an application that’s I/O-intensive yet must process thedata that’s read. One thread could be reading data from the

disk while another is CPU-bound and is processing the datathat has been read. On multi-core machines, the CPU-boundportion of the application could be further threaded to improveperformance.

An example of improving application throughput might bea Web site that supports concurrent multiple users, none ofwhom must wait until another user has finished; all users canaccess the Website simultaneously. Each user would be exe-cuting in a separate thread so that the work they perform does-n’t interfere with the work of others. Again, this type of thread-ing can be beneficial even if there are minimal cores, espe-cially since the majority of the user’s time would be spent read-ing the Web page that was downloaded and not executing inthe server searching for and downloading the page (i.e., whileone user is reading the page, another user could be usingthe processor to download a page).

Improvements in performance via parallelism can best beshown by looking at how to build a wooden table.

There are four main steps to building the table:1. Cut the table top and sand it. Assume that this takes

15 minutes.Claire Cates is principal developer and performance analyst at SAS.

Processors are getting faster, but not at the rate thayhad been in past years. Today, chip makers are doublingup on processor cores, so a good way to increase thespeed of an application is to take advantage

By Claire Cates

Threaded Apps Can Be Like The Wild West. Testing Them Doesn’t Have to Be.

Phot

o by

Jea

nne

Hat

ch



2. Turn the four legs in order toshape and smooth them. Assumethis takes 30 minutes for all fourlegs.

3. Assemble the table. Assume thistakes 10 minutes.

4. Stain the table. Assume thistakes 10 minutes.

One table will take one person 65minutes to build. Four tables would takethis same person 260 minutes to build.Now let’s assume we have four peopleand only one set of equipment. One per-son can run the saw and sander, one per-son can turn the legs, one person canassemble the table and the fourth per-son can stain the table. Having four peo-ple on this project would cut the time to140 minutes. Figure 1 shows each of thefour table-building tasks, divided into five-minute blocks of time and with a differ-ent color for each of the four tables. Asyou can see, the four tables can be builtfaster, but three of the employees areidle for more than half the time.

You could teach employee 1 notonly to cut and sand the table topsbut also to assemble and stain.This would not reduce the totaltime, but it would save twosalaries; therefore, it would beless expensive to produce thetables (see Figure 2).

Alternatively, if you had foursets of tools and four employees,each trained on all skills, youwould quadruple output—pro-ducing four tables instead of onein 65 minutes—but this wouldmake the tables more expensivebecause you’d be paying for moretools, training and salaries (Figure3).

Next, we might consider thelongest job in the system, the turning ofthe legs, and see how we could speedup this process. If we hired one moreemployee capable of turning the legs,and we bought an extra lathe, we couldcut the time for building four tables to80 minutes while keeping most employ-

ees busy most of the time (Figure 4). The same reasoning can be used

with threading. Common types of pro-gram parallelism are data parallelism,task parallelism and pipeline parallelism.

Data parallelism refers to distrib-uting the data being processed acrossmultiple threads. Each thread is per-forming the same function, but on a dif-ferent set of data. The final result canbe achieved when all threads process-ing the data have completed. In theabove example, the turning of the legsis the portion of the activity that wasdata-parallelized.

Task parallelism refers to disjointtasks being run on multiple threads. As

long as the one task does not dependon the results of the other task, the twotasks can be run concurrently. In theexample above, cutting and sanding thetops and turning the legs can be doneat the same time.

Pipeline parallelism comes into play

when one task has a dependency on theoutput of a previous task, as in anassembly line. Each thread is dedicatedto one particular task. Thread A starts,and when it finishes computing iteration1, the results are sent to Thread B.Thread B then starts its process on iter-ation 1. At the same time, Thread Astarts a new computation—iteration 2.This process continues until all iterationsare completed. The serial portion of thiscode occurs in Thread A, iteration 1, andin Thread B, on the last iteration. In theabove example, the assembling andstaining must occur after the wood hasbeen processed, but the workers stain-

ing and assembling can be working asthe wood for the next table is beingprocessed.

In the table-making example, theemployees are equivalent to the differ-ent threads that can execute on amachine; the tasks these employees per-form are equivalent to the code that’sexecuted by the thread. Adding threadsto the system can speed up an applica-tion, but it won’t cut the total time by thenumber of threads.

Furthermore, if the numberof processors on the machine isless than the number of threads,threads will have to share thatprocessor and the execution timewill increase.

Many common computerscience tasks can benefit fromparallelization.

Finally, you can have multi-lev-el parallelism. For example, a cus-tomer database could have athread that’s simply reading thecustomer data, several threadsthat are currently sorting the dataafter it’s been read and anotherthread that produces the reportsfrom the final sorted data. There

could be other threads accepting customerinput and processing those orders andanother thread that simply updates aninventory database. In this case, all ver-sions of the parallelism are incorporatedinto the system. Threading models of anapplication can become quite complex.

FIG. 1: MORE THREADS, MORE IDLENESS

FIG. 2: TWO THREADS, NO WAITING

[Adding threads can speed

up an application, but it

won't cut the total time by

the number of threads.]

HOW MUCH FASTER CANTHREADING MAKE ANAPPLICATION?Unfortunately, not all portions of an appli-cation can be threaded. The portionsthat can be threaded must represent asignificant part of the application’s exe-cution time in order to improve the per-formance of the system.

If the execution paths are drawn, athreaded application might look some-thing like the illustration in Figure 5 (page27). Notice that part of the applicationpath remains serialized and runs on onlyone thread. Other parts of the applica-tion can be threaded, but these areasshould be threaded only if they’re hotspots or if they take up a majority of theexecution time. Remember, even havingan infinite number of processors won’tdecrease the execution time of the seri-al portions of an application. It’s the seri-al portion of the application that ulti-mately limits the performance improve-ments. The execution path shown inFigure 5 is simple; in many applications,some of the threads spawned to per-form a task spawn other threads. Anexecution path picture of a complexapplication can be daunting.

Other criteria that should be con-sidered when threading an application:

• If an application is I/O-bound,threading the code might notmake the application faster since

a shared disk itself is serialized.Yet moving the I/O to its ownthread while other processingoccurs in alternate threads canincrease performance. Further-more, if the data is spread acrossmultiple disks, it may then beadvantageous to use multiple I/Othreads as long as each thread isworking off an individual disk.

• If an application is memory-bound,threading the code will probablyonly speed up the threaded partsby about 50 percent. The memo-

ry cache can throttle down theperformance if the cache is con-stantly being updated.

• CPU-intensive applications are thebest candidates for threading.

IF THREADS ARE SO GOOD,WHY AREN’T MORE APPSTHREADED? (THE BAD)

Many of today’s applications are stillnot threaded. Why? Because writingthreaded code is difficult, and it makesdevelopment and debug cycles longer.

When writing code that will takeadvantage of parallelization, you shouldconsider several things:

• Which areas of the applicationshould be threaded

• How best to thread those sections

• Serialization of shared resources• Threaded applications are harder

to debug

WHAT TO THREAD The first thing a developer needs todetermine is what (if any) part(s) of theapplication should be threaded. The bestway to accomplish this is to run the appli-cation under a code profiler to determineany hotspots in the system. Once thehotspots are identified, ask these ques-tions about your application:

1. Does the application allow for

multiple concurrent users? If so,the execution of each user’s codepath could be in a unique thread,but you may need to limit thenumber of users (See The Ugly.)

2. Are there certain steps in theapplication that must be per-formed in order? If so, eachthread could be a step, butremember from the table-build-ing example that the longest step(turning the legs) will determinethe overall execution time for theapplication. For instance, anapplication could:

a. Read datab. Do computation on each ele-

ment readc. Print a report

If so, you could create one threadthat reads the data and stores it in abuffer, while a second thread looks atthe data and starts processing. Oncethose threads have completed, a thirdthread can print a report.

3. Does your application processlarge amounts of data and canportions of that data beprocessed independently? If so,create multiple threads toprocess portions of the data. Anexample of this would be sortingportions of data in multiplethreads and then merging thedata after each thread has fin-ished.

4. Is there a GUI associated withthe application? If so, you mayconsider placing the code thathandles the user input in a sep-arate thread. This allows the userto enter data in one thread, whilethe application is processing thedata in another and possiblyupdating the display in another.This type of threading model can


FIG. 3: FOCUS ON DIFFICULTY

FIG. 4: FASTEST, MOST EXPENSIVE

be beneficial even on a single-processor machine since itmakes the user feel like the sys-tem is performing multipletasks simultaneously.

5. Are there multiple disjointtasks that can run inde-pendently of each other?If so, these tasks can berun concurrently.

Looking at the code in theapplication may also help in theprocess of determining what toparallelize.

A coding example of taskparallelism:

< start up code >Call AnalyzeA ( handle, A );Call AnalyzeB ( handle, B );Call AnalyzeC ( handle, C );< termination code >

Since AnalyzeA, AnalyzeB andAnalyzeC are independent, these threeactions could be performed in parallel.

A coding example of data paral-lelism:

for ( i = 1; i < n ; i++ )Process_data( array[ i ] );

Since Process_data is working on

different areas of the array, each itera-tion of the loop could be executed in aseparate thread.

A coding example of pipeline paral-lelism:

for( i = 1; i< n; i++ ){Get_next_chunk( data, i );Output = Analyze( data );Update_Database( Output );

}

The data must first be obtained,then analyzed, then updated. One thread

could simply get each data block. Thenext thread analyzes that data block, andthe output from the analysis is updatedin the database.

Complex applications can include acombination of multiple techniques; there-fore, do not limit the types of parallelismyou look for as you search your code.The thread diagram (Figure 5) for multi-level parallelized systems can becomecomplex, but if the application is pro-grammed correctly, it can provide theusers with fast, efficient functionality.

After determining areas in the codethat are candidates for parallelization,look at the amount of code to be exe-cuted and compute how much time theapplication is likely to spend in that sec-tion. Does that amount of time greatlyexceed the amount of time it takes tocreate and destroy a thread along withthe time needed to communicatebetween threads? Creating and destroy-ing a thread can be expensive (in CPUcycles) and overwhelm any parallel sav-ings. If the amount of time spent in thecode to be threaded is minimal, thread-ing that section of code could actuallyslow the application down. Picking theportions of the code where threadingimproves performance is essential.

Once you’ve determined that itwould be beneficial to thread a sectionof code, the developer needs to deter-

mine the optimum number ofthreads to create. This in itselfcan be complicated. How manyother threads are running in theapplication? Should the develop-er create more threads than thenumber of processors? Too manythreads can lead to thread thrash-ing (excessive context-switching).Adding threads in this instancewill only slow the system down.The number of threads that canrun at any one time is limited bythe number of processors.Furthermore, memory and I/Ocaches can be destroyed, possi-bly causing unnecessary delayswhen the thread once again gets

a chance to execute. An applicationshould have more threads than proces-sors only if some of the threads don’texecute often and are normally in a waitstate. A developer should also realizethat the application may be sharing themachine with other applications, so theapplication may not want to use all CPUsor other resources.

The number of threads in an appli-


TABLE 1: PARALLELISM THREADING EXAMPLES

Many common computer science tasks can benefit from parallelization.

Example 1 (Data):An application that must perform a sort on a large amount of data could invoke threads,each sorting a portion of the data. Then the sorted lists can be merged to produce thefinal sorted list.

Example 2 (Data):Adding two matrices is faster with two or more threads. One thread could be addingthe top portion of the matrices, while the other thread is adding the lower portions ofthe matrices.

Example 3 (Data):Using threads to help render graphics can be advantageous. When working with images,each thread can take a portion of the image. When working with ray tracing, each threadcan work on computing the ray tracing for a portion of the picture.

Example 4 (Task):An application that has its data already in memory could be performing a record byrecord print in one thread while performing computations on the data in a separatethread.

Example 5 (Pipeline):One thread could be accepting the input for a customer’s order. Another thread couldprocess the order. The order can’t be processed until the customer’s order has beenentered.

[Creating and destroying a

thread can be expensive (in

CPU cycles) and overwhelm

any parallel savings.]

cation should be flexible. It’s best in manycircumstances if the number of threadsis decided at run time. If an applicationcreates a fixed number of threads andthe data being processed is small, theoverhead of the thread creation, dele-tion and communication can nullify anybenefits sought after from threading. Ifthe application is written to determinethe number of threads based on the data,the application can use the threads togain performance and not affect overallperformance if the data load is light. Inlight data load instances, it might be bestfor the application to not create anythreads and to perform all work in a sin-gle thread.

After determining what to threadand how many threads should be cre-ated for optimum performance, thedeveloper needs to determine how thethreads will communicate. Will threadscommunicate between worker threads,or will communication be handled by amanager thread, which then talks withthe worker threads? Determining thebest way to thread depends onwhat is being threaded and thethreading model used.

SERIALIZATIONSince threads run asynchro-nously, no two runs of the appli-cation can guarantee that thetwo threads will be executed inthe same order. Threaded codemust therefore serialize anyaccess to changeable sharedresources in order to maintainthe consistency of the shareddata. If two or more threads areaccessing updatable shareddata, program crashes or incor-rect results can occur. A datarace or race condition occurswhen multiple threads access ashared resource and the value of theshared resource is dependent on theexecution order of the two threads.

Allocated memory, stack variablesand global variables are the most com-mon examples of a shared resource, buta developer must also consider files or

other I/O devices as possible sharedresources.

A crash can occur if two threads areexecuting at the same time and onethread is in the middle of removing anitem from a linked list while anotherthread is searching that list.

In the following sample code, ThreadA is removing item from a linked list whilethe variable item had the value to bedeleted:

Temp_item = item->prev; Item->next->prev = Temp_item; Temp_item->next = item->next; Free( item );

Thread B is searching through the linkedlist:

For( t2i = list_head; t2i; t2i= t2i>next)

Total += t2i>count;

1. Thread B is executing, and t2i is setto an address (let’s say 0x12345678).

2. Thread B is put into a wait state bythe operating system.

3. Thread A starts executing and itemhappens to have the value0x12345678.

4. If Thread A is able to complete its exe-cution before Thread B is allowed tocontinue its execution, when ThreadB continues executing, the value in t2i

would’ve been deleted. The applica-tion more than likely would crash.

Another reason access to sharedmemory needs to be serialized is toavoid a data race condition, during whichresults are unpredictable.

If both Thread A and Thread B aresharing the variable cnt, and each threadat some time needs to increment cnt, arace condition could occur if the incre-ment was not properly serialized.

To increment a value:1. The contents of cnt are loaded

into a register.2. The register is incremented.3. The contents of the register are

stored back in cnt.The result of cnt after both Thread

1 and Thread 2 could be differentdepending on the order of execution.

In both cases, let’s assume cnt is 0before the process starts:

The value of cnt would be 2 whenCase 1 (page 28) finished but would be1 in Case 2 because the entire processof incrementing the value was not seri-alized. An application should not produceresults that are unpredictable. To avoidthese scenarios, a developer should tryto minimize the shared data betweenthreads. If that’s not possible, all sharedaccesses must be serialized, which is byno means an easy task. Every line ofcode that’s executed by any thread mustbe checked to determine which shared

resources can be written to.This is especially complicatedwith code-heavy threads. It canbe further complicated if thecode calls subroutines or func-tions that are not written by thedeveloper but use commonshared resources. Such sce-narios could include:

• An operating system call,which is not necessarily“thread-safe.”

• Another developer’s codeelsewhere in the compa-ny. Does that developerknow that his code isbeing called by multiplethreads?

• Third-party software. Is itthread-safe?

If the code called by a threadedapplication is not thread-safe, no matterhow well-developed the developer’s ownthreads are, the application will still notperform correctly. It’s essential for thedeveloper to ensure that all executedcode is thread-safe.

FIG. 6: SERIAL/PARALLEL SPLIT


FIG. 5: DIVIDE AND PERFORM

So, what does thread-safe mean?To make code thread-safe, the devel-oper must serialize access to sharedresources. Before a developer startslooking at how to serialize non-thread-safe code, he should look to see if analternative local to the thread is avail-able. Could a copy of the data be givento each thread, with the changes laterupdated in the master copy of the data?If so, this might be considered. If not,the developer needs to determine howto serialize the code that accesses theshared data.

The mechanism to perform the syn-chronization can be any one of the fol-lowing:

1. Atomics2. Regular locks3. Read/write (RW) locks4. Spin locksAn atomic operation is completed

entirely before another operation isallowed to commence. A devel-oper could replace the incrementin the earlier example with anatomic increment to avoid theproblem. In the atomic increment,the process of the increment andstoring the value into memory can-not be interrupted.

A regular lock (mutex or sem-aphore) can be used to controlaccess to a critical section. A crit-ical section is a section of codethat accesses a shared resourcethat mustn’t be accessed by anyother thread while the critical sec-tion is being executed. A lock canbe used to bracket the critical sec-tion as follows:

• Obtain lock.• Critical-section code uses a

shared resource.•Release lock.Every place in the code where the

shared resource is being used must bebracketed by a lock get/release. Sucha lock prevents another thread from exe-

cuting a piece of code that’s similarlylocked until the first thread releases thelock. The thread trying to obtain the lockwill be put into a wait state until thethread holding the lock releases it.

Therefore, the critical sections sur-rounding the shared resource will beguaranteed to execute serially and with-out interruption.

A read/write lock is similar to a reg-ular lock, but the lock can be obtained

either in read or write mode. Each of mul-tiple threads can obtain a read lock with-out any of the threads being put into await state. It’s only when a thread triesto obtain a write lock that a thread orthreads will be put into a wait state.

A read/write lock might be usefulfor an in-memory database. If the major-

ity of the access to the database is toread the data, then the data values arenot changing, and all threads can grabdata without having to be serialized. Butwhen the data needs to be updated, thatwrite operation (thread) must requesta write lock and will be put into a waitstate until all currently held read locksare released. The write lock will then beobtained and held during the update, dur-ing which time all threads requesting aread or write lock using the same lockwill be put into a wait state. Once theupdate has completed, the lock isreleased and any waiting threads will beallowed to proceed. A read/write lock ismost efficient if the number of reads farexceeds the number of writes.

Finally, a developer could use a spinlock, in which a thread waits in a loopuntil the resource becomes available. Ifthe amount of time to wait for theresource is typically short, spin locks arecheaper because the thread doesn’t

have to go into a wait state andpossibly be removed from aprocessor or incur the overheadof executing a context switch.But if the length of wait time isgreat or not known, a spin lockshould not be used. A spin lockdoesn't release the processor forother threads to use, but insteadkeeps the processor at 100 per-cent utilization. Translation: mostof the time the processor is busydoing nothing.

When deciding how muchcode should be locked, there aretwo important things to consid-er. First, if the critical section ofcode is too large, other threads

currently trying to access the lock willbe put into a wait state and become idle.

Suddenly the threaded portion ofyour code becomes serialized. Also, theapplication may not be a candidate forthreading since there is little natural con-currency when large critical sections arepresent. On the other hand, if the locks


CASE 1

Thread A

Register = 0

Register = 1 ( +1 )

Cnt = 1

Thread B

Register = 1

Register = 2 (+1)

Cnt = 2

CASE 2

Thread 1

Register = 0

Register = 1 (+1)

Cnt = 1

Thread 2

Register = 0

Register = 1 (+1)

Cnt = 1

[A read/write lock is

most efficient if the number

of reads far exceeds

the number of writes.]

are obtained too often, the overheadof obtaining and releasing the locks willnullify the benefit of threading.

As a developer isanalyzing code, it’s a goodpractice to move codingstatements that requirebracketing by a lock closeto other statements alsorequiring the same lock toreduce the number of lockaccesses.

For instance, let’ssay code in an applicationis represented as theblocks in Figure 6, wherethe green representscode that can always runin parallel and red repre-sents code that must be serialized. If thedeveloper puts a lock, obtain and releasearound each red block, the locking over-head could be detrimental to perform-ance. Instead, if the developer placessome of the code that must be serial-ized with other code that must be seri-alized, then the reduction in the numberof lock instances will increase perform-ance.

TESTING THREADED CODEOnce the threaded application has beenwritten, testing it brings in another setof problems. It’s particularly importantto check performance on several differ-ent classes of machines. What happenswhen the application is run on a single-processor machine? A dual-processoror multi-core machine under a differentoperating system? Is the performanceacceptable on all classes of machinesand under all operating systems?

And since multi-threaded code pathsare non-deterministic and can thereforeproduce inconsistent results and exe-cution times, the tester also needs toexecute the application multiple timeson multiple machines with the same data.Also, errors and performance problemsmay not be noticed until the system isload-tested and deployed at a customersite.

DEBUGGING THREADED CODEIf a problem is detected and the codemust be debugged, other problemsoccur. Because the developer can’t guar-antee that the code will be executed inthe same sequence, the developer maynot be able to recreate the problem. Thisis especially true when trying to debugareas of the code that were not serial-

ized properly. A threaded application should be

written so that it can be executed seri-

ally if need be, and with a single thread.If the tester or developer determines thatthere’s a problem with the application’sresults, the first thing that should be done

is to run the single threaded version ofthe application and see if it produces cor-rect results. If it doesn’t, this single

threaded application willbe much easier to debugthan the multi-threadedversion.

If the errors occuronly on the multi-thread-ed version, the develop-er will need to use adebugger that handlesmultiple threads so thatthe state of each threadcan be examined.Unfortunately, thedebugger itself often per-turbs the system enoughto prevent the errors

from being reproduced while using thedebugger. Almost every form of debug-ging—from instrumenting the code toputting in printf statements—can perturbthe system enough to prevent errorsfrom being reproduced.

Test organizations are also advisedto consider the use of third-party thread-ing tools such as Intel’s Thread Checkerfor threading issues or a code profilersuch as IBM-Rational Quantify, Intel’sVtune or Intel’s Thread Profiler to deter-mine performance problems. Finally, thedeveloper needs patience since the fail-ures and performance problems may behard to reproduce and may be even hard-er to fix. !

REFERENCES•“Intelligent Test Automation,” by Harry Robinson

(www.geocities.com/harry_robinson_testing/robinson.pdf)

• Developing Multithreaded Applications: A PlatformConsistent Approach http://cache-www.intel.com/cd/00/00/05/15/51534_developing_multithreaded_applications.pdf

• Multithreading Programming Guidehttp://docs.sun.com/app/docs/doc/816-5137

• Dr. Dobbs: Per formance Analysis and MulticoreProcessors : March 30, 2006http://www.ddj.com/dept/64bit/184417069

• Dr Dobbs: Multitasking Alternatives and the Perilsof Preemption: Sept 14, 2006http://www.ddj.com/dept/embedded/193000965

• Data Placement in Threaded Programs: from theIntel Software Network

• Threading Methodology: Principles and Practiceshttp://cache-www.intel.com/cd/00/00/21/93/219349_threadingmethodology.pdf

• Intel Software Developer Webinar Series, Fall 2007• W. Brown, R. Malveau, H. McCormick III, T. Mowbray,

“AntiPatterns: Refactoring Software, Architectures,and Projects in Crisis,” John Wiley, New York (1998)

• C. Smith, L.Williams, “More New Software Anti-Patterns,” Proceedings of 2003 ComputerMeasurement Group

[A threaded application should

be written so that it can

be executed serially if need be,

and with a single thread.]


READ THE FULL STORY (includ-ing The Ugly part) online atwwwwww..ttiinnyyuurrll..ccoomm//TThheeUUggllyy

Originally conceived and implemented by Motorola for manufac-turing, the Six Sigma approach makes use of some problem-solv-ing tools that are as valid for software testing and performance asthey are for hardware design and production.

The use of complex statistics is not a requirement for Six Sigmaprojects and never has been. The concepts of improvement andcontrol are most significant and, indeed, represent a hallmark of theSix Sigma approach (see Table 1).

Another hallmark of the Six Sigma approach is pragmatic proj-ect selection, which is related to the effect of the project on thebottom line. This line of attack is particularly important when deal-ing with software, since a software failure will exist in every copyof a given release and is sometimes also buried in past releases.Hence, the existence of a significant software issue, particularly insafety-related embedded software, will sometimes force an indus-trial recall. This situation can be exacerbated by variations in cus-tomer or field configurations, which often escape detection until thenumber of customers with the wayward software is quite large.

This is the first in a series of articles in which we’ll show howpowerful a simple problem-solving approach can be and demon-strate Six Sigma’s relevance to software testing and performance.This article explains the “define” step in great detail.

CUSTOMERIn general, we develop software for customers and not for our-selves. A customer might be someone from another departmentwithin your company or someone purchasing software at theretail level. Although we do develop software for our own use,the impact severity is typically less and defect containment iscomparatively easy.

When a software problem presents itself, often the bestimmediate response is to listen to the voice of the customer.This includes asking questions to identify what the problemis and is not. What, from the customer’s perspective, are theCritical to Quality (CTQ) attributes that are failing? The prob-lem might not always be an obvious software issue. In essence,we begin a diagnosis of the problem by listing all the knownfactors of the problem as reported by the customer (or patient)

Tracking Quality VariationsWith Six Sigma Methods

By Jon Quigley and Kim Pries

By most definitions, Six Sigma is a collection oftechniques for reducing quality variation in a prod-uct with a strong emphasis on statistical methodsof analysis and some optimization techniques.

Jon Quigley is the manager of the verification and test group at Volvo. Kim Priesis director of product integrity and reliability for Stoneridge, which designs andmanufactures electronic automotive components.


in order of statistical probability (ifknown). This systematic approach is ofbenefit because it allows us to do thefollowing:

• Unambiguously comprehend the sit-uation

• Develop a rational forecast of prob-able behavior

• Immediately eliminate any regulato-ry violations (safety issues) withemergency action

• Plan for an intervention• Develop a work-around until a good

reparation time arises• Determine future actions based on

fact and not “gut” or “hope”Sometimes a major issue will be the

nebulous nature of customer reporting andlogistics (getting data from a customer’scustomer, for example). Regardless, weshould solicit whatever customer data wecan, since the customers are the ones expe-riencing the symptoms and are thereforethe best input source.

ADMINISTRATIONOur Six Sigma Define approach has sev-eral components that we’ll describe. Noneof the documents need be a burden. Butthey represent a systematic approach toplanning and capturing an issue as it evolvesbetween customer and supplier.

TEAM CHARTERAs with any project, the scope of the workhas to be clear to achieve an objective. Weaccomplish this goal with a team charter.The team charter is a concise documentused to summarize information for the solu-tion team in the case where our problem issignificant enough to require a team to digfor an answer. This information includes thedefinition of measurable success criteria,including time (number of hours) or desiredclosure date.

Identification of the appropriate teamskills needed to define the problem is essen-tial to solving the problem. The team char-ter will provide the names and roles of theteam members, which can also include cus-tomer and supplier members. For example,a supplier could be a compiler or operat-ing system manufacturer; a customer couldbe a downstream commercial buyer.Sometimes we’re selling more of a com-modity version of software, in which caseour customer may simply be an aggregateof unhappy end users. We use humanresources to get things done. Experiencesuggests that people who execute and deliv-er are frequently “tapped” for this sort ofteamwork, thereby increasing their work-

load. Given the propensity for overdemandof key talent, it’s worth mentioning as a warn-ing. The team leader will need to be awareof workload problems as he selects his teamfor problem resolution (assuming a full teamis necessary).

The team charter should also providea general summary of the situation as it’sknown at the time we write the charter,which is most often the start of the prob-lem-solving project. More detail will followabout how to describe or define the situa-tion. In order to solve our problem, we’ll also

need resources: information, money, indi-viduals, a place to work and perhaps someequipment and special skills.

Our charter also will include simpleguidelines for team behavior and for meet-ings. Meeting information should include alist of required and optional attendees, loca-tion, time and technology needs (to com-municate with remote team members, forexample). Additionally, there should be anagenda and a recording of minutes. We can

include a contact list in the charter that willshow e-mail and phone numbers for teammembers.

A preliminary plan is useful, but it doesn’t have to be elaborate. The goal isto solve the problem, not write a burden-some document. Some charters mayinclude operational definitions to ensure thatthe team is in sync with regard to the mean-ing of critical terms.

The charter might also include adescription of probable measurementneeds, a time line and a brief assessment

of financial needs (although these may havebeen picked up in the resources section ofthe charter). Some more ambitious char-ters may spell out a goal, but in problem sit-uations the goal may be so obvious as toneed no further clarification. If we do cre-ate a goal, we’ll spell out what is to beaccomplished, define a measurable objec-tive and indicate some kind of deadline.

Also contained in the charter might bethe communications needs for the activity.

TABLE 2: FAILURE MODES TO CAUSES

Failure mode

Output

Observable behavior

Response

By-product of cause

Does not meet a requirement

Cause

Input

Stimulus may not be readily observable

Mechanism for a failure mode

Antecedent of the failure mode

May not meet a requirement

TABLE 1: FIVE EASY PIECES

The most common approach includes these steps:

• Define• Spell out the problem and the scope of the work• Indicate customer impact

• Measure• Gather data about the problem or situation• Ensure the measurement system records meaningful results

• Analyze• Take the data and recast it as information• Use statistical tools when appropriate

• Improve• Optimize or, at worst, improve• Frequently use an implementation and deployment of “best practices”

• Control• Establish procedures that provide for monitoring after improvement• Use a feedback system to truly control the improvement



This can also be handled withina separate project communica-tions plan if the activity has alarge distribution. Connectingcommunications needs to theresponsibility assignments aidsin team mapping. Distributedteams across geographies ororganizations make this com-munications need even greater.

Of course, we expect thatthis level of documentationwould be used for dealing withserious problems or large-scalecustomer issues. Routine prob-lem-solving will not need a SixSigma approach. Some peoplemay deem such a documenta-tion requirement heavy-handed,but our experience with the U.S.Department of Defense and software devel-opment for commercial vehicle productssuggests that more documentation is bet-ter. Just be sure to keep it relevant to scope,team communication, customer commu-nication, resource allocation and basic prob-lem-solving.

THE PROBLEM STATEMENTThe problem statement itself, often partof the charter, should describe currentbehaviors of the software, as well as theimpact on the customer. This impact shouldinclude dollars at stake due to software non-conformance. Situations of lesser signifi-cance, such as annoyances, may not requirea team effort or a charter to solve. But moresignificant problems should be accompa-nied by some kind of record for future useand to capture lessons learned.

It’s also critical that we define the prob-lem scope. A good place to start might bewith Rudyard Kipling’s poem “I Keep SixHonest Serving Men, (they taught me allI knew). Their names are Whatand Why and When, and Howand Where and Who.” We mightalso ask “how much” to deter-mine how widespread the prob-lem may be.

We can break the problemdown by location if we’re ana-lyzing product (embedded) soft-ware. An example would be:

• Systemo Subsystemo Assembly

+ Subassembly+ Component

# Piece partWe can also build tax-

onomies based on classes and

objects, data structure types and functions.We use any or all of these hierarchical for-mats to help us isolate where the problemresides. In essence, we’re showing wherethe problem is and, at the same time, dis-covering where it does not occur. By sodoing, we can use the process of elimina-tion to remove components and subsys-tems that are not relevant to the issue athand.

Our next step is to assess the failuremodes. We can organize the failure modesin the following structure:

1. Complete failure: Software doesn’tinitialize, for example.

2. Partial failure: Software initializesbut resets.

3. Aperiodic failure: Software occa-sionally or intermittently fails for noapparent reason and antecedentcauses are difficult to determine.

4. Periodic failure: Software resets reg-ularly.

5. Failure over time: Software be-

comes corrupt over some quan-tum of time.

6. Too much failure: Un-specified features causeproblems.

7. Too little failure: Soft-ware does not applysome of the requiredcapabilities.

We’ll refer to all of these fail-ure modes when we make useof one of the defect taxonomiesdescribed later in this article. Wealso need to understand that fail-ure modes are not the equivalentof causes.

Table 2 shows a compari-son between failure modes andcauses.

TOOLSThe Six Sigma approach provides someeasy-to-use tools that are helpful duringthe Define phase of the corrective action.One of these tools is the supplier-input-process-output-customer (SIPOC) diagram(see Figure 1). A simpler but also usefulversion of the SIPOC diagram is the input-process-output diagram, or IPO. The arrowson the left represent factors that enter aprocess; the arrows on the right representresponses that exit a process. MultipleIPOs can be strung together analogouslywith a flowchart, with some outputs becom-ing inputs to other processes. Note thatalthough we can model software with thesediagrams, IPOs are mostly used to modelprocesses. This process perspective canbe helpful if the problem originates from aspecific development process (or lackthereof). In the case of software perform-ance, we would take a look at the inputsthat stimulate a function and observe thereaction of the responses to systematic

modifications of the factors.Although these models appearsimple, developing them takessubstantial thought in order toeliminate redundant or mean-ingless inputs and outputs.

Rayleigh plots allow us touse statistical analysis to deter-mine if we are ready for release.These plots can also be appliedafter release to model issuesdiscovered in the field. Inessence, they’re Weibull plotswith a shape factor of two,which provides for the distinc-tive “front-loaded” appearanceof the plot (see Figure 2).Rayleigh plots can also be used

FIG. 1: THE SIPOC EPOCH

FIG. 2: WEIBULL IS NO BULL

to model level of effort for workload analy-ses. We should remember that anyWeibull plot is, by definition, also a three-parameter extreme value type III plot.

A time series or run chart can be use-ful to see if problems—especially intermit-tent problems—arise after a particular time.This situation may be indicative of the effectof a new noise condition or may indicate adeficient release of the software. It’s alsopossible to perform Weibull analysis on thiskind of situation to determine the probabili-ty of more issues, assuming they are relat-ed to time in the field (post-release duration).

When we enter into the measurementand analysis phases, Six Sigma practition-ers frequently use a graphical device calledan Ishikawa diagram (also known as a fish-bone diagram or a cause-and-effect dia-gram). Sometimes, the Ishikawa diagrammakes sense during the “Define” phase,when we need some systematic qualitativeanalyses regarding the problem.

PROBLEM IDENTIFICATIONWhen discovering software issues, we canclassify them more systematically by usingan existing taxonomy for software defects.Taxonomy is nearly always a tree structurethat represents a cognitive hierarchy, themost famous of which is probably theLinnaean taxonomy of living things (king-dom, phylum, class, order, family, genus,species). Two sources of pre-existing tax-onomies are the IEEE standard and the onedefined by Boris Beizer in “Software TestingTechniques,” (International ThomsonComputer Press, 1990).

The IEEE approach is documented inIEEE Std 1044-1993, IEEE StandardClassification for Software Anomalies Std1004-1993 and a guide called IEEE Guideto Classification for Software AnomaliesStd 1044.1-1995. The IEEE approach iscomposed of multiple tables that representa more relational approach to anomaly clas-sification rather than a true taxonomy.Additionally, the standard proposes aprocess composed of:

• Recognition• Investigation• Action• Disposition

Each of those consists of these tasks:• Recording• Classifying• Indentifying impactAlthough the IEEE approach is less of

a tree and more of a tabular design, it rep-resents a systematic, rational approach toknowledge management of softwaredefects. (For another approach, see

“Beizer’s Bugs.”)When the “Define” phase is complete,

and if we’ve done our work well, we shouldbe ready for the “Measure” phase of theproblem-solving algorithm and will knowwhat needs to be measured. The Measurephase will be covered in the next installmentof this series. Sometimes we learn so muchfrom the data-gathering during the Measurephase that we must revisit Define, refinethe problem statement and secure spe-cialized resources.

We should also understand theimpact of the software anomaly both onour own development/testing teams andon those of the customer. In many cas-

es, it will be possible to put a monetaryvalue on the crisis. If we’re repairing aspecific software failure, the monetaryapproach will be easy. If we’re mutatingthe software development process itself,we’ll be faced with the issue that afflictsmany process improvement efforts—theinability to assess the cost of a subopti-mal process.

We should also have a schedule forfixes, whether for software itself or forthe development process. And weshould produce a preliminary schedulefor testing of the change to ensure time-ly provision of resources. But that’s aprocess for another day. !

BEIZER’S BUGS

Boris Beizer presents his solution in the second edition of “Software TestingTechniques” (International Thomson Computer Press, 1990) and describes the expect-ed taxonomic hierarchy in an appendix. He also provides a broad overview of statis-tics related to defect classification.

Beizer created his own classification system because he felt the IEEE approach wasincomplete. The taxonomy provides a numerical hierarchy for “bugs” and a descrip-tion of each of the many line items. Beizer’s top-level categories include:This outline provides a quick flavor of Beizer’s taxonomy. While some people havequestioned Beizer’s relevance to modern software testing approaches, his taxonomyrepresents a convenient format for setting up the “Measure” phase of the Six Sigmaproject (covered in the next installment of this series).

This taxonomy is not just a rhetorical or theoretical exercise. It’s possible to use thislist to understand the origin of the software problems. A software organization canuse this list to categorize the problems found in the field from a historical perspec-tive, then use it to extrapolate future conditions. The figure shows a distribution offaults along this taxonomy from data in Beizer’s study. If an organization’s distribu-tion of the problems is known, it’s possible to use this information in evaluating thereported problems. More important, the list and historical information will provideclues to any underlying systemic issues in the software or embedded developmentprocess that may need to be addressed.

1. Functional requirements1. Logic2. Completeness3. Presentation, documentation4. Changes

2. Functionality as implemented1. Correctness2. Completenes3. Domain bugs

3. Structural bugs1. Control flow and sequencing2. Processing

4. Data1. Definition, structure, declaration2. Access and handling

5. Implementation1. Coding and typographical2. Standards violation

3. Documentation6. Integration

1. Internal interfaces2. External interfaces and timing

7. System, software architecture1. Operating system call, use bug2. Software architecture3. Recovery and accountability4. Performance5. Incorrect diagnostic exception6. Partitions, overlays7. Sysgen/environment

8. Test definition of execution1. Test design bugs2. Test execution bug3. Test documentation4. Test case completeness5. Other test design/execution bugs


THE NON-PROFIT WEB SERVICESInteroperability Organization (ws-i.org) isone of the leading forces driving efforts toestablish best practices for Web servicesinteroperability across multiple platforms,operating systems and programming lan-guages. Chris Ferris, anIBM distinguished engi-neer and CTO of IndustryStandards in the SoftwareGroup Standards Strat-egy organization, currentlyrepresents IBM on theWS-I Basic Profile Work-ing Group and is its formerchairman.

“In today's ultra-com-petitive market for Webservices offerings, it isoften difficult to tell wherethe standards support leaves off andwhere the proprietary features kick in.Additionally, a vendor’s tooling often doesnot draw a clear separation,” he says.What to do? When interoperability is ofparamount importance, Ferris, not surpris-ingly, recommends leveraging WS-I pro-vided tools to help find that line “betweenthe standard and the proprietary andmake an informed decision as to whetheror not to exploit the proprietary based onyour project's requirements.”

But it’s not just about the tools. It’sabout deadlines, budgets, package cus-tomization and too few in-house testers toget it all done accurately, thoroughly andon time.

That brings us to Darren Smith, appli-cation manager for eCommerce andCustomer Facing Technologies atFerguson (ferguson.com), the largestplumbing wholesaler in North America.Though the corporation’s IT departmentexceeds 200, many aspects of Web appdevelopment, from the artistic design of

Web pages to testing underlying logic ishandled increasingly through crowdsourc-ing, specifically TopCoder (TopCoder.com).

“Some of our favorite designers ofFlash apps are in Indonesia. We have no

idea who they are, weknow only their “handle”and they sign a non-disclo-sure agreement,” saysSmith. Ferguson puts adesign out to competition,gets back 30 to 40, picksthe top three, and eventu-ally chooses one. “I can’tget that kind of variety in-house.” The one staffdesigner – “awesome,”says Smith, tweaks win-ning designs to assure a

complete fit into the existing infrastruc-ture. A key app has to be easily under-stood and navigable bynon-techie plumbing con-tractors, “the guys whomake a living from theirwhite panel truck,” andwho keys in at night toreplenish the parts theyused on jobs the previ-ous day. Orders are filledat a local Ferguson sup-ply house overnight,ready for pickup as thecontractor heads out forthe next day’s jobs.

Similarly, QA testinghas become a lot like atrack meet, with quicksprints and leaps overhurdles (“bug races”)leading to quick turnaround and prizemoney. With one QA tester on his staff,delivery timelines lagged. By introducingbug hunts, those lags became “turn andburn” sprints. “After regression testingand identifying bugs, we use the bugraces to get them fixed in a 24-hour turn-around,” says Smith.

Cultural change never is easy, andthe switch to crowdsourcing is no differ-

ent. At Ferguson, the entire lifecycle ofWeb app projects, from idea to rollout,often had to be completed in little morethan a month. Smith says the transition“took 18 months for our people to buyinto it.” And the company is still tweakingits processes.

“At first, there is a lot of resistance inhaving others find bugs, but you have tocome to see this as ‘I’m not losing control,I’m getting help,’ ” says Smith.

Even after getting projects back fromTopCoder following bug races and com-petitions, systems still aren’t quite readyfor deployment. The remaining issuestend to be environmental, often a neces-sary byproduct of security concerns.“There may still be a process in that wehave to take the code and fit it into ourenvironment.” In its first few experimentswith TopCoder crowdsourcing and test-ing, Ferguson’s IT testers had an expecta-

tion that the debuggedcode could be slid intoproduction, a perfectenvironmental fit. Thereality, Smith discovered,is that this is unrealistic.

Typically, this transi-tion takes up to twoweeks. “When it comesto Web aervices, we cre-ate stubs and the bugrace competitors testagainst those. What theydon’t test against is actu-al data.” Due to datatypes that might be differ-ent from those suppliedto the crowdsourcer, theservice may fail. “These

are things you find only when you workwith actual live data.”

With crowdsourcing, Ferguson,already running lean, did not have todownsize its IT staff. “The power of thecrowdsource helps us maintain capacity.”And in the end, it boils down to those self-employed plumbers and their rolling inven-tory. “The difference,” Smith says, “ishappy customers.” !


Crowds Help Keep IT Lean!

Joel Shore is a 20-year industry veteran and hasauthored numerous books on personal computing.He owns and operates Reference Guide, a technicalproduct reviewing and documentation consultancyin Southboro, Mass.

“With crowdsource

[testing], Ferguson

did not have to

down-size IT. ”

CASE STUDY

Joel Shore

Stop Leaks in Your Custom Interfaces - Excelon Development · founders, cloud tech-nologies and the...

Documents

Transcript of Stop Leaks in Your Custom Interfaces - Excelon Development · founders, cloud tech-nologies and the...