Chapter 2 Computer Clusters Lecture 2.2 Computer Cluster Architectures.
Distributed Computer Architectures for Combat SystemsDistributed Computer Architectures for Combat...
Transcript of Distributed Computer Architectures for Combat SystemsDistributed Computer Architectures for Combat...
488 JOHNSHOPKINSAPLTECHNICALDIGEST,VOLUME22,NUMBER4(2001)
M. E. SCHMID and D. G. CROWE
A
DistributedComputerArchitecturesforCombatSystems
Mark E. Schmid and Douglas G. Crowe
dvancesincomputerhardware,networks,andsupportsoftwarehavebroughtdis-tributedcomputingtotheforefrontoftoday’sNavycombatsystems.AneffortknownastheHighPerformanceDistributedComputing(HiPer-D)Programwasestablishedin1991topursueadvantageousapplicationsofthistechnologytotheNavy’spremierAegiscruiseranddestroyercombatants.Thisarticleprovidesanoverviewoftheprogramaccomplish-mentsandpresentsabriefdiscussionofsomeoftheintercomputercommunicationsissuesthatwereparamounttotheprogram’ssuccess.MostsignificantwasthecredibilitybuiltfordistributedcomputingwithintheAegiscommunity.Thiswasaccomplishedthroughmanydemonstrationsofcombatsystemfunctionsthatexhibitedcapacity,timeliness,andreliabilityattributesnecessarytomeetstringentAegissystemrequirements.
INTRODUCTIONIn the 1950s, the Navy embarked on its course
of developing standardized computers for operationaluse.TheAN/UYK-7,AN/UYK-20,AN/UYK-43,andAN/UYK-44arestilltheprimarycombatsystemcom-puters deployed today (Fig. 1). They function at theheartoftheNavy’spremiersurfaceships,merginginfor-mation from sensor systems, providing displays andwarningstothecrew,andcontrollingtheuseofweap-ons.ThesecomputersarewhollyNavydeveloped.Theprocessingunits,interfaces,operatingsystem,andsup-port software (e.g., compilers) were all built by theNavy and its contractors. The AN/UYK-43 is thelast in the line of Navy standard computers (oddlyenoughgoing intoproductionafter its smaller sibling,theAN/UYK-44).Describedin1981as“thenewlargescalecomputer...expectedtobeinserviceforthenext25years,”1itwillactuallyseeaservicelifebeyondthat.
TheNavyStandardComputerProgramwassuccess-ful at reducing logistics, maintenance, and a host ofothercostsrelatedtotheproliferationofcomputervari-ants;however,inthe1980s,theNavyfacedasubstan-tialincreaseinthecostofdevelopingitsownmachines.Pushedbyleapsinthecommercialcomputingindustry,everything associated with Navy Standard Computerdevelopment—frominstructionsetdesigntoprogram-minglanguagecapabilities—wasbecomingmorecom-plexandmoreexpensive.
By 1990, it was clear that a fundamental shift wasoccurring.Theshrinkingcostanddramaticallygrowingperformanceofcomputerswerediminishingthesignifi-canceofthecomputeritself.Thetransformationofcen-tralizedmainframecomputingtomorelocalandrespon-siveminicomputerswasbringingabouttheproliferationofevenlower-costdesktopcomputers.Microprocessors
JOHNSHOPKINSAPLTECHNICALDIGEST,VOLUME22,NUMBER4(2001) 489
DISTRIBUTED COMPUTER ARCHITECTURES
had indeed achieved the performance of their main-framepredecessorsjust10yearsearlier,butthiswasonlyhalfoftheshift!Spurredbynewprocessingcapabilities,systemdesignerswerefreedtotackleincreasinglycom-plexproblemswithcorrespondinglymoreintricatesolu-tions. As a new generation of complex software tookshape,thebalanceofcomputerhardwareandsoftwareinvestments in Navy systems cascaded rapidly to thesoftwareside.
Harnessingthecollectivepowerofinexpensivemain-streamcommercialcomputersforlarge,complexsystemswasclearlybecomingprominentinNavycombatsystemdesign.Thehighlysuccessfuldemonstrationin1990oftheCooperativeEngagementCapability(CEC)proto-type, built with over 20 Motorola commercial micro-processors, was a milestone in distributed computingfor Navy combat system applications. Developmentsin both Aegis and non-Aegis combat systems focusedthereafter on applying the capabilities of distributedcomputingtovirtuallyalloftheNavy’sprototypeefforts.Distributedcomputingofferedthepotentialforprovid-ingsubstantiallymorepowerfulandextensiblecomput-ing, the ability to replicate critical functions for reli-ability,andthesimplificationofsoftwaredevelopmentandtestingthroughthe isolationof functions intheirowncomputingenvironments.Indeed,whentheNavy’sAegisBaseline7andShipSelf-DefenseSystem(SSDS)Mk2Mod1aredelivered(expectedin2002),acom-mercialdistributedcomputing infrastructurewillhavebeenestablishedforallnewbuildsandupgradesoftheNavy’s primary surface combatants (carriers, cruisers,anddestroyers).
Distributedcomputingforcombatsystems,however,isnotwithoutitsdifficulties.RequirementsforsystemssuchasAegisarestringent(responsetimes,capacities,andreliability)andvoluminous(dozensofindependentsensorsystems,weaponsystems,operatorinterfaces,andnetworks tobemanaged),and theprior largebodyofdesignknowledgehasbeenfocusedonNavyStandard
Computerimplementation.(Asanexample,theAegisCombatSystemhadasmallnumberofAN/UYK-7[laterAN/UYK-43] computers. Each represented an Aegis“element”suchastheWeaponControlSystem,RadarControl System, or Command and Decision System.Within an element, interactions among subfunctionswerecarriedoutprimarilyviasharedmemorysegments.Thesedonotmapwelltoadistributedcomputingenvi-ronmentbecausethedistributednodesdonotshareanyphysicalmemory.)
Among the most fundamental issues facing thisnew generation of distributed computing design wasthe pervasive need for “track” data to be available“everywhere”—atallthecomputingnodes.Trackscom-prise thecombat system’sbestassessmentof the loca-tions, movements, and characteristics of the objectsvisibleto its sensors.Theyarethe focalpointofbothautomated and manual decision processes as well asthe subjectof interactionswithothermembersof thebattlegroup’snetworks.Shipsensorsprovidenewinfor-mation on each track periodically, sometimes at highrate.Whilethereportedinformationisrelativelysmall(e.g.,anewpositionandvelocity),thefrequencyofthereportsandthelargenumberofobjectscancombinetoyieldademandingload.Thenumberoftrackdatacon-sumerscaneasilyamounttodozensandplacetheover-all systemcommunicationburden in the regionof 50to100,000messagespersecond.(Figure2showsascanwithmanytracksfeedingmessagesforward.)
Althoughcommercialmachinesandnetworkshaveadvanced admirably in their ability to communicatedata across networks, the combat system applicationshavesomeunusualcharacteristicsthatmaketheirneedssubstantiallydifferentfromthecommercialmainstream.The track information that comprises thebulkof thecombatsystemcommunicationsconsistsofmanysmall(afewhundredbytes)messages.Thesemessagescanbeindependently significant to the downstream consum-ers,withvaryingrequirementstoarrive“quickly”atpar-ticulardestinations.Commercialsystemsarenottunedtohandlelargenumbersofsmallmessages;theyfocusonhandlingtheneedsofthemassmarket,whicharetypi-callylessfrequentandlargersetsofdata.
HIGH PERFORMANCE DISTRIBUTED COMPUTING
TheHiPer-DProgramoriginatedasajointDefenseAdvancedResearchProjectsAgency(DARPA)/Aegis(PMS-400)endeavor,withthreeparticipatingtechni-calorganizations:APL,NavalSurfaceWarfareCenterDahlgrenDivision(NSWCDD),andLockheedMartinCorporation.Itwasinspiredbyafutureshipboardcom-puting vision2 and the potential for synergy betweenit and awaveof emergingDARPA researchproductsindistributedcomputing.Theprogramcommencedin
Figure 1. Navy family computers.
490 JOHNSHOPKINSAPLTECHNICALDIGEST,VOLUME22,NUMBER4(2001)
M. E. SCHMID and D. G. CROWE
June1991andhasbeenanongoingresearchanddevel-opmenteffortdesignedtoreduceriskduetotheintro-duction of distributed computing technologies, meth-ods,andtoolsintotheAegisWeaponSystem.
Many of the significant developments of HiPer-Dhave been showcased in demonstrations. Since 1994,whenthefirstmajorHiPer-Ddemonstrationwasheld,there have been seven demonstrations that providedexposureofconceptsandtechniquestoAegisandotherinterested communities. The first HiPer-D integrateddemonstration(Fig.3)focusedprimarilyonthepoten-tial for applying massively parallel processor (MPP)technology to combat system applications. It broughttogether a collectionof over 50programs runningon34 processors (including 16 processors on the IntelParagon MPP) as a cooperative effort of APL andNSWCDD.Thefunctionalityincludedacompletesen-sor-to-weaponpaththroughtheAegisWeaponSystem,withdoctrine-controlledautomaticandsemi-automaticsystem responses and an operator interface that pro-videdessentialtacticaldisplaysanddecisioninterfaces.ThemajoranomalyrelativetotheactualAegissystemwas the use of core elements from CEC (rather thanAegis) to provide the primary sensor integration andtracking.
By many measures, this first demonstration was asignificant success. Major portions of weapon systemfunctionality had been repartitioned to execute on alargecollectionofprocessors,withmanyfunctionscon-structedtotakeadvantageofreplicationforreliabilityandscaling.TheParagonMPP,however,sufferedsevereperformanceproblemsconnectingwithcomputersout-side itsMPPcommunicationmesh.Keyexternalnet-work functions were on the order of 20 times slowerthanonSunworkstationsofthesameera.Theresulting
impact on track capacity (less than 100) drove laterexplorationawayfromtheMPPapproach.
Thenext-generationdemonstrationhadtwosignifi-cantdifferences:(1)itfocusedontheuseofnetworkedworkstationstoachievethedesiredcomputingcapacity,and(2)itstroveforastrongergroundingintheexistingAegisrequirementsset.Theprimaryimpactofthesedif-ferenceswastoswitchfromaCEC-basedshiptrackingapproachtoanAegis-derivedtrackingapproach.Addi-tionalfidelitywasaddedtothedoctrineprocessing,and
Figure 2. Information originates from sensor views of each object. Each report is small, but the quantity can be very large (because of high sensor report rates and potentially large numbers of objects). The reports flow throughout to be integrated and modified, and to enable rapid reaction to change.
Figure 3. First HiPer-D demonstration physical/block diagram.
Processor
Processor
Processor
Processor
Processor
Processor
Processor
Processor
Processor
Processor
Network
Distributed systemTrack information
Ship sensors
010011001111011101000010110011001111011101000010110010011001111011101000010110011001111011101000010110010011001111011101000010110011001111011101000010110010011001111011101000010110011001111011101000010110010011001111011101000010110011001111011101000010110010011001111011101000010110011001111011101000010110010011001111011101000010110011001111011101000010110010011001111011101000010110011001111011101000010110010011001111011101000010110011001111011101000010110
Tact
ical
dis
play
s
Filter and correlation
Paragon MPP
Control and instrumentation
Eth
erne
t
Doc
trin
e an
d ta
ctic
alsi
mul
atio
n
JOHNSHOPKINSAPLTECHNICALDIGEST,VOLUME22,NUMBER4(2001) 491
DISTRIBUTED COMPUTER ARCHITECTURES
therapidself-defenseresponsecapabilityreferredtoas“auto-special”wasaddedtoprovideacontextinwhichto evaluate stringent end-to-end timing requirements.The1995HiPer-Ddemonstration3–5representedasig-nificantmilestoneinassessingthefeasibilityofmovingcommercialdistributedcomputingcapabilitiesintotheAegismission-criticalsystems’domain.Trackcapacitywasimprovedanorderofmagnitudeovertheoriginaldemonstration;avarietyofreplicationtechniquespro-viding fault tolerance and scalability were exhibited;andcriticalAegisend-to-endtimelinesweremet.
Subsequent demonstrations in 1996 to 1998 builtupon the success of the 1995 demonstration, addingincreased functionality; extending system capacity;exploring new network technologies, processors, andoperating systems; and providing an environment inwhichtoexaminedesignapproachesthatbettermatchedthenewdistributedparadigm.Indeed,afairlycompletesoftware architecturenamed “Amalthea”4was createdtoexplorethefeasibilityofapplication-transparentscal-able, fault-tolerantdistributed systems. Itwas inspiredby theDelta-4architecturedone in the late1980sbyESPRIT(EuropeanStrategicProgramme forResearchandDevelopmentinInformationTechnology)5andwasusedtoconstructasetofexperimental, reliableappli-cations. In addition to architectural exploration, theapplication domain itself was addressed. Even thoughthetop-levelsystemspecificationcontinuedtoconveya valid view of system performance requirements, thedecompositionofthosetop-levelrequirementsintotheAegiselementprogramperformancespecificationshadbeencrafted20yearsearlierwithacomputingcapacityand architecturevastly different from the currentdis-tributedenvironment.
The command and decision element, from whichmuchoftheHiPer-Dfunctionalityoriginated,seemedlikeaparticularlyfertileareaforsignificantchange.Anexample of this was the tendency of requirements tobe specified in termsof aperiodic review. In thecaseofdoctrine-basedreviewforautomaticactions,therealneedwastoensurethatatrackqualifyingforactionwasidentifiedwithinacertainamountoftime,notneces-sarilythatitbedoneperiodically.IntheHiPer-Darchi-tecture, it is simple to establish a client (receiver oftrackdata)thatwillexaminetrackreportsforthedoc-trine-specifiedactionupon data arrival.Fromarespon-siveness standpoint, this far exceeds the performanceoftheperiodicreviewspecifiedintheelementrequire-ments and yet is a very straightforward approach forthehigh-capacityandindependentcomputingenviron-mentofthedistributedsystem.
The1999and2000demonstrations6,7continuedthepattern,bringinginnewfunctionalityandasignificantnew paradigm for scalable fault-tolerant servers thatusenewnetworkswitchtechnology(seenextsection).This phase of development also addressed a need for
instrumenting the distributed system which had beenidentifiedearlyintheprogrambutmetwithaless-than-satisfactorysolution.
Distributed systems apply the concept of “work inparallel,”whereeachworkingunithasitsownindepen-dentresources(processorandmemory).Althoughthiseasesthecontentionbetweenfunctionsrunninginpar-allel, itelevatesthecomplexityoftestandanalysistoanewlevel.Collectionandintegrationoftheinforma-tion required to verify performance or diagnose prob-lemsrequiresophisticateddatagatheringdaemonstoberesidentoneachprocessor,andanalysiscapabilitiesthatcan combine the individual processor-extracted datastreamsmustbecreated.EarlyintheHiPer-Deffort,aGerman research product called “JEWEL”8 was foundtoprovidesuchcapabilitieswithdisplaysthatcouldbeviewedinrealtime.Thereal-timemonitoring/analysiscapabilityofHiPer-Dwasnotonlyoneofitsmostprom-inentdemonstrationfeatures,butsomewouldclaimalsooneofthemostsignificantreasonsthattheannualinte-grationof the complex systemswas consistently com-pletedontime.JEWEL,however,neverachievedstatusasasupportedcommercialproductandwassomewhatcumbersomefromtheperspectiveoflargesystemdevel-opment.Forexample,ithadaninflexibleinterfaceforreportingdata,allowingonlyalimitednumberofinte-gerstobereportedoneachcall.Dataotherthaninte-gerscouldbereported,butonlybyconventionwiththeultimatedatainterpreter(graphingprogram).
Beginningin1999,areal-timeinstrumentationtoolkit called Java Enhanced Distributed System Instru-mentation (JEDSI) was developed and employed as areplacementforJEWEL.ItfollowedtheJEWELmodelofprovidingverylowoverheadextractionofdatafromthe processes of interest but eliminated the interfaceissues thatmade it awkward for large-scale systems. Italso capitalized on the use of commercially availableJava-basedgraphicspackagesandcommunicationssup-porttoprovideanextensivelibraryofreal-timeanalysisandperformancevisualizationtools.
Current and planned initiatives in HiPer-D aremovingtowardmulti-unitoperations.FunctionalityforintegratingdatafromNavytacticaldatalinkshasbeenadded to the system, and the integrationof realCECcomponentsisplannedforlate2001.Suchacapabilitywill provide the necessary foundation to extend theexplorationofdistributedcapabilitiesbeyondtheindi-vidualunit.
EVOLUTION OF COMMUNICATIONS IN HiPer-D
The first phase of HiPer-D, culminating in thesystemshowninFig.3,hadatargetcomputingenviron-mentthatconsistedofanMPPastheprimarycomput-ingresource,surroundedbynetworkedcomputersthat
492 JOHNSHOPKINSAPLTECHNICALDIGEST,VOLUME22,NUMBER4(2001)
M. E. SCHMID and D. G. CROWE
modeled the sensor inputs and workstations that pro-vided graphical operator interfaces. The MPP was anIntel Paragon (Fig. 4). It had a novel internal inter-connectandanoperatingsystem(Mach)thathadmes-sagepassingasanintegralfunction.Machwasthefirstwidelyused(and,forabriefperiod,commerciallysup-ported) operating system that was built as a “distrib-uted operating system”—intended from the outset toefficiently support the communication and coordina-tionrequirementsofdistributedcomputing.
Despite these impressive capabilities, an importantprinciplehadbeenestablishedwithintheDARPAcom-munity that rendered even these capabilities incom-pletefortheobjectiveofreplicationforfaulttoleranceandscalability.ConsiderthecomputingarrangementinFig.5.Asimpleservice—theallocationoftracknum-berstoclientswhoneedtouniquelylabeltracks—isrep-licatedforreliability.Thereplicalistenstotherequestsandperformsthesamesimpleassignmentalgorithm,butdoes not reply to the requester as the primary serverdoes.Thissimplearrangementcanbeproblematicifthecommunicationsdonotbehaveinspecificways.Ifoneoftheclients’requestsisnotseenbythebackup(i.e.,themessageisnotdelivered),thebackupcopywillbeoffset1 fromtheprimary. If theorderofmultiplecli-ents’requestsisinterchangedbetweentheprimaryandbackupservers,thebackupserverwillhaveanerrone-ousrecordofnumbersassignedtoclients.Furthermore,
at such time as the primary copy is declared to havefailed,thebackupwillnotknowwhichclientrequestshavebeenservicedandwhichhavenot.
Communication attributes required to achieve thedesiredbehaviorinthisserverexampleincludereliabledelivery, ordered delivery, and atomic membershipchange.Reliabledeliveryassuresthatamessagegetstoalldestinations(ornoneatall)andaddressestheprob-lemoftheserversgetting“outofsynch.”Ordereddeliv-erymeansthat,withinadefinedgroup(specifyingbothmembersandmessages),allmessagesdefinedaspartofthe group are received in the same order. This elimi-natestheproblemthatcanoccurwhentheprimaryandreplicaseemessagesinadifferentorder.Atomicmem-bershipchangemeansthat,withinadefinedgroup,anymemberenteringor leaving thegroup(including fail-ure)isseentodosoatthesamepointofthemessagestreamamongallgroupmembers.Inoursimpleserverexample above, this allows anew replica tobe estab-lishedwithaproperunderstandingofhowmuch“his-tory”ofpreviousprimaryactivitymustbesupplied,andhow much can be picked up from the current clientrequeststream.
Aconvenientmodel forprovidingthesecommuni-cationattributesis“processgroupcommunications.”Inthismodel,theattributesaboveareguaranteed,i.e.,allmessagesaredeliveredtoallmembersornone,allmes-sages are delivered to all group members in the sameorder, and any members that leave or join the groupareseentodosoatthesamepointwithinthegroup’smessage stream. Early HiPer-D work used a toolkitcalled the Isis Distributed Toolkit9 that served as thefoundationforprocessgroupcommunications.Initiallydeveloped at Cornell University, it was introduced asa commercialproductbya small companynamed IsisDistributedSystems.Figure 4. Paragon.
Shadowcopy
Next # = 11
I nee
d a
num
ber
Client
Tracknumberserver
Next # = 11
10
Shadowcopy
Next # = 10
I nee
d a
num
ber
Client
Tracknumberserver
Next # = 11
10
X
Normal Flawed
Figure 5. Simple replicated track number service operations.
JOHNSHOPKINSAPLTECHNICALDIGEST,VOLUME22,NUMBER4(2001) 493
DISTRIBUTED COMPUTER ARCHITECTURES
The cornerstone element of the HiPer-D commu-nications chain was an application called the RadarTrackServer(RTS).Itstaskwastoservicealargesetof(potentiallyreplicated)clients,providingthemwithtrackdataattheirdesiredrateandstaleness.Fedfromthe stream of track data emerging from the primaryradar sensor and its subsequent track filter, the RTSallowseachconsumerofthetrackinformationtospec-ifydesiredupdateratesforthetracksandfurtherspec-ify latencyor stalenessassociatedwith thedeliveryoftrackinformation.Thiswasasignificant,newarchitec-turalstrategy.Inpriordesigns,trackreportswereeitherdeliveredenmasseat the rateproducedby thesensoror delivered on a periodic basis. (Usually, the sameperiodwasappliedtoalltracksandalluserswithinthesystem,whichmeantacompromiseofsomesort.)Thesenewservercapabilitiesbroughttheabilitytoselectivelyapplyvaryingperformancecharacteristics(updaterateandlatency)basedoneachclient’scriticalevaluationofeachindividualtrack.
TheRTSmatchesaclient’srequestedrateforatrackby filtering out unnecessary reports that occur duringthetargetreportintervals(Fig.6).Forinstance,ifthesensor provides an update every second but a clientrequests updates every 2 s, the RTS will send only
alternateupdates.Of course, this is client- and track-specific.Inthepreviousexample,shouldatrackofmoreextremeinterestappear,theclientcoulddirecttheRTSto report that specific trackat the full1-s report rate.(There was also successful experimentation in use oftheRTStofeedbackconsumerneedstothesensortoenableoptimizationof sensor resources.) If aconsum-er’s desired report rate is not an integral multiple ofthesensorrate,theRTSmaintainsan“achievedreportrate”anddecideswhethertosendanupdatebasedonwhetheritsdeliverywillmaketheachievedreportratecloserorfurtherawayfromtherequestedreportrate.
OnelessonlearnedveryearlyintheHiPer-Deffortwasthat, innetworkcommunications,thecapacitytodelivermessagesbenefitsimmenselyfrombufferinganddeliveryofcollectionsofmessagesratherthanindivid-ualtransmissionofeachmessage.Ofcourse,thisintro-duces latency to thedeliveryof themessages thatareforcedtowaitforotherstofillupthebuffer.TheRTSusesacombinationofbufferingandtimingmechanismsthatallowsthebuffertofillforaperiodnottoexceedtheclient’slatencyspecification.Uponreachingafullbufferconditionorthespecifiedmaximumlatency,thebuffer—even if it is only a single message—is thendeliveredtotheclient.
1–100 4 s
4 s2 s
4 s
100 ms
100 ms50 ms
200 ms
A
B
C 1–100
1–5050–100
Track set Rate LatencyClient
Filter outunneeded
reports
Trackreports
Clientbuffers
Trackreportto Client B
Trackreportto Client A
Trackreportto Client C
Drivetime out(max. waittime)on clientbuffer
Figure 6. Radar track server.
The criticality of the RTS tothesystemandtheanticipatedloadfromthelargenumberofclientsledtoitsimmediateidentificationasaprimecandidateforapplyingrepli-cationtoensurereliabilityandscal-ability to high load levels. Allthese factorsmadetheRTSarichsubject for exploration of processgroup communications. Figure 7depictsthebasicsetofprocessgroupcommunications–basedinteractionsamonga setofcooperating serversprovidingtrackdatatoasetofcli-ents.Theprocessgroupcommunica-tionsdeliveryproperties(combinedwith the particular server design)allowed the server function to bereplicatedforbothimprovedcapac-ity and reliability. It also allowedclientstobereplicatedforeitherorbothpurposes.
ThefirstParagon-centereddem-onstration included the use of theprocessgroupcommunicationspar-adigm and the RTS. Even thoughthe internal Paragon communica-tionsmeshprovedtobeverycapa-ble, thesystemasawholesufferedfrom a severe bottleneck in mov-ing track data into and out of the
494 JOHNSHOPKINSAPLTECHNICALDIGEST,VOLUME22,NUMBER4(2001)
M. E. SCHMID and D. G. CROWE
Paragon.The16-nodeParagonmeshwasshowncapa-bleof8pairsofsimultaneouscommunicationsstreamsof1000messagesasecond(withnobuffering!).Com-pared to the roughly 300 messages per second beingachievedondesktopsofthatperiod,thecombinationofMachmessagingfeaturesandParagonmeshwasimpres-sive.However,asdescribedearlier,theParagon’sEther-net-basedexternalconnectivitywasflawed.Apparently,in thenicheofParagoncustomers,Ethernet interfaceperformancewasnotofsubstantialconcern.Servingthe“supercomputermarket,”theParagonwasmoreattunedto high-performance parallel input devices (primarilydisk) and high-end graphics engines that had HiPPi’s(High-PerformanceParallelInterfaces).
The migration from Paragon MPP hardware, withitsmesh-basedcommunications, tothenetwork-basedarchitectureofsubsequentdemonstrationsprovedtobesmoothforprocessgroupcommunications.Theelimina-tionofpoor-performanceParagonEthernetinput/outputallowed the network-based architecture to achieve arealisticcapacityof600trackswhilemeetingsubsecondtimelines for high-profile, time-critical functions. TheRTSscalabledesignwasabletoaccommodate9distinctclients(withdifferenttrackinterestsandreportcharac-teristics),someofthemreplicated,fora14-clienttotal.
Althoughthenetworkarchitectureandprocessgroupcommunicationsweremajorsuccesses,theIsisDistrib-utedToolkitmentionedearlierwasencounteringreal-worldbusinesspractices.WhenIsisDistributedSystemswasacquiredbyStratusComputer, theeventwas ini-tiallyhailedasapositivematurationofthetechnologyintoasolidproductthatcouldbeevenbettermarketedand supported. However, in 1997, Stratus pulled Isisfromthemarketandterminatedsupport.
Thiswasunsettling.Eventhoughtherewasnoneedtopanic(theexistingIsiscapabilitiesandlicenseswouldcontinue to serve the immediate needs), the demise
of Isis signaled the indifference of the commercialworldtothetypeofdistributedcomputingthatIsissup-ported(andwasfoundtobesouseful).APLeventuallyemployed another research product, “Spread,” devel-oped by Dr. Yair Amir of The Johns Hopkins Uni-versity.10 It provides much of the same process groupcommunicationssupportpresentinIsis,isstillusedbyHiPer-Dtoday,andhasjustrecentlybeen“productized”bySpreadConceptsLLC,Bethesda,MD.Theabsenceofanylargemarketforcommercialprocessgroupcom-munications,however, is an important reminder that,althoughsystemsthatmaximizetheuseofcommercialproducts are pursued, the requirements of the Navy’sdeployedcombatsystemsandthoseofthecommercialworldwillnotalwayscoincide.
In 1998, an interesting new capability in networkswitches, combined with some ingenuity, provided anewapproachfordevelopingfault-tolerantandscalableserverapplications.Recentadvancesinnetworkswitchtechnologyallowedforfast,application-awareswitching(OSILevel4).Withanabilitytosensewhetheraserverapplication was “alive,” a smart switch could intel-ligently balance incoming client connections amongmultipleserversandroutenewconnectionsawayfromfailedones.
The RTS was chosen as a proving ground for thisnewtechnology.Becauseofitscomplexity,theRTSwastheoneremainingHiPer-DfunctionthatemployedIsis.It alsohad thepotential tobenefit substantially (per-formance-wise)byeliminatingtheburdenofinter-RTScoordination.Theinter-RTScoordinationusedfeaturesof Isis to provide a mechanism for exchange of stateinformation(e.g.,clientsandtheirrequestedtrackchar-acteristics)between server replicas.This coordinationaddedcomplexitytotheRTSandtheunderlyingcom-munications and consumed processing time, networkbandwidth,andmemoryresources.
The switch-based RTS continues to support themajorfunctionalityoftheoriginalRTSbutdoessowitha much simpler design and smaller source code base.TheguidingprincipleinthedesignofthenewRTSwassimplicity.Notonlywould thisproveeasier to imple-ment, it would also provide fewer places to introducelatency,thusimprovingperformance.EachRTSreplicaoperates independentlyofallothers. In fact, theserv-ers do not share any state information among themandareoblivioustotheexistenceofanyotherservers.TheimplementationofadesignthatdoesnotrequireknowledgeofotherserverswasadramaticchangefromtheoriginalRTSdesign,whichhadrequiredextensivecoordinationamongtheserverstoensurethatallclientswerebeingproperlyserved,particularlyundertheaber-rantconditionsofaclientorserverfailure.
Clientsoftheswitch-basedRTSserversetviewitasasinglevirtualentity(Fig.8).Eachtimeanewclientrequests service from the “virtual RTS,” the Level 4
Replicatedclient
Client
Replicatedclient
Server
Server
Server
Servercoordination
group
Clientservicegroups
Tracking service sign-on group
Figure 7. Basic RTS operations (different groups and message content).
JOHNSHOPKINSAPLTECHNICALDIGEST,VOLUME22,NUMBER4(2001) 495
DISTRIBUTED COMPUTER ARCHITECTURES
RTS 1
RTS 3
RTS 2 Level 4switch
A
B
C
D
Virtualserver
complexClients
Trac
k in
form
atio
n
Figure 8. Switch-based RTS block diagram.
switch selects one of the RTS servers to handle therequest.Theselectioniscurrentlybasedonarotatingassignmentpolicybutcouldbebasedonotherloadingmeasures that are fed to the switch. Once a clientisestablishedwithaparticularserver,thatserverhan-dlesallitsneeds.Intheeventofserverfailure,aclient-sidelibraryfunctionestablishesconnectionwithanewserver and restores all the previously established ser-vices.Inadditiontoreplicationoftheserver,theLevel4 switch itselfcanbereplicatedtoprovideredundantpathstotheservers.Thisaccommodatesfailureoftheswitchitselfaswellasseveranceofanetworkcable.
Figure 9a is a time plot of a typical end-to-endresponseforthecombatsystemdemonstrationapplica-tionthatusestheRTS.FromsensororiginationthroughtrackfilteringandRTSdeliverytotheclient,averagelatenciesareapproximately80ms.The80-mslatencyisanominal levelofperformancethat is intendedfortracksandprocessingpurposeswherelowlatencyisnot
(a) (b)
Figure 9. Nominal performance with recovery (a) and performance with cable pull diagram (b).
ofhighconcern.Bytuningthepro-cessingfunctionsandclients,laten-ciesinthe20-to30-msrangecanbe achievedbut at the expenseofcapacity. This is a clear engineer-ing trade-off situation, where theproperbalancecanbeadjustedforaparticular application (using com-mandlineparameters).
A server software failure wasforced in the center of the runshown in Fig. 9a, with little per-ceptible impact. Figure 9b showsthespikesinmaximumlatencythatoccur as a result of physical dis-connection of the server from theswitch. Physical reconfigurations,however,take2to4stocomplete.
The switch-based RTS is a sig-nificantsimplification,reducingthe
sourcecode from12,000 to5,000 lines;however, twoofitsperformanceattributesareofpotentialconcerntoclients:
1. Delivery of all data to a client is not strictly guar-anteed.Whenaserverfails,messagesthattranspirebetween the server failure and the client’s rees-tablishment with a new server are not eligible fordelivery to that client. However, when the clientreconnects, it will immediately receive the freshlyarriving track updates. In the case of periodicallyreportedradardata,thisissufficientformostneeds.
2. Failures of the switch itself, the cabling, or serverhardware incur a relatively largedelay to recovery.This isamoreseriouspotentialproblemforclientsbecause the reestablishment of service takes multi-pleseconds.Initially,muchofthistimewasusedtodetectthefault.A“heartbeat”messagewasaddedtotheclient-serverprotocoltosupportsubsecondfaultdetection,butthemechanismsintheLevel4switchfordismantlingthefailedconnectionandestablish-ing a new one could not be accelerated withoutgreater vendor assistance than has been available.Thisseemedtobeaclassiccaseofnotrepresentinga strong enough market to capture the vendor’sattention.Hadthisbeenahigher-profileeffort,withmorepotentialsalesontheline,itisourperceptionthat this problem could have been addressed moresatisfactorily.
The use of Level 4 network switch technology pro-videstwomajoradvantages.First,serverimplementationismuchsimpler.ThenewRTSarchitectureissimilarindesign to a replicated commercial Internet server. Thecomplexserver-to-servercoordinationprotocolsrequiredto manage server and client entries exist, and failureshavebeeneliminated.Second,thecommunicationslayer
496 JOHNSHOPKINSAPLTECHNICALDIGEST,VOLUME22,NUMBER4(2001)
M. E. SCHMID and D. G. CROWE
used between the server and its clients is the same“socket”serviceusedbymajorInternetcomponentslikebrowsersandWebservers.Theuseofstandardcommer-cialcommunicationsgreatlyreducesthedependencyofthesoftwaresystemonspecificproducts.Therefore,port-ing the software to new (both processor and network)environmentswillbestraightforward,loweringthelife-cycleandhardwareupgradecosts.
CONCLUSIONForemostamongtheaccomplishmentsoftheHiPer-D
Programisthelonglistofproof-of-conceptdemonstra-tionstotheAegissponsoranddesignagent,LockheedMartin:
• The sufficiency of distributed computing for Aegisrequirements
• The ability to replicate services for reliability andscalability
• The ability to employ heterogeneous computingenvironments(e.g.,SunandPC)
• Thefeasibilityandadvantagesofatrackdataserver• A single track numbering scheme and server for
allocating/maintainingtracknumbers• Thefeasibilityofstandardnetworkapproacheslike
Ethernet, FDDI (Fiber-Distributed Data Interface)andATM(asynchronoustransfermode)
• Strategies for control of the complex distributedsystemandreconfigurationforreliabilityorscaling
• Thevalueofreal-timeinstrumentation
The relationship between these accomplishmentsand the production Aegis baselines is primarily indi-rect.Whileitwouldbeimpropertoclaim“responsibil-ity”foranyoftheadvancesthattheLockheedMartinengineeringteamhasbeenabletoinfuseintoongoingAegisdevelopments,itisclearthattheHiPer-Deffortshave emboldened the program to advance rapidly.Currently,threemajorAegisbaselinesareunderdevel-opment,eachwithincreasinglyhigherrelianceondis-tributedcomputingandtechnology.Themostrecent,referred to as Aegis Open Architecture, will fullycapitalize on its distributed architecture, involving acomplete re-architectingof the system for a commer-cial-off-the-shelfdistributedcomputinginfrastructure.
Inthespecificareaofcommunications,HiPer-Dhascontinued to follow the advance of technology in its
pursuitofservicesthatcaneffectivelymeetAegistac-tical requirements. Robust solutions with satisfyinglyhigh levels of performance have been demonstrated.Theproblemremains,however,thatthelifeofcommer-cialproductsissomewhatfragile.Thebottom-lineneedforprofit incommercialofferingsconstrainsthesetofresearchproductsthatreachthecommercialmarketandputstheirlongevityinquestion.Thecommercialworld,particularly the computing industry, is quitedynamic.Significantproductswithapparentlypromising futuresandindustrysupportcanarriveanddisappearinaveryshorttime.AninterestingexampleofthisisFDDInet-work technology. Arriving in the early 1990s as analternative to Ethernet that elevated capacity to the“next level” (100Mbit),andgarnering the supportofvirtuallyallcomputermanufacturers,ithasallbutdis-appeared. However, it is this same fast-paced changethatisenablingnewsystemconceptsandarchitecturalapproachestoberealized.
Thechallenge,wellrecognizedbyLockheedMartinandintegratedintoitsstrategyforAegisOpenArchi-tecture, is to build the complex application systemona structured foundation that isolates theapplica-tion from the dynamic computing and communica-tionenvironment.
REFERENCES 1Wander, K. R., and Sleight, T. P., “Definition of New Navy Large
ScaleComputer,” inJHU/APL Developments in Science and Technol-ogy,DST-9,Laurel,MD(1981).
2Zitzman, L. H., Falatko, S. M., and Papach, J. L., “Combat SystemArchitecture Concepts for Future Combat Systems,” Naval Eng. J.102(3),43–62(May1990).
3HiPer-D Testbed 1 Experiment Demonstration,F2D-95-3-27,JHU/APL,Laurel,MD(16May1995).
4Amalthea Software Architecture,F2D-93-3-064,JHU/APL,Laurel,MD(10Nov1993).
5Powell,D.(ed.),Delta-4: A Generic Architecture for Dependable Distrib-uted Computing,ESPRITProject818/2252,Springer-Verlag(1991).
6High Performance Distributed Computing Program 1999 Demonstration Report,ADS-01-007,JHU/APL,Laurel,MD(Mar2001).
7High Performance Distributed Computing Program (HiPer-D) 2000 Dem-onstration Report,ADS-01-020,JHU/APL,Laurel,MD(Feb2001).
8Gergeleit, M., Klemm, L., et al., Jewel CDRS (Data Collection and Reduction System) Architectural Design,ArbeitspapierederGMD,No.707(Nov1992).
9Birman,K.P.,andVanRenesse,R.(eds.),Reliable Distributed Comput-ing with the Isis Toolkit,IEEEComputerSocietyPress,LosAlamitos,CA(Jun1994).
10Stanton, J.R.,A Users Guide to Spread Version 0,1,Center forNet-workingandDistributedSystems,TheJohnsHopkinsUniversity,Bal-timore,MD(Nov2000).
JOHNSHOPKINSAPLTECHNICALDIGEST,VOLUME22,NUMBER4(2001) 497
DISTRIBUTED COMPUTER ARCHITECTURES
THE AUTHORS
MARKE.SCHMIDreceivedaB.S.fromtheUniversityofRochesterin1978andanM.S.fromtheUniversityofMarylandin1984,bothinelectricalengineering.HejoinedAPLin1978inADSD’sAdvancedSystemsDevelopmentGroup,whereheiscurrentlyanAssistantGroupSupervisor.Mr.Schmid’stechnicalpursuitshavefocusedontheapplicationofemergentcomputerhardwareandsoftwaretechnol-ogytothecomplexenvironmentofsurfaceNavyreal-timesystems.InadditiontotheHiPer-Deffort,hisworkincludespursuitofsolutionstoNavalandJointForcesysteminteroperabilityproblems.Hise-mailaddressismark.schmid@jhuapl.edu.
DOUGLAS G. CROWE is a member of the APL Senior Professional Staff andSupervisorof theA2D-003Section inADSD’sAdvancedSystemsDevelopmentGroup.HehasB.S.degrees incomputerscienceandelectricalengineering,bothfromThePennsylvaniaStateUniversity,aswellasanM.A.S.fromTheJohnsHop-kinsUniversity.Mr.Crowehasbeentheleadforgroupsworkingonthedevelop-mentofautomatictestequipmentforairborneavionics,trainingsimulators,andairdefenseC3I systems.He joinedAPL in1996and iscurrently the lead forAPL’[email protected].