Computational Resiliency

27
14 Feb 2001 OASIS PI Meeting Computational Computational Resiliency Resiliency Steve J. Chapin, Susan Older Steve J. Chapin, Susan Older Syracuse University Syracuse University Gregg Irvin Gregg Irvin Mobium Enterprises Mobium Enterprises 1

description

Computational Resiliency. Steve J. Chapin, Susan Older Syracuse University Gregg Irvin Mobium Enterprises. Recap: What is Computational Resiliency?. The ability to sustain application operation and dynamically restore the level of assurance during an attack. - PowerPoint PPT Presentation

Transcript of Computational Resiliency

Page 1: Computational Resiliency

14 Feb 2001 OASIS PI Meeting

Computational ResiliencyComputational Resiliency

Steve J. Chapin, Susan OlderSteve J. Chapin, Susan Older

Syracuse UniversitySyracuse University

Gregg IrvinGregg Irvin

Mobium EnterprisesMobium Enterprises

1

Page 2: Computational Resiliency

Recap: What isRecap: What isComputational Resiliency?Computational Resiliency?

The ability to sustain application operation The ability to sustain application operation and dynamically restore the level and dynamically restore the level

of assurance during an attack.of assurance during an attack.

Application-centric self defense, builtApplication-centric self defense, builton replication, migration, functionalityon replication, migration, functionality

mutation, and camouflage.mutation, and camouflage.

Page 3: Computational Resiliency

Computational ResiliencyComputational Resiliency

Mission CriticalApplication

Attack

Degraded Application sufficiently Improved by

Resiliency to perform Mission Critical Function

Techniques applied to correct situation

ComputationalResiliency

Result ofAttack

Degraded Application trying to perform Mission Critical

Function

Page 4: Computational Resiliency

Example of CRLibExample of CRLib

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

“Safe Zone”OASIS protection

“The Wild”limited protection

Page 5: Computational Resiliency

The PlayersThe Players

Rocky & Bullwinkle: our heroes, both air Rocky & Bullwinkle: our heroes, both air and ground forces.and ground forces.

Dudley: representative of allied power.Dudley: representative of allied power. Boris & Natasha: Directed by shadowy Boris & Natasha: Directed by shadowy

figure (Fearless Leader). Mission: big figure (Fearless Leader). Mission: big trouble for Moose and Squirrel.trouble for Moose and Squirrel.

Snidely: attempting to disrupt Dudley’s Snidely: attempting to disrupt Dudley’s jobs.jobs.

Page 6: Computational Resiliency

The Benign StateThe Benign State

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Dudley’s job(low priority)

Bullwinkle’s jobRocky’s job

Page 7: Computational Resiliency

The AttacksThe Attacks

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Snidely: blocked atfirewall

Dudley does nothing.

Page 8: Computational Resiliency

The AttacksThe Attacks

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Natasha attacks Rocky; caught by IDS.

Page 9: Computational Resiliency

The AttacksThe Attacks

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Rocky’s job migrates back into safe zone;Dudley must give up resources.

Page 10: Computational Resiliency

The AttacksThe Attacks

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Boris attacks Bullwinkle’s job.Some attacks succeed.

Page 11: Computational Resiliency

The AttacksThe Attacks

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Bullwinkle’s job employs camouflage,decoys, and migration.

Page 12: Computational Resiliency

Multi-Faceted ApproachMulti-Faceted Approach

Strong theoretical basis Strong theoretical basis reason about conformance to policyreason about conformance to policy

Computational resiliency libraryComputational resiliency library dynamic application managementdynamic application management

System software support System software support scheduling/policy frameworksscheduling/policy frameworks

Page 13: Computational Resiliency

Computational ResiliencyComputational ResiliencyLibraryLibrary

Group messagingGroup messaging group contains multiple nodesgroup contains multiple nodes all nodes receive all messages to groupall nodes receive all messages to group

Replication/recovery with migrationReplication/recovery with migration liveness check at synchronization pointsliveness check at synchronization points application readiness restored via node application readiness restored via node

creation and migrationcreation and migration

Page 14: Computational Resiliency

Groups and MessagingGroups and Messaging

Group 1

Group 2

Group 3

nodechannel

One group per cooperating task in a distributed computation.

Page 15: Computational Resiliency

Group Messaging DetailGroup Messaging Detail

Group 1 Group 2

In actuality, each member of Group 1 hasa channel to each member of Group 2.

Page 16: Computational Resiliency

Mapping of Nodes to Processors Mapping of Nodes to Processors (channels not shown)(channels not shown)

Group

Processor

Nodes of group Nodes of group mapped across mapped across processorsprocessors

Multiple nodes as Multiple nodes as threads in a single threads in a single processprocess

One or more One or more processes per processes per processorprocessor

Page 17: Computational Resiliency

Periodic Liveness CheckPeriodic Liveness Check Done at user-defined synchronization points in Done at user-defined synchronization points in

the computationthe computation All group members send ping messages to all All group members send ping messages to all

others in the same groupothers in the same group Local Group Leader (1 per group) elected Local Group Leader (1 per group) elected

(responsible for restoring intra-group replication (responsible for restoring intra-group replication level)level)

LGLs elect Global Group Leader (responsible LGLs elect Global Group Leader (responsible for inter-group coordination)for inter-group coordination)

Page 18: Computational Resiliency

Periodic Liveness Check IIPeriodic Liveness Check II LGLs determine local status by fiat, LGLs determine local status by fiat,

restore replication level, and report to GGLrestore replication level, and report to GGL create new threads via cloning LGLcreate new threads via cloning LGL consensus option is in place but currently consensus option is in place but currently

unusedunused GGL reports results of LGL actions to GGL reports results of LGL actions to

other LGLs.other LGLs. LGL and GGL return to normal dutyLGL and GGL return to normal duty

Page 19: Computational Resiliency

Simple ApplicationSimple Application

Page 20: Computational Resiliency

Simple Application After Simple Application After Process Taken Out by AttackerProcess Taken Out by Attacker

Page 21: Computational Resiliency

Application After Second Application After Second Processor LostProcessor Lost

Page 22: Computational Resiliency

Current IssuesCurrent Issues Exploring through in-house red teaming and Exploring through in-house red teaming and

modelingmodeling Efficiency of basic mechanismsEfficiency of basic mechanisms

multiplicative communication loadmultiplicative communication load additive computation loadadditive computation load

Efficacy of basic mechanisms Efficacy of basic mechanisms Window of attack between liveness checksWindow of attack between liveness checks Attack during liveness checkAttack during liveness check agreement algorithmsagreement algorithms

Page 23: Computational Resiliency

Next StepsNext Steps

Additional policy choicesAdditional policy choices agreement protocolsagreement protocols replication/recovery methodsreplication/recovery methods message passing schemes message passing schemes

Tool for user policy expressionTool for user policy expression state-dependent policy specified via state-dependent policy specified via

“chinese menu” approach“chinese menu” approach logical predicates, state transitionslogical predicates, state transitions

} Not necessarilyorthogonalchoices

Page 24: Computational Resiliency

Next StepsNext Steps

-calculus-based formal model for core -calculus-based formal model for core library behaviorlibrary behavior

Split/merge for groupsSplit/merge for groups all nodes in a group must be identicalall nodes in a group must be identical basis for load balancing, functionality basis for load balancing, functionality

mutationmutation First demo at summer PI meeting, 2001First demo at summer PI meeting, 2001

Page 25: Computational Resiliency

ScheduleSchedule

6/00 12/00 6/01 12/02 6/02 12/02 6/03 12/03

Basic -calc

Formalequivalence

Policy/ProtocolAnalysis

BasicCRLib

Page 26: Computational Resiliency

Schedule IISchedule II

6/00 12/00 6/01 12/02 6/02 12/02 6/03 12/03

Funct. Mut.PolicyFrameworksCamouflage

Schedulers

Hard. Apps.Integration

Demos

Page 27: Computational Resiliency

Open IssuesOpen Issues

Cost/benefit analysis of CRCost/benefit analysis of CR how much protection do we provide if the how much protection do we provide if the

attacker knows what we’re trying to do?attacker knows what we’re trying to do? How much is performance affected by How much is performance affected by

message load, active replication, etc.message load, active replication, etc. Potential integration with other OASISPotential integration with other OASIS

complementary with system-hardening complementary with system-hardening technology (e.g., Dependable Intrusion technology (e.g., Dependable Intrusion Tolerance)Tolerance)