Download - VM Introspection for Cognitive Immunity (VICI) Komoku, Inc. Tuesday 18 December 2007 Talk: Tim Fraser [email protected]@komoku.com Demo: Matt Evenson.

VM Introspection for Cognitive Immunity (VICI)

Komoku, Inc.Tuesday 18 December 2007

Talk: Tim Fraser [email protected]: Matt Evenson [email protected]

mailto:[email protected]

mailto:[email protected]

Agenda

1. Project status update.2. New repair strategies.3. New control architecture.4. Summary and conclusions.

Copyright (C) 2007 Komoku, Inc. 2

The VICI approach

VICI XEN KERNEL

1. Run diagnostics

2. Attempt repair

3. Evaluate repair

4. Learn

3Copyright (C) 2007 Komoku, Inc.

VICI detects kernel-modifying rootkitsand repairs the infected kernel.

GOAL

1. Self-diagnosis:< 50% false negative rate< 10% false positive rate

2. Self-healing:Repair within 250msof infection.

3. Cognitive immunity:Learn from repeated attacks:Escalate to optimize response time.De-escalate to reduce harm.

Project timeline, goals, and progress

Phase 1 prototype:Basic diagnostics and repairs

Phase 2 prototype:Add advanced repairs,Brooks-style control architecture,learning for (de)escalation.

Phase 3 (final) prototype:Increase Surgical layer coverage for Red Team exercises.

Q1 Q2 Q3 Q4 Q5 Q6

(Jun 07) (Dec 07) (Jun 08)


Progress towards goals


On schedule. Some insight, experience gained.Deliverables as proposed. More expected from Red Team exercise.

GOAL STATUS

1. Self-diagnosis: o Five effective strategies.< 50% false negative rate o Additional one discarded.< 10% false positive rate o CP, Reboot effective but slow.

o Need to increase coverage of2. Self-healing: most basic “Surgical” strategy.

Repair within 250ms o Need to see how much we canof infection. cram into 250ms.

3. Cognitive immunity:Learn from repeated attacks: o Escalation, De-escalation works.Escalate to optimize response time. o Ready for testing.De-escalate to reduce harm.

What’s new?

VICI XEN KERNEL

1. Run diagnostics

2. Attempt repair

3. Evaluate repair

4. Learn

New repair strategies:Core War, Hitman

Checkpoint, Reboot

New control architecture to

map diagnoses to repairs.

Agent learns current threat sophistication level

andadjusts how it chooses

responses.


Learning the present threat level• Agent gets “angry” when

repairs fail repeatedly.• Angry Agent switches to

more extreme repair strategies.

• Extreme repairs may defeat clever rootkits, but they may also destroy useful kernel state ( == cost).

• Successful repairs make Agent calm down, back down from extreme repairs.

• This escalation and de-escalation makes the Agent learn and adjust to the current level of attack sophistication.

VICI Agent Repair strategy VM kernel


:-) Surgical

:-| Core War

:-( Hitman

>:-( Checkpoint

>:-O Reboot

Part 2: new repair strategies


Surgical repair on basic TtysnoopUser app

Systemcall vector

Rootkit

Kerneltext

infected repaired

surgical

Surgical repair is simple and does not cause collateral damage.


surgical

core war

Core War on Ttysnoop w/snoopdUser app

Systemcall vector

Rootkit

Kerneltext

infected surgical repair repaired ineffective

surgical

Core War repair leaves bad control flow but renders rootkit harmless.


How Core War worksSystem Call Table Ttysnoop fake sys_read() real sys_read()

sys_read call real sys_read

If password then print

return to caller

1. Core War drops in code to jump to the real function at the top of he fake routine.•Same two-instruction code snippet works for everyone:•Leave stack the same, jump to the real function’s start address.2. Core War writes NOPs from that point down to the beginning of the stack cleanup and return code.•Only threads that already went through the rootkit before the repair return through these NOPs.•Threads that arrive after the repair jump to the real function and never return to the rootkit. 11Copyright (C) 2007 Komoku, Inc.

Hitman on Ttysnoop w/strongdUser app

Systemcall vector

Rootkit

Kerneltext

infected core war repair repaired ineffective

surgical

core war

surgical

hitman,core war

Hitman repair kills the rootkit kernel threads that defeat other repairs.


How Hitman worksI. Identify rootkit II. For each process III. Kill processes

start and end addrsSystem Top of per-processcall table kernel stack This could be a stored0xc7891011 0x56780000 return address. Write0xc4560004 0xd00d1234 invalid instruction here0xd00d0bad 0x00001234 to kill process.0xc1230080 0x91011121

Ttysnoopstart: 0xd00d0000 end: 0xd00e0000

Plan: Lay mines on path used by rootkit helpersIf rootkit not in modules not on path used by good processes.list, use 4KB page thatcontains bad addressfor start and end.

ttysnoop: fake read helper routine13Copyright (C) 2007 Komoku, Inc.

Checkpoint and Reboot repairsrebootcheckpoint 1 2 3

X Y Z

Problem: Xen takes Typical case:~6 seconds to Attack at time Z.Restore a CP. VICI restores CP 3.

Some loss of state.Need more complexcontrol to avoid Possible stealthy case?attacks that Infect at Y using some stealthy method VICI misses.prevent Remains dormant until Z, VICI now detects.progress? VICI restores CP 3, 2, 1 to reach uninfected CP.

Worst case:Infect at X, dormant until Z. Need to reboot. Massive state loss.


Part 3: new control scheme


Brooks control scheme for robotsCode Variable Code Variable Code

Level 0: avoid collisions

Sonar Distance Be scared Motormeasure- of nearby Direction to controllerments objects flee in

Key insight: the world is its own best representation.

Brooks development method:

1.Start with an initial level for the simplest behavior.

2.Test robot in real world until you get it right.

3.Add more levels. Life-like behavior emerges from composition of levels.16Copyright (C) 2007 Komoku, Inc.

Brooks control scheme for robots

•Higher levels can read, overwrite lower levels’ variables to use, modify their behavior.

• Lower levels cannot know about higher levels.

Code Variable Code Variable Code

Level 0: avoid collisions

Sonar Distance Be scared Motormeasure- of nearby Direction to controllerments objects flee in

Level 1: explore

Pick a Direction Combinerandom to wander wander with Direction todirection in object travel in

avoidance


Brooks control scheme for VICI

•Higher levels can read, overwrite lower levels’ variables to use, modify their behavior.

• Lower levels cannot know about higher levels.

Code Variable Code Variable Code

Level 0: surgical repair

Diagnostic: Lists of Control: Lists of Repair:hash, value tampered if it’s bad, tables, text write backcomparisons tables, it needs to fix good

text, … fixing values

Level 1: core war

Diagnostic: List of bad Control: Rootkit Repair:Identify indi- function On repeated functions Neutervidual bad pointers lvl 0 failure, to neuter rootkit pointers do Core War code


Escalation and De-escalation• Core War repair runs when Surgical repair fails once.• “Fails once” = Surgical detects a problem on two consecutive cycles.• Hitman follows Core War, then Checkpoint, then Reboot.

• Escalation optimizes response for time when faced with repeated attack.• De-escalation backs down from expensive repairs when cheap ones work again.

HITMAN:CORE WAR:SURGICAL:

HITMAN:CORE WAR:SURGICAL:

Escalation = ImmediateHitman 10X.

In demo, Agent sleeps to make this ~3 secs delay avoided

De-escalation = After 10 of These…

Drop down to 10 of these, solong as it works.


Screenshot from demo


Scrolling display tracks VICI Agent’s “anger” level as Agent runs.

Red bars are cycles where VICI detected attacks.

Green bars are cycles where VICI detected no attacks.

Bar height indicates anger level.

VICI layers = directed acyclic graph


ktables ktext mtext registers entropy packet

Surgical 1

Core War 2

Hitman 3

Checkpoint 4

Reboot 5

Part 4: Summary and conclusion


Insights, experience so far


1. The 250ms time bound limits what you can do and how you can do it.•Komoku Monitoring Engine’s scripting language too slow, checks too numerous.•Solution: VICI Agent entirely C-based, fewer checks.

2. Xen source code availability is critical for research; otherwise not best choice.•Checkpoint and restore is slow.•Can’t checkpoint HVM machines without killing VM.•Perhaps better: small custom hypervisor- No fancy inter-domain communication interface- No general-purpose OS in domain 0.

3.Brooks architecture aids incremental development as advertised, but…•discourages use of strong interfaces and •abstraction for complexity control if followed literally.

Tasks completed and remaining


Prototype Tasks Malware for tests

Phase 1: Surgical ktables ktext entropy … Rootsim(Goal: basic diag- repairs: nosis & repair.)

Phase 2: Non- Ttysnoop with(Goal: alternate surgical Core War, Hitman snoopd and repairs and repairs: Checkpoint, Reboot strongd learning.)

Control artchitecture

Learning

Phase 3: Increase Coverage(Goal: meet SRS2 require- Red Team Exercises ments.)

Summary of accomplishments• Demonstrated automated detection:

+ Effective against 6 categories of attack derived from real-world

rootkits and current research.

- 250ms limit is apt to limit coverage.

• Demonstrated surgical, core war, hitman, checkpoint, reboot repairs:

+ Provides effective self-healing in our tests.

- Checkpoint, reboot repairs take too long (~6 seconds).

• Demonstrated control scheme for escalation and de-escalation:

+ Needs no complex internal representation of what a rootkit is.

+ Agent learns, reacts to current threat sophistication level.


Extra slides


What is a kernel-modifying rootkit?

Jump Table

Rootkit

KernelText

Frequently Changing

Kernel Data

Registers

User Apps• Adversaries install kernel-modifying

rootkits after they have gained full administrative control over a machine.

• The rootkit makes the kernel lie, hiding the adversary’s presence from the real admins.• Hide processes, files.• Some rootkits also provide backdoors,

TTY sniffers.

• How do rootkits modify the kernel’s behavior?• Replace jump table function pointers

with pointers to rootkit code.• Modify kernel text (instructions)• Modify other kernel data structures

(example: process table links)• Modify CPU registers.


Surgical Repair

Jump Table

Rootkit

KernelText

Frequently Changing

Kernel Data

Registers

User Apps

}

}

MD5 Hash

MD5 Hash

Overwrite

Overwrite

Overwrite

Diagnostic Repair

Surgical repair essentially writes back proper values. Our coverage is presently poor.28Copyright (C) 2007 Komoku, Inc.

Learning in the Brooks architectureCode Variable Code Variable Code

Level 1: core war

Diagnostic: List of bad Control: Rootkit Repair:Identify indi- function On repeated functions Neutervidual bad pointers lvl 0 failure, to neuter rootkit pointers do Core War code

Control stateangry = 3 Wiring is

Feedback can change these: threshold = 1 fixed, too.delta = 1on level 0 failure: angry += delta

The algorithm is fixed: on level 0 success: angry = 0on angry >= threshold: do repair

Each level has its own separate feedback function. There is no global feedback function.


Assumptions1. In a real deployment:A. The Domain 0 OS would be hardened. Ours isn’t.B.Xen would be hardened. Ours isn’t.

(Actually, a less featureful custom hypervisor without a general-purpose Domain 0 OS would probably be better than Xen + Debian GNU/Linux.)

2. In a real product, VICI would learn what a healthy kernel looks like by examining installation media or some non-deployed gold-standard healthy kernel. (Useful in a product but not interesting code for research.)

Instead, we assume a grace period after boot during which we can snapshot the virtualized kernel in a known-good state.

3. User-mode rootkits aren’t interesting anymore. We care only about kernel-modifying rootkits.

4. An a adversary can easily gain administrative control of the victim OS.


What’s a rootkit and what’s notRootkits make persistent modifications to the kernel in order to allow the adversary to maintain a clandestine presence on the system for days, weeks, or months.

A rootkit must have at least some useful functionality: hiding processes, files, modules, or sniffing TTYs.

It must modify the kernel’s responses to all requests for relevant services made by all processes, with the possible exception of a small set of processes operated exclusively by the adversary. Alternately, in the case of TTY sniffers, it must monitor the requests rather than modify the responses.

It is easy to add and immediately remove a kernel modification in order to avoid detection. However, that by itself is not sufficient to make a rootkit. A rootkit needs persistent modifications that operate synchronously with user requests, for example, to tamper with the results of the sys_read system call whenever any user process calls sys_read. Still, some clever rootkits make a very small set of persistent changes along strategic control-flow paths that allow them to set up and remove additional temporary changes.

A rootkit must have some means for remote control over the network (perhaps a backdoor) and/or a means for exfiltrating data over the network.