VM Introspection for Cognitive Immunity (VICI)
Komoku, Inc.Tuesday 18 December 2007
Talk: Tim Fraser [email protected]: Matt Evenson [email protected]
Agenda
1. Project status update.2. New repair strategies.3. New control architecture.4. Summary and conclusions.
Copyright (C) 2007 Komoku, Inc. 2
The VICI approach
VICI XEN KERNEL
1. Run diagnostics
2. Attempt repair
3. Evaluate repair
4. Learn
3Copyright (C) 2007 Komoku, Inc.
VICI detects kernel-modifying rootkitsand repairs the infected kernel.
GOAL
1. Self-diagnosis:< 50% false negative rate< 10% false positive rate
2. Self-healing:Repair within 250msof infection.
3. Cognitive immunity:Learn from repeated attacks:Escalate to optimize response time.De-escalate to reduce harm.
Project timeline, goals, and progress
Phase 1 prototype:Basic diagnostics and repairs
Phase 2 prototype:Add advanced repairs,Brooks-style control architecture,learning for (de)escalation.
Phase 3 (final) prototype:Increase Surgical layer coverage for Red Team exercises.
Q1 Q2 Q3 Q4 Q5 Q6
(Jun 07) (Dec 07) (Jun 08)
4Copyright (C) 2007 Komoku, Inc.
Progress towards goals
Copyright (C) 2007 Komoku, Inc. 5
On schedule. Some insight, experience gained.Deliverables as proposed. More expected from Red Team exercise.
GOAL STATUS
1. Self-diagnosis: o Five effective strategies.< 50% false negative rate o Additional one discarded.< 10% false positive rate o CP, Reboot effective but slow.
o Need to increase coverage of2. Self-healing: most basic “Surgical” strategy.
Repair within 250ms o Need to see how much we canof infection. cram into 250ms.
3. Cognitive immunity:Learn from repeated attacks: o Escalation, De-escalation works.Escalate to optimize response time. o Ready for testing.De-escalate to reduce harm.
What’s new?
VICI XEN KERNEL
1. Run diagnostics
2. Attempt repair
3. Evaluate repair
4. Learn
New repair strategies:Core War, Hitman
Checkpoint, Reboot
New control architecture to
map diagnoses to repairs.
Agent learns current threat sophistication level
andadjusts how it chooses
responses.
6Copyright (C) 2007 Komoku, Inc.
Learning the present threat level• Agent gets “angry” when
repairs fail repeatedly.• Angry Agent switches to
more extreme repair strategies.
• Extreme repairs may defeat clever rootkits, but they may also destroy useful kernel state ( == cost).
• Successful repairs make Agent calm down, back down from extreme repairs.
• This escalation and de-escalation makes the Agent learn and adjust to the current level of attack sophistication.
VICI Agent Repair strategy VM kernel
7Copyright (C) 2007 Komoku, Inc.
:-) Surgical
:-| Core War
:-( Hitman
>:-( Checkpoint
>:-O Reboot
Part 2: new repair strategies
8Copyright (C) 2007 Komoku, Inc.
Surgical repair on basic TtysnoopUser app
Systemcall vector
Rootkit
Kerneltext
infected repaired
surgical
Surgical repair is simple and does not cause collateral damage.
9Copyright (C) 2007 Komoku, Inc.
surgical
core war
Core War on Ttysnoop w/snoopdUser app
Systemcall vector
Rootkit
Kerneltext
infected surgical repair repaired ineffective
surgical
Core War repair leaves bad control flow but renders rootkit harmless.
10Copyright (C) 2007 Komoku, Inc.
How Core War worksSystem Call Table Ttysnoop fake sys_read() real sys_read()
sys_read call real sys_read
If password then print
return to caller
1. Core War drops in code to jump to the real function at the top of he fake routine.•Same two-instruction code snippet works for everyone:•Leave stack the same, jump to the real function’s start address.2. Core War writes NOPs from that point down to the beginning of the stack cleanup and return code.•Only threads that already went through the rootkit before the repair return through these NOPs.•Threads that arrive after the repair jump to the real function and never return to the rootkit. 11Copyright (C) 2007 Komoku, Inc.
Hitman on Ttysnoop w/strongdUser app
Systemcall vector
Rootkit
Kerneltext
infected core war repair repaired ineffective
surgical
core war
surgical
hitman,core war
Hitman repair kills the rootkit kernel threads that defeat other repairs.
12Copyright (C) 2007 Komoku, Inc.
How Hitman worksI. Identify rootkit II. For each process III. Kill processes
start and end addrsSystem Top of per-processcall table kernel stack This could be a stored0xc7891011 0x56780000 return address. Write0xc4560004 0xd00d1234 invalid instruction here0xd00d0bad 0x00001234 to kill process.0xc1230080 0x91011121
Ttysnoopstart: 0xd00d0000 end: 0xd00e0000
Plan: Lay mines on path used by rootkit helpersIf rootkit not in modules not on path used by good processes.list, use 4KB page thatcontains bad addressfor start and end.
ttysnoop: fake read helper routine13Copyright (C) 2007 Komoku, Inc.
Checkpoint and Reboot repairsrebootcheckpoint 1 2 3
X Y Z
Problem: Xen takes Typical case:~6 seconds to Attack at time Z.Restore a CP. VICI restores CP 3.
Some loss of state.Need more complexcontrol to avoid Possible stealthy case?attacks that Infect at Y using some stealthy method VICI misses.prevent Remains dormant until Z, VICI now detects.progress? VICI restores CP 3, 2, 1 to reach uninfected CP.
Worst case:Infect at X, dormant until Z. Need to reboot. Massive state loss.
14Copyright (C) 2007 Komoku, Inc.
Part 3: new control scheme
15Copyright (C) 2007 Komoku, Inc.
Brooks control scheme for robotsCode Variable Code Variable Code
Level 0: avoid collisions
Sonar Distance Be scared Motormeasure- of nearby Direction to controllerments objects flee in
Key insight: the world is its own best representation.
Brooks development method:
1.Start with an initial level for the simplest behavior.
2.Test robot in real world until you get it right.
3.Add more levels. Life-like behavior emerges from composition of levels.16Copyright (C) 2007 Komoku, Inc.
Brooks control scheme for robots
•Higher levels can read, overwrite lower levels’ variables to use, modify their behavior.
• Lower levels cannot know about higher levels.
Code Variable Code Variable Code
Level 0: avoid collisions
Sonar Distance Be scared Motormeasure- of nearby Direction to controllerments objects flee in
Level 1: explore
Pick a Direction Combinerandom to wander wander with Direction todirection in object travel in
avoidance
17Copyright (C) 2007 Komoku, Inc.
Brooks control scheme for VICI
•Higher levels can read, overwrite lower levels’ variables to use, modify their behavior.
• Lower levels cannot know about higher levels.
Code Variable Code Variable Code
Level 0: surgical repair
Diagnostic: Lists of Control: Lists of Repair:hash, value tampered if it’s bad, tables, text write backcomparisons tables, it needs to fix good
text, … fixing values
Level 1: core war
Diagnostic: List of bad Control: Rootkit Repair:Identify indi- function On repeated functions Neutervidual bad pointers lvl 0 failure, to neuter rootkit pointers do Core War code
18Copyright (C) 2007 Komoku, Inc.
Escalation and De-escalation• Core War repair runs when Surgical repair fails once.• “Fails once” = Surgical detects a problem on two consecutive cycles.• Hitman follows Core War, then Checkpoint, then Reboot.
• Escalation optimizes response for time when faced with repeated attack.• De-escalation backs down from expensive repairs when cheap ones work again.
HITMAN:CORE WAR:SURGICAL:
HITMAN:CORE WAR:SURGICAL:
Escalation = ImmediateHitman 10X.
In demo, Agent sleeps to make this ~3 secs delay avoided
De-escalation = After 10 of These…
Drop down to 10 of these, solong as it works.
19Copyright (C) 2007 Komoku, Inc.
Screenshot from demo
Copyright (C) 2007 Komoku, Inc. 20
Scrolling display tracks VICI Agent’s “anger” level as Agent runs.
Red bars are cycles where VICI detected attacks.
Green bars are cycles where VICI detected no attacks.
Bar height indicates anger level.
VICI layers = directed acyclic graph
Copyright (C) 2007 Komoku, Inc. 21
ktables ktext mtext registers entropy packet
Surgical 1
Core War 2
Hitman 3
Checkpoint 4
Reboot 5
Part 4: Summary and conclusion
22Copyright (C) 2007 Komoku, Inc.
Insights, experience so far
Copyright (C) 2007 Komoku, Inc. 23
1. The 250ms time bound limits what you can do and how you can do it.•Komoku Monitoring Engine’s scripting language too slow, checks too numerous.•Solution: VICI Agent entirely C-based, fewer checks.
2. Xen source code availability is critical for research; otherwise not best choice.•Checkpoint and restore is slow.•Can’t checkpoint HVM machines without killing VM.•Perhaps better: small custom hypervisor- No fancy inter-domain communication interface- No general-purpose OS in domain 0.
3.Brooks architecture aids incremental development as advertised, but…•discourages use of strong interfaces and •abstraction for complexity control if followed literally.
Tasks completed and remaining
24Copyright (C) 2007 Komoku, Inc.
Prototype Tasks Malware for tests
Phase 1: Surgical ktables ktext entropy … Rootsim(Goal: basic diag- repairs: nosis & repair.)
Phase 2: Non- Ttysnoop with(Goal: alternate surgical Core War, Hitman snoopd and repairs and repairs: Checkpoint, Reboot strongd learning.)
Control artchitecture
Learning
Phase 3: Increase Coverage(Goal: meet SRS2 require- Red Team Exercises ments.)
Summary of accomplishments• Demonstrated automated detection:
+ Effective against 6 categories of attack derived from real-world
rootkits and current research.
- 250ms limit is apt to limit coverage.
• Demonstrated surgical, core war, hitman, checkpoint, reboot repairs:
+ Provides effective self-healing in our tests.
- Checkpoint, reboot repairs take too long (~6 seconds).
• Demonstrated control scheme for escalation and de-escalation:
+ Needs no complex internal representation of what a rootkit is.
+ Agent learns, reacts to current threat sophistication level.
25Copyright (C) 2007 Komoku, Inc.
Extra slides
26Copyright (C) 2007 Komoku, Inc.
What is a kernel-modifying rootkit?
Jump Table
Rootkit
KernelText
Frequently Changing
Kernel Data
Registers
User Apps• Adversaries install kernel-modifying
rootkits after they have gained full administrative control over a machine.
• The rootkit makes the kernel lie, hiding the adversary’s presence from the real admins.• Hide processes, files.• Some rootkits also provide backdoors,
TTY sniffers.
• How do rootkits modify the kernel’s behavior?• Replace jump table function pointers
with pointers to rootkit code.• Modify kernel text (instructions)• Modify other kernel data structures
(example: process table links)• Modify CPU registers.
27Copyright (C) 2007 Komoku, Inc.
Surgical Repair
Jump Table
Rootkit
KernelText
Frequently Changing
Kernel Data
Registers
User Apps
}
}
MD5 Hash
MD5 Hash
Overwrite
Overwrite
Overwrite
Diagnostic Repair
Surgical repair essentially writes back proper values. Our coverage is presently poor.28Copyright (C) 2007 Komoku, Inc.
Learning in the Brooks architectureCode Variable Code Variable Code
Level 1: core war
Diagnostic: List of bad Control: Rootkit Repair:Identify indi- function On repeated functions Neutervidual bad pointers lvl 0 failure, to neuter rootkit pointers do Core War code
Control stateangry = 3 Wiring is
Feedback can change these: threshold = 1 fixed, too.delta = 1on level 0 failure: angry += delta
The algorithm is fixed: on level 0 success: angry = 0on angry >= threshold: do repair
Each level has its own separate feedback function. There is no global feedback function.
29Copyright (C) 2007 Komoku, Inc.
Assumptions1. In a real deployment:A. The Domain 0 OS would be hardened. Ours isn’t.B.Xen would be hardened. Ours isn’t.
(Actually, a less featureful custom hypervisor without a general-purpose Domain 0 OS would probably be better than Xen + Debian GNU/Linux.)
2. In a real product, VICI would learn what a healthy kernel looks like by examining installation media or some non-deployed gold-standard healthy kernel. (Useful in a product but not interesting code for research.)
Instead, we assume a grace period after boot during which we can snapshot the virtualized kernel in a known-good state.
3. User-mode rootkits aren’t interesting anymore. We care only about kernel-modifying rootkits.
4. An a adversary can easily gain administrative control of the victim OS.
30Copyright (C) 2007 Komoku, Inc.
What’s a rootkit and what’s notRootkits make persistent modifications to the kernel in order to allow the adversary to maintain a clandestine presence on the system for days, weeks, or months.
A rootkit must have at least some useful functionality: hiding processes, files, modules, or sniffing TTYs.
It must modify the kernel’s responses to all requests for relevant services made by all processes, with the possible exception of a small set of processes operated exclusively by the adversary. Alternately, in the case of TTY sniffers, it must monitor the requests rather than modify the responses.
It is easy to add and immediately remove a kernel modification in order to avoid detection. However, that by itself is not sufficient to make a rootkit. A rootkit needs persistent modifications that operate synchronously with user requests, for example, to tamper with the results of the sys_read system call whenever any user process calls sys_read. Still, some clever rootkits make a very small set of persistent changes along strategic control-flow paths that allow them to set up and remove additional temporary changes.
A rootkit must have some means for remote control over the network (perhaps a backdoor) and/or a means for exfiltrating data over the network.
31Copyright (C) 2007 Komoku, Inc.
Top Related