April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing...

18
April 29, 2006 DynAMOS -- SMTPS '06 1 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris <[email protected]> Arizona State University Kyung Dong Ryu <[email protected]> IBM T.J. Watson Research Center

Transcript of April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing...

Page 1: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 1

On-the-Fly Kernel Updates for High-Performance Computing Clusters

Kristis Makris <[email protected]>Arizona State University

Kyung Dong Ryu <[email protected]>IBM T.J. Watson Research Center

Page 2: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 2

Motivation

Updating the kernel in HP clusters requires downtime Revenue loss in pay-per-use, time-sharing clusters Disruption of long-lived parallel tasks

Process migration may not be possible Postponing updates has its price

Unpatched kernel security holes Missed kernel specialization opportunities

Adaptive selection of kernel subsystem to use; Virtualization cannot help

Parallel computing needs Safe, unobtrusive updates (no system restart) Temporary, reversible specialization of some nodes Portable updating system (i386 + PowerPC)

Page 3: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 3

Solution: Dynamic Kernel Updates

Approaches Adaptable OS

Specially crafted, like K42, VINO, Synthetix Require OS and application restructuring

Dynamic code instrumentation Zero kernel source modification (KernInst, GILK) Basic block code interposition Currently limited

• No procedure replacement• No autonomous kernel adaptability• No safe, complete subsystem update guarantees

Page 4: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 4

Dynamic Updates Classification

Updating changes in Userspace requirements

Security fix breaks existing applications that rely on defect Kernel external requirements

Function signature changes (API changes) Kernel internal requirements

Global variables used by a function group (e.g. enlarge copy buffer used in pipefs)

Updating needs State tracking

Enlarge copy buffer only for 2 processes Must adaptively enlarge the buffer and use newer functions

State transfer Copy data from old buffer to new

Page 5: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 5

Dynamic Update Types No safe update point

Update read-only global variable (e.g maximum number of open files)

Add new variable used only by a single function Safe update point

Update uid of an inode (guarded by a semaphore) Add new variable used by function group (must update atomically)

Non-quiescent resources Update kernel scheduler to use different policy.

Datatype updates Update functions that use the old datatype to use the new datatype Maintain shadow data structure that holds only new fields, and

update only functions that use the new fields

Page 6: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 6

DynAMOS System Architecture

Distribute updates to cluster nodes Process updating requests from

control station with framework

Prepare updates to be applied Coordinate safe activation/removal

Currently implemented for i386 uniprocessor Linux kernels 2.2-2.6

Page 7: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 7

Execution Flow Redirection (1)

Install trampoline in beginning of original function Disable local processor interrupts Flush I-cache

Use an indirect jump (jmp *) Don’t modify page permissions

Divert execution to a redirection handler

Original function can no longer be directly executed

Page 8: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 8

Execution Flow Redirection (2)

Create separate redirection handler for each function Customize from template

Clone and relocate original function image

Choose between active function versions with adaptation handler

Can execute different versions of functions in different process contexts

Page 9: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 9

Function Cloning Benefits

Unaltered stack when newer function is executed No processor state saved on stack

Autonomous kernel determination of update timeliness Using adaptation handler

Function-level instrumented applications Basic blocks can be bypassed Modifications developed in functions with original

source language

Page 10: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 10

Function Relocation

Adjust relative branch instructionsReplace ret instructions with jumps back

to redirection handlerSafely detect

Backward branches: Point to code overwritten by trampoline

Outbound branches: Jump to code outside function image

Page 11: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 11

Applying Security Patches

Openwall hardening changes for Linux 2.4.22 Permission check when writing in named pipes

Updated open_namei function No safe update point needed

Permission check when following a symbolic link Updated open_namei, vfs_link functions Had to update inline function do_follow_link,

used by link_path_walk No need to update functions atomically

Confirmed unauthorized access was denied

Page 12: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 12

Applying Unobtrusive Fine-grained Cycle Stealing

Linger-Longer system for Linux 2.2.19 Introduces a guest priority New scheduling policy

Updated schedule function in 4-node clusterConfirmed guest processes were not

consuming CPU time when host processes were active

Page 13: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 13

Applying Adaptive Memory Paging For Efficient Gang-Scheduling Various adaptive memory paging policies for Linux

2.2.19 for 4-node cluster Required modifications in kswapd, swap_out, rw_swap_page, swapin_readahead, filemap_nopage

kswapd is a kernel thread that never exits Beginning of function is never called again Thread sleeps by calling interruptible_sleep_on Insert interruptible_sleep_on_v2 forcing kswapd to exit Start kswapd_v2

Confirmed job switching time was reduced

Page 14: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 14

Overhead

29k footprint < 1ns trampoline

installation time 20 ns redirection handler

overhead 2.3 secs update on 2Ghz

P4 (adaptive paging) 1-8% overhead (due to

indirect jump)

Page 15: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 15

Related Work

Cluster Management Systems Do not support dynamic kernel updates

K42 Specially designed with hot-swappable capabilities Requires quiescence for all updates

Hicks’ system User-level software updates; requires recompilation

KernInst, GILK, ATOM, EEL Do not facilitate adaptive execution Do not replace complete subsystems

Page 16: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 16

On-going and Additional Work Ensure safe update reversal

Confirm quiescence in stack and program counter Update datatypes

Maintain shadow data structure of new fields Apply EPCKPT kernel-assisted checkpointing Adaptively enlarge pipefs buffer Apply Superpages support Apply Scalable TCP for highspeed WANs Automatically produce updates given a patch file

Apply MOSIX Upgrade Linux kernel

Page 17: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 17

Conclusion

Dynamic Kernel Updates Dynamic code instrumentation Commodity operating system Function cloning for adaptive execution

Multiple function versions can run concurrently Safe updates of non-quiescent subsystems

Scheduler, kernel threads Demonstrated updates

Adaptive memory paging for efficient gang-scheduling Unobtrusive fine-grain cycle stealing Public security fixes

Small memory footprint, 1-8% overhead

Page 18: April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris Arizona State University Kyung Dong.

April 29, 2006 DynAMOS -- SMTPS '06 18

Questions ?