Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a...

1

Transcript of Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a...

Page 1: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Nablacontainers:anewapproachtocontainerisolationBrandonLum,RicardoKoller ,DanWilliams,Sahil Suneja

IBMResearchhttps://nabla-containers.github.io

Kubecon China2018

Page 2: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

ContainersarenotsecurelyIsolated

2

Page 3: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

ContainersarenotsecurelyIsolated

3

- Whatdoesthisexactlymean?

- WhyareVMsconsideredsecurebutnotcontainers?

- Howdoweimprovecontainerisolation?

Page 4: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Overview

• ThreatModel:Isolation• Isolationthroughsurfacereduction• Ourapproach:Nabla• MeasuringIsolation• Nabla vsVMs?

4

Page 5: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Whatdoesitmeantobeisolated?

• Containersthatareco-locatedshouldnotbeabletoaccessdataofanother

• Scenarios:• Horizontalattacksfromvulnerableservices

• Container-nativemulti-tenantcloud

Kernel

attacker

ServiceA

secret

containers

Page 6: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

ContainerIsolationReality• Containers==namespacedprocessesà Kernelexploitsmostlywork

• Sep2018:CVE-2018-14634• DirtyCOW (CVE-2016-5195)• Manymore(CVEdatabase),2018:Codexec (3),Mem.Corrupt(8)

• Horizontalattackpossibleviasharedprivilegedcomponent(kernel) Kernel

attacker

ServiceA

secret

containers

attacker

Exploitviasyscalls

Page 7: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

DirtyCOW

• DityCow ExploitSketch:• mmap apage• Createathreadthatinvokesmadvise

• CreateathreadthatinvokesRead/Write procfs

• TriggersraceconditioninKernelMem.managementcode

// FROM: https://dirtycow.ninja/

map=mmap(NULL,st.st_size,PROT_READ,MAP_PRIVATE,f,0); printf("mmap %zx\n\n",(uintptr_t) map);

/* You have to do it on two threads. */ pthread_create(&pth1,NULL,madviseThread,argv[1]); //madvisepthread_create(&pth2,NULL,procselfmemThread,argv[2]); // R/W procfs

/* You have to wait for the threads to finish. */ pthread_join(pth1,NULL); pthread_join(pth2,NULL); return 0;

7

Page 8: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

ContainerIsolationReality

Kernel

attacker

ServiceA

secret

containers

attacker

Page 9: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Application

Kernel

KernelFootprint

>300Syscalls

disk

FS

• Exploitstargetvulnerablepartofkernelviasyscalls.

• Ifwerestrictthenumberofsyscalls• à Lessreachablekernelfunctions• à Lesspotentialvulnerabilities• à Lesspossibleexploits

Page 10: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Application

Kernel

DockerDefaultSeccomp Policy

~280Syscalls

disk

FS

• Dockerdefaultseccomp policy• disablesaround44systemcallsoutof300+.

• Genericseccomp policies– hardtocreates.t. itissecure

• Syscall profilingismostlyheuristicbased

44Syscallsseccomp (Whitelistingpolicy)

Greyed– unreachablefunctions

Page 11: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Application

Kernel

Nabla

7Syscalls

disk

FS

• Deterministic andgenericseccomp policy

• Only7syscalls!• UsesLibOS techniques

seccompLibOS

Original300+Syscall interface*

Page 12: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Nabla

• Takingunikernel ideasandputtingitintocontainers

• Usingtools/technologiesfromtherumprun andsolo5community

• Modifyunikernel toworkasaprocess

12

“Unikernels asProcesses”(ACMSoCC ’18)

(https://dl.acm.org/citation.cfm?id=3267845)

Page 13: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

MakingandrunningaNabla

• Buildapp.withcustombuildprocess*

• Nabla runtime,runnc loadsthenabla binariesandsetsupseccompprofiles

13

Application

7Syscalls

seccompLibOS

*currentlimitationofbuildprocess,weareinvestigatingwaystoconsiderremovingacustombuildprocess

Application

>300SyscallsBuildprocess* Nabla

Binary

ContainerRuntime

runc

Application Application

runnc

Application

7Syscalls

seccomp

LibOS

Application

7Syscalls

seccomp

LibOS

Page 14: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Demo

14

Page 15: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

strace/ftracemeasurements(Lowisgood)

15

Application

Kernel

>300Syscalls

disk

FS

ftracemeasuresnumberofboxestouched.

stracemeasuressyscallsinvoked.

Page 16: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

ftracemeasurements(lowerisbetter)

16

Kata-containers(VMs)

Nabla

WhatdoesthissayaboutourisolationvsVMs?

Page 17: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

HavewesurpassedVMisolation?

• Weexploredandcontestedthisideainourpaper:

“SayGoodbyetoVirtualizationforaSaferCloud”(USENIXHotCloud 2018)

(https://www.usenix.org/conference/hotcloud18/presentation/williams)

• Maybe… Butseveralquestions:• Implementationspecificcomparisons?KVMvsotherhypervisors• Hardwareinclusivethreatmodel(Spectre/Meltdown,etc.)• Othermetrics

17

Page 18: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

What’sNext?

• Wewanttoengagethecommunity:

• Developmentworkforrunnc/nabla-base-build/nabla-demo-apps• Removeneedtorebuildnabla containers(SupportfordynamiclinkingLibOS)• Createnewimagesandmorelanguagesupportforapplications

• ChimeinonImprovingSecurityAnalysis/Metrics• https://github.com/nabla-containers/nabla-measurements

18

Page 19: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

19

ThankYou!https://nabla-containers.github.io

BrandonLum (@lumjjb)– [email protected]

#NablaContainers

Page 20: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Backup

20

Page 21: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

ftracemeasurements(lowerisbetter)

21

Application

Kernel

>300Syscalls

disk

FS

MeasuringnumberofboxesTouched.

Page 22: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Throughput(higherisbetter)

22

Page 23: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Demo

23

ContainerRuntime

Kubelet

containerd

CNIPlugin

Cri-containerd

CRI

CNI runnc

IMAGEREGISTRY

Imagepull(OCIimagespec)

RunContainer(OCIRuntimeSpec)

OtherConfigfrompodSpeci.e.mounts,security,etc.

runc

Page 24: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

InsideaNabla container

• Unmodifiedusercode(e.g.,Node.js,redis,nginx,etc.)

• Rumprun libraryOS• UnmodifiedNetBSD code+someglue• RunsonthinSolo5unikernel interface

• Nabla Tender• Setupofseccomp policy• TranslatesSolo5callstosystemcalls

Libc

Rumprun glue

NetBSD

Solo5

FSTCP/IP

Application

𝛁 Tender

OriginalContainer

Page 25: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Backup:ContainersvsVMs

25

Page 26: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Overview

• ThreatModel:Isolation• WhatmakesVMsisolated?• Nabla:Howdowegetthoseisolationpropertieswithoutoverhead?

26

Disclaimer:Inthistalk,wearedoinga1:1comparison.Defenseindepthisavaliddiscussionwithadifferentsetoftrade-offs.

Page 27: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

ContainersVMs

27

Hypervisor(+HostKernel(root))

GuestOS ☠

HostKernel

Pro-cess ☠

HighLevel- Syscalls:Filesysteminterface,socketinterface,etc.

LowLevel– VT:BlockDev.Interface,TAPinterface,etc.

Page 28: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

ContainersVMs

28

Infra

Interface

FS

GuestApplication Process

disk

ALOTmoreexploitablecodeintheinfrastructure!!!

Infra

Interface

Guest .OS .

disk

FS

Page 29: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Lower level interface

Less code

Fewer vulnerabilities

Stronger isolation

Page 30: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

30

Page 31: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Kernelfunctionsaccessedbyapplications

• Comparedtostandardcontainers

• 5-6xlesskernelfunctionsaccessed

• 8-14xfewersyscalls

• AbouthalfthenumberofkernelfunctionsaccessedasVMs!

0 200 400 600 800

1000 1200 1400 1600

nginxnginx-large

node-express

redis-get

redis-set

Uni

que

kern

elfu

nctio

ns a

cces

sed process

ukvmnabla

ContainerVM

nabla

Page 32: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

AccessiblekernelfunctionsunderNabla policy

0

100

200

300

400

500

600

700

0 50 100 150 200 250 300

Uni

que

kern

el fu

nctio

ns

acceptnablablock

0

30

0 10

• Trinitykernelfuzztestertotrytoaccessasmuchofkernelaspossible

• Nabla policyreducesamountofaccessiblekernelfunctionsby98%

Page 33: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Unikernel isolationcomesfromtheinterface

• Directmappingbetween10hypercalls andsystemcall/resourcepairs

33

Hypercallwalltime

puts

poll

blkinfo

blkwrite

blkread

netinfo

netwrite

netread

halt

• 6forI/O• Network:packetlevel• Storage:blocklevel

• vs.>350syscalls

SystemCall Resourceclock_gettime

write stdoutppoll net_fd

pwrite64 blk_fdpread64 blk_fd

write net_fdread net_fdexit_group

Page 34: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

SOCC

34

Page 35: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Implementation:nabla 𝛁

35

• ExtendedSolo5unikernelecosystemandukvm

• Prototypesupports:• MirageOS• IncludeOS• Rumprun

• https://github.com/solo5/solo5

Page 36: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Measuringisolation:commonapplications

0 200 400 600 800

1000 1200 1400 1600

nginxnginx-large

node-express

redis-get

redis-set

Uni

que

kern

elfu

nctio

ns a

cces

sed process

ukvmnabla

36

• Codereachablethroughinterfaceisametricforattacksurface

• Usedkernelftrace

• Results:• Processes:5-6xmore• VMs:2-3xmore

Page 37: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Measuringisolation:fuzztesting

37

0

100

200

300

400

500

600

700

0 50 100 150 200 250 300

Uni

que

kern

el fu

nctio

ns

acceptnablablock

0

30

0 10

• Usedkernelftrace• Usedtrinitysystemcallfuzzer totrytoaccessmoreofthekernel

• Results:• Nabla policyreducesby98%overa“normal”process

Page 38: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Measuringperformance:throughput

80%

100%

120%

140%

160%

180%

200%

py_tornado

py_chameleon

node_fib

mirage_H

TTP

py_2to3

node_express

nginx_large

redis_get

redis_set

includeos_TCP

nginx

includeos_UD

P

Nor

mal

ized

thro

ughp

ut

245

no I/O with I/O

ukvmnablaQEMU/KVM

38

• Applicationsinclude:• Webservers• Pythonbenchmarks• Redis• etc.

• Results:• 101%-245%higherthroughputthanukvm

Page 39: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Measuringperformance:CPUutilization

0 20 40 60 80

100 120

(a)

CPU

%

0 20 40 60 80

100

(b)

VM

exits

/ms

0

0.5

1

1.5

0 5000 10000 15000 20000(c

) IP

C (i

ns/c

ycle

)Requests/sec

nablaukvm

39

• vmexits haveaneffectoninstructionspercycle

• ExperimentwithMirageOSwebserver

• Results:• 12%reductionincpuutilizationoverukvm

Page 40: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries

Measuringperformance:startuptime

0

250

500

750

Hello

world

QEMU/KVM

0

10

20

30ukvm

0

10

20

30nabla

0

10

20

30process

0

500

QEMU/KVM

ukvm

nabla

process

2 4 6 8 10 12 14 160

500

1000

1500

HTTP

POST

2 4 6 8 10 12 14 160

50

100

150

200

2 4 6 8 10 12 14 16

Number of cores

0

50

100

150

200

2 4 6 8 10 12 14 160

50

100

150

200

0 2 4 6 8 10 12 14

0

500

1000

1500

40

• Startuptimeisimportantforserverless,NFV

• Results:• Ukvm has30-370%higherlatencythannabla

• MostlydueavoidingKVMoverheads

Helloworld

HTTPPost