Post on 06-May-2018
See Docker from the Perspective of Linux Process
Allen Sun@DaoCloud Hangzhou Docker Meetup
2015.03.14
Agenda
1. Prerequisite
Linux Process (do_fork / copy_process )
Namespaces
2. How Docker deals process
dockerinit, ENTRYPOINT, CMD
syscall——fork() Process A
fork()
Process A continues
Process B
execev()
exit()
wait() ZOMBIE
SIGCHLD
clean up
Child - new PID
executes a different program !
Reference: http://www.lynx.com/the-fork-call-posix-processes-and-parent-child-relationships
Parent - original PID
do_fork do_fork
copy_process
determine PID
wake_up_new_task
wait_for_completion
copy_process
check flags
dup and init task_struct
check resource limit
copy/share process details
Reference:Mauerer W. Professional Linux kernel architecture[M] Figure 2-7 and Figure 2-8. John Wiley & Sons, 2010.
copy_semundo
copy_namespaces
……
set IDs, task relationships, etc.
……
struct nsproxy *nsproxy
struct task_struct
struct uts_namespace *uts_ns
struct nsproxy
struct mnt_namespace *mnt_ns
struct net *net_ns
struct uts_namespace
struct mnt_namespace
struct net
task_struct and namespaces
Nsproxy proxies 5 kinds of namespace for a process.
1.uts_namespace 2.mnt_namespace 3.pid_namespace 4.ipc_namespace 5.net
user_namespace is not in nsproxy! Based on Linux kernel 3.13
What is in namespaces? struct pid_namespace { … struct task_struct * child_reaper; … int level; struct pid_namespace *parent; };
struct mnt_namespace { atomic_t count; struct mount *root; struct list_head list; …… };
Based on Linux kernel 3.13
struct uts_namespace { struct kref kref; struct new_utsname name; struct user_namespace *user_ns; …… }
struct new_utsname { char sysname[..]; char nodename[..]; char release[..]; char version[..]; char machine[..]; char domainname[..]; }; ……
Docker? Where is Docker?
Docker Client
Docker Daemon
Docker Container
Docker Container
……
fork !
do_fork
copy_process
copy_namespaces
do_execve
Docker Container is born just by syscall fork and exec a process !
Difference (Docker’s fork vs normal fork)
Special flags used in syscall do_fork()
flag name Linux kernel version
CLONE_NEWNS 2.4.19
CLONE_NEWUTS 2.6.19
CLONE_NEWIPC 2.6.24
CLONE_NEWPID 2.6.24
CLONE_NEWNET 2.6.29
CLONE_NEWUSER 3.8
Namespaces in Docker func init() { namespaceList = Namespaces { {Key: "NEWNS", Value: syscall.CLONE_NEWNS, File: "mnt"}, {Key: "NEWUTS", Value: syscall.CLONE_NEWUTS, File: "uts"}, {Key: "NEWIPC", Value: syscall.CLONE_NEWIPC, File: "ipc"}, {Key: "NEWUSER", Value: syscall.CLONE_NEWUSER, File: "user"}, {Key: "NEWPID", Value: syscall.CLONE_NEWPID, File: "pid"}, {Key: "NEWNET", Value: syscall.CLONE_NEWNET, File: "net"}, } }
Based on libcontainer v1.2.0
USER_NAMESPACE: not fully implemented in Docker NET_NAMESPACE: not used in network mode “host” and ”other container”
What to Fork?
Docker Client
Docker Daemon
? ?
fork with flags!
…… Docker Container
fork Docker Container?
Docker Container == Process(es) ?
What Process to Fork?
Whatever! A process indeed.
Process is just forked, not execed yet.
Result is like below:
task_struct ready
namespaces ready
other resources ready
Process is still static, no program is running.
Then exec! exec what? Have you ever heard of
dockerinit, ENTRYPOINT or CMD in Docker?
name description
dockerinit init thing that first runs inside a new namespace to setup mount, net namespaces and other things.
ENTRYPOINT An ENTRYPOINT allows you to configure a container that will run as an executable
CMD The main purpose of a CMD is to provide defaults for an executing container.
Reference: https://docs.docker.com/reference/builder
Dockerinit, ENTRYPOINT, CMD
Docker Daemon
process
fork
exec
dockerinit ENTRYPOINT CMD
1. 2. 3.
new namespaces
init namespaces
the only process (same PID)
dockerinit
Docker Daemon and dockerinit
Docker Daemon
syncPipe
parent
child
Usage: coordnate the sequential of Docker Daemon and dockerinit.
Dockerinit will be blocked if nothing read in syncPipe.
Why ?
How to coordinate? Docker Daemon
dockerinit
1.Create Command The executable in container(dockerint)
2.Create syncPipe
3.Pass pipe to Child
4. command.start() Fork and exec the command
syncPipe(nothing) blocked
5. SetupCgroups syncPipe(nothing) blocked, controlled by cgroup
6. init network syncPipe(nothing) blocked, controlled by cgroup
7.Sync with Child syncPipe(has networkState) read from syncPipe
fork, new PID!
Based on libcontainer v1.2.0
How to coordinate? Docker Daemon dockerinit
1.SetupNetwork
2.SetupRoute
3.Init Mount ns
4.Apply apparmor
5.execv Entrypoint
Setup devices, mount points and fs
ENTRYPOINT exec, same PID!
exec, same PID! CMD
Finally, YOUR APP! 8.command.wait()
Based on libcontainer v1.2.0
x. execv Cmd
Docker Container
Docker Daemon
process
fork
exec
dockerinit ENTRYPOINT CMD (your application)
1. 2. 3.
new namespaces
init namespaces
the only process (same PID)
cgroups applied
Docker Container process process process
process
Why to Coordinate?
1. Docker Daemon needs to Synchronize with dockerinit.
block dockerinit so no children of dockerinit can escape from cgroups.
2. Can not switch namespace in Go runtime. blocked until Docker Daemon transfers network details that will be used
to setup network interface in newnet namespace.
Q&A
PRESENTATION TITLE
SPEAKER NAME
2014 / 12 /09
THANK YOU !
Email: allen.sun@daocloud.io weibo: @莲子弗如清 webchat: shlallen