2
A process as seen by the kernel :process table details task structure inodemax no of processes’ managementpid allocation related system calls :
o forko cloneo execo wait
File system calls :how open files are maintained by a process effect of fork and exec on the files and the task structure with respect
to opened filesopen and close calls and their effect in the kerneldiscussion on system calls like fcntl, flock, dup
COURSE DETAILS
3
Threads : creation communicationmaintenance by the kernel
Signals and their implementation in kernel :kill signalpause alarm
Ipc calls related to : shared memorypipes and fifos semaphoresmessage queuesData structures maintained by the kernel :
• system calls responsible for creation and destruction
4
Memory management : swapping and demand pagingpage tablesphysical and virtual memoryStress will be given to the effects on the shared memory.
Device drivers : registration loading and unloading of modulesmapping to file system functions interrupt handling and bottom halves
Socket programming
5
LINUX …
A Unix clone
Written from scratch by Linus Torvalds with assistance from a loosely - knit team of hackers across the Net.
Aims towards POSIX compliance.
6
INTRODUCTION TO LINUX KERNEL
Some of the salient features of the OS:•Multi user, multi processor, each user can execute several processes.•Machine architecture hidden from the user, thus making it an easy environment for programming.•Uses a hierarchical file system that allows easy maintenance and easy implementation•Uses a consistent format for files, byte streams etc, making application programs easier to write.•Supports multiple executable formats - like a.out, ELF, java)
7
SYSTEM OVERVIEW
User Applications
O/S Services
Linux Kernel
Hardware
User Applications - being used by a general user.O/S Services - various services lik vi, sh, who etc.Linux Kernel –the set of system calls and the various services the system provides to the applications.Hardware - the underlying hardware.
Linux kernel forms the heart of the OS and shall be the area discussed.
10
The following system calls are used by the programmer :
fork, clone : to create a new process
exec : to run a different executable on the same process
exit : to end the execution of a process.
wait : to wait for a child process to complete execution.
11
The fork system call :int fork( ) The fork system call creates a copy of the process that
executes the system call.
The process executing the call is the parent process.
The newly created process is the child process.
The fork system call is called once (in the parent) but it returns twice (once in the parent and once in the child).
In the parent it returns the pid of the child.
In the child it returns with value 0.
Process Description Horizon 12
In case of failure it returns –1.
The child is the copy of the parent, i.e., the same program starts executing as two different processes.
The address space of the parent is duplicated.
Parent and the child share the following : text region, opened files, pipes etc and have a unique copy of the data region and the stack.
Except for Process 0 all processes have a single immediate parent.
13
The files that were open in the parent process before fork are shared by the child process after fork.
The child process has a new and unique pid.
The child process has its own copies of the parent’s file descriptors.
The child’s memory pages are generated via copy-on-write
File locks and pending signals are not inherited.
The execution in the child continues from the point of fork and continues till the main function ends.
To obtain details on the complete state of the newly created child process, run info fork.
14
main( ) {int cpid, fd ; char ch[10] ;
fd = open(“TEST”, O_RDONLY) ; printf(“fd :: %d\n”, fd) ; read(fd, (void *)ch, 5) ; printf(“char read :: %c, %c\n”, ch[3], ch[4]) ;
if(cpid = fork()) {wait4(cpid, NULL, 0, NULL) ; printf(“par :: wait over\n”) ;read(fd, (void *)ch, 5) ; printf(“char read :: %c, %c\n”, ch[0], ch[1]) ;close(fd) ;}
else {printf(“i m chld\n”) ;read(fd, (void *)ch, 5) ; printf(“char read :: %c, %c\n”, ch[0], ch[1]) ;sleep(3) ; printf(“chld exiting\n”) ;close(fd) ; }
}
The following code shows how the child process is sharing the opened files of the parent process.
15
The clone system call :int clone(int (*fn)(void*arg),void* child_stack, int flags, void * arg) This creates a new process
The parent and child processes share the same memory pages, table of file descriptors and the table of the signal handlers.
The child process starts execution at the function fn ( args ) passed as argument to the system call.
The child process terminates when this function fn ends and the integer returned by fn is the exit code for the child process.
The child process may also terminate explicitly by calling exit( ) or after receiving a fatal signal.
16
The main use of clone is to implement threads – multiple threads of control in a program that run concurrently in a shared memory space.
The clone() system call leaves all memory management up to the programmer. The first thing to be done is allocating space for the stack of the new child thread with malloc().
NOTE : clone should not be used if you are writing portable code. it is a linux only system call. if you want portable threads, use a POSIX threads implementation such as LinuxThreads.]
Process Description Horizon 17
int create_thread(int stack_size, int (*start_routine)(void*), void* arg) { void* child_stack; int pid; if(!stack_size) stack_size = 16384; /* allocate some memory for a new stack */ if(!(child_stack = malloc(stack_size))) return 0;/* stacks grow downwards on 99% of linux implementations so point to end of it */ child_stack = (void*)(stack_size + (char*)child_stack); pid = clone(start_routine, child_stack,
CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND , arg); if(pid < 0) { /* failed so free the new stack */ child_stack = (void*)((char*)child_stack - stack_size); free(child_stack); } return pid; }
int thread(void* arg) { while(1) { printf("thread\n"); sleep(200); } }
int main(void) { create_thread(0, thread, NULL); while(1) { printf("main\n"); sleep(200); } }
Example : the following shows that the linux system call 'clone' can be used to create a thread.
Process Description Horizon 18
The exec system call :int execve ( char *filename, char * argv[ ], char * envp[ ]) This executes a new program.
It replaces the current program of the executing process with the new program.
It does not change the pid or the parent’s pid of the executing process.
Any signals that were set to terminate the original program will terminate the new program also and the signals that were ignored in the original program will be ignored by the new program also.
19
Any signals that were set to be caught in the original program are handled as per the default action in the program.This is because the new program does not contain the signal handler function as was defined in the old program.
20
Example :main( ){char *args[] = {“cp”, “new.c”, “dd.c”, NULL} ;
execvp(“cp”, args) ;printf(“exec fail\n”) ; }
21
main( ) {char *args[3] ;int i ;
args[0] = “./run” ;args[1] = “hello” ;args[2] = NULL ;printf(“before exec\n”) ;execvp(“./run”, args) ;printf(“exec failed\n”) ; }
Example :
compile to a.out
main( int argc, char **argv) {printf(“string sent is %s\n”, argv[1]) ;}
compile to run
22
The wait system call :pid_t wait4 ( pid_t pid, int * status, int options, struct rusage *rusage
This suspends execution of the current process until :a child as specified by the pid argument has exitedORan unignored signal is delivered.
If no child as specified by pid is existing at the time the call is made, then the function returns immediately.
It frees up the system resources that were used by the child.
23
Example : The following code shows use of fork and wait :main( ){int i, cpid ;if(cpid = fork()) {
wait4(cpid, NULL, 0, NULL) ;printf(“parent finished waiting\n”) ;
}else {
for(i = 0 ;i < 10; i++)printf(“i am child\n”) ; }
}
24
The following have been taken from man wait(2) :
A child that terminates, but has not been waited for, becomes a "zombie".
The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child.
As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes.
If a parent process terminates, then its "zombie" children are adopted by init process, which automatically performs a wait to remove the zombies.
25
The wait set of system calls are used to wait for state changes in a child of the calling process, and obtain information about the child whose state has changed.
Child's state change means :– child terminated– child stopped by a signal– child resumed by a signal.
If child is terminated, when parent wait( )s, the system releases the resources associated with the child.
Till the time a wait( ) is performed by the parent, the terminated child remains in a "zombie" state.
If a child has already changed state, these calls return immediately. Else they block the caller until either a child changes state or a signal handler interrupts the call.
26
Process priority : The nicer a process, the less CPU it will try to take from other
processes.Thus higher the nice value, the lower the priority of the process. nice() function is used to modify the nice value of the calling
process. Only the superuser may specify a negative increment, or
priority increase.Example :
#include <stdio.h>#include <sys/resource.h>main() { printf("%d\n", getpriority(PRIO_PROCESS, 0)) ;
nice(-2) ; printf("%d\n", getpriority(PRIO_PROCESS, 0)) ;}Execute the binary as a superuser. Eg, on ubuntu → sudo ./a.out
27
nice() becomes useful when :– several processes are demanding more resources than the
CPU can provide.– a higher priority process will get a larger chunk of the CPU
time than a lower priority process. If the CPU can deliver more resources than the processes are
requesting, then even the lowest priority process can get up to 99% of the CPU.
Nice value and static priority :Conventional process's static priority = (120 + Nice value)So user can use nice/renice command to change nice value in order
to change conventional process's priority.By default, conventional process starts with nice value of 0 which
equals static priority 120.
28
Scheduling priority depends on scheduling class.
Scheduling classes :- SCHED_FIFO: A First-In, First-Out real-time process- SCHED_RR: A Round Robin real-time process- SCHED_NORMAL: A conventional, time-shared processMost processes are SCHED_NORMAL
Using ps command, scheduling class can be obtained as follows :ps -o class,cmd
Interpretation :TS SCHED_OTHER (SCHED_NORMAL)FF SCHED_FIFORR SCHED_RR
29
Scheduling priorities :For real-time process (SCHED_FIFO/SCHED_RR) :
– it is the real-time priority– ranging from 1 (lowest priority) to 99 (higest priority).
For conventional process (SCHED_NORMAL ) :– It is dynamic priority which depends on →
• It is static priorityranging from 100 (highest priority) to 139 (lowest priority)
that is, (120 + (-20) ) to ( 120 + (19) )AND• Bonusranging from 0 to 10is set by schedulerdepends on the past history of the process – it is related to the
average sleep time of the process.
30
The ps command can be used to know the real-time priority and the dynamic priority of the processes as follows : ps -o class,rtprio,pri,nice,cmd
The sched_setscheduler( ) function can be used to set the scheduling policy and the associated parameters for the process.
For conventional processes, the policy could be :SCHED_OTHER the standard round-robin time-sharing policySCHED_BATCH for "batch" style execution of processesSCHED_IDLE for running very low priority background jobs
For real-time processes, policy could be :SCHED_FIFO a first-in, first-out policySCHED_RR a round-robin policy
In this case the sched_priority can be set to indicate real-time priority.
32
A process as an entity has the following features : It is the execution of a program.
It consists of a pattern of bytes (interpreted by the CPU as machine instructions ) – the text, data and stack.
It executes by following a strict sequence of instructions that is self contained.
It can read and write its own data and stack but cannot access the data and stack of other processes.
…contd.
General Features :
33
It has a life – spent partly in user mode and partly in system mode.
It is uniquely known by its process id ( pid). The pid remains the same throughout the life of the process.
It is a dynamic entity constantly changing as the machine code instructions are executed by the CPU.
It can communicate with the other processes running on the system
…contd.
34
It is allocated a special structure by the kernel (This is the structure task_struct).
It holds an entry in the task table
On Redhat 6.0 the limit on the total number of processes is 512, and the limit on the number of tasks per user is half of that.
Since Redhat 6.1, using the standard redhat supplied kernel, the total number of process is 2560, and the max per user is 2048.
(If you need to increase the limits, you will need to modify the /usr/src/linux/include/tasks.h file. The parameters to change are NR_TASKS and MAX_TASKS_PER_USER.) In all cases, the maximum value for these parameters is 4092.
35
The program executed by a process during its life can be changed.
Every process has a parent process.The parent of a process may change during the life of a process.
There exist some special processes in the system which are known as daemon processes.
36
Daemon Processes : They execute in the background without an associated
terminal or login shell.
They are started once when the system is initialized.
Their lifetime is the entire time that the system is operating, usually they do not die or get restarted later.
They spend most of their time waiting for some event to occur at which time they perform their service.
They frequently spawn other processes to handle service requests.
There exist some special conditions that must be taken care of while writing a daemon.
37
Rules while writing Daemons : All unnecessary file descriptors must be closed.
The current working directory must be changed to “/”.A daemon would not require for example stdin, stdout, stderr.
These file descriptors are inherited by the daemon from its parent by default.
The current working directory is also inherited by a process from its parent.
If the pwd of the daemon is a mounted filesystem then it can not be unmounted as long as the daemon is running.
The daemon must do a fork and have the parent exit allowing the actual daemon to run in the child process.
The daemon must be dissociated from its process group.
This is required so that the daemon does not receive the signals sent to the entire process group.
This is necessary so as to disassociate the daemon with a terminal. Thus it need not be started as a background process.An exception to this rule are the daemons
which execute a chdir to the directory where they do all the work.
The daemon must execute umask(0) to reset the file access creation mask.
A process inherits this mask from parent process.Why the need ?If the daemon created a file with mode 0660, so that only the user and the group could read and write the file, but the umask value was 060, the group read write permissions wuld be turned off.
38
Process States :
STATES IN WHICH A PROCESS COULD BE :1. The process is executing in USER MODE.
1. The process is executing in SYSTEM (or KERNEL) MODE.
1. The process is READY TO RUN waiting for the CPU.
1. The process is SLEEPING and is residing in the main memory.
1. The process is ready to run but the swapper must swap the process into main memory before it could be scheduled.
…contd.
39
1. The process is sleeping and the swapper has swapped it to secondary storage to make space for other processes.
1. The process is returning from the user to kernel mode when the kernel preempts it and does a context switch to schedule another process.
1. The process is newly CREATED. It is in a transition state – neither it is ready to run nor is it sleeping.
1. The process has exited and is in a ZOMBIE STATE.
40
8
2
9 7
34
56
1
fork
wakeup
Sleep, Swapped Ready to Run, Swapped
Created
Not enough memory(swapping)
Preempted
Swapout
Asleepinmemory
wakeup
Swapout
Swapin
Enough memory
Sleep
Zombie
User Running
Ready to Run in Memory
exitpreempt
reschedule process
return to user
KernelRunning
interrupt,interrupt return
return
sys call,interrupt`
41
A process is CREATED when a fork ( ) is done.
A process switches from the USER MODE to KERNEL MODE when a system call is executed.
A process executing in KERNEL MODE can be preempted only when it is about to return to the USER MODE
During the execution of a system call if a process has to wait for some system resource, e.g, for a disk I/O, the process goes in the SLEEP state.
This waiting / sleeping always occurs when the process is in the kernel mode.
Process State Transition Details :
42
Apart from this self giving up of the CPU by a process (while waiting for a system resource), Linux also uses pre-emptive scheduling.
Pre – emptive scheduling :a process executes for a small amount of timeafter this time another process is picked up to runthe original process waits till the CPU again selects it.
A runnable process is one waiting only for CPU to run.
The SCHEDULER selects the most deserving process to run out of all the runnable processes in the system.
43
Some Important Constants
HZ : no. of clock ticks received by the system in 1 second = 100
TASK_SIZE : user process space size = 0xC0000000
NR_TASKS : maximum number of processes in the system = 512 (in version 7) / 4098 (in version 9)
PID_MAX : maximum pid allocated to a process = 0x8000
44
TASK_RUNNING : process is runnable process may or may not be running many processes could be in TASK_RUNNING only one process is running at any given time (for single
CPU) and it is marked by the global variable current
TASK_INTERRUPTIBLE : process is sleeping process can be interrupted
Process State Related Constants :
45
TASK_UNINTERRUPTIBLE : process is sleeping process can not be interrupted
TASK_ZOMBIE : process has finished execution the parent of the process has not yet executed wait
46
SCHED_FIFO : first in first out scheduling policy
SCHED_RR : round robin scheduling policy
SCHED_OTH : any other scheduling policy
Scheduling Policy Related Constant
47
NSIG : total number of signals = 32 (in version 7)64 (in version 9)
Signal Processing Related Constants :
49
The task_struct structure(contents) state : TASK_RUNNING, TASK_INTERRUPTIBLE etc. priority :
priority given by scheduler to the process.it is the amount of time for which the process will run for
when allowed.effected by nice system call (a large nice value means a
low priority, priority can be incremented only by the superuser, others can use it to decrease the priority of processes
time_slice : it is the amount of time that the process is allowed to run
forit is set to priority when the process is first run and is
decremented on each clock tick.
50
unsigned long rt_priority : relative priority of a real time process
tarray_ptr : pointer to task[ ] array linkage used to release the task slot when the process dies
policy : scheduling policy (SCHED_FIFO etc)tty :
terminal to which the task is associated. if 0, impies no terminal e.g., for a daemon process
p_opptr : original parentp_pptr : parentp_cptr : youngest childp_ysptr : younger siblingp_osptr : older sibling
51
fs : filesystem information files : information of all the open files of the process. mm : memory management information pid pgrp sig : holds action to be taken on the various trapped signals. signal : holds the signal number information of a received
signal pdeath_signal :
•the signal send to the task when its original parent dies.•it is set by using the prctl system call, the value is cleared upon a fork
52
Related Global Data
task [ NR_TASKS ] : this task vector is an array of pointers to every
task_struct structure in the system task [ 0 ] is the idle task which gets called when no other
task can run. It can not be killed and never sleeps. Its state field is never used.
init_task is the run queue which contains pointers to only those tasks which are in the TASK_RUNNING state.
pidhash [ PIDHASH_SZ ] : maintains a hash table of pids of the tasks
53
What Happens During fork/cloneA new task_struct data structure is created from the system’s
physical memory.
The new task_struct is entered in the task vector.
The contents of the old process’s task_struct are copied in the cloned task_struct.
The pid for the new task is obtained.Pids keep increasing till the maximum limit is reached after which again the kernel starts allocating (released) pids from the beginning. For user processes the pids start from value 300 onwards. (All the lower pids are reserved for the daemon processes)
54
The new process shares the resources of the parent process : process’s files and file system information signal handlers virtual memory
All pending unhandled signals (inherited from the parent) are deleted for the child.
The start_time for the new process is set to jiffies (number of clock ticks since the system started)
The dynamic priority of the parent task ( held in counter ) is shared between parent and child tasks so that the total amount of dynamic prioirties in the system doesn’t change.
55
The following relationship related processing takes place : The parent pointers in the new task are set to current task. The child pointer of the new task is set to NULL. The new process has no younger sibling. the parent’s current youngest child becomes the older
sibling of the new process. The new process becomes parent’s youngest child.
The new process is added to the run queue and its state is marked as TASK_RUNNABLE. At this point the parent completes its part of fork( ) The child task is later scheduled by the normal scheduling
algorithm. It then returns from the fork ( ) function.
56
1st
fork
2nd
fork
3rd
fork4th
fork
p_cptr
p_osptrp_osptr
p_ysptr
Relationship between the parent task and the various child tasks
p_osptrp_osptr
NULL
p_ysptr p_ysptr
NULL
p_ysptr
57
kernel stack
kernel stack
open files
current directoryand root
open files
current directoryand root
files_struct
fs_struct
task_struct for parent
task_struct for child
mm_struct
mm_struct
shared text
user stack
user stack
childdata
parentdata
58
Difference Between fork and cloneBoth the system calls basically have the same implementation within the kernel.
The following differences are there between the two system calls :
clone allows the child process to share parts of its execution context with its parent eg, memory space, file descriptor table, signal handler table and the pid.
The clone system call is thus providing support for creating threads managed by the kernel while fork creates a new task.
59
mm
files
virtual memory areas
inodefile desc tbl
parent process
mm
files
file desc tbl
child process
60
What Happens During an exit
A task terminates by executing the exit( ) system call.
A process may invoke exit ( ) as follows : explicitly : the startup routine calls exit ( ) when the
program returns from the main function. implicitly : the kernel may invoke exit ( ) internally
for a process on receipt of an uncaught signal.
An exiting process enters the zombie state.
An exiting process relinquishes its resources.
The idle task task [ 0 ] with pid = 0 can not be killed
61
During a process exit the following takes place : All task related memory is freed :
All the open files of the task are closed by setting the file descriptor array to NULL
The task state is set to TASK_ZOMBIE.
All the closest relatives are informed of the death of the current task. It takes care of the following : for all the child tasks of the dying task, set parent to the
child_reaper ( the init process).
62
the parent task is notified of the death, the exit_signal of the dying task(as stored in the task_struct of the dying task) is send to the parent
if the parent has issued a wait for the child (about to die), then wake up the parent
Finally the scheduler is invoked to schedule another task.
During exit the dying process releases the memory it had acquired accept for the task structure and its entries in the task table and the pid hash table.
63
The wait system call
This system call must be executed by the parent process for every child in order to ensure that the child process does not eat up the system resources as Zombies.
During execution of wait either of the following is possible : the parent finds that there is a child that has already
finished execution, i.e., there is a child in zombie stateOR
all the children are currently executing.
In the first case the parent returns immediately releasing the system resources held by this zombie child
64
In the latter case the parent goes in an interruptible sleep and would be woken up when the child exits or some other signal is delivered to this parent process.
Release of system resources of the child implies the following : the slot occupied by the process in the task table is released
and marked as free the task entry is removed from the pid hash table remove the child process from the run queue remove the child process from its siblings list (this will
effect the older and younger siblings maintained by both the immediate siblings of this child process)
the task structure held by the process itself is released
65
What Happens During an exec
During an exec a new executable has to be loaded in place of the old one on the same process.
This happens by first flushing out the old executable. This includes : A private copy of the signal table is created. (During a
fork – creation of a process, the child shares the signal table of the parent.
The unhandled signals of the old executing program are cleared up.
The signal table is cleared up.
66
The old mmap stuff is released. The virtual memory maps of the old executable are set to NULL and the page tables are cleared up.
Those file descriptors in the file descriptor table are cleared for which the process has set the close-on-exec flag (This can be done through the fcntl system call).
The new executable is then loaded. For this the page tables are updated for the process and the new virtual memory map is generated.
67
Processing During System Startup
The kernel goes through the various initializations required for the different parts of the kernel.
Then when the kernel is ready to ‘move to user mode’ it starts the init process (details covered later) which has a pid = 1.
Then for pid = 0, the kernel executes the idle task. This is now the Process number 0, the idle task, which
keeps running in an infinite loop. Whenever there’s nothing else to do, the scheduler will
run this idle process.
68
The ‘init process’ execution starts as follows : starts the following daemon processes :
• bdflush• kpiod• kswapd
opens /dev/console in O_RDWR mode. As a result of this fd = 0 (stdin) gets associated with the process.
twice it dup()s fd = 0. Thus fd = 1 (stdout) and fd = 2 (stderr) also get associated with the process.
finally the /sbin/init program is exec( )ed.
69
Now that the init program has been exec()ed, the kernel has no direct control on the program flow.
Now the kernel proceeds with providing scheduling services amongst the others to the alive processes.
The flow of control can now be seen from the ‘process relationships fig’
For each terminal to be activated, init fork( )s a copy of itself.
Each of these children exec( ) the getty program.
70
After a user logs on to one of the terminals, the getty exec( )s the login program.
The login then fork( )s and exec( )s the /bin/sh
Now when a user command is entered, the sh first fork( )s a copy of itself and then exec( )s the program corresponding to the user command.
72
Exercises :
1) Write a program that behaves as follows :The program maintains a file “login” which contains the following information in the specified format -usr1 logged in at hh:mm:ss on DD’ MM YYusr1 logged out at hh:mm:ss on DD’ MM YYtotal time taken is hh:mm:ss
usr2 logged in at hh:mm:ss on DD’ MM YYusr2 logged out at hh:mm:ss on DD’ MM YY total time taken is hh:mm:ss
Initially the program waits at the following prompt - login :Whenever a user enters a name, the program proceeds. Whatever command the user enters (the user enters the command specifying the correct path) is executed by the program and then the program waits for the next user command. This continues till the user enters exit to exit.
73
On exit the log out information is stored in the “login” file and the program then waits for the next login.The program itself terminates when the user name is entered as exit.
1) Implement the system function.
1) Write a program in which a process creates 10 child processes. Each of the child process is started by passing the pid of its immediate older sibling – this child process displays the pid of its immediate older sibling.All the children should continue to run till the last child is created.The parent should wait for all its children to die out.
Top Related