02 fundamentals

Fundamentals

CIS 552

Fundamentals

Low-level I/O (read/write using system calls)

Opening/Creating filesReading & Writing filesMoving around in files file descriptors

Program Creation and Execution exec family of system callsusing fork() to create processes exiting processesprocess termination

Low-level I/O

Unix System Calls can be used to: Open/Create Files Read & Write files Move around in files

System calls work with file descriptors A file descriptor is an int that represents a file/device File descriptors are allocated by the system

Limited number of file descriptors available 0 is stdin, 1 is stdout, 2 is stderr; these are automatically allocated

and opened when a program begins

Opening/Creating Files

open(<file/device name>,<how to open>[,<mode>]) <file/device name> is a C-string, exactly as in the C++ fstream member

function open() <how to open> is an int, usually specified with a constant defined in

fcntl.h. The values may be combined using a logical OR ( | ), allowing every possible way of opening a file (examples on next slide)

Detail on this appears on the next slide <mode> is an optional third argument that is specified only if the file is

created. It is a three digit octal number representing the permissions to be set on the file.

a file descriptor (of type int) is returned on success; otherwise –1 is returned.

Specifying How a File is to be OpenedFile Access

You must specify how the file is to be accessed. Exactly one of these values must appear in the 2nd argument of open():

O_RDONLY The file is opened for reading only. If the file doesn’t exist, and the create

option (next slide) isn’t specified, open() will fail (note that it is not ordinary to open a file to be read with the create option specified)

O_WRONLY The file is to be opened for writing only. If the file doesn’t exist, and the

create option is not specified, or if the file does exist, and the create option is specified along with the fail if file exists option, open() will fail.

O_RDWR The file is opened for full access. Errors specified above can occur,

however. If none of these are specified, open() can succeed, but nothing can be

done to the file. (See open2.c in the demo directory)

Specifying How a File is to be OpenedFile Opening Options

The following are used to further specify how a file is opened. They are combined with the access and possibly each other:O_APPEND

all access is at end of fileO_CREAT

create file if it doesn’t exist (does not cause failure if file exists). Use a third argument to specify mode.

O_TRUNC Destroy the existing contents of the file, if it exists.

O_EXCL When combined with O_CREAT, causes open() to fail if the file

already exists.

Specifying How a File is to be OpenedExercises/Examples

Open a file named abc.txt for writing, only if it exists. If it does, the present contents should be preserved, and writing should occur after the present contents.

Give an open() command equivalent to the C++ command Dest.open(“abc.txt”), where Dest is an ofstream

Give an open command that opens a file abc.txt for writing only if it doesn’t already exist.

Give command(s), to open a file abc.txt for writing. If abc.txt exists, the user should be prompted, and then the file opened for writing only if the user answers affirmatively to a prompt to overwrite the file. Otherwise, the file is created.

Repeat the previous exercise, using C++

Low-level Reading and Writing

Using low-level (system) calls, I/O can only be accomplished via a buffer, whose size is determined by the user.

small buffer requires more device accesses, slowing program; some systems override with their own buffer

Commands: read(<file descriptor>,<address of buffer>,<# bytes to read>) write(<file descriptor>,<address of buffer>,<# bytes to read>)

value returned is # bytes read; 0 => eof (read only), -1 => error

Moving Around in A File Low-level access via File Descriptor

lseek(<file descriptor>,<amount>,<relativity>)where <file descriptor> is a file descriptor of an open file <amount> is the number of bytes to move the file descriptor... <relativity> is the point from which the operation takes place

SEEK_SET (0) is start of file SEEK_CUR (1) is current position SEEK_END (2) is end of file

if <amount> is positive with this option, there will be a hole in the file.

the return value is the new offset in the file

Exercises

Give an lseek() command to find the present offset in a file How is the present offset in a file found when the file is

accessed via a file pointer? Give a loop to write the values 1-100 in the same spot of a

text file. Why is the seek command basically worthless in a text file,

except to go to the beginning or end of the file?

Program Creation and Execution

exec family of system calls using fork() to create processes process termination waiting for processes

exec Family of System Calls

Execute a program using one of the exec calls Six exec functions Can provide a list and have function build argv[] vector or provide the

vector itself Can specify path to program or use path variable to find it Can use present environment, or specify new environment

environment defined by setenv items. Type set at Unix prompt to see environment

called program occupies environment of caller (takes its pid, as new process isn’t created). Only replaces text, data, heap, and stack segments of caller of exec

execs are the only function calls that don’t return

execl & execv

execl give pathname to file (incl. name) as 1st argument Follow with list of command line arguments; executable 1st

this list is used to build the argv vector passed to the called program executable name is in argv[0]

last argument must be null char *

execv give pathname to file (incl. name) as 1st argument Follow with null-terminated argv vector

execlp & execvp

execlp give filename as 1st argument; path environment variable is

used to find executable Follow with list of command line arguments; executable 1st

this list is used to build the argv vector passed to the called program

last argument must be null char *

execvp give filename as 1st argument; path used Follow with null-terminated argv vector

Exercise

Suppose a program X is executed whose purpose is to call another program Y whose name and command-line arguments are passed in to X as command-line arguments.

1. Write an execlp and an execvp command, as they would appear in X, to execute Y.

2. What problem could arise if we choose to use execl or execv?

Process Creationfork & vfork

Processes can only be created using a fork command

fork is called once, but returns twice parent calls fork, value is returned to each of parent and

child

created process is called the child existing process is the parent Process identifier (pid) is used to distinguish

between parent and child Every process has a unique pid in the system

fork

Called once, returns twice; return type is pid_t, unsigned int type for pids

Parent receives child’s pid as return value of fork

child receives 0 as return value from fork Order of execution depends on scheduling algorithm of system.

Order can be controlled; important if there is dependence between parent and child

Child and parent execute in their own environments Child gets copy of parent’s environment (including data space) as it is at

time of fork If parent has file open, child gets copy of file pointers or descriptors If parent’s stdout (cout) is redirected before fork, child output to stdout is also

redirected

Child Process’ Inheritance

user id, group id controlling terminal current working directory root directory file mode creation mask signal mask (later) and dispositions close-on-exec flag for open file descriptors environment attached shared memory segments resource limits

Items That DistinguishParent From Child

Return value from fork unique pid different parent pid (ppid)

child’s ppid is the parent’s pid parent’s pid stays same

certain time values reset in child parent’s file locks aren’t inherited pending alarms are not inherited pending signals in parent aren’t inherited

vfork

Maintained for compatability, vfork is not often used Some systems implement it as a fork

Child executes first. Parent doesn’t resume until child either exits (via call of exit() or _exit()) or uses exec to start another program.

The child is a separate process, but it shares the parent’s data space. vforkdemo.c shows that this is the case

vforkdemo.c#include <stdio.h>#include <sys/types.h>#include <unistd.h>

int glob ;intmain(void){ int var; pid_t pid; glob = 10; var = 20; printf("before vfork\n");

if (vfork()== 0 ) { sleep(1); glob++; var++; printf("child glob= %d var= %d\n", glob, var); _exit(0); // what if this is just exit()? } else { printf("parent glob= %d var= %d\n", glob, var); }}

•Parent output is same as child’s

•If separate data spaces, parent wouldn’t experience ++ operations

When a process terminates...

Its parent is informed via a SIGCHLD signal If its parent doesn’t “collect” it, it becomes a zombie

process (next slide) If its parent has already terminated, it is inherited by the init

process (pid 1), which “collects” it using a wait operation (2 slides hence).

All processes must have an identifiable (by pid) parent

It is able to provide minimal information to its parent when it comes to “collect” it

Zombie Process

A zombie is a process that has terminated, but hasn’t been “collected” by its parent

The system frees most of the resources associated with this terminated process, e.g. closing open files and freeing memory,and minimally maintains: process id CPU time used termination status

wait operations

A terminated process’ information is “collected” via a call of a wait operation by its parent

Operations: wait

blocks the calling process until a child process terminates, returning the child’s pid. If caller has no children, wait immediately returns –1 (error)

the termination status (return value) of the child may be obtained via the argument

waitpid permits a caller to wait for a particular child, identified by its pid can be configured to not block if the child is active

Process Termination

Normal:

1. main contains return statement

2. call exit

3. call _exit

Abnormal:

1. Process aborts itself

2. Process receives a signal (e.g. division by 0)

exit() vs._exit()

exit() causes a process to terminate. stdio funtion fclose is applied to all open streams the argument is the exit status

exit status becomes termination status when exit() calls _exit()

_exit is a Unix specific command the process terminates immediately without any cleanup

see vfork examples

If a process doesn’t explicitly specify exit status via a call to exit() or _exit(), the process’ termination status will be undefined

Exercises

1. What can happen if a process calls exit while writing to a buffered output stream?

2. Write code guaranteed to create a zombie process.

3. Write code that will cause a process to be inherited by the init process.

02 fundamentals

Education

Transcript of 02 fundamentals