Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU...

102
Overview Understanding Virtual memory, Process concepts, IPC, File Systems EXT2 Understanding Shell programming Understanding Boot process Understanding cross compilation and installing Linux installation on embedded hardware Understanding developing application for embedded systems Duration Five days - 40 hours (8hours a day) 50% of lecture, 50% of practical labs. Trainer http://www.linkedin.com/in/pravinkumarsinha Audience Professional Software developers People supporting embedded and medium scale products. Prerequisite Knowledge of c programming All examples are provided through c programming language. Knowledge of c programming is required. C training slides can browsed at http://www.minhinc.com/training/c/advance-c-slides.php Pdf document can be downloaded from http://www.minhinc.com/training/advance-c-slides.pdf Setup Ubuntu 16.0x LTS Raspberry Pi3 Linux Internals Training 5-day session © www.minhinc.com p1

Transcript of Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU...

Page 1: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Overview Understanding Virtual memory, Process concepts, IPC, File Systems EXT2Understanding Shell programmingUnderstanding Boot processUnderstanding cross compilation and installing Linux installation onembedded hardwareUnderstanding developing application for embedded systems

Duration Five days - 40 hours (8hours a day)50% of lecture, 50% of practical labs.

Trainer http://www.linkedin.com/in/pravinkumarsinha

Audience Professional Software developersPeople supporting embedded and medium scale products.

Prerequisite Knowledge of c programmingAll examples are provided through c programming language.Knowledge of c programming is required.C training slides can browsed athttp://www.minhinc.com/training/c/advance-c-slides.php

Pdf document can be downloaded fromhttp://www.minhinc.com/training/advance-c-slides.pdf

Setup Ubuntu 16.0x LTSRaspberry Pi3

Linux Internals Training5-day session

© www.minhinc.com p1

Page 2: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

Lecture - Introduction to Linux

GNU Project/GPL LicensingEvolution of Linux & Development ModelDevice Identities in Linux - PartitioningScheme/dev/filesMajor Minor device numbermknod system call

Lecture - Introduction to Kernel

History of LinuxTypes of KernelThe Linux kernelKernel Architecture

Lecture - Shell commands & Shell

Basic Shell commandsBash Shell Essentials- Introduction- Process- Redirection- Shell Programming- Programming Commands- Advance Shell Programming- Function- Array- I/O Redirection and file descriptor- Local and Global variables- Conditional ExecutionCreating Makefiles

Lecture

Lecture session will be course content presentation through the trainer.Any source code example related to the topic will be demonstrated, it wouldinclude executing the binaries. Complete lecture material can be downloaded fromhttp://www.minhinc.com/training/advance-li-slides.pdf

Labs

Lecture session will be course content presentation through the trainer.Any source code example related to the topic will be demonstrated, it wouldinclude executing the binaries.

© www.MinhInc.com p2

Page 3: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Afternoon

Lab

mknodWrite a Makefile to compile fileCreate a static library using MakefileCreate a dynamic library using MakefileWrite application using static library anddynamic library generated

Day 2 Morning

Lecture - Creating Libraries

Creating Static Library- Using Static LibraryCreating Shared Library- Using Shared Library

Lecture - The Boot Process

BIOS Level - Boot Loader - Setup- startup_32 functionsThe start_kernel() function

Day 2 Afternoon

Lab

Cross compiling kernelBoard Support packages

Day 3 Morning

Lecture - The File System

Virtual File system & its roleFiles associated with a processproc file systemSystem Calls

Lecture - Process Management

Process DefinedProcess Descriptor Structures in the kernelProcess StatesProcess SchedulingProcess CreationSystem calls related to process management

© www.MinhInc.com p3

Page 4: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Lecture - Memory Management

Defining and Creating secondary memory areasMemory allocation & deallocation system calls malloc,calloc,alloca, freeDemand Paging definedProcess Organization in MemoryAddress Translation and page fault handlingVirtual Memory Management

Day 3 Afternoon

Lab

Implement late bindingCreate hard linkCreate soft linkWrite a program to enumerate stat structurefor both hard link and soft link. Illustrate whichfield is differentCreate a child process and validates if allopen descriptors are copied to child process also. - Use file seek from parent and see child's descriptor also got seeked.

Day 4 Morning

Lecture - Multi Thread Programming

Creating multiple threadsParent synchronization with other Thread

Lecture - Inter process communication

PipesFifo'ssignalsSystem-V IPC's-Message queues - Shared memory - Semaphores

Lecture - Sockets

An OverviewSystem calls related to TCP and UDP socket

© www.MinhInc.com p4

Page 5: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 4 Afternoon

Lab

Write a multi threaded application and check ifglobal variables are shared. - Protect them using semaphores

Day 5 Morning

Lecture - Network Programming

TCP Server Client ProgrammingUDP Server Client ProgrammingNetlink socket interface

Lecture - Programming and Debugging tools

strace - Tracing System callsltrace : Tracing Library callsTools used to detect memory access errorand Memory leakage in Linux : mtraceUsing gdb and ddd utilitiesCore dump Analysis etc.

Lecture - Device Driver Introduction

IntroductionKernel modulesCharacter device driversBlock device driversHardware and Interrupt Handling

Day 5 Afternoon

Lab

© www.MinhInc.com p5

Page 6: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Linux Internals EssentialsLinux Internals Essenstials- Training Course

Minh, Inc.

DISCLAIMERDISCLAIMER

Text of this document is written in Bembo Std Otf(13 pt) font.Text of this document is written in Bembo Std Otf(13 pt) font.

Code parts are written in Consolas (10 pts) font.Code parts are written in Consolas (10 pts) font.

This training material is provided through This training material is provided through Minh, Inc., B'lore, India, B'lore, IndiaPdf version of this document is available at Pdf version of this document is available at http://www.minhinc.com/training/advance-li-slides.pdfFor suggestion(s) or complaint(s) write to us at For suggestion(s) or complaint(s) write to us at [email protected]

Document modified on Sep-30-2019 Document modified on Sep-30-2019

Document contains 100 pages.Document contains 100 pages.

Page 7: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

1. Introduction to LinuxGNU Project/GPL LicensingGNU Project/GPL LicensingEvolution of Linux & Development ModelEvolution of Linux & Development ModelDevice Identities in Linux-Partitioning SchemaDevice Identities in Linux-Partitioning Schema

Day 1 Morning

1. Introduction to LinuxGNU Project/GPL LicensingGNU Project/GPL Licensing

Evolution of Linux & Development ModelEvolution of Linux & Development ModelDevice Identities in Linux-Partitioning SchemaDevice Identities in Linux-Partitioning Schema

a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any changes.b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.)

* 1991: The Linux kernel is publicly announced on 25 August by the 21-year-old Finnish student Linus Benedict Torvalds.^[13]* 1992: The Linux kernel is relicensed under the GNU GPL. The first Linux distributions are created.* 1993: Over 100 developers work on the Linux kernel. With their assistance the kernel is adapted to the GNU environment, which creates a large spectrum of application types for Linux. The oldest currently (as of 2015) existing Linux distribution, Slackware, is released for the first time. Later in the same year, the Debian project is established. Today it is the largest community distribution.* 1994: Torvalds judges all components of the kernel to be fully matured: he releases version 1.0 of Linux. The XFree86 project contributes a graphical user interface (GUI). Commercial Linux distribution makers Red Hat and SUSE publish version 1.0 of their Linux distributions.* 1995: Linux is ported to the DEC Alpha and to the Sun SPARC.Over the following years it is ported to an ever greater number of platforms.* 1996: Version 2.0 of the Linux kernel is released. The kernel can now serve several processors at the same time using symmetric multiprocessing (SMP), and thereby becomes a serious alternative for many companies.* 1998: Many major companies such as IBM, Compaq and Oracle announce their support for Linux. The Cathedral and the Bazaar is first published as an essay (later as a book), resulting in Netscape publicly releasing the source code to its Netscape Communicator web browser suite. Netscape's actions and crediting of the essay^[50] brings Linux's open source development model to the attention of the popular technical press. In addition a group of programmers begins developing the graphical user interface KDE.* 1999: A group of developers begin work on the graphical environment GNOME, destined to become a free replacement for KDE, which at the time, depends on the, then proprietary, Qt toolkit. During the year IBM announces an extensive project for the support of Linux.* 2000: Dell announces that it is now the No. 2 provider of Linux-based systems worldwide and the first major manufacturer to offer Linux across its full product* 2002: The media reports that "Microsoft killed Dell Linux"^[52]* 2004: The XFree86 team splits up and joins with the existing X standards body to form the X.Org Foundation, which results in a substantially faster development of the X server for Linux.* 2005: The project openSUSE begins a free distribution from Novell's community. Also the project OpenOffice.org introduces version 2.0 that then started supporting OASIS OpenDocument standards.* 2006: Oracle releases its own distribution of Red Hat Enterprise Linux. Novell and Microsoft announce cooperation for a better interoperability and mutual patent protection.* 2007: Dell starts distributing laptops with Ubuntu pre-installed on them.* 2009: RedHat's market capitalization equals Sun's, interpreted as a symbolic moment for the "Linux-based economy".^[53]* 2011: Version 3.0 of the Linux kernel is released.* 2012: The aggregate Linux server market revenue exceeds that of the rest of the Unix market.^[54]* 2013: Google's Linux-based Android claims 75% of the smartphone market share, in terms of the number of phones shipped.^[55]* 2014: Ubuntu claims 22,000,000 users.^[56]* 2015: Version 4.0 of the Linux kernel is released.

© www.minhinc.com p6

Page 8: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

1. Introduction to LinuxGNU Project/GPL LicensingGNU Project/GPL Licensing

Evolution of Linux & Development ModelEvolution of Linux & Development Model

Device Identities in Linux-Partitioning SchemaDevice Identities in Linux-Partitioning Schema

Device comes in two flavours:- A character device represents a hardware device that reads or writes a serial stream of data bytes. Serial and parallel ports, tape drives, terminal devices, and sound cards.

-A block device represents a hardware device that reads or write data in fixed size blocks.unlike a character device, a block device provides random access to data stored on the device.a disk drive is an example of a block device.

Linux identifies devices using two numbers:the major device number and the minor device number.

Major device number generally identifies a driver where as minor number identifies devices controlled by the driver.so actual device is identified as major:minor combination. A device can be master and slave. master are identified with 1,2,3... and slaves as 65,66,67...

For each device there is a device file or device entry in the file system.cp rm mv commands works on device file as regular file.data transfer happens from actual device through device driver. use mknod to create file entry for the device.

$mknod ./lp0 c 6 0lp0 - path to the device filec - character device, b for block device6 - major device number, driver id0 - minor master device number

$ls -l lp0crw-r----- 1 root root 6, 0 Mar 7 17:03 lp0#include <stdio.h>int main(int argc, char *argv[]){stat("lp0")printf("file type");printf("major file number");printf("minor file number");return 0;}

© www.minhinc.com p7

Page 9: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

2. Introduction to KernelHistory of LinuxHistory of LinuxTypes of kernelTypes of kernel

The Linux KernelThe Linux Kernel

Kernel ArchitectureKernel Architecture

History- UNIX: 1969 Thompson & Ritchie AT&T Bell Labs.- BSD: 1978 Berkeley Software Distribution.- Commercial Vendors: Sun, HP, IBM, SGI, DEC.- GNU: 1984 Richard Stallman, FSF.- POSIX: 1986 IEEE Portable Operating System unIX.- Minix: 1987 Andy Tannenbaum.- SVR4: 1989 AT&T and Sun.- Linux: 1991 Linus Torvalds Intel 386 (i386).- Open Source: GPL.

Linux Features- UNIX-like operating system- Features: - Preemptive multitasking. - Virtual memory (protected memory, paging). - Shared libraries. - Demand loading, dynamic kernel modules. - Shared copy-on-write executables. - TCP/IP networking. - SMP support. - Open source.

What's a Kernel?- AKA: executive, system monitor.- Controls and mediates access to hardware.- Implements and supports fundamental abstractions: - Processes, files, devices etc.- Schedules / allocates system resources: - Memory, CPU, disk, descriptors, etc.- Enforces security and protection.- Responds to user requests for service (system calls).- Etc...

Kernel Design Goals- Performance: efficiency, speed. - Utilize resources to capacity with low overhead.- Stability: robustness, resilience. - Uptime, graceful degradation.- Capability: features, flexibility, compatibility.- Security, protection. - Protect users from each other & system from bad users.- Portability.- Extensibility.

Kernel Modules

© www.minhinc.com p8

Page 10: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

2. Introduction to KernelHistory of LinuxHistory of Linux

Types of kernelTypes of kernelThe Linux KernelThe Linux Kernel

Kernel ArchitectureKernel Architecture

Day 1 Morning

2. Introduction to KernelHistory of LinuxHistory of Linux

Types of kernelTypes of kernel

The Linux KernelThe Linux KernelKernel ArchitectureKernel Architecture

Types of Kernel- Monolithic.- Layered.- Modularized.- Micro-kernel.- Virtual machine.

A monolithic kernel is a kernel where all services (file system, VFS, device drivers, etc) as well as core functionality (scheduling, memory allocation, etc.) are a tight knit group sharing the same space. This directly opposes a microkernel.

A monolithic kernel is a kernel architecture where the entire operating system is working in the kernel space and alone as supervisor mode. In difference with other architectures,1 the monolithic kernel defines alone a high-level virtual interface over computer hardware, with a set of primitives or system calls to implement all operating system services such as process management, concurrency, and memory management itself and one or more device drivers as modules.

A microkernel prefers an approach where core functionality is isolated from system services and device drivers (which are basically just system services). For instance, VFS (virtual file system) and block device file systems (i.e. minixfs) are separate processes that run outside of the kernel's space, using IPC to communicate with the kernel, other services and user processes. In short, if it's a module in Linux, it's a service in a microkernel, indicating an isolated process.

Recent versions of Windows on the other hand use a Hybrid kernel.

A hybrid kernel is a kernel architecture based on combining aspects of microkernel and monolithic kernel architectures used in computer operating systems. The category is controversial due to the similarity to monolithic kernel; the term has been dismissed by some as simple marketing. The traditional kernel categories are monolithic kernels and microkernels (with nanokernels and exokernels seen as more extreme versions of microkernels).

The Linux Kernel-Monolithic

© www.minhinc.com p9

Page 11: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

2. Introduction to KernelHistory of LinuxHistory of Linux

Types of kernelTypes of kernel

The Linux KernelThe Linux Kernel

Kernel ArchitectureKernel Architecture

Linux Source Tree

linux/arch- Subdirectories for each current port.- Each contains kernel, lib, mm, boot and other directories whose contents override code stubs in architecture independent code.- lib directory contains highly-optimized common utility routines such as memcpy, checksums, etc.- arch directory as of 2.4: - alpha, arm, i386, ia64, m68k, mips, mips64. - ppc, s390, sh, sparc, sparc64.

linux/drivers- Largest amount of code in the kernel tree (~1.5M).- device, bus, platform and general directories.- drivers/char - n_tty.c is the default line discipline.- drivers/block - elevator.c, genhd.c, linear.c, ll_rw_blk.c, raidN.c.- drivers/net - specific drivers and general routines Space.c and net_init.c.- drivers/scsi - scsi_*.c files are generic; sd.c (disk), sr.c (CD- ROM), st.c (tape), sg.c (generic).- General: - cdrom, ide, isdn, parport, pcmcia, pnp, sound, telephony, video.- Buses - fc4, i2c, nubus, pci, sbus, tc, usb.- Platforms - acorn, macintosh, s390, sgi.

linux/fs- Contains: - virtual filesystem (VFS) framework. - subdirectories for actual filesystems.- vfs-related files: - exec.c, binfmt_*.c - files for mapping new process images. - devices.c, blk_dev.c - device registration, block device support. - super.c, filesystems.c. - inode.c, dcache.c, namei.c, buffer.c, file_table.c. - open.c, read_write.c, select.c, pipe.c, fifo.c. - fcntl.c, ioctl.c, locks.c, dquot.c, stat.c.

linux/include- include/asm-*: - Architecture-dependent include subdirectories.- include/linux:

© www.minhinc.com p10

Page 12: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

- Header info needed both by the kernel and user apps. - Usually linked to /usr/include/linux. - Kernel-only portions guarded by #ifdefs - #ifdef __KERNEL__ - /* kernel stuff */ - #endif- Other directories: - math-emu, net, pcmcia, scsi, video.

linux/init- Just two files: version.c, main.c.- version.c - contains the version banner that prints at boot.- main.c - architecture-independent boot code.- start_kernel is the primary entry point.

linux/ipc- System V IPC facilities.- If disabled at compile-time, util.c exports stubs that simply return -ENOSYS.- One file for each facility: - sem.c - semaphores. - shm.c - shared memory. - msg.c - message queues.

linux/kernel- The core kernel code.- sched.c - "the main kernel file": - scheduler, wait queues, timers, alarms, task queues.- Process control: - fork.c, exec.c, signal.c, exit.c etc...- Kernel module support: - kmod.c, ksyms.c, module.c.- Other operations: - time.c, resource.c, dma.c, softirq.c, itimer.c. - printk.c, info.c, panic.c, sysctl.c, sys.c.

linux/lib- kernel code cannot call standard C library routines.- Files: - brlock.c - "Big Reader" spinlocks. - cmdline.c - kernel command line parsing routines. - errno.c - global definition of errno. - inflate.c - "gunzip" part of gzip.c used during boot. - string.c - portable string code. - Usually replaced by optimized, architecture- dependent routines. - vsprintf.c - libc replacement.

linux/mm- Paging and swapping: - swap.c, swapfile.c (paging devices), swap_state.c (cache). - vmscan.c - paging policies, kswapd. - page_io.c - low-level page transfer.- Allocation and deallocation: - slab.c - slab allocator. - page_alloc.c - page-based allocator. - vmalloc.c - kernel virtual-memory allocator.- Memory mapping: - memory.c - paging, fault-handling, page table code. - filemap.c - file mapping. - mmap.c, mremap.c, mlock.c, mprotect.c.

linux/scripts- Scripts for: - Menu-based kernel configuration. - Kernel patching. - Generating kernel documentation.

© www.minhinc.com p11

Page 13: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

3. Shell commands & ShellBasic Shell commandsBasic Shell commandsBash Shell EssentialsBash Shell Essentials- Introduction- Introduction - Process - Process

-Shell Programming-Shell Programming

- Shell Programming - Shell Programming

-Programming commands-Programming commands

- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution

Creating MakefilesCreating Makefiles

Shell structureShell scripting has four components1) Kernel2) Shell Process3) Command Process4) Redirectors, Pipes, Filters etc.

Kernel does- I/O management- Process management- File management- Memory management

----------- ----------------- -------------| User | ------> | Linux Shell | ---------> | Kernel |----------- ----------------- ------------- | V ------------------- | command process | -------------------

Shells

NOTE: To find your shell type following command$ echo $SHELL

Linux Common Commands© www.minhinc.com p12

Page 14: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

3. Shell commands & ShellBasic Shell commandsBasic Shell commands

Bash Shell EssentialsBash Shell Essentials- Introduction- Introduction

- Process- Process-Shell Programming-Shell Programming

- Shell Programming - Shell Programming

-Programming commands-Programming commands

- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution

Creating MakefilesCreating Makefiles

$ date --help

$ ls --help | moreSyntax: command-name --helpSyntax: man command-nameSyntax: info command-name

$ man ls$ info bashNOTE: In MS-DOS, you get help by using /? clue or by typing help command asC:\> dir /?C:\> date /?C:\> help timeC:\> help dateC:\> help

Linux Command$ date$ who$ pwd$ ls$ cat > myfile$ more myfile$ mv sales

$ ln Page1 Book1$ rm myfile$ rm -rf oldfiles$ chmod u+x,g+wx,o+x myscript$ mail$ who am i$ logout$ mail ashish$ wc myfile$ grep fox$ sort myfile$ tail +5 myfile$ cmp myfile$ pr myfile

ProcessA process is program (command given by user) to perform some Job. In Linux when you start process, it gives a number (called PID or process-id), PID starts from 0 to 65535.$ ls -lR , is command or a request to list files in a directory and all sub directory in your current directory.

Why Process requiredLinux is multi-user, multitasking o/s. It means you can run more than two process simultaneously if you wish. For e.g.. To find how many files do you have on your system you may give command like$ ls / -R | wc -lThis command will take lot of time to search all files on your system. So you can run such command in Background or simultaneously by giving command like$ ls / -R | wc -l &The ampersand (&) at the end of command tells shells start command (ls / -R | wc -l) and run it in background takes next command immediately. An instance of

© www.minhinc.com p13

Page 15: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

3. Shell commands & ShellBasic Shell commandsBasic Shell commands

Bash Shell EssentialsBash Shell Essentials- Introduction- Introduction

- Process - Process

-Shell Programming-Shell Programming - Shell Programming - Shell Programming

-Programming commands-Programming commands

- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution

Creating MakefilesCreating Makefiles

running command is called process and the number printed by shell is called process-id (PID), this PID can be use to refer specific running process.

Redirection of Standard output/input or Input - Output redirection(1) > Redirector Symbol (Truncate to zero and write)Syntax: Linux-command > filename$ ls > myfiles(2) >> Redirector Symbol (Append)Syntax: Linux-command >> filename$ date >> myfiles(3) < Redirector SymbolSyntax: Linux-command < filenameTo take input to Linux-command from file instead of key-board. For e.g. To take input for cat command give$ cat < myfiles

PipesA pipe is a way to connect the output of one program to the input of another program without any temporary file.

A pipe is nothing but a temporary storage place where the output of one command is stored and then passed as the input for second command. Pipes are used to run more than two commands ( Multiple commands) from same command line.Syntax: command1 | command2

FilterA filter command takes input from a pipe and constricts the output of the previous program.$ tail +20 < hotel.txt | head -n30 >hlistHere head is filter which takes its input from tail command (tail command startselecting from line number 20 of given file i.e. hotel.txt) and passes this lines toinput to head, whose output is redirected to 'hlist' file.

© www.minhinc.com p14

Page 16: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

3. Shell commands & ShellBasic Shell commandsBasic Shell commands

Bash Shell EssentialsBash Shell Essentials- Introduction- Introduction

- Process - Process

-Shell Programming-Shell Programming

- Shell Programming- Shell Programming-Programming commands-Programming commands

- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution

Creating MakefilesCreating Makefiles

Introduction to Shell ProgrammingShell program is series of Linux commands.

Variables in LinuxSometimes to process our data/information, variables are remembered by shell Process.

1) System variables - Created and maintained by Linux itself. This type of variable defined in CAPITAL LETTERS.2) User defined variables (UDV) - Created and maintained by user. This type of variable defined in lower LETTERS.

$ echo $USERNAME$ echo $HOMECaution: Do not modify System variable this can some time create problems.

User Defined Variable

Syntax: variablename=valueNOTE: Here 'value' is assigned to given 'variablename' and Value must be on right side = sign Fore.g.$ no=10 # this is ok$ 10=no # Error, NOT Ok, Value must be on right side of = sign.To define variable called 'vech' having value Bus

$ vech=BusTo define variable called n having value 10$ n=10

You can define NULL variable as follows (NULL variable is variable which has no value at the time of definition) For e.g.$ vech=$ vech=""Try to print it's value $ echo $vech , Here nothing will be shown because variable has no value i.e. NULL variable.

To print or access variables use following syntaxSyntax: $variablenameFor eg. To print contains of variable 'vech'$ echo $vech

How to Run Shell Scripts(1) Use chmod command as follows to give execution permission to our scriptSyntax: chmod +x shell-script-name © www.minhinc.com p15

Page 17: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

OR Syntax: chmod 777 shell-script-name(2) Run our script asSyntax: ./your-shell-program-nameFor e.g.$ ./first

OR /bin/sh your-shell-program-nameFor e.g.$ bash first$ /bin/sh first

Script file name complete path is required OR PATH variable needs to be set.To run the script, file name complete path is required

OR PATH variable needs to be set.

Commands Related with Shell Programming(1)echo [options] [string, variables...]Displays text or variables value on screen.Options-n Do not output the trailing new line.-e Enable interpretation of the following backslash escaped characters in the strings:\a alert (bell)\b backspace\c suppress trailing new line

new line\r carriage return\t horizontal tab\\ backslashFor eg. $ echo -e "An apple a day keeps away \a\t\tdoctor"

(2)More about QuotesThere are three types of quotes" i.e. Double Quotes' i.e. Single quotes` i.e. Back quote1."Double Quotes" - Anything enclose in double quotes removed meaning of that characters (except \ and $).2. 'Single quotes' - Enclosed in single quotes remains unchanged.3. `Back quote` - To execute command.For eg.$ echo "Today is date"Can't print message with today's date.$ echo "Today is `date`".Now it will print today's date as, Today is Tue Jan ....,See the `date` statement uses back quote,(See also Shell Arithmetic NOTE).

3) Shell ArithmeticUse to perform arithmetic operations For e.g.$ expr 1 + 3$ expr 2 - 1$ expr 10 / 2$ expr 20 % 3 # remainder read as 20 mod 3 and remainder is 2)$ expr 10 \* 3 # Multiplication use \* not * since its wild card)$ echo `expr 6 + 3`For the last statement note the following points1) First, before expr keyword we used ` (back quote) sign not the (single quote i.e. ') sign. Backquote is generally found on the key under tilde (~) on PC keyboards OR To the above of TAB key.2) Second, expr is also end with ` i.e. back quote.3) Here expr 6 + 3 is evaluated to 9, then echo command prints 9 as sum4) Here if you use double quote or single quote, it will NOT work, For eg.$ echo "expr 6 + 3" # It will print expr 6 + 3$ echo 'expr 6 + 3'

Command Line arguments$ myshell foo bar

© www.minhinc.com p16

Page 18: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Command line arguments or Function arguments

Exit StatusBy default in Linux if particular command is executed, it return two type of values,if return value is zero (0), command is successfulIf return value is nonzero (>0), command is not successful or some sort of error executing command/shell script.This value is know as Exit Status of that command.To determine this exit Status we use $? variable of shell. For eg.$ rm unknow1filerm: cannot remove 'unkowm1file': No such file or directoryand after that if you give command $ echo $?it will print nonzero value(>0) to indicate error. Now give command$ ls$ echo $?It will print 0 to indicate command is successful.

© www.minhinc.com p17

Page 19: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

3. Shell commands & ShellBasic Shell commandsBasic Shell commands

Bash Shell EssentialsBash Shell Essentials- Introduction- Introduction

- Process - Process

-Shell Programming-Shell Programming

- Shell Programming - Shell Programming

-Programming commands-Programming commands- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution

Creating MakefilesCreating Makefiles

If-then-fi for decision making is shell script$ bcfollows type 5 + 2 as5+277 is response of bc i.e. addition of 5 + 2 you can even try5-25/2Now what happened if you type 5 > 2 as follows5>20

Syntax:if condition then command1 if condition is true or if exit status of condition is 0 (zero) ... ...fi

$ cat > showfile#!/bin/sh##Script to print file#if cat $1then echo -e "

File $1, found and successfully echoed"fi

test command or [ expr ]test command or [ expr ] is used to see if an expression is true, and if it is true it return zero(0),otherwise returns nonzero(>0) for false. Syntax: test expression OR [ expression ]Now will write script that determine whether given argument number is positive. Write script as follows$ cat > ispostive#!/bin/sh## Script to see whether argument is positive#if test $1 -gt 0then echo "$1 number is positive"fi

Or

$ cat > ispostive#!/bin/sh## Script to see whether argument is positive#If [ test $1 -gt 0 ]

© www.minhinc.com p18

Page 20: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

then echo "$1 number is positive"fi

test or [ expr] works with1.Integer ( Number without decimal point)2.File types3.Character stringsFor Mathematics use following operator in Shell Script

NOTE: == is equal, != is not equal.For string Comparisons use

Shell also test for file and directory types

if...else...fiIf given condition is true then command1 is executed otherwise command2 is executed.Syntax:if conditionthen command1 if condition is true or if exit status of condition is 0(zero) ... ...else command2 if condition is false or if exit status of condition is >0 (nonzero) ... ...fi$ cat > isnump_n#!/bin/sh# Script to see whether argument is positive or negative#if [ $# -eq 0 ]then echo "$0 : You must give/supply one integers" exit 1fiif test $1 -gt 0then echo "$1 number is positive"else echo "$1 number is negative"

© www.minhinc.com p19

Page 21: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

fiMultilevel if-then-elseSyntax:if conditionthen condition is zero (true - 0) execute all commands up to elif statementelif condition1 condition1 is zero (true - 0) execute all commands up to elif statementelif condition2 condition2 is zero (true - 0) execute all commands up to elif statementelse None of the above condtion,condtion1,condtion2 are true (i.e. all of the above nonzero or false) execute all commands up to fifi

for loop Syntax:

for { variable name } in { list } do execute one for each item in the list until the list is not finished (And repeat all statement between do and done) done

Suppose,$ cat > testforfor i in 1 2 3 4 5do echo "Welcome $i times"doneRun it as,$ chmod +x testfor$ ./testfor

while loopSyntax:while [ condition ]do command1 command2 command3 .. ....done

$cat > nt1#!/bin/sh#Script to test while statementif [ $# -eq 0 ]then echo "Error - Number missing form command line argument" echo "Syntax : $0 number" echo " Use to print multiplication table for given number" exit 1fin=$1i=1while [ $i -le 10 ]do echo "$n * $i = `expr $i \* $n`" i=`expr $i + 1`done

The case StatementThe case statement is good alternative to Multilevel if-then-else-fi statement. It enable you to match several values against one variable. Its easier to read and write.Syntax:case $variable-name inpattern1) command .. command;;pattern2) command .. command;;patternN) command .. command;; *) command .. command;;esac

© www.minhinc.com p20

Page 22: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

The $variable-name is compared against the patterns until a match is found. The shell then executes all the statements up to the two semicolons that are next to each other. The default is *) and its executed if no match is found. For eg. Create script as follows$ cat > car## if no vehicle name is given# i.e. -z $1 is defined and it is NULL## if no command line argif [ -z $1 ]then rental="*** Unknown vehicle ***"elif [ -n $1 ]then# otherwise make first arg as rental rental=$1ficase $rental in "car") echo "For $rental Rs.20 per k/m";; "van") echo "For $rental Rs.10 per k/m";; "jeep") echo "For $rental Rs.5 per k/m";; "bicycle") echo "For $rental 20 paisa per k/m";; *) echo "Sorry, I can not gat a $rental for you";;esac

Save it by pressing CTRL+D$ chmod +x car$ car van$ car car$ car Maruti-800

The read StatementUse to get input from keyboard and store them to variable.Syntax: read varible1, varible2,...varibleNCreate script as$ cat > sayH##Script to read your name from key-board#echo "Your first name please:"read fnameecho "Hello $fname, Lets be friend!"Run it as follows$ chmod +x sayH$ ./sayH

Filename Shorthand or meta Characters (i.e. wild cards)

* or ? or [...] is one of such shorthand character.* Matches any string or group of characters.For e.g. $ ls * , will show all files, $ ls a* - will show all files whose first name is starting with letter'a', $ ls *.c ,will show all files having extension .c $ ls ut*.c, will show all files having extension .c but first two letters of file name must be 'ut'.? Matches any single character.For e.g. $ ls ? , will show one single letter file name, $ ls fo? , will show all files whose names are 3 character long and file name begin with fo[...] Matches any one of the enclosed characters.For e.g. $ ls [abc]* - will show all files beginning with letters a,b,c[..-..] A pair of characters separated by a minus sign denotes a range;For eg. $ ls /bin/[a-c]* - will show all files name beginning with letter a,b or c like/bin/arch /bin/awk /bin/bsh /bin/chmod /bin/cp/bin/ash /bin/basename /bin/cat /bin/chown /bin/cpio/bin/ash.static /bin/bash /bin/chgrp /bin/consolechars /bin/csh

But

$ ls /bin/[!a-o]$ ls /bin/[^a-o]

command1;command2To run two command with one command line.For eg. $ date;who ,Will print today's date followed http://www.freeos.com/guides/lsst/shellprog.htm (18 of 19) [17/08/2001 17.42.21] Linux Shell Script Tutorialby users who are currently login.

© www.minhinc.com p21

Page 23: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

3. Shell commands & ShellBasic Shell commandsBasic Shell commands

Bash Shell EssentialsBash Shell Essentials- Introduction- Introduction

- Process - Process

-Shell Programming-Shell Programming

- Shell Programming - Shell Programming

-Programming commands-Programming commands

- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional ExecutionCreating MakefilesCreating Makefiles

/dev/null - Use to send unwanted output of programSyntax: command > /dev/nullFor e.g. $ ls > /dev/null , output of this command is not shown on screen its send to this special file. The /dev directory contains other device files. The files in this directory mostly represent peripheral devices such disks liks floppy disk, sound card, line printers etc.local and Global Shell variable (export command)Normally all our variables are local. Local variable can be used in same shell, if you load another copy of shell (by typing the /bin/bash at the $ prompt) then new shell ignored all old shell's variable. For e.g.Consider following example$ vech=Bus$ echo $vechBus$ /bin/bash$ echo $vechNOTE:-Empty line printed$ vech=Car$ echo $vechCar$ exit$ echo $vech

Syntax: export variable1, variable2,.....variableNFor e.g.$ vech=Bus$ echo $vechBus$ export vech$ /bin/bash$ echo $vechBus$ exit$ echo $vech

Conditional execution i.e. && and ||The control operators are && (read as AND) and || (read as OR). An AND list has theSyntax: command1 && command2Here command2 is executed if, and only if, command1 returns an exit status of zero. An OR list has theSyntax: command1 || command2Here command2 is executed if and only if command1 returns a non-zero exit status. You can use both as followscommand1 && comamnd2 if exist status is zero || command3 if exit status is non-zeroHere if command1 is executed successfully then shell will run command2 and if command1 is not successful then command3 is executed. For e.g.$ rm myf && echo File is removed successfully || echo File is not removedIf file (myf) is removed successful (exist status is zero) then "echo File is removed successfully" statement is executed, otherwise "echo File is not removed" statement is executed (since exist status is non-zero)

Functions © www.minhinc.com p22

Page 24: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Function is series of instruction/commands. Function performs particular activity in shell. To define function use followingSyntax:function-name ( ){ command1 command2 ..... ... commandN return}

Where function-name is name of you function, that executes these commands. A return statement will terminate the function. For e.g. Type SayHello() at $ prompt as follows$ SayHello(){echo "Hello $LOGNAME, Have nice computing"return}$ SayHelloHello xxxxx, Have nice computingEdit /etc/bashrc (as root) or ~/.bashrc for executing function at login time.

I/O Redirection and file descriptors$ cat > myf This is my file ^DAbove command send output of cat command to myf file. Redirection can be used to send output to stderr, stdout and can be used to read input for stdin files

[sc@localhost ~]$ rm > tmp1rm: missing operandTry 'rm --help' for more information.[sc@localhost ~]$ cat tmp1[sc@localhost ~]$ rm > tmp1 2>&1[sc@localhost ~]$ cat tmp1rm: missing operandTry 'rm --help' for more information.[sc@localhost ~]$

ArrayArrays are define asar=(one two three)

for i in 2 4 5 6; dodone

for i in {1..6}; dodone

${ar[1]} ${ar[2]} ...

$ar[*] or $ar[@] # for list

${#ar[*]} # for number of elements

for i in ${ar[@]}; dodone

© www.minhinc.com p23

Page 25: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 1 Morning

3. Shell commands & ShellBasic Shell commandsBasic Shell commands

Bash Shell EssentialsBash Shell Essentials- Introduction- Introduction

- Process - Process

-Shell Programming-Shell Programming

- Shell Programming - Shell Programming

-Programming commands-Programming commands

- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution

Creating MakefilesCreating Makefiles

Constituents of a make file* Rules* Variables* Directives - Inclusion of another make - Conditional directives* Comments - Text that follows # symbol is treated as comment - To include # literally, prefix with \

RulesSyntaxtarget1 [target2] : [prerequisite1] [prerequisite2]<TAB>command-1<TAB>command-2

* Explicit rule - explicitly specify the prerequisites for a specific target* Implicit rules - Take advantage of the knowledge make has about known patterns of files (e.g., .c, .cpp .o, .s) - Further classified into pattern rules & suffix rules

VariablesPredefinedo Some commonly used variables predefined by GNU make CC , FLAGS , CFLAGS, LDFLAGS, $@, $^, $<

$@ name of the target foo1.o: foo1.c foo1.h$< name of the first prerequisite gcc -c $<$^ names of all prerequisites

foo: foo1.o foo2.o gcc -o $@ $^

foo: foo1.o foo2.o gcc -o foo foo1.o foo2.o

foo2.o: foo2.c foo2.h gcc -c $<

© www.minhinc.com p24

Page 26: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

foo2.o: foo2.c foo2.h foo1.h gcc -c foo2.c

User definedABC:=10 # const assigmentABC=10 # non const assignment

Command line variablesVariables can be defined or redefined from command line$ make$ make VAR1=abc VAR2=xyz

Use override directive to let undesirable command line redefines for a variable be ignoredex.VAR1=dummyVAR2=All: echo VAR1 = $(VAR1) echo VAR2 = override $(VAR2) VAR1=dummy

Conditional assingmentARCH ?= x86AppendSRC += x.c

Implicit rules

Wildcardfoo: *.ogcc -o $@ $^X

FunctionsGeneral syntax $(function-name arg1[,argn]) SRC := x.c y.c z.c* String functions- $(subst search-str,replace-str,text) OBJS := $(subst .c,.o,$(SRC))- $(patsubst search-pat,replace-pat,text) OBJS := $(patsubst %.c, obj/%.o, x.c y.c z.c)

* Warning function - Very useful for debugging - Can be placed anywhere in a makefile $(warning TARGET not defined) outputs in the format <filename>:<linenum>:TARGET not defined* Shell function - Can be used to invoke any external program today := $(shell date)

Wildcard functionSRCS := $(wildcard *.c)OBJS := $(subst .c,.o,$(SRCS))

foo: $(OBJS) gcc -c $< cc -o $@ $^foo2.o: foo2.c foo2.h foo1.h gcc -c $<foo1.o: foo1.c foo1.h gcc -c $<

Pattern rulefoo : foo1.o foo2.o g++ -o $@ $^

© www.minhinc.com p25

Page 27: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

foo2.o: foo2.h foo1.hfoo1.o: foo1.h

# pattern rule for .cpp to .o %.o : %.cpp g++ -c $<

More advanced %.o:%.c $(COMPILE.c) $(OUTPUT_OPTION) $<where COMPILE.c =$(CC) $(CFLAGS) $(CPPFLAGS ) $(TARGET_ARCH) -c CC =cc OUTPUT_OPTION =-o $@

conditionalsconditional-directive text-if-trueendif

conditional-directive text-if-trueelse text-if-falseendif

Conditional directives - ifeq - ifneq - ifdef variable-name - ifndef variable-name

© www.minhinc.com p26

Page 28: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 2 Morning

4. Creating LibrariesCreating Static LibraryCreating Static Library- Using Static Library- Using Static LibraryCreating Shared LibraryCreating Shared Library -Using Shared Library -Using Shared Library

Day 2 Morning

4. Creating LibrariesCreating Static LibraryCreating Static Library - Using Static Library - Using Static Library

Creating Shared LibraryCreating Shared Library

-Using Shared Library-Using Shared Library

Referhttp://www.minhinc.com/training/cpp/advance-cpp-slides.php#chap1_7

Dynamic Loading and UnloadingThis functionality is available under Linux by using the dlopen function. dlopen ("libtest.so", RTLD_LAZY)

The second parameter is a flag that indicates how to bind symbols in the sharedLibrary.Include the <dlfcn.h> header file and link with the -ldl option to pick up thelibdl library.

void* handle = dlopen ("libtest.so", RTLD_LAZY);void (*test)() = dlsym (handle, "my_function");(*test)();dlclose (handle);

Both dlopen and dlsym return NULL if they do not succeed. In that event, you can call dlerror (with no parameters) to obtain a human-readable error message describing the problem.

C++ file linking to C shared libraryIf you're writing the code in your shared library in C++, you will probably want to declare those functions and variables that you plan to access elsewhere with theextern "C" linkage specifier. extern "C" void foo ();This prevents the C++ compiler from mangling the function name, which would change the function's name from foo to a different, funny-looking name that encodes extra information about the function. A C compiler will not mangle names; it will use whichever name you give to your function or variable.

© www.minhinc.com p27

Page 29: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 2 Morning

5. The Boot ProcessThe Boot ProcessThe Boot ProcessBIOS LevelBIOS Level

- Boot Loader - Setup - Boot Loader - Setup - startup_32 functions - startup_32 functions

The start_kernel() functionThe start_kernel() function

Day 2 Morning

5. The Boot ProcessThe Boot ProcessThe Boot Process

BIOS LevelBIOS Level - Boot Loader - Setup - Boot Loader - Setup - startup_32 functions - startup_32 functions

The start_kernel() functionThe start_kernel() function

Linux Boot flow

Booting Sequence

1. Tern on2. CPU jump to address of BIOS (0xFFFF0)3. BIOS runs POST (Power-On Self Test)4. Find bootale devices5. Loads and execute boot sector form MBR6. Load OS

BIOS refers to the software code run by a computer when first powered onThe primary function of BIOS is code program embedded on a chip that recognizes and controls various devices that make up the computer.

MBR Master Boot Record- OS is booted from a hard disk, where the Master Boot Record (MBR) contains the primary boot loader- The MBR is a 512-byte sector, located in the first sector on the disk (sector 1 of cylinder 0, head 0)- After the MBR is loaded into RAM, the BIOS yields control to it.

© www.minhinc.com p28

Page 30: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 2 Morning

5. The Boot ProcessThe Boot ProcessThe Boot Process

BIOS LevelBIOS Level

- Boot Loader - Setup- Boot Loader - Setup - startup_32 functions - startup_32 functionsThe start_kernel() functionThe start_kernel() function

MBR, Master Boot Record

- The first 446 bytes are the primary boot loader, which contains both executable code and error message text - The next sixty-four bytes are the partition table, which contains a record for each of four partitions - The MBR ends with two bytes that are defined as the magic number (0xAA55). The magic number serves as a validation check of the MBR

Extract MBR, Master Boot Record# dd if=/dev/hda of=mbr.bin bs=512 count=1# od -xa mbr.bin

Boot Loader - Boot loader or kernel loader first decompress kernel zImage file then calls kernel start_kernel() function passing the arguments. - Optional, initial RAM disk - GRUB and LILO are the most popular Linux boot loader.

List of Boot loadersbootman, GRUB, LILO, NTLDR, XOSL, BootX, loadlin, Gujin, Boot Camp, Syslinux, GAG

© www.minhinc.com p29

Page 31: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

GRUB Boot Loader - GRUB is an operating system independent boot loader - A multi-boot software packet from GNU - Flexible command line interface - File system access - Support multiple executable format - Support disk less system - Download OS from network

GRUB Boot Process1. The BIOS finds a bootable device (hard disk) and transfers control to the master boot record2. The MBR contains GRUB stage 1. Given the small size of the MBR, Stage 1 just load the next stage of GRUB3. GRUB Stage 1.5 is located in the first 30 kilobytes of hard disk immediately following the MBR. Stage 1.5 loads Stage 2.4. GRUB Stage 2 receives control, and displays to the user the GRUB boot menu (where the user can manually specify the boot parameters).5. GRUB loads the user-selected (or default) kernel into memory and passes control on to the kernel.

GRUB Config File

LILO: LInux LOader - A versatile boot manager that supports: - Choice of Linux kernels. - Boot time kernel parameters. - Booting non-Linux kernels. - A variety of configurations. - Characteristics: - Lives in MBR or partition boot sector. - Has no knowledge of filesystem structure so... - Builds a sector "map file" (block map) to find kernel. - /sbin/lilo - "map installer". - /etc/lilo.conf is lilo configuration file.LILO Boot Loader

lilo.conf

© www.minhinc.com p30

Page 32: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Kernel Booting, Init processKernel execute init(pid 1) program, getting init process. - Init is the root/parent of all processes executing on Linux - The first processes that init starts is a script /etc/rc.d/rc.sysinit - Based on the appropriate run-level, scripts are executed to start various processes to run the system and make it functional - Init is responsible for starting system processes as defined in the /etc/inittab file - Init typically will start multiple instances of "getty" which waits for console logins which spawn one's user shell process - Upon shutdown, init controls the sequence and processes for shutdown

Process ID Description0 The Scheduler1 The init process2 kflushd3 kupdate4 kpiod5 kswapd6 mdrecoveryd

Linux files structure

© www.minhinc.com p31

Page 33: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

6. The File SystemThe File SystemThe File SystemVirtual File system & its roleVirtual File system & its role

Files associated with a processFiles associated with a process

proc file systemproc file system

System callsSystem calls

The File System

Filesystems are containers of files, that are stored, probably in a directory tree, together with attributes, like size, owner, creation date and the like. A filesystem has a type. It defines how things are arranged on the disk. For example, one has the types minix, ext2, reiserfs, iso9660, vfat, hfs.

Linux File System Layout

Inode and direntry

$mkdir testdir

© www.minhinc.com p32

Page 34: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

InodeAn (in-core) inode contains the metadata of a file: its serial number, its protection (mode), its owner, its size, the dates of last access, creation and last modification, etc. It also points to the superblock of the filesystem the file is in, the methods for this file, and the dentries (names) for this file.

struct inode {unsigned long i_ino;umode_t i_mode;uid_t i_uid;gid_t i_gid;kdev_t i_rdev;loff_t i_size;struct timespec i_atime;struct timespec i_ctime;struct timespec i_mtime;struct super_block *i_sb;struct inode_operations *i_op;struct address_space *i_mapping;struct list_head i_dentry;...}

User space stat structure provides similar interface

#include <sys/types.h>#include <sys/stat.h>#include <unistd.h>int stat (const char *path, struct stat *buf);int fstat (int fd, struct stat *buf);int lstat (const char *path, struct stat *buf);

truct stat {dev_t st_dev; /*ID of device containing file */ino_t st_ino; /*inode number *mode_t st_mode; /*permissions */nlink_t st_nlink; /*number of hard links */uid_t st_uid; /*user ID of owner */gid_t st_gid; /*group ID of owner */dev_t st_rdev; /*device ID (if special file) */off_t st_size; /*total size in bytes */blksize_t st_blksize; /*blocksize for filesystem I/O */blkcnt_t st_blocks; /* number of blocks allocated */time_t st_atime; /*last access time */time_t st_mtime; /*last modification time */time_t st_ctime; /*last status change time */};

lstat() is identical to stat(), except that if pathname is a symbolic link, then it returns information about the link itself, not the file that it refers to.

fstat() is identical to stat(), except that the file about which information is to be retrieved is specified by the file descriptor fd.

#include#include#include#include<sys/types.h><sys/stat.h><unistd.h><stdio.h>int main (int argc, char *argv[]){struct stat sb;int ret;if (argc < 2) {fprintf (stderr,"usage: %s <file>", argv[0]);return 1;}ret = stat (argv[1], &sb);if (ret) {perror ("stat");return 1;}printf ("%s is %ld bytes",argv[1], sb.st_size);return 0;}

The following mask values are defined for the file type of the st_mode field:

© www.minhinc.com p33

Page 35: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

S_IFMT 0170000 bit mask for the file type bit fieldS_IFSOCK 0140000 socketS_IFLNK 0120000 symbolic linkS_IFREG 0100000 regular fileS_IFBLK 0060000 block deviceS_IFDIR 0040000 directoryS_IFCHR 0020000 character deviceS_IFIFO 0010000 FIFO

Thus, to test for a regular file (for example), one could write:stat(pathname, &sb);if ((sb.st_mode & S_IFMT) == S_IFREG) {/* Handle regular file */}

#include "apue.h"intmain(int argc, char *argv[]){int i;struct stat buf;char *ptr;for (i = 1; i < argc; i++) { printf("%s: ", argv[i]); if (lstat(argv[i], &buf) < 0) { err_ret("lstat error"); continue; }

if (S_ISREG(buf.st_mode)) ptr = "regular"; else if (S_ISDIR(buf.st_mode)) ptr = "directory"; else if (S_ISCHR(buf.st_mode)) ptr = "character special"; else if (S_ISBLK(buf.st_mode)) ptr = "block special"; else if (S_ISFIFO(buf.st_mode)) ptr = "fifo"; else if (S_ISLNK(buf.st_mode)) ptr = "symbolic link"; else if (S_ISSOCK(buf.st_mode)) ptr = "socket"; else ptr = "** unknown mode **"; printf("%s", ptr);} exit(0);}

Printing all fields

# include <fcntl.h># include <stdio.h># include <time.h># include <sys/types.h># include<sys/stat.h>

main(){struct stat fst;struct tm *Time;int fd;fd = open("testfile",O_RDONLY);fstat(fd,&fst);printf("Listing the detailsd of the file");printf(" The inode no of the file is %d",fst.st_ino);printf(" The device ID of the file is %d",fst.st_dev);printf(" The block size of the file system is %d",fst.st_blksize);printf("The user ID is %d",fst.st_uid);printf("The group ID is %d",fst.st_gid);printf("Access time is %d",fst.st_atime);printf("creation time is %d",fst.st_ctime);printf("modification time is %d",fst.st_mtime);Time = localtime(&fst.st_atime);

printf("day : %d ",Time->tm_mday);printf("month: %d ",Time ->tm_mon);printf("year : %d ",Time->tm_year);printf("hour : %d ",Time->tm_hour);printf("min : %d ",Time ->tm_min);}

PermissionsWhile the stat calls can be used to obtain the permission values for a given file, two other system calls set those values:#include <sys/types.h>#include <sys/stat.h>int chmod (const char *path, mode_t mode);int fchmod (int fd, mode_t mode);Example chmod

int ret;/** Set 'map.png' in the current directory to* owner-readable and -writable. This is the* same as 'chmod 600 ./map.png'.*/ret = chmod ("./map.png", S_IRUSR | S_IWUSR);if (ret)perror ("chmod");

© www.minhinc.com p34

Page 36: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

OwnershipIn the stat structure, the st_uid and st_gid fields provide the file's owner and group, respectively. Three system calls allow a user to change those two values:#include <sys/types.h>#include <unistd.h>int chown (const char *path, uid_t owner, gid_t group);int lchown (const char *path, uid_t owner, gid_t group);int fchown (int fd, uid_t owner, gid_t group);

struct group *gr;int ret;/** getgrnam() returns information on a group* given its name.*/gr = getgrnam ("officers");if (!gr) {/* likely an invalid group */perror ("getgrnam");return 1;}/* set manifest.txt's group to 'officers' */ret = chown("manifest.txt", -1, gr->gr_gid);if (ret)perror ("chown");

Reading a Directory's ContentsA directory is represented by DIR object

#include <sys/types.h>#include <dirent.h>DIR * opendir (const char *name);To obtain the file descriptor behind a given directory stream:#define _BSD_SOURCE /* or _SVID_SOURCE */#include <sys/types.h>#include <dirent.h>int dirfd (DIR *dir);

Reading from a directory streamOnce you have created a directory stream with opendir() , your program can begin reading entries from the directory. To do this, use readdir() , which returns entries one by one from a given DIR object:

#include <sys/types.h>#include <dirent.h>struct dirent * readdir (DIR *dir);

A successful call to readdir() returns the next entry in the directory represented by dir . The dirent structure represents a directory entry. Defined in <dirent.h> , onLinux, its definition is:Applications successively invoke readdir() , obtaining each file in the directory, until they find the file they are searching for or until the entire directory is read, at which time readdir() returns NULL .

struct dirent {ino_t d_ino; /* inode number */off_t d_off; /* offset to the next dirent */unsigned short d_reclen; /* length of this record */unsigned char d_type; /* type of file */char d_name[256]; /* filename */};

To close the DIR*int closedir (DIR *dir);

/** find_file_in_dir - searches the directory 'path' for a* file named 'file'.** Returns 0 if 'file' exists in 'path' and a nonzero* value otherwise.*/int find_file_in_dir (const char *path, const char *file){ struct dirent *entry; int ret = 1; DIR *dir; dir = opendir (path); errno = 0; while ((entry = readdir (dir)) != NULL) { if (strcmp(entry->d_name, file) == 0) {

© www.minhinc.com p35

Page 37: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

ret = 0; break; } }if (errno && !entry) perror ("readdir");

closedir (dir);return ret;}

System calls for reading directory contentsThe previously discussed functions for reading the contents of directories are standar- dized by POSIX and provided by the C library. Internally, these functions use one of two system calls, readdir() and getdents() , which are provided here for completeness:

#include <unistd.h>#include <linux/types.h>#include <linux/dirent.h>

#include <errno.h>/** Not defined for user space: need to* use the _syscall3() macro to access.*/int readdir (unsigned int fd,struct dirent *dirp,unsigned int count);int getdents (unsigned int fd,struct dirent *dirp,unsigned int count);

LinksA link is essentially just a name in a list (a directory) that points at an inode-there would appear to be no reason why multiple links to the same inode could not exist. That is, a single inode (and thus a single file) could be referenced from, say, both /etc/customs and /var/run/ledger.

Hard LinkFiles can have 0, 1, or many links. Most files have a link count of 1-that is, they are pointed at by a single directory entry-but some files have 2 or even more links. These are called hard link.

The link() system call, one of the original Unix system calls, and now standardized by POSIX, creates a new link for an existing file:

#include <unistd.h>int link (const char *oldpath, const char *newpath);

int ret;/** create a new directory entry,* '/home/kidd/privateer', that points at* the same inode as '/home/kidd/pirate'*/ret = link ("/home/kidd/privateer", /home/kidd/pirate");if (ret)perror ("link");

Symbolic LinksSymbolic links, also known as symlinks or soft links, are similar to hard links in that both point at files in the filesystem. The symbolic link differs, however, in that it is not merely an additional directory entry, but a special type of file altogether. This special file contains the pathname for a different file, called the symbolic link's target. At runtime, on the fly, the kernel substitutes this pathname for the symbolic link's pathname (unless using the various l versions of system calls, such as lstat() , which operate on the link itself, and not the target).Soft links, unlike hard links, can span filesystems also called dangling softlink.

#include <unistd.h>int symlink (const char *oldpath, const char *newpath);

int ret;/** create a symbolic link,* '/home/kidd/privateer', that* points at '/home/kidd/pirate'*/ret = symlink ("/home/kidd/privateer", "/home/kidd/pirate");

© www.minhinc.com p36

Page 38: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

6. The File SystemThe File SystemThe File System

Virtual File system & its roleVirtual File system & its roleFiles associated with a processFiles associated with a process

proc file systemproc file system

System callsSystem calls

if (ret)perror ("symlink");

UnlinkingThe converse to linking is unlinking, the removal of pathnames from the filesystem. A single system call, unlink(), handles this task:

#include <unistd.h>int unlink (const char *pathname);

VFS, Virtual File Systems

- The Linux kernel implements the concept of Virtual File System (VFS, originally Virtual Filesystem Switch), so that it is (to a large degree) possible to separate actual "low-level" filesystem code from the rest of the kernel.- The VFS is more of an Interface rather than an actual complete file system.- An important role of the VFS is to perform what is called "Standard Actions". For example, the function lseek() is not actually implemented by any file system, as the function of lseek() is provided by a "standard action" of VFS.- Two important native filesystems in the Linux environment are ext2 and the proc file system.

Four main objects in VFS API: superblock, dentries, inodes, files- The kernel keeps track of files using in-core inodes ("index nodes"), usually derived by the low-level filesystem from on-disk inodes. - A file may have several names, and there is a layer of dentries ("directory entries") that represent pathnames, speeding up the lookup operation. - Several processes may have the same file open for reading or writing, and file structures contain the required information such as the current file position. - Access to a filesystem starts by mounting it. This operation takes a filesystem type (like ext2, vfat, iso9660, nfs) and a device and produces the in-core superblock that contains the information required for operations on the filesystem; a third ingredient, the mount point, specifies what pathname refers to the root of the filesystem.

Auxiliary objects We have filesystem types, used to connect the name of the filesystem to the routines for setting it up (at mount time) or tearing it down (at umount time). - A struct vfsmount represents a subtree in the big file hierarchy - basically a pair (device, mountpoint). - A struct nameidata represents the result of a lookup.

© www.minhinc.com p37

Page 39: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

- A struct address_space gives the mapping between the blocks in a file and blocks on disk. It is needed for I/O.

Filesystem type registrationThe struct is of type struct file_system_type . Here the 2.2.17 version:struct file_system_type {const char *name;int fs_flags;struct super_block *(*read_super) (struct super_block *, void *, int);struct file_system_type *next;};

The call register_filesystem() hangs this struct in the chain with head file_systems , and unregister_filesystem() removes it again.Accesses to this chain are protected by the spinlock file_systems_lock . There are no other writers. The main reader is of course the mount() system call (via get_fs_type() ). Other readers are get_filesystem_list() used for /proc/filesystems , andthe sysfs system call.The code is in fs/filesystems.c .

static struct file_system_type tue_fs_type = {.owner= THIS_MODULE,.name= "tue",.get_sb= tue_get_sb,.kill_sb= kill_block_super,.fs_flags= FS_REQUIRES_DEV,}

static int __init init_tue_fs(void) {return register_filesystem(&tue_fs_type);}static void __exit exit_tue_fs(void){unregister_filesystem(&tue_fs_type);}

Struct file_system_type

struct file_system_type {const char *name;int fs_flags;struct super_block *(*get_sb)(struct file_system_type *,int, char *, void *, struct vfsmount *);void (*kill_sb) (struct super_block *);struct module *owner;struct file_system_type *next;struct list_head fs_supers;struct lock_class_key s_lock_key;struct lock_class_key s_umount_key;};

(In 2.4 there was no kill_sb() , and the role of get_sb() was taken by read_super() . The final parameter of get_sb() and the lock_class_key fields are present since 2.6.18.)

nameHere the filesystem type gives its name ("tue"), so that the kernel can find it when someone does mount -t tue /dev/foo /dir

get_sbAt mount time the kernel calls the fstype->get_sb() routine that initializes things and sets up a superblock. Typically this is a 1-line routine that calls one of get_sb_bdev , get_sb_single , get_sb_nodev , get_sb_pseudo

kill_sbAt umount time the kernel calls the fstype->kill_sb() routine to clean up. Typically one of kill_block_super , kill_anon_super , kill_litter_super .

Example of the use of owner - sysfsThere exists a strange SYSV system call sysfs that will return (i) a sequence number given a filesystem type, and (ii) a filesystem type given a sequence number, and (iii) the total number of filesystem types registered now. This call is not supported by libc or glibc.These sequence numbers are rather meaningless since they may change any moment. But this means that one can get a snapshot of the list of filesystem types without looking at /proc/filesystems . For example, the program

#include <stdio.h>#include <linux/unistd.h>

© www.minhinc.com p38

Page 40: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

/* define the 3-arg version of sysfs() */static _syscall3(int,sysfs,int,option,unsigned int,fsindex,char *,buf);/* define the 1-arg version of sysfs() */static int sysfs1(int i) {return sysfs(i,0,NULL);}

main(){int i, tot;char buf[100];/* how long is a filesystem type name?? */tot = sysfs1(3);if (tot == -1) {perror("sysfs(3)");

exit(1);for (i=0; i<tot; i++) {if (sysfs(2, i, buf)) {perror("sysfs(2)");exit(1);}printf("%2d: %s", i, buf);}Return 0;

might give output like0:ext21:minix2:romfs3:msdos4:vfat5:proc6:nfs7:smbfs8:iso9660

MountingThe mount system call attaches a filesystem to the big file hierarchy at some indicated point. Ingredients needed:(i) a device that carries the filesystem (disk, partition, floppy, CDROM, SmartMedia card, ...), (ii) a directory where the filesystem on that device must be attached, (iii) a filesystem type.

The code for sys_mount() is found in fs/namespace.c and fs/super.c . The connection with the filesystem type name is made in do_kern_mount() :

struct file_system_type *type = get_fs_type(fstype);struct super_block *sb;if (!type)return ERR_PTR(-ENODEV);sb = type->get_sb(type, flags, name, data);and this is the only call of the get_sb() routine.

The code for sys_umount() is found in fs/namespace.c and fs/super.c . The counterpart of the just quoted code is the cleanup in deactivate_super() :fs->kill_sb(s);and this is the only call of the kill_sb() routine.

The superblockThe superblock gives global information on a filesystem: the device on which it lives, its block size, its type, the dentry of the root of the filesystem, the methods it has, etc., etc.struct super_block {dev_t s_dev;unsigned long s_blocksize;struct file_system_type *s_type;struct super_operations *s_op;struct dentry *s_root;...}struct super_operations {struct inode *(*alloc_inode)(struct super_block *sb);void (*destroy_inode)(struct inode *);void (*read_inode) (struct inode *);void (*dirty_inode) (struct inode *);void (*write_inode) (struct inode *, int);void (*put_inode) (struct inode *);void (*drop_inode) (struct inode *);void (*delete_inode) (struct inode *);void (*put_super) (struct super_block *);void (*write_super) (struct super_block *);int (*sync_fs)(struct super_block *sb, int wait);void (*write_super_lockfs) (struct super_block *);void (*unlockfs) (struct super_block *);int (*statfs) (struct super_block *, struct statfs *);int (*remount_fs) (struct super_block *, int *, char *);void (*clear_inode) (struct inode *);void (*umount_begin) (struct super_block *);int (*show_options)(struct seq_file *, struct vfsmount *);};

© www.minhinc.com p39

Page 41: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

This is enough to get started: the dentry of the root directory tells us the inode of this root directory (and in particular its i_ino ), and sb->s_op->read_inode(inode) will read this inode from disk. Now inode->i_op->lookup() allows us to find names in the root directory, etc.Each superblock is on six lists, with links through the fields s_list , s_dirty , s_io , s_anon , s_files , s_instances , respectively.

The super_blocks listAll superblocks are collected in a list super_blocks with links in the fields s_list . This list is protected by the spinlock sb_lock . The main use is in super.c:get_super() or user_get_super() to find the superblock for a given block device. (Bothroutines are identical, except that one takes a bdev , the other a dev_t .) This list is also used various places where all superblocks must be sync'ed or all dirty inodes must be written out.

<b.The fs_supers listAll superblocks of a given type are collected in a list headed by the fs_supers field of the struct filesystem_type, with links in the fields s_instances . Also this list is protected by the spinlock sb_lock .

The file listAll open files belonging to a given superblock are chained in a list headed by the s_files field of the superblock, with links in the fields f_list of the files. These lists are protected by the spinlock files_lock . This list is used for example in fs_may_remount_ro() to check that there are no files currently open for writing.

The list of anonymous dentriesNormally, all dentries are connected to root. However, when NFS filehandles are used this need not be the case. Dentries that are roots of subtrees potentially unconnected to root are chained in a list headed by the s_anon fieldof the superblock, with links in the fields d_hash . These lists are protected by the spinlock dcache_lock . They are grown in dcache.c:d_alloc_anon() and shrunk in super.c:generic_shutdown_super() .

The inode lists s_dirty, s_ioLists of inodes to be written out. These lists are headed at the s_dirty (resp. s_io ) field of the superblock, with links in the fields i_list . These lists are protected by the spinlock inode_lock . See fs/fs-writeback.c .

InodesAn (in-core) inode contains the metadata of a file: its serial number, its protection (mode), its owner, its size, the dates of last access, creation and last modification, etc. It also points to the superblock of the filesystem the file is in, the methods for this file, and the dentries (names) for this file.struct inode {unsigned long i_ino;umode_t i_mode;uid_t i_uid;gid_t i_gid;kdev_t i_rdev;loff_t i_size;struct timespec i_atime;struct timespec i_ctime;struct timespec i_mtime;struct super_block *i_sb;

struct inode_operations *i_op;struct address_space *i_mapping;struct list_head i_dentry;...}

struct inode_operations {int (*create) (struct inode *, struct dentry *, int);struct dentry * (*lookup) (struct inode *, struct dentry *);int (*link) (struct dentry *, struct inode *, struct dentry *);int (*unlink) (struct inode *, struct dentry *);int (*symlink) (struct inode *, struct dentry *, const char *);};int (*mkdir) (struct inode *, struct dentry *, int);int (*rmdir) (struct inode *, struct dentry *);int (*mknod) (struct inode *, struct dentry *, int, dev_t);int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *);int (*readlink) (struct dentry *, char *,int);int (*follow_link) (struct dentry *, struct nameidata *);void (*truncate) (struct inode *);int (*permission) (struct inode *, int);int (*setattr) (struct dentry *, struct iattr *);int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);int (*setxattr) (struct dentry *, const char *, const void *, size_t, int);ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);ssize_t (*listxattr) (struct dentry *, char *, size_t);int (*removexattr) (struct dentry *, const char *);

Each inode is on four lists, with links through the fields i_hash , i_list , i_dentry , i_devices .

Dentries

© www.minhinc.com p40

Page 42: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

The dentries encode the filesystem tree structure, the names of the files. Thus, the main parts of a dentry are the inode (if any) that belongs to it, the name (the final part of the pathname), and the parent (the name of the containing directory). There are also the superblocks, the methods, a list of subdirectories, etc.struct dentry {struct inode *d_inode;struct dentry *d_parent;struct qstr d_name;struct super_block *d_sb;struct dentry_operations *d_op;struct list_head d_subdirs;...}struct dentry_operations {int (*d_revalidate)(struct dentry *, int);int (*d_hash) (struct dentry *, struct qstr *);int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);int (*d_delete)(struct dentry *);void (*d_release)(struct dentry *);void (*d_iput)(struct dentry *, struct inode *);};

Each dentry is on five lists, with links through the fields d_hash , d_lru , d_child , d_subdirs , d_alias .

FilesFile structures represent open files, that is, an inode together with a current (reading/writing) offset. The offset can be set by the lseek() system call. Note that instead of a pointer to the inode we have a pointer to the dentry -that means that the name used to open a file is known. In particular system calls like getcwd() are possible.

struct file {struct dentry *f_dentry;struct vfsmount *f_vfsmnt;struct file_operations *f_op;mode_t f_mode;loff_t f_pos;struct fown_struct f_owner;unsigned int f_uid, f_gid;unsigned long f_version;...}

Here the f_owner field gives the owner to use for async I/O signals.

struct file_operations {struct module *owner;loff_t (*llseek) (struct file *, loff_t, int);ssize_t (*read) (struct file *, char *, size_t, loff_t *);ssize_t (*aio_read) (struct kiocb *, char *, size_t, loff_t);ssize_t (*write) (struct file *, const char *, size_t, loff_t *);ssize_t (*aio_write) (struct kiocb *, const char *, size_t, loff_t);int (*readdir) (struct file *, void *, filldir_t);unsigned int (*poll) (struct file *, struct poll_table_struct *);int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);int (*mmap) (struct file *, struct vm_area_struct *);int (*open) (struct inode *, struct file *);int (*flush) (struct file *);int (*release) (struct inode *, struct file *);int (*fsync) (struct file *, struct dentry *, int datasync);int (*aio_fsync) (struct kiocb *, int datasync);int (*fasync) (int, struct file *, int);int (*lock) (struct file *, int, struct file_lock *);ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t, void *);ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);

Each file is in two lists, with links through the fields f_list , f_ep_links .

f_listThe list with links through f_list was discussed above. It is the list of all files belonging to a given superblock. There is a second use: the tty driver collects all files that are opened instances of a tty in a list headed by tty->tty_files with links through the file field f_list . Conversely, these files point back at the tty via their field private_data .(This field private_data is also used elsewhere. For example, the proc code uses it to attach a struct seq_file to a file.)

The event poll listAll event poll items belonging to a given file are collected in a list with head f_ep_links , protected by the file fieldf_ep_lock . (For event poll stuff, see epoll_ctl(2).)

struct vfsmountA struct vfsmount describes a mount. The definition lives in mount.h :

struct vfsmount {struct list_head mnt_hash;struct vfsmount *mnt_parent; /* fs we are mounted on */struct dentry *mnt_mountpoint; /* dentry of mountpoint */struct dentry *mnt_root;/* root of the mounted tree */struct super_block *mnt_sb;/* pointer to superblock */struct list_head mnt_mounts; /* list of children, anchored here */struct list_head mnt_child;

© www.minhinc.com p41

Page 43: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

6. The File SystemThe File SystemThe File System

Virtual File system & its roleVirtual File system & its role

Files associated with a processFiles associated with a processproc file systemproc file system

System callsSystem calls

/* and going through their mnt_child */atomic_t mnt_count;int mnt_flags;char *mnt_devname;/* Name of device e.g. /dev/dsk/hda1 */struct list_head mnt_list;};

fs_structA struct fs_struct determines the interpretation of pathnames referred to by a process (and also, somewhat illogically, contains the umask). The typical reference is current->fs . The definition

lives in fs_struct.h :struct fs_struct {atomic_t count;rwlock_t lock;int umask;struct dentry * root, * pwd, * altroot;struct vfsmount * rootmnt, * pwdmnt, * altrootmnt;};

Semantics of root and pwd are clear. Remains to discuss altroot .

There are two normal cases for handling the descriptors after a fork.1. The parent waits for the child to complete. In this case, the parent does not need to do anything with its descriptors. When the child terminates, any of the shared descriptors that the child read from or wrote to will have their file offsets updated accordingly.2. Both the parent and the child go their own ways. Here, after the fork, the parent closes the descriptors that it doesn't need, and the child does the same thing. This way, neither interferes with the other's open descriptors. This scenario is often found with network servers.

Besides the open files, numerous other properties of the parent are inherited by the child:* Real user ID, real group ID, effective user ID, and effective group ID* Supplementary group IDs* Process group ID

© www.minhinc.com p42

Page 44: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

6. The File SystemThe File SystemThe File System

Virtual File system & its roleVirtual File system & its role

Files associated with a processFiles associated with a process

proc file systemproc file systemSystem callsSystem calls

* Session ID* Controlling terminal* The set-user-ID and set-group-ID flags* Current working directory* Root directory* File mode creation mask* Signal mask and dispositions* The close-on-exec flag for any open file descriptors* Environment* Attached shared memory segments* Memory mappings* Resource limitsThe differences between the parent and child are* The return values from fork are different.* The process IDs are different.* The two processes have different parent process IDs: the parent process ID of the child is the parent; the parent process ID of the parent doesn't change.* The child's tms_utime, tms_stime, tms_cutime, and tms_cstime values are set to 0* File locks set by the parent are not inherited by the child.* Pending alarms are cleared for the child.* The set of pending signals for the child is set to the empty set.

/proc is a window into the running Linux kernel. Files in the /proc file system don't corre-spond to actual files on a physical device. Instead, they are magic objects that behave like files but provide access to parameters, data structures, and statistics in the kernel. The "contents" of these files are not always fixed blocks of data, as ordinary file contents are. Instead, they are generated on the fly by the Linux kernel when you readfrom the file.You can also change the configuration of the running kernel by writing to certain files in the /proc file system.Let's look at an example: % ls -l /proc/version -r--r--r-- 1 root root 0 Jan 17 18:09 /proc/versionSize is 0 as this generated by kernel

$mountnone on /proc type proc (rw)

none reveals that is not a file systemon disk.

Extracting Information from /proc#include <stdio.h>#include <string.h>/* Returns the clock speed of the system's CPU in MHz, as reported by /proc/cpuinfo. On a multiprocessor machine, returns the speed of the first CPU. On error returns zero. */float get_cpu_clock_speed (){ FILE* fp; char buffer[1024]; size_t bytes_read; char* match; float clock_speed; /* Read the entire contents of /proc/cpuinfo into the buffer. */ fp = fopen ("/proc/cpuinfo", "r"); bytes_read = fread (buffer, 1, sizeof (buffer), fp); fclose (fp); /* Bail if read failed or if buffer isn't big enough. */ if (bytes_read == 0 || bytes_read == sizeof (buffer)) return 0; /* NUL-terminate the text. */ buffer[bytes_read] = '\0'; /* Locate the line that starts with "cpu MHz". */

match = strstr (buffer, "cpu MHz"); if (match == NULL) return 0; /* Parse the line to extract the clock speed. */

© www.minhinc.com p43

Page 45: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

6. The File SystemThe File SystemThe File System

Virtual File system & its roleVirtual File system & its role

Files associated with a processFiles associated with a process

proc file systemproc file system

System callsSystem calls

#include <fcntl.h>int fcntl(int fd, int cmd);int fcntl(int fd, int cmd, long arg);int fcntl(int fd, int cmd, struct flock *lock); Returns: depends on cmd if OK (see following), -1 on error

For record locking cmd is F_GETLK, F_SETLK or F_SETLKWstruct flock { short l_type; /* F_RDLCK, F_WRLCK, or F_UNLCK */ short l_whence; /* SEEK_SET, SEEK_CUR, or SEEK_END */ off_t l_start; /* offset in bytes, relative to l_whence */ off_t l_len; /* length, in bytes; 0 means lock to EOF */ pid_t l_pid; /* returned with F_GETLK */};

# include <stdio.h># include<fcntl.h>Main() {int fd, pid, retval;struct flock lockc, lockp;fd = open("testlock",O_WRONLY);lockp.l_type = F_WRLCK;lockp.l_whence = 0;lockp.l_start = 10;lockp.l_len = 15;if((retval = fcntl(fd, F_SETLK,&lockp)) == -1) // Parent is locking the file perror("parent write lock");printf("retval is %d",retval);if((pid = fork()) == 0){ lockc.l_type = F_WRLCK; lockc.l_whence = 0; lockc.l_start = 40; lockc.l_len = 55; //Child is locking the file if((retval = fcntl(fd, F_SETLK,&lockc)) == -1)perror("Child write lock"); printf("retval is %d",retval); printf("Child Process over"); } else { sleep(3); lockp.l_type = F_UNLCK; lockp.l_whence = 0; lockp.l_start = 10; lockp.l_len = 15; // Parent is unlocking the file if((retval = fcntl(fd, F_SETLK,&lockp)) == -1)perror("parent write lock"); printf("Parent Process over"); }}

Both are trying to make READ LOCK,

Successfull can try at WRITE LOCK# include <stdio.h># include<fcntl.h>

main(){int fd, pid, retval;struct flock lockc, lockp;

fd = open("testlock",O_RDONLY);

lockp.l_type = F_RDLCK;lockp.l_whence = 0; //SEEK_SETlockp.l_start = 10;lockp.l_len = 15;if((retval = fcntl(fd, F_SETLK,&lockp)) == -1) // Parent is locking the file perror("parent read lock");printf("Parent retval is %d",retval);

//Child starts hereif((pid = fork()) == 0){ if((retval = fcntl(fd, F_GETLK,&lockc)) == -1) perror("child write lock");

printf("retval is %d",retval);printf("process %d has locked this section",lockc.l_pid);printf("lock type %d",lockc.l_type);printf("whence %d",lockc.l_whence);printf("start %d",lockc.l_start);printf("lenth is %d",lockc.l_len);

lockc.l_type = F_RDLCK;lockc.l_whence = 0;lockc.l_start = 10;lockc.l_len = 15;//Child is locking the fileif((retval = fcntl(fd, F_SETLK,&lockc)) == -1) perror("Child read lock");

printf("Child retval is %d",retval); printf("Child Process over");}else {sleep(3);printf("Parent Process over");}}

sscanf (match, "cpu MHz : %f", &clock_speed); return clock_speed;}int main (){ printf ("CPU clock speed: %4.0f MHz", get_cpu_clock_speed ()); return 0;}

Various directories and files in /proc1)/proc/<number> # for processes running2)/proc/self #for current process3)/proc/cpuinfo4)/proc/devices5)/proc/pci #summary of devices connected to pci bus6)/proc/tty/driver/serail #serial ports7)/proc/sys/kernel #kernel information8)/proc/meminfo #system's memory usage9)/proc/filesystem #filesystems mounted in kernel10) /proc/mount #all mounted filesytems

1. fcntl Record Locking

2. lockf

SYNOPSIS#include <unistd.h>int lockf(int fd, int cmd, off_t len);- apply, test or remove a POSIX lock on an open file

DEADLOCK, avoid deadlock with F_TLOCK in child lockf() call

# include <fcntl.h># include <unistd.h>main(){int fd, retvelue;pid_t pid;

char buff[100];

© www.minhinc.com p44

Page 46: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

if((fd = open("locktest",O_RDWR|O_CREAT, 0666)) == -1) perror("open file locktest");

if(lockf(fd,F_LOCK,10) == -1) perror("lockf failed");

if((pid = fork()) == 0){ if(lockf(fd,F_LOCK,10) == -1) //child blocked dead lock....! perror("lockf failed"); puts("The child process over"); } else{ wait(0); printf("Process %d is over",getpid());}}

3. access#include <unistd.h>int access(const char *pathname, int mode);

access() checks whether the process would be allowed to read, write or test for existence of the file (or other file system object) whose name is pathname. If pathname is a symbolic link permissions of the file referred to by this symbolic link are tested.mode is a mask consisting of one or more of R_OK, W_OK, X_OK and F_OK.R_OK, W_OK and X_OK request checking whether the file exists and has read, write and execute permissions, respectively. F_OK just requests checking for the existence of the file.#include<errno.h>#include<stdio.h>#include<unistd.h>int main(int argc, char* argv[]) {char* path = argv[1];int ret;ret = access(path,F_OK); // check for file existsif(ret == 0)printf(" %s file exists",path);}

4. create#include <sys/types.h>#include <sys/stat.h>#include <fcntl.h>

int open(const char *pathname, int flags);int open(const char *pathname, int flags, mode_t mode);int creat(const char *pathname, mode_t mode);

5. dup, dup2#include <unistd.h>

int dup(int oldfd);int dup2(int oldfd, int newfd);

dup() and dup2() create a copy of the file descriptor oldfd.

After a successful return from dup() or dup2(), the old and new file descriptors may be used interchangeably. They refer to the same open file descriptor thus share file offset and file status flags; for example, if the file offset is modified by usinglseek(2) on one of the descriptors, the offset is also changed for the other.

The two descriptors do not share file descriptor flags (the close-on-exec flag). The close- on-exec flag (FD_CLOEXEC;

dup() uses the lowest-numbered unused descriptor for the new descriptor.

dup2() makes newfd be the copy of oldfd, closing newfd first if necessary.

# include <stdio.h># include <stdlib.h># include <fcntl.h># include <sys/stat.h>main(){int fd, newfd;

© www.minhinc.com p45

Page 47: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

if((fd = creat("testfile",0666)) == -1){ perror("Creat failed"); exit(0);}printf("Descriptor is %d",fd);newfd= dup2(fd,5);//try with stdoutprintf("New Descriptor is %d",newfd);printf("The PID is %d",getpid());for(;;);close(fd);close(newfd);}

Using fcntl to create a copy# include <stdio.h># include <fcntl.h>main(){int fd,fd1, newfd;fd = open("temp",O_RDWR | O_CREAT ,0666);printf("The file discriptor is %d",fd);

fd1 = open("temp1",O_RDWR | O_CREAT ,0666);newfd=fcntl(fd,F_DUPFD,NULL);printf("The file discriptor is %d",newfd); }

6. mmap#include <sys/mman.h>

void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);int munmap(void *start, size_t length);

The mmap() function asks to map length bytes starting at offset offset from the file (or other object) specified by the file descriptor fd into memory, preferably at address start.This latter address is a hint only, and is usually specified as 0. The actual place where the object is mapped is returned by mmap().

The prot argument describes the desired memory protection (and must not conflict with the open mode of the file).It is either PROT_NONE or is the bitwise OR of one or more of the other PROT_* flags.PROT_EXEC Pages may be executed.PROT_READ Pages may be read.PROT_WRITE Pages may be written.PROT_NONE Pages may not be accessed.

The flags parameter specifies the type of the mapped object, mapping options and whether modifications made to the mapped copy of the page are private to the process or are to be shared with other references. It has bits

MAP_FIXED Do not select a different address than the one specified. If the memory region specified by start.MAP_SHARED Share this mapping with all other processes that map this object. Storing to the region is equivalent to writing to the file.MAP_PRIVATE Create a private copy-on-write mapping. Stores to the region do not affect the original file. It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.

© www.minhinc.com p46

Page 48: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

#include<unistd.h>#include<stdlib.h>#include<sys/mman.h>#include<sys/types.h>#include<sys/stat.h>#include<fcntl.h>#include<stdio.h>#include<syscall.h>

main(int argc,char *argv[]){int fd;void *addr;

if( (fd=open(argv[1],O_RDWR|O_CREAT,0777)) < 0) { perror("open"); exit(1);}

lseek(fd,5,SEEK_SET);write(fd,"",1);//lseek(fd,0,SEEK_SET);addr=mmap(0,5,PROT_WRITE,MAP_SHARED,fd,0); /* Mapping the file to memory*/close(fd);sprintf(addr,"%s","hello");munmap(addr,5);}//#define BCM2708_PERI_BASE 0x20000000#define BCM2708_PERI_BASE 0x3F000000#define GPIO_BASE (BCM2708_PERI_BASE + 0x200000) /* GPIO controller */

#include <stdio.h>#include <stdlib.h>#include <fcntl.h>#include <sys/mman.h>#include <unistd.h>

#define PAGE_SIZE (4*1024)#define BLOCK_SIZE (4*1024)

int mem_fd;void *gpio_map;

// I/O accessvolatile unsigned *gpio;

// GPIO setup macros. Always use INP_GPIO(x) before using OUT_GPIO(x) or SET_GPIO_ALT(x,y)#define INP_GPIO(g) *(gpio+((g)/10)) &= ~(7<<(((g)%10)*3))#define OUT_GPIO(g) *(gpio+((g)/10)) |= (1<<(((g)%10)*3))#define SET_GPIO_ALT(g,a) *(gpio+(((g)/10))) |= (((a)<=3?(a)+4:(a)==4?3:2)<<(((g)%10)*3))

//#define GPIO_SET *(gpio+7) // sets bits which are 1 ignores bits which are 0//#define GPIO_CLR *(gpio+10) // clears bits which are 1 ignores bits which are 0//temporarily introduced for pint 4#define GPIO_SET *(volatile unsigned int*)(gpio+7) |= 0x10 // sets bits which are 1 ignores bits which are 0#define GPIO_CLR *(volatile unsigned int*)(gpio+10)|= 0x10 // clears bits which are 1 ignores bits which are 0

#define GPIO_READ(g) *(gpio + 13) &= (1<<(g))

void setup_io();

int main(int argc, char **argv){int g,rep;

// Set up gpi pointer for direct register accesssetup_io();// set GPIO pin 7 as output// INP_GPIO(7); // must use INP_GPIO before we can use OUT_GPIO INP_GPIO(4); // must use INP_GPIO before we can use OUT_GPIO// OUT_GPIO(7); OUT_GPIO(4);

// flash LED on and off 10 times for (rep = 0; rep < 10; rep++) {// GPIO_SET = (1 << 7); printf("setting"); GPIO_SET; sleep(1);// GPIO_CLR = (1 << 7);

printf("resetting"); GPIO_CLR; sleep(1); } return 0;} // main// Set up a memory regions to access GPIOvoid setup_io(){/* open /dev/mem */if ((mem_fd = open("/dev/mem", O_RDWR|O_SYNC) ) < 0) { printf("can't open /dev/mem"); exit(-1);}/* mmap GPIO */gpio_map = mmap( NULL, //Any adddress in our space will do BLOCK_SIZE, //Map length PROT_READ|PROT_WRITE, // Enable reading & writting to mapped memory MAP_SHARED, //Shared with other processes mem_fd, //File to map GPIO_BASE //Offset to GPIO peripheral ); close(mem_fd); //No need to keep mem_fd open after mmap if (gpio_map == MAP_FAILED) { printf("mmap error %d", (int)gpio_map); //errno also set! exit(-1); } // Always use volatile pointer! gpio = (volatile unsigned *)gpio_map;} // setup_io()

© www.minhinc.com p47

Page 49: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

7. mountmount [-lhV]

mount -a [-fFnrsvw] [-t vfstype] [-O optlist]mount [-fnrsvw] [-o options [,...]] device | dirmount [-fnrsvw] [-t vfstype] [-o options] device dir

Mount a file system

All files accessible in a Unix system are arranged in one big tree, the file hierarchy, rooted at /. These files can be spread out over several devices. The mount command serves to attach the file system found on some device to the big file tree. Conversely, the umount(8) command will detach it again.

The standard form of the mount command, is mount -t type device dir#include<sys/mount.h>#include<stdio.h>main(){int fd;fd = mount("/dev/fd0","/mnt/floppy","ext2",MS_NOSUID,NULL);if(fd != -1)printf(" Floppy mounted successfully");printf(" Changing Directory to floppy");

© www.minhinc.com p48

Page 50: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

chdir("/mnt/floppy");printf(" Creating a file test_file in floppy");fd = creat("test_file",0644);if (fd != -1)printf(" File Creation successful");}

8. readv, writev#include <sys/uio.h>

ssize_t readv(int fd, const struct iovec *vector, int count);ssize_t writev(int fd, const struct iovec *vector, int count);readv, writev - read or write data into multiple buffers

The readv() function reads count blocks from the file associated with the file descriptor fd into the multiple buffers described by vector.

The writev() function writes at most count blocks described by vector to the file associated with the file descriptor fd.

The pointer vector points to a struct iovec defined in <sys/uio.h> as

struct iovec { void *iov_base; /* Starting address */ size_t iov_len; /* Number of bytes */};

# include<stdio.h># include <fcntl.h># include <sys/uio.h>

struct emp{char name[25];int age;float sal;}obj[2], Emp [2]={{"Hello",10,123.345},{"World",20,234.567}};

main(){struct iovec readiovobj,ioobj;int fd;int retval;ioobj.iov_base = Emp;ioobj.iov_len =sizeof(Emp);

printf("%d",ioobj.iov_len );

fd = open("temp",O_CREAT|O_RDWR,0666);retval=writev(fd,&ioobj,1);printf("%d",retval);

lseek(fd,0,SEEK_SET);readiovobj.iov_base = obj;readiovobj.iov_len =sizeof(Emp);retval=readv(fd,&readiovobj,1);printf("%d",retval);}

9. pread, pwrite #define _XOPEN_SOURCE 500

#include <unistd.h>

ssize_t pread(int fd, void *buf, size_t count, off_t offset);

ssize_t pwrite(int fd, const void *buf, size_t count, off_t offset); pread, pwrite - read from or write to a file descriptor at a given offset

pread() reads up to count bytes from file descriptor fd at offset offset (from the start of the file) into the buffer starting at buf. The file offset is not changed.pwrite() writes up to count bytes from the buffer starting at buf to the file descriptor fd at offset offset. The file offset is not changed.The file referenced by fd must be capable of seeking.

#include<stdio.h>#include<sys/stat.h>#include<sys/types.h>#include<fcntl.h>#include<unistd.h>

main(){int fd1, fd2,n;

© www.minhinc.com p49

Page 51: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

char ch[1024];if((fd1 = open("/etc/passwd",O_RDONLY)) == -1) perror("Un able to open source");

n = pread(fd1,ch,100,100);printf(ch);close(fd1);

if((fd2 = open("newfile",O_WRONLY,0666)) == -1){ perror("Un able to open target"); exit(1);}pwrite(fd2,"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",40,500); pwrite(fd2,"YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY",40,500);

close(fd2);}

© www.minhinc.com p50

Page 52: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

7. Process ManagementProcess DefinedProcess DefinedProcess Descriptor Structures in the kernelProcess Descriptor Structures in the kernel

Process StatesProcess States

Process SchedulingProcess Scheduling

Process CreationProcess Creation

System calls related to process managementSystem calls related to process management

Day 3 Morning

7. Process ManagementProcess DefinedProcess Defined

Process Descriptor Structures in the kernelProcess Descriptor Structures in the kernelProcess StatesProcess States

Process SchedulingProcess Scheduling

Process CreationProcess Creation

System calls related to process managementSystem calls related to process management

- A Process is a file in file system.- A Process is object code in execution-active, alive, running programs- Processes are more than just assembly language; they consist of data, resources, state,and a virtualized computer.- A process uses many resources like memory space, CPU, files, etc., during its lifetime.- A Process contains threads, contained in a process group and has parent Process. A process group contained in Session. Session has tty, terminal attached to it where at most one process group (Foreground process group) attached to the terminal. Rest detached process groups are background process group.

- A Process is sub program that is scheduled, by kernel, to the process for execution. Main thread in a process is actual entity that get scheduled to the CPU. Kernel maintains separate copy of registers and various other data structure for a process. - In multi processing environment register values in context of process gets loaded to actual register when execution resumes.

- A process is an entry in task vector, and is an instance of task_struct.

Process Structure* Every process is represented by a task_struct data structure.* This structure is quite large and complex.* When ever a new process is created a new task_struct structure is created by the kernel and the complete process information is maintained by the structure.* When a process is terminated, the corresponding structure is removed.* Uses doubly linked list data structure.

© www.minhinc.com p51

Page 53: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

7. Process ManagementProcess DefinedProcess Defined

Process Descriptor Structures in the kernelProcess Descriptor Structures in the kernel

Process StatesProcess StatesProcess SchedulingProcess Scheduling

Process CreationProcess Creation

System calls related to process managementSystem calls related to process management

* Solaris uses proc structure to manage processes.

task_struct task[256];

struct task_struct {volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */void *stack;atomic_t usage;unsigned int flags; /* per process flags, defined below */unsigned int ptrace;

#ifdef CONFIG_SMPstruct llist_node wake_entry;int on_cpu;struct task_struct *last_wakee;unsigned long wakee_flips;unsigned long wakee_flip_decay_ts;

int wake_cpu;#endifint on_rq;

int prio, static_prio, normal_prio;unsigned int rt_priority;const struct sched_class *sched_class;struct sched_entity se;struct sched_rt_entity rt;

#ifdef CONFIG_CGROUP_SCHEDstruct task_group *sched_task_group;#endif

#ifdef CONFIG_PREEMPT_NOTIFIERS/* list of struct preempt_notifier: */struct hlist_head preempt_notifiers;#endif/** fpu_counter contains the number of consecutive context switches* that the FPU is used. If this is over a threshold, the lazy fpu* saving becomes unlazy to save the trap. This is an unsigned char* so that after 256 times the counter wraps and the behavior turns* lazy again; this to deal with bursty apps that only use FPU for* a short time*/unsigned char fpu_counter;#ifdef CONFIG_BLK_DEV_IO_TRACE unsigned int btrace_seq;#endifunsigned int policy;int nr_cpus_allowed;cpumask_t cpus_allowed;

#ifdef CONFIG_PREEMPT_RCU int rcu_read_lock_nesting; char rcu_read_unlock_special; struct list_head rcu_node_entry;#endif /* #ifdef CONFIG_PREEMPT_RCU */#ifdef CONFIG_TREE_PREEMPT_RCU struct rcu_node *rcu_blocked_node;#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */#ifdef CONFIG_RCU_BOOST struct rt_mutex *rcu_boost_mutex;#endif /* #ifdef CONFIG_RCU_BOOST */

#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT) struct sched_info sched_info;#endif

struct list_head tasks;#ifdef CONFIG_SMP struct plist_node pushable_tasks;#endif

struct mm_struct *mm, *active_mm;#ifdef CONFIG_COMPAT_BRK unsigned brk_randomized:1;#endif#if defined(SPLIT_RSS_COUNTING) struct task_rss_stat rss_stat;#endif/* task state */int exit_state;int exit_code, exit_signal;int pdeath_signal; /* The signal sent when the parent dies */unsigned int jobctl; /* JOBCTL_*, siglock protected */....

In order to run unix, the computer hardware must provide two modes of execution:- kernel mode- user mode

Some computers have more than two execution modes.- eg: Intel processor. It has four modes of execution.

Each process has virtual address space; references to virtual memory are translated to physical memory locations using set of address translation maps.

Process States

© www.minhinc.com p52

Page 54: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

7. Process ManagementProcess DefinedProcess Defined

Process Descriptor Structures in the kernelProcess Descriptor Structures in the kernel

Process StatesProcess States

Process SchedulingProcess SchedulingProcess CreationProcess Creation

System calls related to process managementSystem calls related to process management

Scheduling (Kernel perspective)* The kernel keeps track of a processes creation time as well as the CPU time that it consumes during its lifetime.* This clock is the combination of software and hardware setup.* It is independent of CPU frequency.* A clock tick unit is Jiffy. System's interactive response depends on the clock frequency.- For example: the jiffy value may be 10ms (100Hz) or 1ms (1000Hz) depending on implementation

Each clock tick, the kernel updates the amount of time that the current process has spent in system and in user mode.* Linux also supports process specific interval timers, processes can use system calls to set up timers to send signals to themselves when the timers expire. These timers can be single-shot or periodic timers.

Process Scheduling* The job of a scheduler is to select the most deserving process to run out of all of the runnable processes in the run queue.* Implement fair scheduling to avoid starvation* Implement suitable scheduling policy* Updates state of the processes in every clock tick (jiffy)

Policy - FIFO, Round Robin, Shortest Job First,FILO, Priority based etc.* Priority - higher priority process will be allowed to run.* Pre-emptive and Non-preemptive scheduling.* rt_priority - many UNIX variants support real time scheduling priority range.

Priority RangeScheduling priorities (in a typical UNIX system)have integer valuesbetween 0 and 127, with smaller numbersmeaning higher priorities.* For Solaris: 0 to 169* For Linux: 0 to 139

Process Scheduling: Linux* The Linux kernel implements two separate priority ranges.* The first is the nice value, a number from -20 to 19 with a default of zero. Larger nice values correspond to a lower priority.* A process with a nice value of -20 receives the maximum time slice, whereas a process with a nice value of 19 receives the minimum time slice.* Time slice: minimum -10ms, default -150ms and maximum

© www.minhinc.com p53

Page 55: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

7. Process ManagementProcess DefinedProcess Defined

Process Descriptor Structures in the kernelProcess Descriptor Structures in the kernel

Process StatesProcess States

Process SchedulingProcess Scheduling

Process CreationProcess CreationSystem calls related to process managementSystem calls related to process management

- 300ms

* The second range is the real-time priority* By default, it ranges from zero to 99.* All real time processes are at a higher priority than normal processes.* Linux implements real-time priorities in accordance with POSIX.

* Linux provides two real-time scheduling policies, SCHED_FIFO and SCHED_RR* The normal non real-time scheduling policy is SCHED_OTHER* SCHED_FIFO implements without time slices- so it can run until it blocks or explicitly yields the processor.* SCHED_RR is identical to SCHED_FIFO except that each process can only run until it exhausts a predetermined time Slice.

Scheduler System Calls nice() Set a process's nice value sched_setscheduler() Set a process's scheduling policy sched_getscheduler() Get a process's scheduling policy sched_setparam() Set a process's real-time priority sched_getparam() Get a process's real-time priority sched_get_priority_max() Get the maximum real-time priority sched_get_priority_min() Get the minimum real-time priority sched_rr_get_interval() Get a process's timeslice value

Process CreationParent process creates children processes, which, in turn create other processes, forming a tree of processes.Resource sharing Parent and children share all resources. Children share subset of parent's resources. Parent and child share no resources.Execution Parent and children execute concurrently. Parent waits until children terminate.Address space Child duplicate of parent. Child has a program loaded into it.

fork()* pid_t fork (void); creates a new process.

© www.minhinc.com p54

Page 56: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

* All statements after the fork() system call in a program are executed by two processes - the original process that used fork(), plus the new process that is created by fork( ).main ( ) {printf (" Hello fork %d, fork ( ) ");}- Hello fork: 0- Hello fork: x ( > 0);- Hello fork: -1

Parent and Childif (!fork( )) {/* Child Code */}else {/* parent code */wait (0); /* or */waitpid(pid, ....);}

Zombie State and Orphan Process* When a child process exits, it has to give the exit status to the parent process.* If the parent process is busy or suspended then the child process will not be able to terminate.* Such state is called Zombie.* If parent exits before child, the child will become an orphan process and the init process (grand parent) will take care of the child process.

Copy on Write (COW)* Instead of copying the address space of the parent, UNIX uses the COW technique for economical use of the memory page.* The parent space is not copied, it can be shared by both the parent and the child process but the memory pages are marked as write protected.* If parent or child wants to modify the pages, then kernel copies the parent pages to the child process.* Advantage: Kernel can defer or prevent copying of a parent process address space.

execlTo run a new program in a process, you use one ofthe "exec" family ofcalls (such as "execl") and specify following:* the pathname of the program to run* the name of the program* each parameter to the program* (char *)0 or NULL as the last parameter to specify end of parameter list

exec Familyint execl (const char *path, const char *arg, .....);int execlp (const char *file, const char *arg);int execle (const char *path, const char *arg, ......., char *const envp[ ]);int execv (const char *path, char *const argv[ ]);int execvp (const char *file, char *const argv[ ]); All the above library functions call internally execve system call.int execve (const char *filename, char *const argv [ ] , char *const evnp [ ]);

© www.minhinc.com p55

Page 57: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Text Portion* User Context consists portions accessible to the process while running in user mode.* The text portion of a process contains the actual machine instructions that are executed by the hardware.* When a program is executed by the OS, the text portion is read into memory from its disk file, unless the OS supports shared text and a copy of program is already being executed.Data Portion* The data portion contains the program's data. It is possible for this to be divided into 3 pieces.* Initialized read only data contains elements that are initialized by the program and are read only while the process is executing.* Initialized read write data contains data elements that are initialized by the program and may have their values modified during execution of the process.

Stack Portion* Un-initialized data contains data elements that are not initialized by the program but are set to zero before execution starts .* The heap is used while a process is running to allocate more data space dynamically to the process.* The stack is used dynamically while the process is running to contain the stack frames that are used by many programming languages.

Kernel Context* The stack frames contain the return address linkage for each function call and also the data elements required by a function.* A gap is shown between heap and stack to indicate that many OS leave some room between these 2 portions, sothat both can grow dynamically.* The kernel context of a process is maintained and accessible only to the kernel. This area contains info that the kernel needs to keep track of the process and to stop and restart the process while other processes are allowed to execute.

Daemon ProcessIntroduction* Daemon process starts during system startup.* They frequently spawn other process to handle services requests.- Mostly started by initialization script /etc/rc* Waits for an event to occur.* perform some specified task on periodic basis (cron job)* perform the requested service and wait- Example print server

Characteristics* executed at the background process* Orphan process* No controlling terminal* run with super user privileges* process group leaders* session leaders

© www.minhinc.com p56

Page 58: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

7. Process ManagementProcess DefinedProcess Defined

Process Descriptor Structures in the kernelProcess Descriptor Structures in the kernel

Process StatesProcess States

Process SchedulingProcess Scheduling

Process CreationProcess Creation

System calls related to process managementSystem calls related to process management

How to daemonize1. Call umask to set the file mode creation mask to a known value, usually 0.2. Call fork and have the parent exit. Child inherits the process group ID of the parent but gets a new process ID, so we're guaranteed that the child is not a process group leader. This is a prerequisite for the call to setsid that is done next.3. Call setsid to create a new session. The three steps listed in Section 9.5 occur. The process (a) becomes the leader of a new session, (b) becomes the leader of a new process group, and (c) is disassociated from its controlling terminal.4. Change the current working directory to the root directory. The current working directory inherited from the parent could be on a mounted file system.5. Unneeded file descriptors should be closed. This prevents the daemon from holding open any descriptors that it may have inherited from its parent (which could be a shell or some other process).6. Some daemons open file descriptors 0, 1, and 2 to /dev/null so that any library routines that try to read from standard input or write to standard output or standard error will have no effect.

$ ps -axj #to get all daemon process, does not have terminal

#include "apue.h"#include <syslog.h>#include <fcntl.h>#include <sys/resource.h>voiddaemonize(const char *cmd){int i, fd0, fd1, fd2;pid_t pid;struct rlimit rl;struct sigaction sa;/** Clear file creation mask.*/umask(0);/** Get maximum number of file descriptors.*/if (getrlimit(RLIMIT_NOFILE, &rl) < 0) err_quit("%s: can't get file limit", cmd);/** Become a session leader to lose controlling TTY.*/if ((pid = fork()) < 0) err_quit("%s: can't fork", cmd);else if (pid != 0) /* parent */ exit(0);setsid();/** Ensure future opens won't allocate controlling TTYs.*/ sa.sa_handler = SIG_IGN; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; if (sigaction(SIGHUP, &sa, NULL) < 0) err_quit("%s: can't ignore SIGHUP", cmd); if ((pid = fork()) < 0) err_quit("%s: can't fork", cmd); else if (pid != 0) /* parent */ exit(0); /* * Change the current working directory to the root so * we won't prevent file systems from being unmounted. */ if (chdir("/") < 0) err_quit("%s: can't change directory to /", cmd); /* * Close all open file descriptors. */ if (rl.rlim_max == RLIM_INFINITY) rl.rlim_max = 1024; for (i = 0; i < rl.rlim_max; i++) close(i); /* * Attach file descriptors 0, 1, and 2 to /dev/null. */ fd0 = open("/dev/null", O_RDWR); fd1 = dup(0); fd2 = dup(0); /* * Initialize the log file. */ openlog(cmd, LOG_CONS, LOG_DAEMON); if (fd0 != 0 || fd1 != 1 || fd2 != 2) { syslog(LOG_ERR, "unexpected file descriptors %d %d %d", fd0, fd1, fd2); exit(1); }}

1. wait, waitpid#include <sys/types.h>#include <sys/wait.h>

pid_t wait(int *status);pid_t waitpid(pid_t pid, int *status, int options);int waitid(idtype_t idtype, id_t id, siginfo_t *infop, int options); wait, waitpid - wait for process to change state

© www.minhinc.com p57

Page 59: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

A state change is considered to be: the child terminated; the child was stopped by a signal; or the child was resumed by a1. wait, waitpid signal. In the case of a terminated child, performing a wait allows the system to release the resources associated with the child; if a wait is not performed, then termi nated the child remains in a "zombie" state.

If a child has already changed state, then these calls return immediately. Otherwise they block until either a child changes state or a signal handler interrupts the call (assuming that system calls are not automatically restarted using the SA_RESTART flag of sigaction(2)).

waitpid(-1, &status, 0);

The value of pid can be:< -1 meaning wait for any child process whose process group ID is equal to the absolute value of pid.-1 meaning wait for any child process.0 meaning wait for any child process whose process group ID is equal to that of the calling process.> 0 meaning wait for the child whose process ID is equal to the value of pid.

#include <stdio.h>int main () {int i=0,pid;printf ("Ready to fork");pid = fork();if (pid == 0){printf ("Child starts");for(i=0;i<1000;i++) printf ("%d\t",i);printf ("Child ends");sleep(30); uncomment this to get child orphaned process}else {Wait(0); //comment and sleep to get child as zombie processprintf ("Parent process");}}

2. exec#include <unistd.h>

extern char **environ;

int execl(const char *path, const char *arg, ...);int execlp(const char *file, const char *arg, ...);int execle(const char *path, const char *arg, ..., char * const envp[]);int execv(const char *path, char *const argv[]);int execvp(const char *file, char *const argv[]); execl, execlp, execle, execv, execvp - execute a fileexecl#include <stdio.h>int main (){int pid;pid = fork();if (pid == 0){ printf ("Exec starts"); execl("/bin/ls","ls","-l",(char *)0); printf ("Execl did not work");}else{ wait(0); printf ("Parent:Is completed in child"); }}execv#include <stdio.h>int main (){char *temp[4];temp[0] = "ls";temp[1] = "-l";temp[2] = (char *)0;

© www.minhinc.com p58

Page 60: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

execv("/bin/ls",temp);printf ("This will not print");}

© www.minhinc.com p59

Page 61: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

8. Memory ManagementDefining and Creating secondary memoryDefining and Creating secondary memoryareasareasMemory allocation & deallocation systemMemory allocation & deallocation systemcalls malloc, calloc, alloca, freecalls malloc, calloc, alloca, free

Virtual Memory ManagementVirtual Memory Management

Address Translation and page faultAddress Translation and page faulthandlinghandling

Demand Paging definedDemand Paging defined

Process Organization in MemoryProcess Organization in Memory

Factors to be considered while designing secondary memory

Latency, Throughput and BandwidthLatency - : Amount of time for a single operation to execute.Throughput - Rate at which operations get executed. Normally expressed as Operations/second. In sequential processing throughput = 1 /latencyBandwidth - : Total rate at which data moves between processor and memory. Product of throughput and datawidth

Pipelining, Parallelism and Pre-chargingMemory systems can be pipelined similar to the processors are pipelined, allowing operations to overlap execution to improve throughput.Many memory technologies require a certain delay (idle time ) between operations to pre-charge circuitry for the next access.Attaching multiple memories to the processor's memory bus allows parallelism. This increases the rate at which memory is accessed without increasing the pin count of the processor.

Two kinds of systems that support parallelism - Replicated & Banked.Replicated provides multiple copies of entire memory. Store needs to write into all copies( more expensive than loads ).Banked memory - Data is divided or interleaved across memories.

Example:What is the bandwidth of a memory system with a latency of 40 ns that transfers 1 byte per operation and is pipelined to allow 4 operations to overlap execution (assume no pipelining overhead ) ?

Dividing latency 40 ns by number of overlapped operations ( 4 ) gives a rate of 1 operation per 10 ns as the throughput of the memory system. At 1 byte of data per operation, this gives a bandwidth of 100 Mbyte/sec.

© www.minhinc.com p60

Page 62: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Levels in the Memory HierarchyCache :1. Generally implemented using SRAM.2. Use hardware to keep track of addresses stored in them.3. Tend to be small ( capacity ).4. Small Block Sizes ( 32 to 128 bytes ).

Main Memory:1. Generally implemented using DRAM.2. Use software to keep track of addresses.3. Larger capacity ( Few MB to several Gigabytes ).4. Larger Block Sizes ( several kilobytes ).

Virtual Memory:1. Implemented using disks.2. Contains all of the data in the memory system.

Some terminology...Hit : When an address is found at a given hierarchy.Miss: When an address is NOT found at a given hierarchy.Hit Rate: % of references that reach a given level & result in hits.Miss Rate: % of references that reach a given level & result in misses.Note: Hit Rate + Miss Rate = 100% ALWAYS.

When a miss occurs, a BLOCK of data is brought in from a lower level into the current level of the hierarchy. As time progresses, the current level may fill up, and run out of free space. A block must be removed to accommodate the new block. This is called eviction or replacement. The method to decide on what block to remove is called replacement policy.To simplify evicting data blocks, many memory systems maintain a property called inclusion. The presence of an address at a given level of a memory hierarchy GUARANTEES that the address is present in ALL LOWER LEVELS of the memory system.

Computing average access times in a memory hierarchy...

If we know the hit-rate and access-time ( time to complete a request that hits ) for each level in the hierarchy, we can compute average access time of the memory hierarchy. For each level in the hierarchy, the average access time is( T hit x P hit ) + ( T miss x P miss )Where T hit = Time to resolve requests that hit in the levelP hit = Hi-rate of the level, expressed as a probability.T miss = Average access time of the level below this one. rate of the level.P miss = MissNote that Hit-rate of the lowest level is 100%, we start at the bottom and compute the average access time of each level upwards in the hierarchy.

© www.minhinc.com p61

Page 63: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Example:A memory system contains a cache, a DRAM and a Virtual Store. The access time of the cache is 5 ns with a hit-rate of 80%, whereas the access time of the DRAM is 100 ns with a 99.5 % hit-rate. The access time of the virtual store is 10 ms. What is the average access time of the hierarchy ?We start at the bottom and work upwards:The hit-rate of Virtual store is always 100%.Average access time for requests that reach DRAM= ( 100 ns x 0.995 ) + ( 10 ms x 0.005 ) = 50,099.5 nsThe average access time for requests that reach the cache( which is ALL REQUESTS !!)= ( 5 ns x 0.80 ) + ( 50,099.5 ns x 0.20 ) = 10,024 ns

SRAM and DRAM ChipsThese have the same basic structure ( shown in next slide )Data is stored in rectangular array of bit cells, each holding 1 bit. To read data from the array, half of the address to be read ( generally high order bits) is fed into a decoder. The decoder asserts (drives high) the word line corresponding to the value of its input bits, which causes all of the bit cells in the corresponding row to drive their values onto bit lines that they are connected to.

The other half of the address is then used as an input to a multiplexer that selects theappropriate bit line and drives its output onto the output pins of the chip.

To store data on the chip, the same process is used, except the value to be written isdriven on appropriate bit line and written into the selected bit cell.

© www.minhinc.com p62

Page 64: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

8. Memory ManagementDefining and Creating secondary memoryDefining and Creating secondary memoryareasareas

Memory allocation & deallocation systemMemory allocation & deallocation systemcalls malloc, calloc, alloca, freecalls malloc, calloc, alloca, freeVirtual Memory ManagementVirtual Memory Management

Address Translation and page faultAddress Translation and page faulthandlinghandling

Demand Paging definedDemand Paging defined

Process Organization in MemoryProcess Organization in Memory

Day 3 Morning

8. Memory ManagementDefining and Creating secondary memoryDefining and Creating secondary memoryareasareas

Memory allocation & deallocation systemMemory allocation & deallocation systemcalls malloc, calloc, alloca, freecalls malloc, calloc, alloca, free

Virtual Memory ManagementVirtual Memory ManagementAddress Translation and page faultAddress Translation and page faulthandlinghandling

Demand Paging definedDemand Paging defined

Process Organization in MemoryProcess Organization in Memory

#include <stdlib.h>

void *malloc(size_t size);void free(void *ptr);void *calloc(size_t nmemb, size_t size);void *realloc(void *ptr, size_t size);

The malloc() function allocates size bytes and returns a pointer to the allocated memory. The memory is not initialized. If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().

The free() function frees the memory space pointed to by ptr, which must have been returned by a previous call to malloc(), calloc(), or realloc().

The calloc() function allocates memory for an array of nmemb elements of size bytes each and returns a pointer to the allocated memory. The memory is set to zero. If nmemb or size is 0, then calloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().

The realloc() function changes the size of the memory block pointed to by ptr to size bytes. The contents will be unchanged in the range from the start of the region up to the minimum of the old and new sizes.

#include <alloca.h>void *alloca(size_t size);

DESCRIPTIONThe alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.

Virtual MemoryEach program has a virtual address space which is the set of addresses that programs use for load and store operations.

The physical address space is the set of addresses used to reference locations in main memory.

The virtual address space is divided into pages some of which reside inside a page frame ( slots in main memory ) while others reside on the disk. Pages are always aligned on a multiple of the page size so that the addresses never overlap.

The terms virtual page and physical page are used to describe a page of data in the virtual and physical address spaces respectively.

Pages that have been loaded into main memory are said to have been mapped.

Virtual memory allows a computer to act as if its main memory were much larger than it actually is.

© www.minhinc.com p63

Page 65: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

8. Memory ManagementDefining and Creating secondary memoryDefining and Creating secondary memoryareasareas

Memory allocation & deallocation systemMemory allocation & deallocation systemcalls malloc, calloc, alloca, freecalls malloc, calloc, alloca, free

Virtual Memory ManagementVirtual Memory Management

Address Translation and page faultAddress Translation and page faulthandlinghandlingDemand Paging definedDemand Paging defined

Process Organization in MemoryProcess Organization in Memory

When a program references a virtual address, it cannot tell, except by timing the latency of the operation, whether the virtual address was resident in the main memory or whether it had to be fetched from disk.

This makes it possible for the computer to shuffle pages in and out of the main memory exactly like data is brought in and out of the cache.

Address TranslationPrograms running on systems with Virtual Memory use Virtual Addresses as the arguments to load and store instructions.

The main memory uses Physical Addresses to record locations where data is actually stored.

Whenever a program uses a Virtual Address, this must be converted into a Physical Address and this process is known as Address Translation.

When a program accesses a memory location, the O.S accesses a Page Table, which is a data structure that contains the mapping of the virtual address to the physical address.

If the virtual page is mapped ( present in memory ) then the physical address is retrieved and the operation proceeds.

If the virtual page is NOT mapped, then a page fault occurs and the O.S fetches the page from the hard disk, loading it into a page frame, and updating the page table with the new translation. Once the page has been read into memory from disk, and the page table updated, the physical address of the page can be determined and the memory reference completed.

If all the page frames already contain data, one of them must be evicted to the disk to make room for the incoming data. The replacement policies used to select the page that is evicted are similar to the ones for set-associative caches.

Because both virtual and physical pages are always aligned on a multiple of their size, the page table does not need to keep track of the full virtual or physical address of a page that is mapped. Instead virtual addresses are divided into a Virtual Page Number or VPN and a set of bits that describe an offset from the start of the virtual page to the virtual address. Similarly, the physical pages are divided into Physical Page Numbers or PPN and an offset

© www.minhinc.com p64

Page 66: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Because both virtual and physical pages are always aligned on a multiple of their size, the page table does not need to keep track of the full virtual or physical address of a page that is mapped. Instead virtual addresses are divided into a Virtual Page Number or VPN and a set of bits that describe an offset from the start of the virtual page to the virtual address. Similarly, the physical pages are divided into Physical Page Numbers or PPN and an offset from the start of the physical page to the physical address.

The virtual and physical pages in a given system are generally the same size, so the number of bits(log 2 of the page size) for the offset of the virtual and physical addresses are the same.

The VPN and PPN may be of different lengths. For example, on 64-bit systems, the virtual addresses are generally much longer than physical addresses.

The page table is accessed using the virtual page frame number as an offset.

Virtual page frame 5 would be the 6th element of the table (0 is the first element).

To translate a virtual address into a physical one, the processor must first work outthe virtual addresses page frame number and the offset within that virtual page. Bymaking the page size a power of 2 this can be easily done by masking and shifting.Assuming a page size of 0x2000 bytes (which is decimal 8192) and an address of 0x2194 in process Y's virtual address space then the processor would translate that address into offset 0x194 into virtual page frame number 1.

© www.minhinc.com p65

Page 67: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

V Valid, if set this PTE is valid,FOE "Fault on Execute", Whenever an attempt to execute instructions in this pageoccurs, the processor reports a page fault and passes control to the operating system,

FOW "Fault on Write", as above but page fault on an attempt to write to this page,FOR "Fault on Read", as above but page fault on an attempt to read from this page,ASM Address Space Match. This is used when the operating system wishes to clear only some of the entries from the Translation Buffer,KRE Code running in kernel mode can read this page,URE Code running in user mode can read this page,GH Granularity hint used when mapping an entire block with a single Translation. Buffer entry rather than many,KWE Code running in kernel mode can write to this page,UWE Code running in user mode can write to this page,

page frame number For PTEs with the V bit set, this field contains the physical Page Frame Number (page frame number) for this PTE. For invalid PTEs, if this field is not zero, it contains information about where the page is in the swap file.The following two bits are defined and used by Linux:PAGE DIRTY if set, the page needs to be written out to the swap file,PAGE ACCESSED Used by Linux to mark a page as having been accessed.

TLB, Translation Lookaside BuffersA major disadvantage of using page tables is that a page table must be accessed for every memory reference. On a system with a single-level page table, this doubles the number of memory accesses, since each load or store operation requires one memory reference to access the appropriate page table and one to perform the actual load/store. This greatly increases the latency of a memory reference.

The problem is even greater on multi-level page tables, because multiple references are required to traverse the page table. To reduce penalty, CPUs that incorporate virtual memory use Translation Looaside Buffers ( TLBs) that act as caches for the page table. Whenever a program performs a memory reference the virtual address is sentto the TLB to determine if it contains a translation for that address. If so, the TLB returns the physical address and the memory reference continues.

If not, a TLB miss occurs and the system searches the page table for a translation. Some systems provide hardwaresupport for a TLB miss while others require the OS to access the page table thru software.

TLB misses versus Page Faults© www.minhinc.com p66

Page 68: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

In a system that supports TLBs, 3 possible cases exist:

Day 3 Morning

8. Memory ManagementDefining and Creating secondary memoryDefining and Creating secondary memoryareasareas

Memory allocation & deallocation systemMemory allocation & deallocation systemcalls malloc, calloc, alloca, freecalls malloc, calloc, alloca, free

Virtual Memory ManagementVirtual Memory Management

Address Translation and page faultAddress Translation and page faulthandlinghandling

Demand Paging definedDemand Paging definedProcess Organization in MemoryProcess Organization in Memory

1. Hit in the TLB : The TLB contains the physical address and it is returned immediately.2. TLB miss, but page mapped : In this case the system accesses the page table from memory to find the translation for the virtual address, copies that translation into TLB returns the memory reference3. TLB miss and page not mapped: The system accesses the page table and finds that its is not mapped. This results in a page fault. The O.S loads the page's data from disk in the same manner as a virtual memory system that does not contain TLB.

TLB misses and page faults are handled very differently by the O.S because of the difference in the amount of time it takes to resolve each event.

TLB misses generally take a short time to resolve if the page is mapped and normally takes a few hundred cycles so user programs can just wait for its completion.

TLB misses that result in a page fault can take a few milliseconds which is the amount of time slice generally given to a process. Therefore, a page fault can trigger a context switch through invoking the scheduler while the page fault is being resolved.

TLB EntryTLBs are organized similar to caches having an associativity and number of sets. While cache sizes are typically described in bytes, TLBs are in number of entres or translations contained in them, since the amount of space taken up by each entry is mostly irrelevant to the performance of the system.

This a 128-entry, 4-way set-associative TLB would have 32 sets each containing 4 entries.

The TLB entry contains the VPN of the page that it is a translation for, which is compared to the VPN of the address of a memory reference to determine if a hit has occurred.

Like a cache's tag array entry, bits of the VPN used to select an entry in the TLB are omitted to save space. All the bits of the PPN are stored however, since they may differ from the corresponding bits in the VPN.

Demand PagingAs there is much less physical memory than virtual memory the operating system must be careful that it does not use the physical memory inefficiently. One way to save physical memory is to only load virtual pages that are currently being used by the executing program.

This technique of only loading virtual pages into memoryas they are accessed is known as demand paging.

© www.minhinc.com p67

Page 69: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

When a process attempts to access a virtual address that is not currently in memory the processor cannot find a page table entry for the virtual page referenced. For Example in previous figure there is no entry in process X's page table for virtual page frame number 2 and so if process X attempts to read from an address within virtual page frame number 2 the processor cannot translate the address into a physical one. At this point the processor notifies the operating system that a page fault has Occurred.

If the faulting virtual address is invalid this means that the process has attempted to access a virtual address that it should not have. Maybe the application has gone wrong in some way, for example writing to random addresses in memory. In this case the operating system will terminate it, protecting the other processes in the system from this rogue process.

If the faulting virtual address was valid but the page that it refers to is not currently in memory, the operating system must bring the appropriate page into memory from the image on disk.

The fetched page is written into a free physical page frame and an entry for the virtual page frame number is added to the processes page table. The process is then restarted at the machine instruction where the memory fault occurred. This time the virtual memory access is made, the processor can make the virtual to physical address translation and so the process continues to run.

Linux uses demand paging to load executable images into a processes virtual memory. Whenever a command is executed, the file containing it is opened and its contents are mapped into the processes virtual memory. This is done by modifying the data structures describing this processes memory map and is known as memory mapping.

However, only the first part of the image is actually brought into physical memory. The rest of the image is left on disk. As the image executes, it generates page faults and Linux uses the processes memory map in order to determine which parts of the image to bring into memory for execution.

© www.minhinc.com p68

Page 70: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 3 Morning

8. Memory ManagementDefining and Creating secondary memoryDefining and Creating secondary memoryareasareas

Memory allocation & deallocation systemMemory allocation & deallocation systemcalls malloc, calloc, alloca, freecalls malloc, calloc, alloca, free

Virtual Memory ManagementVirtual Memory Management

Address Translation and page faultAddress Translation and page faulthandlinghandling

Demand Paging definedDemand Paging defined

Process Organization in MemoryProcess Organization in Memory

Process Address Space

© www.minhinc.com p69

Page 71: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 4 Morning

9. Multi Thread ProgrammingCreating multiple threadsCreating multiple threadsParent synchronization with other ThreadParent synchronization with other Thread

System callsSystem calls

Introduction* Thread is a sequential flow of control through a program.* If a process is defined as a program in execution then a thread is defined as a function in execution.* If a thread is created, it will execute a specified function.* Two type of threading:- Single Threading- Multi threading

POSIX ThreadThe created threads within a process shareinstructions of a processprocess address space and dataopen file descriptorspwd, uid and gid

The created threads maintain its own:thread identification number (tid)pc, sp, set of registersstackSignal Handlers priority of the threads scheduling policy

Advantages of Threads:Takes less time for:* Creation of a new thread* Termination of a thread* Communication between threads are easier.

There are two broad categories of threadimplementation:1. User level Threads (ULT)2. Kernel level threads (or kernel-supported threads or Light weight processes)

Thread managementThread management is done by the application and the kernel is not aware of the existence of threads.* Thread library contains code for creating and destroying threads, passing messages and data between threads, for scheduling thread execution and for saving and restoringthread contexts.* This thread application are allocated to a single process managed by the kernel.* All the activity takes place in user space and within a single process. The kernel continues to schedule the process as a unit and assigns a single execution state to that process.

ULTAdvantages:* Thread switching does not require kernel mode.* Scheduling can be application specific.* Can run on any OS.Disadvantages:* When it executes a system call, not only is that thread isblocked, but all the threads within the process are blocked.

KLTKernel Level Threads:* Thread management is done by the kernel- Advantage: If one thread in a process is blocked, kernel can schedule another thread of the same process.

© www.minhinc.com p70

Page 72: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 4 Morning

9. Multi Thread ProgrammingCreating multiple threadsCreating multiple threads

Parent synchronization with other ThreadParent synchronization with other ThreadSystem callsSystem calls

Day 4 Morning

9. Multi Thread ProgrammingCreating multiple threadsCreating multiple threads

Parent synchronization with other ThreadParent synchronization with other Thread

System callsSystem calls

- Disadvantage: Transfer of control from one thread to another within the same process requires a mode switch to the kernel

Advantages of Multi ThreadingImprove application responsivenessUse multiprocessors more efficientlyImprove program structureuse fewer system resourcesSpecific applications in uniprocessor machinesApplications A file server on a LAN Graphical User Interfaces (GUIs) web applications

Parent wait on join() system call to let children join themHello Thread Example#include <pthread.h>void thread_function (void) {printf (" Hello POSIX Thread");printf ("Thread id: %d", pthread_self());}main ( ) {pthread_t mythread;pthread_create ( &mythread, NULL, thread_function, NULL);pthread_join (mythread, NULL);}$cc thread.c -lpthread

1. pthread_create

#include <pthread.h>

int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void*), void *restrict arg);

The pthread_create() function shall create a new thread, with attributes specified by attr, within a process. If attr is NULL, the default attributes shall be used. If the attributes specified by attr are modified later, the thread's attributes shall not be affected. Upon successful completion, pthread_create() shall store the ID of the created thread in the location referenced by thread.

#include<stdio.h>#include<unistd.h>#include<stdlib.h>#include<pthread.h>#include<string.h>

void *thread_fun(void *arg);

char message[]="hello world";

int main(){int res;pthread_t a_thread;void *thread_result;

res=pthread_create(&a_thread,NULL,thread_fun,(void *)message);if(res !=0){ perror("unable to create thread"); exit(1);}printf("waiting for thread to finish");//Thread joining, catch exit value from the thread res=pthread_join(a_thread,&thread_result);

if(res !=0){ perror("unable to join thread");

© www.minhinc.com p71

Page 73: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

exit(1);}

printf("thread joined , it returned %s",(char *)thread_result);printf("Message is now %s",message);exit(0);}

void *thread_fun(void *arg){printf("thread fun ,arg is %s",(char *)arg);sleep(3);strcpy(message,"bye");//exit with return valuepthread_exit("thank you");}

2. pthread_key_create#include <pthread.h>

int pthread_key_create(pthread_key_t *key, void (*destructor)(void*)); pthread_key_create - thread-specific data key creation

The pthread_key_create() function shall create a thread-specific data key visible to all threads in the process. Key values provided by pthread_key_create() are opaque objects used to locate thread-specific data. Although the same key value may be used by different threads, the values bound to the key by pthread_setspecific() are maintained on a per-thread basis and persist for the life of the calling thread.

Upon key creation, the value NULL shall be associated with the new key in all active threads. Upon thread creation, the value NULL shall be associated with all defined keys in the new thread.

#include <malloc.h>#include <pthread.h>#include <stdio.h>#include<stdlib.h>#include<unistd.h>

/* The key used to associate a log file pointer with each thread. */static pthread_key_t thread_log_key;

/* Write MESSAGE to the log file for the current thread. */

void write_to_thread_log (const char* message){FILE* thread_log = (FILE*) pthread_getspecific (thread_log_key);fprintf (thread_log, "%s", message);}

/* Close the log file pointer THREAD_LOG. */void close_thread_log (void* thread_log){fclose ((FILE*) thread_log);}

void* thread_function (void* args){char thread_log_filename[20];FILE* thread_log;/* Generate the filename for this thread's log file. */sprintf (thread_log_filename, "thread%d.log", (int) pthread_self ());/* Open the log file. */thread_log = fopen (thread_log_filename, "w");/* Store the file pointer in thread-specific data under thread_log_key. */pthread_setspecific (thread_log_key, thread_log);write_to_thread_log ("Thread starting.");/* Do work here... */return NULL;}main (){int i;pthread_t threads[5];

/* Create a key to associate thread log file pointers inthread-specific data. Use close_thread_log to clean up the filepointers. */

pthread_key_create (&thread_log_key, close_thread_log);

/* Create threads to do the work. */for (i = 0; i < 5; ++i)pthread_create (&(threads[i]), NULL, thread_function, NULL);

/* Wait for all threads to finish. */

for (i = 0; i < 5; ++i)pthread_join (threads[i], NULL);return 0;}

3. pthread_mutex_init#include <pthread.h>

int pthread_mutex_destroy(pthread_mutex_t *mutex);int pthread_mutex_init(pthread_mutex_t *restrict mutex, const pthread_mutexattr_t *restrict attr);pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

The pthread_mutex_destroy() function shall destroy the mutex object referenced by mutex; the mutex object becomes, in effect, uninitialized. An implementation may cause pthread_mutex_destroy() to set the object referenced by mutex to an invalid value. A destroyed mutex object can be reinitialized using pthread_mutex_init(); the results of oth erwise referencing the object after it has been destroyed are undefined.It shall be safe to destroy an initialized mutex that is unlocked. Attempting to destroy a locked mutex results in undefined behavior.

© www.minhinc.com p72

Page 74: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

The pthread_mutex_init() function shall initialize the mutex referenced by mutex with attributes specified by attr. If attr is NULL, the default mutex attributes are used; the effect shall be the same as passing the address of a default mutex attributes object. Upon successful initialization, the state of the mutex becomes initialized and unlocked.

#include<stdio.h>#include<unistd.h>#include<stdlib.h>#include<string.h>#include<pthread.h>#include<semaphore.h>

void *thread_fun(void *arg);

pthread_mutex_t work_mutex;

char work_area[1024];int time_to_exit=0;int main(){int res;pthread_t a_thread;void *thread_result; res=pthread_mutex_init(&work_mutex,NULL);//initialize mutex default attrres=pthread_create(&a_thread,NULL,thread_fun,NULL);pthread_mutex_lock(&work_mutex); //put a lock to the main thread, then enjoy printf("input some text enter end to finish");while(!time_to_exit) { fgets(work_area,1024,stdin);

//unlock the main thread,your subordinate is waiting pthread_mutex_unlock(&work_mutex); while(1){ pthread_mutex_lock(&work_mutex);//lock it is your turn if(work_area[0] != '\0') { pthread_mutex_unlock(&work_mutex); sleep(1); } else break;

}}

pthread_mutex_unlock(&work_mutex);printf("waiting for thread to finish");res=pthread_join(a_thread,&thread_result);printf("thread joined , it returned %s",(char *)thread_result);pthread_mutex_destroy(&work_mutex);exit(0);}

void *thread_fun(void *arg){sleep(1);//Sleep well Let main thread send some datapthread_mutex_lock(&work_mutex);//lock the curr threadwhile(strncmp("end",work_area,3) !=0){ printf("you entered %d characters",strlen(work_area) -1); work_area[0]='\0'; pthread_mutex_unlock(&work_mutex);//unlock the current thread sleep(1);//Sleep well , Let main thread do it's job pthread_mutex_lock(&work_mutex); while(work_area[0] == '\0') { pthread_mutex_unlock(&work_mutex); sleep(1); pthread_mutex_lock(&work_mutex); }}time_to_exit=1;work_area[0]='\0'; pthread_mutex_unlock(&work_mutex);

pthread_exit("thank you");

}//End of the function

© www.minhinc.com p73

Page 75: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 4 Morning

10. Inter process communicationPipesPipesFifo'sFifo's

SignalsSignals

System-V IPC'sSystem-V IPC's - Message queues - Message queues

- Shared memory - Shared memory

- Semaphore - Semaphore

Persistence of various ipcs

Unnamed Pipe or Pipe

On command line pipe is represented as "|"* It can be used in the shell to link two or more commands- For example ls -Rl | wc* Two ends of a pipe is represented as a set of two descriptors.* A pipe is used to communicate between related Processes (common ancestor). Normally, a pipe is created by a process, that process calls fork, and the pipe is used between the parent and the child.

* Half duplex* Data is passed in order.* Pipe uses circular buffer and it has zero buffering capacity* The read and write system calls are blocking calls.

#include <unistd.h>int fd[2];

int pipe(int fd[2]);

© www.minhinc.com p74

Page 76: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

One Way Communication between parent and child

Create a pipe.* Call fork.* Parent can send data and child can read the data or vice versa.* Unused ends (descriptors) should be closed.

parent closes the read end of the pipe (fd[0]), and the child closes the write end (fd[1]).

© www.minhinc.com p75

Page 77: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Two way Communication* Create two pipes say fd1, fd2.* Four descriptors for each process (fd1[0],fd1[1],fd2[0],fd2[1])* Parent closes read end of fd1 and write end of fd2- close(fd1[0], fd2[1]);* child closes read end of fd2 and write end of fd1- close(fd2[0], fd1[1]);

Pipe : Advantages & DisadvantagesAdvantages:* Simplest form of IPC* Persistence in process level* Can be used in shell

Disadvantages:* Cannot be used to communicate between unrelated processes

popen and pclose Functions

The function popen does a fork and exec to execute the cmdstring and returns a standard I/O file pointer. If type is "r", the file pointer is connected to the standard output of cmdstring.If type is "w", the file pointer is connected to the standard input of cmdstring.

#include <stdio.h>FILE *popen(const char *cmdstring, const char *type); Returns: file pointer if OK, NULL on errorint pclose(FILE *fp); Returns: termination status of cmdstring, or -1 on error

Result of fp = popen(cmdstring, "r")

SIMPLEX PIPE

© www.minhinc.com p76

Page 78: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 4 Morning

10. Inter process communicationPipesPipes

Fifo'sFifo'sSignalsSignals

System-V IPC'sSystem-V IPC's - Message queues - Message queues

- Shared memory - Shared memory

- Semaphore - Semaphore

#include<stdio.h>#include<stdlib.h>#include<unistd.h>main() {int pipefd[2],n;char buff[100];

if(pipe(pipefd) < 0) //create a pipe with two descriptors perror("filed in openning pipe");printf("read fd = %d, write fd = %d",pipefd[0],pipefd[1]);

//write into the pipe's write decriptorif(write(pipefd[1],"hello world.....!",18)!= 18) perror("filed in writing pipe");

//read from the pipe's read decriptorif((n = read(pipefd[0],buff,sizeof(buff))) < 0) perror("filed in writing pipe");

write(1 , buff, n); //write to the stdoutexit(0);}

DUPLEX PIPE

#include <stdio.h>#include<stdlib.h>#include<unistd.h>#include <string.h># define MAXBUF 1024void client(int readfd, int writefd) {char buff[MAXBUF];int n;puts("Enter file name");scanf("%s",buff);n = strlen(buff);if(buff[n-1] == '') n--;if(write(writefd,buff, n) !=n) perror("client: write error");while((n = read(readfd,buff,MAXBUF)) > 0){ if(write(1,buff,n)!= n) perror("client: error");}if(n < 0) perror("Client: write error");}

void server(int readfd,int writefd) {char buff[MAXBUF];int n, fd;if((n = read(readfd, buff, MAXBUF)) <= 0) perror("server: read error");buff[n] = '\0';if((fd = open(buff,0)) < 0) perror("server:open error");while((n = read(fd,buff,MAXBUF)) > 0) if(write(writefd,buff,n)!= n) perror("server: write error"); if(n < 0) perror("server : read error");}

main() {int pipefd1[2], pipefd2[2], childfd, n;char buff[100];if(pipe(pipefd1) < 0 || pipe(pipefd2) < 0) perror("filed in openning pipes");if((childfd = fork()) < 0){perror("can't fork");close(pipefd1[0]);close(pipefd1[1]);close(pipefd2[0]);close(pipefd2[1]);}else if(childfd > 0){ //Parent processclose(pipefd1[0]); //read1close(pipefd2[1]); //write2client(pipefd2[0],pipefd1[1]);while(wait(( int *) 0)!= childfd);close(pipefd1[1]);close(pipefd2[0]);} else { // child processclose(pipefd1[1]); // write1close(pipefd2[0]); // read2server(pipefd1[0],pipefd2[1]);close(pipefd1[0]);close(pipefd2[1]);}exit(0);}

FIFO: Introduction* FIFO works much like a pipe -Half duplex, data passed in FIFO order, circular bufferand zero buffering capacity.

© www.minhinc.com p77

Page 79: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

* FIFO is created on a file system as a devicespecial file* It can be used to communicate between unrelatedprocesses* It can be reused.* Persist till the file is deleted.

FIFO Creation* FIFO can be created in a shell by using mknod ormkfifo command. - mknod myfifo p - mkfifo a=rw myfifo* In a C program mknod system call or mkfifo library function can be used. - int mkfifo ( char *file_name, mode_t mode); - int mknod (char *file_name, mode_t mode, dev_t dev);* mknod("./MYFIFO", S_IFIFO|0666, 0);

Using FIFO* Once a FIFO is created either from a shell or through a program, file's related system calls (open, read, write, select, close etc., ) are used to access the FIFO.* For example: Process 1 may open a FIFO in write only mode and write some data.* Process 2 may open the FIFO in read only mode, read the data and display on the monitor.

FIFO: Disadvantages* Data cannot be broadcast to multiple receivers.* If there are multiple receivers, there is no way to direct to a specific reader or vice versa.* Cannot be used across network* Less secure than a pipe, since any process with valid access permission can access data.* Cannot store data* No message boundaries. Data is treated as a stream of Bytes.

#include <stdio.h>#include<stdio.h>#include<stdlib.h>#include <string.h>

# define FIFO1 "/tmp/fifo1" //fifos can be created in users home# define FIFO2 "/tmp/fifo2" //directory also.

# define MAXBUF 1024

void client(int readfd, int writefd) {char buff[MAXBUF];int n;puts("Enter file name");scanf("%s",buff); //reading file namen = strlen(buff);if(buff[n-1] == '') n--;if(write(writefd,buff, n) !=n) //writing file name into fifoperror("client: write error");while((n = read(readfd,buff,MAXBUF)) > 0) if(write(1,buff,n)!= n) perror("client: error");if(n < 0) perror("Client: write error");}

void server(int readfd,int writefd) {char buff[MAXBUF];int n, fd;if((n = read(readfd, buff, MAXBUF)) <= 0) perror("server: read error");buff[n] = '\0';if((fd = open(buff,0)) < 0) perror("server:open error");while((n = read(fd,buff,MAXBUF)) > 0) if(write(writefd,buff,n)!= n) perror("server: write error");if(n < 0) perror("server : read error");}

main() {int readfd, writefd ,pid;//fifo is created with user read and write //permission.if((mkfifo(FIFO1, 0666)) < 0){perror("Fifo1 failed");exit(1);}if((mkfifo(FIFO2, 0666)) < 0){perror("Fifo1 failed");exit(2);}if((pid = fork()) == 0){readfd = open(FIFO1, 0, 0);//child opens fifo1 for readwritefd = open(FIFO2, 1, 0);//child opens fifo2 for write//child process calls server functionserver(readfd, writefd);exit(3);}writefd = open(FIFO1, 1, 0);readfd = open(FIFO2, 0, 0);//Parent becomes client processclient(readfd, writefd);

//parent wait till exit status returned is equal to pid(current child)waitpid(pid,NULL,0);close(readfd);close(writefd);

© www.minhinc.com p78

Page 80: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 4 Morning

10. Inter process communicationPipesPipes

Fifo'sFifo's

SignalsSignalsSystem-V IPC'sSystem-V IPC's - Message queues - Message queues

- Shared memory - Shared memory

- Semaphore - Semaphore

unlink(FIFO1); //removing fifo from /tmpunlink(FIFO2);exit(0);}

The common communication channel between user space program and kernel is given by the system calls.But there is a different channel, that of the signals, used both between user processes and from kernel to user process.

Sending SignalsA program can signal a different program using the kill() system call with prototypeint kill(pid_t pid, int sig);This will send the signal with number sig to the process with process ID pid . Signal numbers are small positive integers.

Receiving signalstypedef void (*sighandler_t)(int);sighandler_t signal(int sig, sighandler_t handler);

Signal Value Action Comment-------------------------------------------------SIGHUP 1 Term Hangup detected on controlling terminal or death of controlling processSIGINT 2 Term Interrupt from keyboardSIGQUIT 3 Core Quit from keyboardSIGILL 4 Core Illegal InstructionSIGABRT 6 Core Abort signal from abort(3)SIGFPE 8 Core Floating point exceptionSIGKILL 9 Term Kill signalSIGSEGV 11 Core Invalid memory referenceSIGPIPE 13 Term Broken pipe: write to pipe with noreadersSIGALRM 14 Term Timer signal from alarm(2)SIGTERM 15 Term Termination signalSIGUSR1 30,10,16 Term User-defined signal 1SIGUSR2 31,12,17 Term User-defined signal 2SIGCHLD 20,17,18 Ign Child stopped or terminatedSIGCONT 19,18,25 Cont Continue if stoppedSIGSTOP 17,19,23 Stop Stop processSIGTSTP 18,20,24 Stop Stop typed at terminalSIGTTIN 21,21,26 Stop Terminal input for background processSIGTTOU 22,22,27 Stop Terminal output for background process

The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

Signals not in the POSIX.1-1990 standard but described in SUSv2 and POSIX.1-2001.Signal Value Action Comment--------------------------------------------------------------SIGBUS 10,7,10 Core Bus error (bad memory access)SIGPOLL Term Pollable event (Sys V).Synonym for SIGIOSIGPROF 27,27,29 Term Profiling timer expiredSIGSYS 12,31,12 Core Bad argument to routine (SVr4)SIGTRAP 5 Core Trace/breakpoint trapSIGURG 16,23,21 Ign Urgent condition on socket (4.2BSD)SIGVTALRM 26,26,28 Term Virtual alarm clock (4.2BSD)SIGXCPU 24,24,30 Core CPU time limit exceeded (4.2BSD)SIGXFSZ 25,25,31 Core File size limit exceeded (4.2BSD)

various other signals.Signal Value Action Comment-------------------------------------------------SIGIOT 6 Core IOT trap. A synonym for SIGABRTSIGEMT 7,-,7 TermSIGSTKFLT -,16,- Term Stack fault on coprocessor (unused)SIGIO 23,29,22 Term I/O now possible (4.2BSD)SIGCLD -,-,18 Ign A synonym for SIGCHLDSIGPWR 29,30,19 Term Power failure (System V)SIGINFO 29,-,- A synonym for SIGPWRSIGLOST -,-,- Term File lock lost (unused)SIGWINCH 28,28,20 Ign Window resize signal (4.3BSD, Sun)SIGUNUSED -,31,- Core Synonymous with SIGSYS

© www.minhinc.com p79

Page 81: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Blocking signalsEach process has a list (bitmask) of currently blocked signals. When a signal is blocked, it is not delivered (that is, no signal handling routine is called), but remains pending.The sigprocmask() system call serves to change the list of blocked signals. See sigprocmask(2).The sigpending() system call reveals what signals are (blocked and) pending.The sigsuspend() system call suspends the calling process until a specified signal is received.When a signal is blocked, it remains pending, even when otherwise the process would ignore it.

wait and SIGCHLDWhenever the child (it exits, crashes, traps, stops, continues), and in particularwhen it dies, the parent is sent a SIGCHLD signal. If parent handles it then

The parent can use the system call wait() or waitpid() or so, there are a few variations, to learn about the status of its stopped or deceased children. In the case of a deceased child, as soon as a status has been reported, the zombie vanishes.

If the parent is not interested it can say so explicitly (before the fork) using

signal(SIGCHLD, SIG_IGN);orstruct sigaction act;act.sa_handler = something;act.sa_flags = SA_NOCLDWAIT;sigaction (SIGCHLD, &act, NULL);

and as a result it will not hear about deceased children, and children will not be transformed into zombies. Default action for SIGCHLD is to ignore the signal but it would create zombie child process.

Returning from a signal handlerWhen the program was interrupted by a signal, its status (including all integer and floating point registers) was saved, to be restored just before execution continues at the point of interruption.This means that the return from the signal handler is more complicated than an arbitrary procedure return - the saved state must be restored.To this end, the kernel arranges that the return from the signal handler causes a jump

# include <stdio.h># include <signal.h># include <unistd.h>void sig_fun(int);main() {struct sigaction signalact;signalact.sa_handler = sig_fun;sigemptyset(&signalact.sa_mask);signalact.sa_flags =0;sigaction(SIGINT, &signalact, 0);while(1){ printf("hello world"); sleep(1);}}void sig_fun(int signal) { printf("Hi, I got signal: %d",signal);}

SIGCHLD# include <signal.h>void sig_init(void);main() { unsigned int pid, i; if((pid = fork()) == 0) sleep(1); else { signal(SIGCHLD,sig_init); for(i=0;i < 1000000000;i++) ; printf("parent exiting"); }}void sig_init(void){ printf("child terminated");}

© www.minhinc.com p80

Page 82: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 4 Morning

10. Inter process communicationPipesPipes

Fifo'sFifo's

SignalsSignals

System-V IPC'sSystem-V IPC's- Message queues- Message queues - Shared memory - Shared memory

- Semaphore - Semaphore

SIGUSER

#include<stdio.h>#include<signal.h>static void sighandler(int);int main(void) {int i,parentpid,childpid,status;/*prepare the sighandler routine to catch SIGUSR1 and SIGUSR2 */if(signal(SIGUSR1,sighandler)==SIG_ERR) printf("Parent:Unable to create handler for SIGUSR1");parentpid=getpid();if((childpid=fork())==0) { kill(parentpid,SIGUSR1);/* raise the SIGUSR1 signal*/ printf("Hi,child, I am here .............!

"); if(signal(SIGUSR2,sighandler)==SIG_ERR) printf("Child:Unable to create handler for SIGUSR2"); /*Child Process begins busy-wait for a signal*/ printf("child,waiting for singnal"); pause(); //sleep(4); printf("child done %d",getpid());}else { kill(childpid,SIGUSR2);/* raise the SIGUSR2 signal*/ printf("Parent:waiting for child to terminate....."); //sleep(1); wait(&status);/*Parent waiting for the child termination*/ //kill(parentpid,SIGTERM);/*Parent raising the SIGTERM signal*/ printf("parent done %d",getpid());}}static void sighandler(int signo) {switch(signo){ case SIGUSR1:/* Incoming SIGUSR1 signal*/ printf("Parent:Recieved SIGUSR1"); break; case SIGUSR2:/*Incoming SIGUSR2 signal*/ printf("Recieved SIGUSR2"); break; default: printf("This should not be printed");} return;}

Introduction* Sys V IPC is implemented as a single unit.* System V IPC Provides three mechanisms namely: - Message Queues - Shared Memory - Semaphores* Persist till explicitly delete or reboot the system.

Common AttributesEach IPC objects has the following attributes. key id Owner Permission Size - Message queue - used-bytes, number of messages - Shared memory - size, number of attach, status - Semaphore - number of semaphores in a set - The ipc_perm structure holds the common attributes of the resources.

System Limitations$ ipcs -l------ Shared Memory Limits --------max number of segments = 4096max seg size (kbytes) = 32768max total shared memory (kbytes) = 8388608min seg size (bytes) = 1------ Semaphore Limits --------max number of arrays = 128max semaphores per array = 250max semaphores system wide = 32000max ops per semop call = 32semaphore max value = 32767------ Messages: Limits --------

© www.minhinc.com p81

Page 83: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

max queues system wide = 16max size of message (bytes) = 8192default max size of queue (bytes) = 16384

Get a Key* If we wish to communicate between different processes using an IPC resource, the first step is to create a shared unique identifier.* The simplest form of the identifier is a number - the system generates this number dynamically for a given mechanism by using the ftok library function.* But apart from the creator, other processes that want to communicate with the creator process should agree to the key value.* Syntax: key_t ftok (const char *filename, int id);

Get an idThe syntax for a get function is: int xxxget (key_t key, int xxxflg); (xxx may be msg or shm or sem)If successful, returns to an identifier; otherwise -1 for error.The key can be generated in three different ways - from the ftok library function - by choosing some static positive integer value - by using the IPC_PRIVATE macroflags commonly used with this function are IPC_CREAT and IPC_EXCL.

Control an ObjectThe syntax for the control function is: int xxxctl (int xxxid, int cmd, struct xxxid_ds *buffer); (xxx may be msg or shm or sem);If successful, the xxxctl function returns zero, otherwise it returns -1.The command argument may beIPC_STATIPC_SETIPC_RMID

Message Queues* Message queue overcomes FIFO limitation like storing data and setting message boundaries.* Create a message queue* Send message (s) to the queue* Any process who has permission to access the queue can retrieve message (s).* Remove the message queue.

Each queue has the following msqid_ds structure associated with it:

struct msqid_ds {struct ipc_perm msg_perm;msgqnum_t msg_qnum; /*# of messages on queue */msglen_t msg_qbytes; /*max # of bytes on queue */pid_t msg_lspid; /*pid of last msgsnd() */pid_t msg_lrpid; /*pid of last msgrcv() */time_t msg_stime; /*last-msgsnd() time */time_t msg_rtime; /*last-msgrcv() time */ime_t msg_ctime; /*last-change time */....};

msgget* int msgget (key_t key, int msgflg);* The first argument key can be passed from the return value of the ftok function or made IPC_PRIVATE.* To create a message queue, IPC_CREAT ORed with access permission is set for the msgflg argument.* Ex: msgid = msgget (key, IPC_CREAT | 0744);msgid = msgget (key, 0);

msgsnd* The syntax of the function is:* int msgsnd (int msqid, structu msgbuf *msgp, size_t msgsz, int msgflg);* Arguments:- message queue ID- address of the structure.- size of the message text- message flag* 0 or IPC_NOWAITstruct mymesg {

© www.minhinc.com p82

Page 84: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

#include<sys/ipc.h>#include<sys/types.h>#include<sys/msg.h>#include<unistd.h>#include<stdlib.h>#include<stdio.h>struct message{long mtype;char mtext[50];};main(){struct message m1;int msgid;if((msgid=msgget(1,0666|IPC_CREAT))==-1) { perror("msgget"); exit(1);}m1.mtype=getpid();printf("Process id of the current process is:%ld",getpid());printf("Enter the message you want to send to the queue");fgets(m1.mtext,50,stdin);if((n=msgsnd(msgid,&m1,50,0))==-1) { perror("msgsnd"); exit(1);}printf("Message successfully sent");}

Message receive testr.c#include<sys/ipc.h>#include<sys/types.h>#include<sys/msg.h>#include<unistd.h>#include<stdlib.h>#include<stdio.h>struct message {

long mtype; char mtext[50];};

main() {struct message m1;int msgid;if((msgid=msgget(1,0666|IPC_CREAT))==-1) { perror("msgget"); exit(1);}if(msgrcv(msgid,&m1,10,0,MSG_NOERROR)==-1) { perror("msgsnd"); exit(1);}printf("Message received from the process whose pid is:%ld",m1.mtype);printf("And the message is:%s",m1.mtext);}

Message control testc.c#include<sys/ipc.h>#include<sys/types.h>#include<sys/msg.h>#include<unistd.h>#include<stdlib.h>#include<stdio.h>main(){int msgid;if((msgid=msgget(1,0))==-1) { perror("msgget"); exit(1);}if(msgctl(msgid,IPC_RMID,0)==-1) { perror("msgctl"); exit(1);}printf("Message queue successfully deleted");}

long mtype;/* positive message type */char mtext[512]; /* message data, of length nbytes */};

msgrcv Syntax of the function is:ssize_t msgrcv (int msqid, struct msgbuf *msgp, size_t msgsz, long msgtype, int msgflg);

msgtype argument is used to retrieve a particularmessage. 0 -retrieve in FIFO order +ve - retrieve the the exact value of the message type -ve - first message or <= to the absolute value. on success, msgrcv returns with the number of bytes actually copied into the message text

Destroying a Message QueueThere are many ways:* From command line, using one of the ways- $ ipcrm msg msqid- $ ipcrm -q msqid- $ ipcrm -Q msgkey* Using system call- msgctl (msgid, IPC_RMID, 0);

Message Queue: Pseudo Codekey = ftok (".", 'a');msqid = msgget (key, IPC_CREAT|0666);msgsnd (msqid, &struct, sizeof (struct), 0);msgrcv (msqid, &struct, sizeof (struct), mtype, 0);msgctl (msqid, IPC_RMID, NULL);$ipcrm msg msqid

Limitations* Message queues are effective if a small amount of data is transferred.* Very expensive for large transfers.* During message sending and receiving, the message is copied from user buffer into kernel buffer and vice versa* So each message transfer involves two data copy operations, which results in poor performance of a system.* A message in a queue can not be reusedMessage send tests.c

© www.minhinc.com p83

Page 85: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 4 Morning

10. Inter process communicationPipesPipes

Fifo'sFifo's

SignalsSignals

System-V IPC'sSystem-V IPC's - Message queues - Message queues

- Shared memory- Shared memory - Semaphore - Semaphore

Shared Memory* Very flexible and ease of use.* Fastest IPC mechanisms* shared memory is used to provide access to Global variable Shared libraries Word processors Multi-player gaming environment Http daemons Other programs written in languages like Perl, C etc.,

Shared Memory: Data StructuresThe data structures used in shared memory are * shmid_ds * ipc_perm * Shminfo * shm_info * shmid_kernel

ipc_perm Structurestruct ipc_perm {__key_t __key; - Key__uid_t uid - Owner's user ID__gid_t gid; - Owner's group ID__uid_t cuid; - Creator's user ID__gid_t cgid; - Creator's group IDunsigned short int mode; - r/w permission unsigned short int__seq; - Sequence number};

shmid_dsstruct shmid_ds{struct ipc_perm shm_perm;size_t shm_segsz;__time_t shm_atime;__time_t shm_dtime;__time_t shm_ctime;__pid_t shm_cpid;__pid_t shm_lpid;shmatt_t shm_nattch;};

Steps to Access Shared MemoryThe steps involved are:

© www.minhinc.com p84

Page 86: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

#include<sys/ipc.h>#include<sys/shm.h>#include<stdio.h>#include <stdlib.h>#include <string.h>main(){int shmid,pos;char *msg;

if((shmid=shmget(110,1024,IPC_CREAT|0666))==-1) { perror("shmget"); exit(1);}

msg=shmat(shmid,0,0);

printf("Enter the data you want to write into shared memory");fgets(msg,1024,stdin);pos = strlen(msg);strcpy(msg+pos-1,"World");printf("Data successfully written");

shmdt(msg);}

Shared memory read#include<sys/ipc.h>#include<sys/shm.h>#include<stdio.h>#include <stdlib.h>#include <string.h>main()

{int shmid;char *msg;

if((shmid=shmget(110,1024,0666|IPC_CREAT))==-1) { perror("shmget"); //get shrdmry id exit(1);}

msg=shmat(shmid,0,0);printf("Data written in the shared memory is:%s",msg);

shmdt(msg); //to detach the memory location for further use}

Shared memory control#include<sys/ipc.h>#include<sys/shm.h>#include<stdio.h>#include <stdlib.h>

main(){int shmid;

if((shmid=shmget(110,0,0))==-1) //110 is key{ perror("shmid"); exit(1);}

if(shmctl(shmid,IPC_RMID,0)==-1) { perror("shmctl"); exit(1);

* Creating shared memory* Connecting to the memory & obtaining a pointer to the memory* Reading/Writing & changing access mode to the memory* Detaching from memory* Deleting the shared segmentshmat* Used to attach the created shared memory segment onto a process address space.* void *shmat(int shmid,void *shmaddr,int shmflg)* Example: data=shmat(shmid,(void *)0,0);* A pointer is returned on the successful execution of the system call and the process can read or write to the segment using the pointer.

Reading / Writing to Shared Memory* Reading or writing to a shared memory is the easiestpart.* The data is written on to the shared memory as we do itwith normal memory using the pointers* Eg. Read:printf("SHM contents : %s", data);* Eg. Write:prinf("Enter a String : ");scanf(" %[^]",data);

shmdt and shmctl* The detachment of an attached shared memory segment is done by shmdt to pass the address of the pointer as an argument.* Syntax: int shmdt(void *shmaddr);* To remove shared memory call:int shmctl(shmid,IPC_RMID,NULL);* These functions return -1 on error and 0 on successful execution.

Shared Memory: Pseudo Code* shmid = shmget (key, 1024, IPC_CREAT|0744);* void *shmat (int shmid, void *shmaddr, int shmflg); if the shm is read only pass SHM_RDONLY else 0* (void *)data = shmat (shmid, (void *)0, 0);* int shmdt (void *shmaddr);* int shmctl (shmid, IPC_RMID, NULL);

Limitations* Data can either be read or written only. Append is not allowed.* Race condition- Since many processes can access the shared memory, any modification done by one process in the address space is visible to all other processes. Since the address space is a shared resource, the developer should implement a proper locking mechanism to prevent the race condition in the shared memory.

Shared memory create

© www.minhinc.com p85

Page 87: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

}printf("Shared memory successfully removed");}

Day 4 Morning

10. Inter process communicationPipesPipes

Fifo'sFifo's

SignalsSignals

System-V IPC'sSystem-V IPC's - Message queues - Message queues

- Shared memory - Shared memory

- Semaphore- Semaphore

Semaphores* If a process wants to use the shared object, it will "lock" it by asking the semaphore to decrement the counter* Depending upon the current value of the counter, the semaphore will either be able to carry out this operation, or will have to wait until the operation becomes possible* The current value of counter is >0, the decrement operation will be possible. Otherwise, the process will have to wait

System V IPC: Semaphores* System V semaphore provides a semaphore set- that can include a number of semaphores. It is up to user to decide the number of semaphores in the set.* Each semaphore in the set can be a binary or a counting semaphore. Each semaphore can be used to control access to one resource - by changing the value of semaphore count.

Semaphore: Initializationunion semun {int val;// value for SETVALstruct semid_ds *buf; // buffer for IPC_STAT, IPC_SETunsigned short int *array; // array for GETALL, SETALL};union semun arg;semid = semget (key, 1, IPC_CREAT | 0644);arg.val = 1; /* 1 for binary else > 1 for Counting Semaphore */semctl (semid, 0, SETVAL, arg);

Semaphore: Implementationstruct sembuf {short sem_num; /* semaphore number: 0 means first */short sem_op; /* semaphore operation: lock or unlock */short sem_flg; /* operation flags : 0, SEM_UNDO, IPC_NOWAIT */};struct sembuf buf = {0, -1, 0}; /* (-1 + previous value) */semid = semget (key, 1, 0);semop (semid, &buf, 1); /* locked */-----Critical section--------buf.sem_op = 1;semop (semid, &buf, 1); /* unlocked */

# include <sys/types.h># include <sys/sem.h># include <sys/ipc.h># include <stdio.h># include<pthread.h># include<unistd.h>union semun{ int val; struct semid_ds *buf; unsigned short array;

struct seminfo *__buff;};

void * th_fun(void *);

union semun u;int sid;key_t key;int pid, sid;struct sembuf su, sl;

main(){pthread_t t1, t2, t3, t4;

unsigned short int key;key = ftok("semaphore.c",100);sid = semget(key,1,IPC_CREAT | 0666);printf("semaphore created by %d",getpid());u.val = 2;semctl(sid,0,SETVAL,u);printf("Semaphore initialized to %d",u.val);

pid = getpid();

sl.sem_num = 0;sl.sem_op = -1;sl.sem_flg = SEM_UNDO ;su = sl;su.sem_op = 1;

pthread_create(&t1, NULL, th_fun,"Thread One");pthread_create(&t2, NULL, th_fun,"Thread two");pthread_create(&t3, NULL, th_fun,"Thread three");pthread_create(&t4, NULL, th_fun,"Thread four");

pthread_join(t1,NULL);pthread_join(t2,NULL);pthread_join(t3,NULL);pthread_join(t4,NULL);

//semctl(sid,0,IPC_RMID);

© www.minhinc.com p86

Page 88: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

printf("Semaphore removed");}void * th_fun(void *p){char *str;int i = 0;

str = (char * )p;printf("%s is Trying to lock semaphore %d

",str, pid);if(semop(sid,&sl,1) == 0) printf("%s Succedd in LOck %d

",str,pid);

while(++i < 3) { printf("%s Resourec use here %d

",str,pid); sleep(6);}semop(sid,&su,1);printf("%s Unlock and Bye %d

",str,pid);}

© www.minhinc.com p87

Page 89: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 4 Morning

11. SocketsAn OverviewAn OverviewSystem calls related toSystem calls related to - TCP - TCP

-UDP-UDP

A socket is an abstraction of a communication endpoint. Just as they would use file descriptors to access files, applications use socket descriptors to access sockets. Socket descriptors are implemented as file descriptors in the UNIX System. Indeed, many of the functions that deal with file descriptors, such as read and write, will work with a socket descriptor.

To create a socket, we call the socket function.

#include <sys/socket.h>int socket(int domain, int type, int protocol);

Returns: file (socket) descriptor if OK, -1 on error

Domain

Type

Protocol

© www.minhinc.com p88

Page 90: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

socket() call is similar to open() system call.slose - deallocates the socketdup, dup2 - duplicates the file descriptor as normalfchdir - fails with errno set to ENOTDIRfchmod - unspecifiedfchown - implementation definedfcntl -some commands supported, including F_DUPFD, F_DUPFD_CLOEXEC, F_GETFD, F_GETFL, F_GETOWN, F_SETFD, F_SETFL, and F_SETOWNfdatasync, fsync - implementation definedfstat - some stat structure members supported, but how left up to the implementationftruncate - unspecifiedioctl - some commands work, depending on underlying device driverlseek - implementation defined (usually fails with errno set to ESPIPE)mmap - unspecifiedpoll - works as expectedpread and pwrite - fails with errno set to ESPIPEread and readv - equivalent to recv without any flagsselect - works as expectedwrite and writev - equivalent to send without any flags

#include <sys/socket.h>int shutdown(int sockfd, int how);

If how is SHUT_RD, then reading from the socket is disabled. If how is SHUT_WR, then we can't use the socket for transmitting data. We can use SHUT_RDWR to disable both data transmission and reception.

Given that we can close a socket, why is shutdown needed? There are several reasons. First, close will deallocate the network endpoint only when the last active reference is closed. If we duplicate the socket (with dup, for example), the socket won't be deallocated until we close the last file descriptor referring to it. The shutdown function allows us to deactivate a socket independently of the number of active file descriptors referencing it. Second, it is sometimes convenient to shut a socket down in one direction only. For example, we can shut a socket down for writing if we want the process we are communicating with to be able to tell when we are done transmitting data, while still allowing us to use the socket to receive data sent to us by the process.

Byte OrderingThe TCP/IP protocol suite uses big-endian byte order.

#include <arpa/inet.h>uint32_t htonl(uint32_t hostint32); Returns: 32-bit integer in network byte orderuint16_t htons(uint16_t hostint16); Returns: 16-bit integer in network byte orderuint32_t ntohl(uint32_t netint32); Returns: 32-bit integer in host byte orderuint16_t ntohs(uint16_t netint16); Returns: 16-bit integer in host byte order

© www.minhinc.com p89

Page 91: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

struct sockaddr_in {sa_family_t sin_family; /* address family */in_port_t sin_port; /* port number */struct in_addr sin_addr; /* IPv4 address */};

inet_ntop - network to presentation

#include <arpa/inet.h>const char *inet_ntop(int domain, const void *restrict addr,char *restrict str, socklen_t size);Returns: pointer to address string on success, NULL on errorint inet_pton(int domain, const char *restrict str,void *restrict addr);Returns: 1 on success, 0 if the format is invalid, or -1 on error

Address Look UpTo iterate or set the network configuration on the machine

#include <netdb.h>struct hostent *gethostent(void); Returns: pointer if OK, NULL on errorvoid sethostent(int stayopen);void endhostent(void);

struct hostent {char *h_name;char **h_aliases;int h_addrtype;int h_length;char **h_addr_list;.};

DNSgethostbyname and gethostbyaddr() are obselete against following api

#include <netdb.h>struct netent *getnetbyaddr(uint32_t net, int type);struct netent *getnetbyname(const char *name);struct netent *getnetent(void);

All return: pointer if OK, NULL on error

void setnetent(int stayopen);void endnetent(void);

The netent structure contains at least the following fields:

struct netent {char n_name; /*network name */char **n_aliases; /*alternate network name array pointer */int n_addrtype; /*address type */uint32_t n_net; /*network number */..};

We can map between protocol names and numbers with the following functions.

#include <netdb.h>struct protoent *getprotobyname(const char *name);struct protoent *getprotobynumber(int proto);struct protoent *getprotoent(void);

All return: pointer if OK, NULL on error

void setprotoent(int stayopen);void endprotoent(void);

The protoent structure as defined by POSIX.1 has at least the following members:

struct protoent {char *p_name; /* protocol name */char **p_aliases; /* pointer to alternate protocol name array */int p_proto;/* protocol number */..};

© www.minhinc.com p90

Page 92: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 4 Morning

11. SocketsAn OverviewAn Overview

System calls related toSystem calls related to- TCP- TCP-UDP-UDP

Services are represented by the port number portion of the address. Each service is offered on a unique, well-known port number. We can map a service name to a port number with getservbyname, map a port number to a service name with getservbyport, or scan the services database sequentially with getservent.

#include <netdb.h>struct servent *getservbyname(const char *name, const char *proto);struct servent *getservbyport(int port, const char *proto);struct servent *getservent(void);

All return: pointer if OK, NULL on error

void setservent(int stayopen);void endservent(void);

The servent structure is defined to have at least the following members:

struct servent {char *s_name;char **s_aliases;int s_port;char *s_proto;..};

#include <sys/socket.h>int bind(int sockfd, const struct sockaddr *addr, socklen_t len);

Returns: 0 if OK, -1 on error

* sockfd - the socket file descriptor returned by socket().* addr - a pointer to a struct sockaddr that contains information about IP address and port number.* len - set to sizeof (struct sockaddr)

int connect (int sockfd, struct sockaddr *serv_addr, int addrlen);* sockfd - the socket file descriptor returned by socket().* serv_addr - is a struct sockaddr containing the destination port and IP address.* addrlen - set to sizeof (struct sockaddr).

int listen (int sockfd,int backlog);* sockfd - the socket file descriptor returned by socket().* backlog - the number of connections allowed on the incoming queue.* Backlog should never be zero as servers always expect connection from client.* The listen function converts an unconnected socket into a passive socket.* On successful execution of listen is indicating that the kernel should accept incoming connection requests directed to this socket.

int accept (int sockfd, void *addr, int *addrlen);

© www.minhinc.com p91

Page 93: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

#include<stdio.h>#include<sys/types.h>#include<sys/socket.h>#include<netinet/in.h>#include<arpa/inet.h>#define MYPORT 1034main(){int pid,sd,nsd,dat,yes=1;char message[40];socklen_t length;struct sockaddr_in server,client;

if((sd=socket(PF_INET,SOCK_STREAM,0))==-1) { perror("socket"); exit(1);}server.sin_port=htons(MYPORT);server.sin_family=PF_INET;server.sin_addr.s_addr=inet_addr("192.168.2.20"); if(bind(sd,(struct sockaddr *)&server,sizeof(server))==-1) { perror("bind"); exit(1);}if(listen(sd,1)==-1) { perror("listen"); exit(1);

}

/*A child process is created for accepting connections*/printf("Waiting for connection.............");pid=fork();while(1){ if(pid==0) { if((nsd=accept(sd,(struct sockaddr *)&client,&length))==-1) { perror("accept"); exit(1); } printf("Got connection from client:%s",inet_ntoa(client.sin_addr)); /*else fragment is the parent process taking care of send and receive to clients*/ if((dat=recv(nsd,message,40,0))==-1) { perror("recv"); exit(1); } message[dat]='\0'; printf("Data received is : %s",message); printf("Enter the data you want to send to client"); fgets(message,40,stdin); send(nsd,message,40,0);

sockfd - the socket file descriptor returned by socket().addr - a pointer to a struct sockaddr_in. The information about the incoming connection like IP address and port number are stored.addrlen - a local integer variable that should be set to sizeof (structsockaddr_in) before its address is passed to accept().

close (sockfd);* Close system call prevents any more reads and writes to the socket. For attempting to read or write the socket on the remote end will receive an error.

int shutdown (int sockfd, int how);sockfd - socket file descriptor of the socket to be shutdown.how - if it is 0 - Further receives are disallowed 1 - Further sends are disallowed 2 - Further sends and receives are disallowed.The shutdown system call gives more control (than close (sockfd) over how the socket descriptor can be closed.

Typical server code

struct sockaddr_in serv, cli;sd = socket (AF_INET, SOCK_STREAM, 0);serv.sin_family = AF_INET;serv.sin_addr.s_addr = INADDR_ANY;serv.sin_port = htons (portno);bind (sd, &serv, sizeof (serv));listen (sd, 5);nsd = accept (sd, &cli, &sizeof (cli));read / write (nsd, ....);

Typical Client codestruct sockaddr_in serv;sd =socket(AF_INET,SOCK_STREM, 0);serv.sin_family = AF_INET;serv.sin_addr.s_addr = inet_addr("ser ip");serv.sin_port = htons (portno);connect (sd, &server, sizeof (server));read / write (sd, ....);

Iterative ServerOne client request at a time.nsd = accept (sd, &cli,...);while (1) {read/write(nsd, ...);}

Concurrent ServerMany clients requests can be serviced concurrentlywhile (1) {nsd =(accept (sd, &cli, ....);if (!fork( )) {close(sd);read/write(nsd, .....);exit();} elseclose(nsd);}

/* This is a program which illustrates the concurrent server by creating a child process */

© www.minhinc.com p92

Page 94: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

} }//close(sd);}

Day 4 Morning

11. SocketsAn OverviewAn Overview

System calls related toSystem calls related to - TCP - TCP

-UDP-UDP

Client and Server both has to useinclude <sys/socket.h>ssize_t sendto(int sockfd, const void *buf, size_t nbytes, int flags, const struct sockaddr *destaddr, socklen_t destlen);Returns: number of bytes sent if OK, -1 on error

#include <sys/socket.h>ssize_t recvfrom(int sockfd, void *restrict buf, size_t len, int flags, struct sockaddr *restrict addr, socklen_t *restrict addrlen);

Returns: length of message in bytes, 0 if no messages are available and peer has done an orderly shutdown, or -1 on error

© www.minhinc.com p93

Page 95: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 5 Morning

12. Network ProgrammingTCP Server Client ProgrammingTCP Server Client ProgrammingUDP Server Client ProgrammingUDP Server Client Programming

UDP Server Cient ProgrammingUDP Server Cient Programming

Day 5 Morning

12. Network ProgrammingTCP Server Client ProgrammingTCP Server Client Programming

UDP Server Client ProgrammingUDP Server Client ProgrammingUDP Server Cient ProgrammingUDP Server Cient Programming

Tcpclient

#include<sys/socket.h>#include<sys/types.h>#include<netinet/in.h>#include<arpa/inet.h>#include<stdio.h>

#define PORT 1034

struct sockaddr_in server;main(){int n,sd,length;char msg[40];//length=sizeof(client);if((sd=socket(PF_INET,SOCK_STREAM,0))==-1) { perror("socket"); exit(1);}server.sin_family=PF_INET;server.sin_port=htons(PORT);//server.sin_addr.s_addr=inet_addr("192.168.1.2");server.sin_addr.s_addr=inet_addr("127.0.0.1");if(connect(sd,(struct sockaddr *)&server,sizeof(server))==-1) { perror("connect"); exit(1);}

printf("Enter the message you want to send to server");fgets(msg,40,stdin);send(sd,msg,40,0);printf("Waiting for message from server..............");n=recv(sd,msg,40,0);msg[n]='\0';

printf("Message received from server is:%s",msg);close(sd); }

Tcpserver

#include<stdio.h>#include<sys/types.h>#include<sys/socket.h>#include<netinet/in.h>#include<arpa/inet.h>#define MYPORT 1034main(){int sd,pid,nsd,dat,yes=1;char message[40];struct sockaddr_in server,client;socklen_t length;if((sd=socket(PF_INET,SOCK_STREAM,0))==-1) { perror("socket"); exit(1);}server.sin_port=htons(MYPORT);server.sin_family=PF_INET;//server.sin_addr.s_addr=inet_addr("192.168.1.2");server.sin_addr.s_addr=inet_addr("127.0.0.1");

/*if(setsockopt(sd,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof(int))==-1) { perror("setsockopt"); exit(1);}*/ if(bind(sd,(struct sockaddr *)&server,sizeof(server))==-1) { perror("bind"); exit(1);}

if(listen(sd,5)==-1) { perror("listen"); exit(1);}printf("Waiting for connection.............");if((nsd=accept(sd,(struct sockaddr *)&client,&length))==-1) { perror("accept"); exit(1);}printf("Got connection from client:%s",inet_ntoa(client.sin_addr));

if((dat=recv(nsd,message,40,0))==-1) { perror("recv"); exit(1);}message[dat]='\0';printf("Data received is : %s",message);printf("Enter the data you want to send to client");fgets(message,40,stdin);send(nsd,message,40,0); close(sd);}

udpclient#include<sys/socket.h>#include<sys/types.h>#include<netinet/in.h>#include<arpa/inet.h>#include<stdio.h>

© www.minhinc.com p94

Page 96: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 5 Morning

12. Network ProgrammingTCP Server Client ProgrammingTCP Server Client Programming

UDP Server Client ProgrammingUDP Server Client Programming

UDP Server Cient ProgrammingUDP Server Cient Programming

#include <stdlib.h>#define PORT 1034

main(){int n,sd,length;char msg[40];struct sockaddr_in server,client;if((sd=socket(PF_INET,SOCK_DGRAM,0))==-1) { perror("socket"); exit(1);}

client.sin_family=PF_INET;client.sin_port=htons(PORT);//client.sin_addr.s_addr=inet_addr("192.168.1.2");client.sin_addr.s_addr=inet_addr("127.0.0.1");

printf("Enter the message you want to send to server");fgets(msg,40,stdin);if(sendto(sd,msg,40,0,(struct sockaddr *)&client,sizeof(server))==-1) { perror("sendto"); exit(1);}printf("Waiting for message from server..............");length=sizeof(client);n=recvfrom(sd,msg,40,0,(struct sockaddr *)&server,&length);msg[n]='\0';

printf("Message received from server is:%s",msg);}

udpserver#include<stdio.h>#include<sys/types.h>#include<sys/socket.h>#include<netinet/in.h>#include<arpa/inet.h>#include<stdlib.h>#define MYPORT 1034

main(){int sd,nsd,dat,length,yes=1;char message[40];struct sockaddr_in server,client;

if((sd=socket(PF_INET,SOCK_DGRAM,0))==-1) { perror("socket"); exit(1);}server.sin_port=htons(MYPORT);server.sin_family=PF_INET;//server.sin_addr.s_addr=inet_addr("192.168.1.2");server.sin_addr.s_addr=inet_addr("127.0.0.1");

/*if(setsockopt(sd,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof(int))==-1) { perror("setsockopt"); exit(1);}*/

if(bind(sd,(struct sockaddr *)&server,sizeof(server))==-1) { perror("bind"); exit(1);}

length=sizeof(client);if((dat=recvfrom(sd,message,40,0,(struct sockaddr *)&client,&length))==-1) { perror("recvfrom"); exit(1);}

printf("Got connection from client:%s",inet_ntoa(client.sin_addr));

message[dat]='\0';

printf("Data received is : %s",message);printf("Enter the data you want to send to client");fgets(message,40,stdin);sendto(sd,message,40,0,(struct sockaddr *)&client,length);}

netlink - Communication between kernel and userspace (PF_NETLINK)

#include <asm/types.h>#include <sys/socket.h>#include <linux/netlink.h>

netlink_socket = socket(PF_NETLINK, socket_type, netlink_family);

Netlink is used to transfer information between kernel and userspace processes. It consists of a standard sockets-based interface for userspace processes and an internal kernel API for kernel modules.

Netlink is a datagram-oriented service. Both SOCK_RAW and SOCK_DGRAM are valid values for socket_type. However, the netlink protocol does not distinguish between datagram and raw sockets.

netlink_family selects the kernel module or netlink group to communicate with. The currently assigned netlink families are:

© www.minhinc.com p95

Page 97: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

NETLINK_ROUTE Receives routing and link updates and may be used to modify the routing tables (both IPv4 and IPv6), IP addresses, link parameters, neighbour setups, queueing disciplines, traffic classes and packet classifiers

NETLINK_W1 Messages from 1-wire subsystem.

Example creates a NETLINK_ROUTE netlink socket which will listen to the RTM-GRP_LINK (network interface create/delete/up/down events) and RTMGRP_IPV4_IFADDR (IPv4 addresses add/delete events) multicast groups.

struct sockaddr_nl sa;

memset (&sa, 0, sizeof(sa));snl.nl_family = AF_NETLINK;snl.nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR;

fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);bind(fd, (struct sockaddr*)&sa, sizeof(sa));

Example demonstrates how to send a netlink message to the kernel (pid 0). Notethat application must take care of message sequence numbers in order to reliably track acknowledgements.

struct nlmsghdr *nh; /* The nlmsghdr with payload to send. */struct sockaddr_nl sa;struct iovec iov = { (void *) nh, nh->nlmsg_len };struct msghdr msg;

msg = { (void *)&sa, sizeof(sa), &iov, 1, NULL, 0, 0 };memset (&sa, 0, sizeof(sa));sa.nl_family = AF_NETLINK;nh->nlmsg_pid = 0;nh->nlmsg_seq = ++sequence_number;/* Request an ack from kernel by setting NLM_F_ACK. */nh->nlmsg_flags |= NLM_F_ACK;

sendmsg (fd, &msg, 0);

And the last example is about reading netlink message.

int len;char buf[4096];struct iovec iov = { buf, sizeof(buf) };struct sockaddr_nl sa;struct msghdr msg;struct nlmsghdr *nh;

msg = { (void *)&sa, sizeof(sa), &iov, 1, NULL, 0, 0 };len = recvmsg (fd, &msg, 0);

for (nh = (struct nlmsghdr *) buf; NLMSG_OK (nh, len);nh = NLMSG_NEXT (nh, len)) {/* The end of multipart message. */if (nh->nlmsg_type == NLMSG_DONE) return;

if (nh->nlmsg_type == NLMSG_ERROR)/* Do some error handling. */...

/* Continue with parsing payload. */...}

© www.minhinc.com p96

Page 98: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 5 Morning

13. Programming and Debugging toolsstrace - Tracing system callsstrace - Tracing system callsltrace - Tracing library callsltrace - Tracing library calls

Tools used to detect memory accessTools used to detect memory accesserror; and memory leakage in linuxerror; and memory leakage in linuxmtracemtrace

Tracing Processes* strace command - trace system calls and signals - strace runs until the given command exits - It is a useful tool for diagnostic, instructional and debugging* ptrace system call - Process trace

Strace

#strace -c -e trace=file mkfifo -m 0744 myfifoexecve("/usr/bin/mkfifo", ["mkfifo", "-m", "0744", "myfifo"]) = 0

% time seconds us/call calls syscall------ ----------- ----------- --------- --------- ---------------- 47.62 0.000020 20 1 mknod 33.33 0.000014 4 4 open 11.90 0.000005 5 1 chmod 7.14 0.000003 1 3 fstat------ ----------- ----------- --------- --------- ----------------100.00 0.000042 9

1. Trace the Execution of an Executable$ strace lsexecve("/bin/ls", ["ls"], [/* 21 vars */]) = 0brk(0)access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)mmap2(NULL, 8192, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb78c7000access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)open("/etc/ld.so.cache", O_RDONLY) = 3fstat64(3, {st_mode=S_IFREG|0644, st_size=65354, ...}) = 0......2. Trace a Specific System Calls in an Executable Using Option -e$ strace -e open lsopen("/etc/ld.so.cache", O_RDONLY) = 3open("/lib/libselinux.so.1", O_RDONLY) = 3open("/lib/librt.so.1", O_RDONLY) = 3

open("/lib/libacl.so.1", O_RDONLY) = 3open("/lib/libc.so.6", O_RDONLY) = 3open("/lib/libdl.so.2", O_RDONLY) = 3open("/lib/libpthread.so.0", O_RDONLY) = 3open("/lib/libattr.so.1", O_RDONLY) = 3open("/proc/filesystems", O_RDONLY|O_LARGEFILE) = 3open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3

3. Execute Strace on a Running Linux Process Using Option -p$ strace -p 1725 -o output.txtattach: ptrace(PTRACE_ATTACH, ...): Operation not permittedCould not attach to process. If your uid matches the uid of the targetprocess, check the setting of /proc/sys/kernel/yama/ptrace_scope, or tryagain as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf

4. Print Relative Time for System Calls Using Option -r

Strace also has the option to print the execution time for each system calls as shown below.

$ strace -r ls0.000000 execve("/bin/ls", ["ls"], [/* 37 vars */]) = 00.000846 brk(0) = 0x84180000.000143 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)0.000163 mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb787b0000.000119 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)0.000123 open("/etc/ld.so.cache", O_RDONLY) = 30.000099 fstat64(3, {st_mode=S_IFREG|0644, st_size=67188, ...}) = 00.000155 mmap2(NULL, 67188, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb786a000...

© www.minhinc.com p97

Page 99: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 5 Morning

13. Programming and Debugging toolsstrace - Tracing system callsstrace - Tracing system calls

ltrace - Tracing library callsltrace - Tracing library callsTools used to detect memory accessTools used to detect memory accesserror; and memory leakage in linuxerror; and memory leakage in linuxmtracemtrace

Day 5 Morning

13. Programming and Debugging toolsstrace - Tracing system callsstrace - Tracing system calls

ltrace - Tracing library callsltrace - Tracing library calls

Tools used to detect memory accessTools used to detect memory accesserror; and memory leakage in linuxerror; and memory leakage in linuxmtracemtrace

ltrace' is another Linux Utility similar to 'strace'. However, ltrace lists all the library calls being called in an executable or a running process.

This tool is very useful for debugging user-space applications to determine which library call is failing.

It is also capable of receiving signals for segmentation faults, etc.

Assume the code1. #include <stdio.h>2. #include <unistd.h>3..4. int main()5. {6. FILE *fp = fopen("rfile.txt", "w+");7. fprintf(fp+1, "Invalid Write");8. fclose(fp);9. return 0;10. }

Lets compile and run it.Code:

x@ubuntu:~/source$ gcc file.c -Wall -o filex@ubuntu:~/source$./fileSegmentation fault (core dumped)

That is a segmentation fault. Lets use ltrace to debug and see what is happening.Code:

x@ubuntu:~/source$ltrace ./file __libc_start_main(0x8048454, 1, 0xbfc19db4, 0x80484c0, 0x8048530 <unfinished ...> fopen("rfile.txt", "w+") = 0x9160008 fwrite("Invalid Write", 1, 14, 0x916009c <unfinished ...> --- SIGSEGV (Segmentation fault) --- +++ killed by SIGSEGV +++

Mtrace, memory trace. Follow the steps to use it

1. Call mtrace() When Your Program Starts#include <stdio.h>#include <stdlib.h>#include <mcheck.h>int main() { char *string; mtrace(); string = malloc(100 * sizeof(char)); return 0;}

2. Compile Program with Debugging Options$gcc -g -o mtrace_test mtrace_test.c

3. Set MALLOC_TRACEFor bashexport MALLOC_TRACE="mtrace.out"

For C shell, it would be:© www.minhinc.com p98

Page 100: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

setenv MALLOC_TRACE mtrace.out4. Run The Program Once5. View The Datamtrace <prog name> <output log file name>mtrace mtrace_test mtrace.outAssuming the C code at the beginning was the code in mtrace_test.c, the following output would be produced:

Memory not freed:----------------- Address Size Caller0x0000000000501460 0x64 at /array/home/dcurrie/test/mtrace/mtrace_test.c:11

ValgrindFinding Memory Leaks With Valgrind

eample.cinclude <stdlib.h>int main(){ char *x = malloc(100); /* or, in C++, "char *x = new char[100] */ x[10] = 'a'; return 0;}$gcc example.c -o example

$valgrind --tool=memcheck --leak-check=yes example==2116== 100 bytes in 1 blocks are definitely lost in loss record 1 of 1==2116== at 0x1B900DD0: malloc (vg_replace_malloc.c:131)==2116== by 0x804840F: main (in /home/cprogram/example1)

Finding Invalid Pointer Use With Valgrindvalgrind --tool=memcheck --leak-check=yes example

results in the following warning

==9814== Invalid write of size 1==9814== at 0x804841E: main (example2.c:6)==9814== Address 0x1BA3607A is 0 bytes after a block of size 10 alloc'd==9814== at 0x1B900DD0: malloc (vg_replace_malloc.c:131)==9814== by 0x804840F: main (example2.c:5)

© www.minhinc.com p99

Page 101: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

Day 5 Morning

14. Device Driver IntroductionIntroductionIntroductionKernel modulesKernel modules

Character device driversCharacter device drivers

Block device driversBlock device drivers

Hardware and Interrupt HandlingHardware and Interrupt Handling

Day 5 Morning

14. Device Driver IntroductionIntroductionIntroduction

Kernel modulesKernel modulesCharacter device driversCharacter device drivers

Block device driversBlock device drivers

Hardware and Interrupt HandlingHardware and Interrupt Handling

Day 5 Morning

14. Device Driver IntroductionIntroductionIntroduction

Kernel modulesKernel modules

Character device driversCharacter device driversBlock device driversBlock device drivers

Hardware and Interrupt HandlingHardware and Interrupt Handling

Day 5 Morning

14. Device Driver IntroductionIntroductionIntroduction

Kernel modulesKernel modules

Character device driversCharacter device drivers

Block device driversBlock device driversHardware and Interrupt HandlingHardware and Interrupt Handling

Day 5 Morning

14. Device Driver IntroductionIntroductionIntroduction

Kernel modulesKernel modules

Character device driversCharacter device drivers

Block device driversBlock device drivers

Hardware and Interrupt HandlingHardware and Interrupt Handling

© www.minhinc.com p100

Page 102: Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -

www.minhinc.com©