Raoul Kernel Slides
-
Upload
nobin-mathew -
Category
Documents
-
view
229 -
download
0
Transcript of Raoul Kernel Slides
-
7/29/2019 Raoul Kernel Slides
1/25
Linux Kernel Networking
Raoul Rivas
-
7/29/2019 Raoul Kernel Slides
2/25
Kernel vs Application Programming
No memory protection
We share memory withdevices, scheduler
Sometimes no preemption
Can hog the CPU
Concurrency is difficult
No libraries
Printf, fopen No security descriptors
In Linux no access to files
Direct access to hardware
Memory Protection
Segmentation Fault
Preemption
Scheduling isn't ourresponsibility
Signals (Control-C)
Libraries
Security Descriptors
In Linux everything is a filedescriptor
Access to hardware as files
-
7/29/2019 Raoul Kernel Slides
3/25
Outline
User Space and Kernel Space Running Context in the Kernel
Locking
Deferring Work
Linux Network Architecture
Sockets, Families and Protocols
Packet Creation
Fragmentation and Routing
Data Link Layer and Packet Scheduling
High Performance Networking
-
7/29/2019 Raoul Kernel Slides
4/25
System Calls
A system call is an interrupt
syscall(number,arguments)
The kernel runs in adifferent address space
Data must be copied backand forth
copy_to_user(),copy_from_user()
Never directly dereferenceany pointer from user space
Kernel Space
User Space
Syscalltable
write(ptr, size);
ptr
syscall(WRITE, ptr, size)
sys_write()
Copy_from_user()INT0x80
0xFFFF50
0x011075
-
7/29/2019 Raoul Kernel Slides
5/25
Context
Context: Entity whom the kernel is running code on behalf of
Process context and Kernel Context are preemptible
Interrupts cannot sleep and should be small
They are all concurrent
Process context and Kernel context have a PID:
Struct task_struct* current
Kernel Context Process
Context
Interrupt
Context
Preemptible Yes Yes No
PID Itself Application PID No
Can Sleep? Yes Yes No
Example Kernel Thread System Call Timer Interrupt
-
7/29/2019 Raoul Kernel Slides
6/25
Race Conditions
Process context, Kernel Context and Interruptsrun concurrently
How to protect critical zones from race
conditions? Spinlocks
Mutex
Semaphores Reader-Writer Locks (Mutex, Semaphores)
Reader-Writer Spinlocks
-
7/29/2019 Raoul Kernel Slides
7/25
Inside Locking Primitives Spinlock
//spinlock_lock:disable_interrupts();while(locked==true);
//critical region//spinlock_unlock:enable_interrupts();locked=false;
Mutex
//mutex_lock:If (locked==true){
Enqueue(this);Yield();
}locked=true;
//critical region//mutex_unlock:If !isEmpty(waitqueue){wakeup(Dequeue());}Else locked=false;
We can't sleep while thespinlock is locked!
We can't use a mutex inan interrupt because
interrupts can't sleep!
THE MUTEX SLEEPSTHE SPINLOCK SPINS...
-
7/29/2019 Raoul Kernel Slides
8/25
When to use what?
Mutex SpinlockShort Lock Time
Long Lock Time
Interrupt Context
Sleeping
Usually functions that handle memory, user space ordevices and scheduling sleep
Kmalloc, printk, copy_to_user, schedule
wake_up_process does not sleep
-
7/29/2019 Raoul Kernel Slides
9/25
Linux Kernel Modules
Extensibility
Ideally you don't want topatch but build a kernelmodule
Separate Compilation
Runtime-Linkage
Entry and Exit Functions
Run in Process Context LKM Hello-World
#define MODULE
#define LINUX#define __KERNEL__
#include #include #include
static int __init myinit(void){printk(KERN_ALERT "Hello,
world\n");Return 0;
}
static void __exit myexit(void){printk(KERN_ALERT "Goodbye,
world\n");}
module_init(myinit);module_exit(myexit);MODULE_LICENSE("GPL");
-
7/29/2019 Raoul Kernel Slides
10/25
The Kernel Loop
The Linux kernel uses the concept ofjiffies to measure time
Inside the kernel there is a loop tomeasure time and preempt tasks
A jiffy is the period at which the timerin this loop is triggered
Varies from system to system 100Hz, 250 Hz, 1000 Hz.
Use the variable HZ to get thevalue.
The schedule function is thefunction that preempts tasks
schedule()
Timer1/HZ
add_timer(1 jiffy)jiffies++
scheduler_tick()
tick_periodic:
-
7/29/2019 Raoul Kernel Slides
11/25
Deferring Work / Two Halves
Kernel Timers are used to createtimed events
They use jiffies to measure time
Timers are interrupts
We can't do much in them!
Solution: Divide the work in twoparts
Use the timer handler to signal athread. (TOP HALF)
Let the kernel thread do thereal job. (BOTTOM HALF)
Timer
Timer Handler:wake_up(thread);
Thread:While(1){
Do work();Schedule();
}
Interruptcontext
Kernel
context
TOP HALF
BOTTOMHALF
-
7/29/2019 Raoul Kernel Slides
12/25
Linux Kernel Map
-
7/29/2019 Raoul Kernel Slides
13/25
Linux Network ArchitectureSocket Access
INET UNIX
VFS
Socket Splice
Protocol Families
NFS SMB iSCSI
Network Storage
UDP TCP
Protocols
IP
802.11ethernet
Network Interface
Network Device Driver
File Access
Logical Filesystem
EXT4
-
7/29/2019 Raoul Kernel Slides
14/25
Socket Access
Contains the system callfunctions like socket, connect,accept, bind
Implements the POSIX
socket interface Independent of protocols or
socket types
Responsible of mapping socket
data structures to integerhandlers
Calls the underlying layerfunctions
sys_socket()sock_create
sys_socket
socketIntegerhandler
Socketcreate
Handlertable
-
7/29/2019 Raoul Kernel Slides
15/25
Protocol Families
Implements different socketfamilies INET, UNIX
Extensible through the useof pointers to functions and
modules. Allocates memory for the
socket
Calls net_proto_familiy
create for familiy specificinitilization
*pf
inet_create
net_proto_family
AF_LOCAL
AF_UNIX
-
7/29/2019 Raoul Kernel Slides
16/25
Socket Splice
Unix uses the abstraction of Files as first classobjects
Linux supports to send entire files between file
descriptors. A descriptor can be a socket
Also Unix supports Network File Systems
NFS, Samba, Coda, Andrew The socket splice is responsible of handling
these abstractions
-
7/29/2019 Raoul Kernel Slides
17/25
Protocols
Families have multipleprotocols
INET: TCP, UDP
Protocol functions are
stored in proto_ops
Some functions are notused in that protocol so theypoint to dummies
Some functions are thesame across manyprotocols and can beshared
inet_bind
inet_listen
inet_stream_connect
socket inet_stream_ops
proto_ops
inet_bind
NULL
inet_dgram_connect
inet_dgram_ops
-
7/29/2019 Raoul Kernel Slides
18/25
Packet Creation
At the sending function, thebuffer is packetized.
Packets are represented bythe sk_buff data structure
Contains pointers the:
transport layer header
Link-layer header
Received Timestamp Device we received it
Some fields can be NULL
tcp_send_msg
tcp_transmit_skb
ip_queue_xmit
Struct sk_buf
char*
Struct sk_buf
TCP Header
-
7/29/2019 Raoul Kernel Slides
19/25
Fragmentation and Routing
Fragmentation is performedinside ip_fragment
If the packet does not havea route it is filled in by
ip_route_output_flow There are routing
mechanisms used
Route Cache
Forwarding InformationBase
Slow Routing
ip_fragment
FIB
Slow routing
ip_route_output_flow
Route cache
forward dev_queue_xmit(queue packet)
NY
N
N
N
Y
Y
Y
ip_forward(packet forwarding)
-
7/29/2019 Raoul Kernel Slides
20/25
Data Link Layer The Data Link Layer is
responsible of packetscheduling
The dev_queue_xmit isresponsible of enqueing
packets for transmission inthe qdisc of the device
Then in process context it istried to send
If the device is busy weschedule the send for alater time
The dev_hard_start_xmit isresponsible for sending tothe device
Dev_queue_xmit(sk_buf)
Dev qdisc enqueue
dev_hard_start_xmit()
Dev qdisc dequeue
-
7/29/2019 Raoul Kernel Slides
21/25
Case Study: iNET
INET is an EDF (EarliestDeadline First) packetscheduler
Each Packet has a deadline
specified in the TOS field We implemented it as a
Linux Kernel Module
We implement a packet
scheduler at the qdisc level. Replace qdisc enqueue and
dequeue functions
Enqueued packets are put
in a heap sorted by deadline
enqueue(sk_buf)
dequeue(sk_buf)
HW
Deadline
heap
-
7/29/2019 Raoul Kernel Slides
22/25
High-Performance Network Stacks Minimize copying
Zero copy technique
Page remapping
Use good data structures
Inet v0.1 used a list instead of a heap
Optimize the common case
Branch optimization
Avoid process migration orcache misses
Avoid dynamic assignment of interrupts to different CPUs
Combine Operations within the same layer to minimizepasses to the data
Checksum + data copying
-
7/29/2019 Raoul Kernel Slides
23/25
High-Performance Network Stacks
Cache/Reuse as much as you can
Headers, SLAB allocator
Hierarchical Design + Information Hiding
Data encapsulation Separation of concerns
Interrupt Moderation/Mitigation
Receive packets in timed intervals only (e.g. ATM)
Packet Mitigation
Similar but at the packet level
-
7/29/2019 Raoul Kernel Slides
24/25
Conclusion
The Linux kernel has 3 main contexts: Kernel, Process andInterrupt.
Use spinlock for interrupt context and mutexes if you plan tosleep holding the lock
Implement a module avoid patching the kernel main tree
To defer work implement two halves. Timers + Threads
Socket families are implemented through pointers tofunctions (net_proto_family and proto_ops)
Packets are represented by the sk_buf structure
Packet scheduling is done at the qdisc level in the Link Layer
-
7/29/2019 Raoul Kernel Slides
25/25
References Linux Kernel Map http://www.makelinux.net/kernel_map
A. Chimata, Path of a Packet in the Linux Kernel Stack, Universityof Kansas, 2005
Linux Kernel Cross Reference Source
R. Love, Linux Kernel Development , 2nd Edition, Novell Press,
2006 H. Nguyen, R. Rivas, iDSRT: Integrated Dynamic Soft Realtime
Architecture for Critical Infrastructure Data Delivery over WAN,Qshine 2009
M. Hassan and R. Jain, High Performance TCP/IP Networking:Concepts, Issues, and Solutions, Prentice-Hall, 2003
K. Ilhwan, Timer-Based Interrupt Mitigation for High PerformancePacket Processing, HPC, 2001
Anand V., TCPIP Network Stack Performance in Linux Kernel 2.4
and 2.5, IBM Linux Technology Center
http://www.makelinux.net/kernel_maphttp://www.makelinux.net/kernel_map