QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon,...

429
QNX Neutrino RTOS System Architecture For release 6.3.0 or later 2006, QNX Software Systems GmbH & Co. KG.

Transcript of QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon,...

Page 1: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino RTOS

System Architecture

For release 6.3.0 or later

2006, QNX Software Systems GmbH & Co. KG.

Page 2: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

1996–2006, QNX Software Systems GmbH & Co. KG. All rights reserved.

Published under license by:

QNX Software Systems International Corporation175 Terence Matthews CrescentKanata, OntarioK2M 1W8CanadaVoice: +1 613 591-0931Fax: +1 613 591-3579Email: [email protected]:http://www.qnx.com/

Publishing history

October 1996 First edition

May 1999 Second edition

August 2001 Third edition

January 2003 Fourth edition

June 2004 Fifth edition

February 2006 Sixth edition

Electronic edition published 2006

Technical support options

To obtain technical support for any QNX product, visit theTechnical Support section in theServices area on our website(www.qnx.com). You’ll find a wide range of support options, including our free web-basedDeveloper Support Center.

QNX, Neutrino, Photon, Photon microGUI, Momentics, and “Build a More Reliable World” are trademarks, registered in certain jurisdictions, of QNX

Software Systems GmbH & Co. KG and are used under license by QNX Software Systems International Corporation. All other trademarks belong to their

respective owners.

Printed in Canada.

Part Number: 002533

Page 3: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Contents

About This Guide xviiTypographical conventions xix

Note to Windows users xx

What you’ll find in this guide xxi

The Philosophy of QNX Neutrino 11Design goals 3

An embeddable POSIX OS? 3

Product scaling 4

Why POSIX for embedded systems? 4

Why QNX Neutrino for embedded systems? 6

Microkernel architecture 7

The OS as a team of processes 10

A true kernel 11

System processes 12

Interprocess communication 13

QNX Neutrino as a message-passing operating system 13

Network distribution of kernels 14

Single-computer model 15

Flexible networking 15

The QNX Neutrino Microkernel 172Introduction 19

The implementation of QNX Neutrino 20

POSIX realtime and thread extensions 21

November 2, 2006 Contents iii

Page 4: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

System services 21

Threads and processes 23

Thread attributes 26

Thread scheduling 31

When scheduling decisions are made 31

Scheduling priority 32

Scheduling algorithms 34

IPC issues 41

Thread complexity issues 42

Synchronization services 44

Mutual exclusion locks 45

Condition variables 46

Barriers 48

Sleepon locks 51

Reader/writer locks 51

Semaphores 52

Synchronization via scheduling algorithm 53

Synchronization via message passing 54

Synchronization via atomic operations 54

Synchronization services implementation 55

QNX Neutrino IPC 56

Synchronous message passing 57

Message copying 60

Simple messages 64

Channels and connections 64

Message-passing API 68

Robust implementations with Send/Receive/Reply 68

Events 71

Signals 73

Summary of signals 79

POSIX message queues 81

Shared memory 84

iv Contents November 2, 2006

Page 5: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

Pipes and FIFOs 93

Clock and timer services 94

Time correction 96

Timers 96

Interrupt handling 99

Interrupt latency 99

Scheduling latency 100

Nested interrupts 101

Interrupt calls 102

The Instrumented Microkernel 1093Introduction 111

Instrumentation at a glance 111

Event control 112

Modes of emission 113

Ring buffer 114

Data interpretation 114

System analysis with the IDE 115

Proactive tracing 116

Multicore Processing 1194Introduction 121

Asymmetric multiprocessing (AMP) 122

Symmetric multiprocessing (SMP) 123

Neutrino’s microkernel approach 124

Booting an x86 SMP system 125

Booting a PowerPC or MIPS SMP system 126

How the SMP microkernel works 126

Critical sections 128

Bound multiprocessing (BMP) 130

A viable migration strategy 131

Choosing between AMP, SMP, and BMP 132

November 2, 2006 Contents v

Page 6: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

Process Manager 1355Introduction 137

Process management 137

Process primitives 138

Process loading 143

Memory management 143

Memory Management Units (MMUs) 144

Memory protection at run time 146

Quality control 148

Full-protection model 149

Pathname management 150

Domains of authority 150

Resolving pathnames 151

Symbolic prefixes 155

File descriptor namespace 158

Dynamic Linking 1636Shared objects 165

Statically linked 165

Dynamically linked 165

Augmenting code at runtime 166

How shared objects are used 167

ELF format 167

ELF without COFF 168

The process 168

Runtime linker 170

Loading a shared library at runtime 171

Symbol name resolution 172

Resource Managers 1757Introduction 177

What is a resource manager? 177

Why write a resource manager? 178

vi Contents November 2, 2006

Page 7: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

The types of resource managers 180

Communication via native IPC 181

Resource manager architecture 183

Message types 184

The resource manager shared library 184

Summary 190

Filesystems 1918Introduction 193

Filesystems and pathname resolution 193

Filesystem classes 194

Filesystems as shared libraries 195

io-blk 197

Filesystem limitations 200

Image filesystem 201

RAM “filesystem” 202

Embedded transaction filesystem (ETFS) 203

Inside a transaction 204

Types of storage media 205

Reliability features 205

QNX 4 filesystem 208

DOS Filesystem 209

CD-ROM filesystem 212

FFS3 filesystem 212

Customization 213

Organization 213

Features 215

Utilities 217

System calls 218

NFS filesystem 218

CIFS filesystem 219

Linux Ext2 filesystem 219

Virtual filesystems 219

November 2, 2006 Contents vii

Page 8: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

Package filesystem 220

Inflator 221

Character I/O 2239Introduction 225

Driver/io-char communication 226

Device control 228

Input modes 229

Device subsystem performance 233

Console devices 234

Terminal emulation 235

Serial devices 235

Parallel devices 236

Pseudo terminal devices (ptys) 236

Networking Architecture 23910Introduction 241

Network manager (io-net) 242

Filter module 243

Converter module 243

Protocol module 243

Driver module 244

Loading and unloading a driver 244

Network DDK 245

Native Networking (Qnet) 24711QNX Neutrino distributed 249

Name resolution and lookup 251

File descriptor (connection ID) 253

Behind a simpleopen() 253

Global Name Service (GNS) 255

Network naming 255

Quality of Service (QoS) and multiple paths 257

viii Contents November 2, 2006

Page 9: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

QoS policies 257

Specifying QoS policies 260

Symbolic links 261

Examples 261

Local networks 262

Remote networks 262

Custom device drivers 263

TCP/IP Networking 26512Introduction 267

Stack configurations 267

Structure of the TCP/IP manager 269

Socket API 270

Database routines 271

Multiple stacks 272

SCTP 272

IP filtering and NAT 273

NTP 274

Dynamic host configuration 274

AutoIP 274

PPP over Ethernet 275

/etc/autoconnect 275

SNMP support 276

Embedded web server 276

CGI method 277

SSI method 277

High Availability 27913What is High Availability? 281

An OS for HA 281

Custom hardware support 283

Client library 283

Recovery example 283

November 2, 2006 Contents ix

Page 10: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

High Availability Manager 286

HAM and the Guardian 287

HAM hierarchy 287

Publishing autonomously detected conditions 292

Subscribing to autonomously published conditions 294

HAM as a “filesystem” 294

Multistage recovery 295

HAM API 295

Power Management 30114Power management for embedded systems 303

Application-driven framework 303

Framework components 303

Libraries and BSPs 304

Power manager 306

Power-managed objects 311

Power-aware applications 315

Persistent storage services 316

CPU power management 318

Adaptive Partitioning 32115What are partitions? 323

Why adaptive? 324

Benefits of adaptive partitioning 325

Engineering your product’s performance 326

Dealing with design complexity 326

Providing security 328

Debugging 329

Adaptive partitioning scheduler 330

The Photon microGUI 33116A graphical microkernel 333

The Photon event space 334

x Contents November 2, 2006

Page 11: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

Regions 336

Events 337

Graphics drivers 339

Multiple graphics drivers 340

Color model 341

Font support 341

Stroke-based fonts 342

Unicode multilingual support 342

UTF-8 encoding 343

Animation support 343

Multimedia support 344

Plugin architecture 344

Media Player 345

Printing support 346

The Photon Window Manager 346

Widget library 347

Fundamental widgets 347

Container widgets 351

Advanced widgets 354

Convenience functions 360

Driver development kits 364

Summary 364

Glossary 367

Index 391

November 2, 2006 Contents xi

Page 12: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 13: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

List of Figures

Conventional executives offer no memory protection. 9

In a monolithic OS, system processes have no protection. 9

A microkernel provides complete memory protection. 9

The QNX Neutrino architecture. 10

The QNX Neutrino microkernel. 20

QNX Neutrino preemption details. 23

Sparse matrix (tid,key) to value mapping. 28

Possible thread states. 29

The ready queue. 33

Thread A blocks, Thread B runs. 35

FIFO scheduling. 36

Round-robin scheduling. 36

A thread’s budget is replenished periodically. 38

Thread drops in priority until its budget is replenished. 39

Thread oscillates between high and low priority. 40

Changes of state for a client thread in a send-receive-replytransaction. 57

Changes of state for a server thread in a send-receive-replytransaction. 58

A multipart transfer. 61

Scatter/gather of a read of 1454 bytes. 62

Connections map elegantly into file descriptors. 66

Pulses pack a small payload. 67

Threads should always send up to higher-level threads. 70

A higher-level thread can “send” a pulse event. 71

The client sends asigevent to the server. 72

November 2, 2006 List of Figures xiii

Page 14: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

Signal delivery. 76

Arguments tommap(). 88

Interrupt handler simply terminates. 100

Interrupt handler terminates, returning an event. 101

Stacked interrupts. 102

Instrumentation at a glance. 112

The IDE helps you visualize system activity. 116

Virtual address mapping (on an x86). 145

Full protection VM (on an x86). 150

The SCOID and FD map to an OCB of an I/O Manager. 159

Two processes open the same file. 160

A process opens a file twice. 161

Object file format: linking view and execution view. 168

Process memory layout on an x86. 169

A resource manager is responsible for three data structures. 188

Multiple clients opening various devices. 189

Encapsulation. 190

QNX Neutrino filesystem layering. 196

ETFS is a filesystem composed entirely of transactions. 204

Theio-char module is implemented as a library. 226

Device I/O in QNX Neutrino. 227

Conditions for satisfying an input request. 230

Pseudo-ttys. 237

Theio-net process. 242

A simple network where the client and server reside on separatemachines. 251

A client-server message pass across the network. 253

Qnet and a single network. 258

Qnet and physically separate networks. 259

Thenpm-tcpip suite and dependents. 269

Embedded web server. 276

The QNX PM framework facilitates application-specific designs.306

xiv List of Figures November 2, 2006

Page 15: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

Static partitions guarantee that processes get the resourcesspecified by the system designer. 323

Without adaptive partitioning, a DOS attack on one process canstarve other critical functions. 328

With adaptive partitioning, a DOS attack is contained. 329

Photon regions. 335

Opaque regions are clipped out. 339

November 2, 2006 List of Figures xv

Page 16: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 17: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

About This Guide

November 2, 2006 About This Guide xvii

Page 18: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 19: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Typographical conventions

Typographical conventionsThroughout this manual, we use certain typographical conventions todistinguish technical terms. In general, the conventions we useconform to those found in IEEE POSIX publications. The followingtable summarizes our conventions:

Reference Example

Code examples if( stream == NULL )

Command options -lR

Commands make

Environment variables PATH

File and pathnames /dev/null

Function names exit()

Keyboard chords Ctrl – Alt – Delete

Keyboard input something you type

Keyboard keys Enter

Program output login:

Programming constants NULL

Programming data types unsigned short

Programming literals 0xFF, "message string"

Variable names stdin

User-interface componentsCancel

We format single-step instructions like this:

➤ To reload the current page, pressCtrl – R.

We use an arrow (→) in directions for accessing menu items, like this:

November 2, 2006 About This Guide xix

Page 20: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Typographical conventions 2006, QNX Software Systems GmbH & Co. KG.

You’ll find the Other... menu item underPerspective→Show View.

We use notes, cautions, and warnings to highlight importantmessages:

Notes point out something important or useful.☞

CAUTION: Cautions tell you about commands or procedures thatmay have unwanted or undesirable side effects.!

WARNING: Warnings tell you about commands or proceduresthat could be dangerous to your files, your hardware, or evenyourself.

Note to Windows usersIn our documentation, we use a forward slash (/) as a delimiter inallpathnames, including those pointing to Windows files.

We also generally follow POSIX/UNIX filesystem conventions.

xx About This Guide November 2, 2006

Page 21: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. What you’ll find in this guide

What you’ll find in this guideTheSystem Architecture guide accompanies the QNX Neutrinorealtime OS and is intended for both application developers andend-users.

The guide describes the philosophy of QNX Neutrino and thearchitecture used to robustly implement the OS. It coversmessage-passing services, followed by the details of the microkernel,the process manager, resource managers, the Photon microGUI, andother aspects of QNX Neutrino.

Note that certain features of the OS as described in this guide maystill be under development for a given release.

For the latest news and information on any QNX product, visit ourwebsite (www.qnx.com). You’ll find links to many useful areas —software downloads, featured articles by developers, newsgroups,technical support options, and more.

The following table may help you find information quickly:

To find out about: Go to:

OS design goals; message-passing IPC The Philosophy of QNX Neutrino

System services The QNX Neutrino Microkernel

System event monitoring The Instrumented Microkernel

Working on a system with more thanone processor

Multicore Processing

Memory management, pathnamemanagement, etc.

Process Manager

Shared objects Dynamic Linking

Device drivers Resource Managers

continued. . .

November 2, 2006 About This Guide xxi

Page 22: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

What you’ll find in this guide 2006, QNX Software Systems GmbH & Co. KG.

To find out about: Go to:

Image, RAM, QNX 4, DOS, CD-ROM,Flash, NFS, CIFS, Ext2 filesystems

Filesystems

Serial and parallel devices Character I/O

Network subsystem Networking Architecture

Native QNX Neutrino networking Native Networking (Qnet)

TCP/IP implementation TCP/IP Networking

Fault recovery High Availability

Power-aware systems Power Management

Sharing resources among competingprocesses

Adaptive Partitioning

Graphical environment The Photon microGUI

Terms used in QNX documentation Glossary

xxii About This Guide November 2, 2006

Page 23: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 1

The Philosophy of QNX Neutrino

In this chapter. . .Design goals 3Why QNX Neutrino for embedded systems? 6Microkernel architecture 7Interprocess communication 13Network distribution of kernels 14

November 2, 2006 Chapter 1 � The Philosophy of QNX Neutrino 1

Page 24: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 25: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Design goals

Design goalsThe primary goal of QNX Neutrino is to deliver the open systemsPOSIX API in a robust, scalable form suitable for a wide range ofsystems — from tiny, resource-constrained embedded systems tohigh-end distributed computing environments. The OS supportsseveral processor families, including x86, ARM, XScale, PowerPC,MIPS, and SH-4.

For mission-critical applications, a robust architecture is alsofundamental, so the OS makes flexible and complete use of MMUhardware.

Of course, simply setting out these goals doesn’t guarantee results.We invite you to read through thisSystem Architecture guide to get afeel for our implementation approach and the design tradeoffs chosento achieve these goals. When you reach the end of this guide, wethink you’ll agree that QNX Neutrino is the first OS product of itskind to truly deliver open systems standards, wide scalability, andhigh reliability.

An embeddable POSIX OS?According to a prevailing myth, if you scratch a POSIX operatingsystem, you’ll find UNIX beneath the surface! A POSIX OS istherefore too large and unsuitable for embedded systems.

The fact, however, is that POSIX isnot UNIX. Although the POSIXstandards are rooted in existing UNIX practice, the POSIX workinggroups explicitly defined the standards in terms of “interface,notimplementation.”

Thanks to the precise specification within the standards, as well as theavailability of POSIX test suites, nontraditional OS architectures canprovide a POSIX API without adopting the traditional UNIX kernel.Compare any two POSIX systems and they’lllook very much alike —they’ll have many of the same functions, utilities, etc. But when itcomes to performance or reliability, they may be as different as nightand day. Architecture makes the difference.

November 2, 2006 Chapter 1 � The Philosophy of QNX Neutrino 3

Page 26: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Design goals 2006, QNX Software Systems GmbH & Co. KG.

Despite its decidedly non-UNIX architecture, QNX Neutrinoimplements the standard POSIX API. By adopting a microkernelarchitecture, the OS delivers this API in a form easily scaled down forrealtime embedded systems or incrementally scaled up as required.

Product scalingSince you can readily scale a microkernel OS simply by including oromitting the particular processes that provide the functionalityrequired, you can use a single microkernel OS for a much wider rangeof applications than a realtime executive.

Product development often takes the form of creating a “product line,”with successive models providing greater functionality. Rather thanbe forced to change operating systems for each version of the product,developers using a microkernel OS can easily scale the system asneeded — by adding filesystems, networking, graphical userinterfaces, and other technologies.

Some of the advantages to this scalable approach include:

� portable application code (between product-line members)

� common tools used to develop the entire product line

� portable skill sets of development staff

� reduced time-to-market.

Why POSIX for embedded systems?A common problem with realtime application development is thateach realtime OS tends to come equipped with its own proprietaryAPI. In the absence of industry standards, this isn’t an unusual statefor a competitive marketplace to evolve into, since surveys of therealtime marketplace regularly show heavy use of in-houseproprietary operating systems. POSIX represents a chance to unifythis marketplace.

Among the many POSIX standards, those of most interest toembedded systems developers are:

4 Chapter 1 � The Philosophy of QNX Neutrino November 2, 2006

Page 27: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Design goals

� 1003.1 — defines the API for process management, device I/O,filesystem I/O, and basic IPC. This encompasses what might bedescribed as the base functionality of a UNIX OS, serving as auseful standard for many applications. From a C-languageprogramming perspective, ANSI X3J11 C is assumed as a startingpoint, and then the various aspects of managing processes, files,and tty devices are detailed beyond what ANSI C specifies.

� Realtime Extensions — defines a set of realtime extensions to thebase 1003.1 standard. These extensions consist of semaphores,prioritized process scheduling, realtime extensions to signals,high-resolution timer control, enhanced IPC primitives,synchronous and asynchronous I/O, and a recommendation forrealtime contiguous file support.

� Threads — further extends the POSIX environment to include thecreation and management of multiple threads of execution within agiven address space.

� Additional Realtime Extensions — defines further extensions to therealtime standard. Facilities such as attaching interrupt handlersare described.

� Application Environment Profiles — defines several AEPs(Realtime AEP, Embedded Systems AEP, etc.) of the POSIXenvironment to suit different embedded capability sets. Theseprofiles represent embedded OSs with/without filesystems andother capabilities.

For an up-to-date status of the many POSIX drafts/standardsdocuments, see the PASC (Portable Applications StandardsCommittee of the IEEE Computer Society) report athttp://pasc.opengroup.org/standing/sd11.html.

Apart from any “bandwagon” motive for adopting industry standards,there are several specific advantages to applying the POSIX standardto the embedded realtime marketplace.

November 2, 2006 Chapter 1 � The Philosophy of QNX Neutrino 5

Page 28: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Why QNX Neutrino for embedded systems? 2006, QNX Software Systems GmbH & Co. KG.

Multiple OS sources

Hardware manufacturers are loath to choose a single-sourcedhardware component because of the risks implied if that sourcediscontinues production. For the same reason, manufacturersshouldn’t be tied to a single-sourced, proprietary OS simply becausetheir application source code isn’t portable to other OSs.

By building applications to the POSIX standards, developers can useOSs from multiple vendors. Application source code can be readilyported from platform to platform and from OS to OS, provided thatdevelopers avoid using OS-specific extensions.

Portability of development staff

Using a common API for embedded development, programmersexperienced with one realtime OS can directly apply their skill sets toother projects involving other processors and operating systems. Inaddition, programmers with UNIX or POSIX experience can easilywork on embedded realtime systems, since the nonrealtime portion ofthe realtime OS’s API is already familiar territory.

Development environment: native and cross development

With the addition of interface hardware similar to the target runtimesystem, a workstation running a POSIX OS can become a functionalsuperset of the embedded system. As a result, the application can beconveniently developed on the self-hosted desktop system.

Even in a cross-hosted development environment, the API remainsessentially the same. Regardless of the particular host (QNXNeutrino, Linux, Windows,...) or the target (x86, ARM, MIPS,PowerPC,...), the programmer doesn’t need to worry aboutplatform-specific endian, alignment, or I/O issues.

Why QNX Neutrino for embedded systems?The main responsibility of an operating system is to manage acomputer’s resources. All activities in the system — schedulingapplication programs, writing files to disk, sending data across a

6 Chapter 1 � The Philosophy of QNX Neutrino November 2, 2006

Page 29: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Microkernel architecture

network, and so on — should function together as seamlessly andtransparently as possible.

Some environments call for more rigorous resource management andscheduling than others. Realtime applications, for instance, depend onthe OS to handle multiple events and to ensure that the systemresponds to those events within predictable time limits. The moreresponsive the OS, the more “time” a realtime application has to meetits deadlines.

QNX Neutrino is ideal forembedded realtime applications. It can bescaled to very small sizes and provides multitasking, threads,priority-driven preemptive scheduling, and fast context-switching —all essential ingredients of an embedded realtime system. Moreover,the OS delivers these capabilities with a POSIX-standard API; there’sno need to forgo standards in order to achieve a small system.

QNX Neutrino is also remarkably flexible. Developers can easilycustomize the OS to meet the needs of their applications. From a“bare-bones” configuration of a microkernel with a few smallmodules to a full-blown network-wide system equipped to servehundreds of users, you’re free to set up your system to use only thoseresources you require to tackle the job at hand.

QNX Neutrino achieves its unique degree of efficiency, modularity,and simplicity through two fundamental principles:

� microkernel architecture

� message-based interprocess communication

Microkernel architectureBuzzwords often fall in and out of fashion. Vendors tend toenthusiastically apply the buzzwords of the day to their products,whether the terms actually fit or not.

The term “microkernel” has become fashionable. Although many newoperating systems are said to be “microkernels” (or even“nanokernels”), the term may not mean very much without a cleardefinition.

November 2, 2006 Chapter 1 � The Philosophy of QNX Neutrino 7

Page 30: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Microkernel architecture 2006, QNX Software Systems GmbH & Co. KG.

Let’s try to define the term. A microkernel OS is structured as a tinykernel that provides the minimal services used by a team of optionalcooperating processes, which in turn provide the higher-level OSfunctionality. The microkernel itself lacks filesystems and many otherservices normally expected of an OS — those services are providedby optional processes.

The real goal in designing a microkernel OS is not simply to “make itsmall.” A microkernel OS embodies a fundamental change in theapproach to delivering OS functionality.Modularity is the key, size isbut a side effect. To call any kernel a “microkernel” simply because ithappens to be small would miss the point entirely.

Since the IPC services provided by the microkernel are used to “glue”the OS itself together, the performance and flexibility of thoseservices govern the performance of the resulting OS. With theexception of those IPC services, a microkernel is roughly comparableto a realtime executive, both in terms of the services provided and intheir realtime performance.

The microkernel differs from an executive in how the IPC services areused to extend the functionality of the kernel with additional,service-providing processes. Since the OS is implemented as a teamof cooperating processes managed by the microkernel, user-writtenprocesses can serve both as applications and as processes that extendthe underlying OS functionality for industry-specific applications.The OS itself becomes “open” and easily extensible. Moreover,user-written extensions to the OS won’t affect the fundamentalreliability of the core OS.

A difficulty for many realtime executives implementing the POSIX1003.1 standard is that their runtime environment is typically asingle-process, multiple-threaded model, with unprotected memorybetween threads. Such an environment is only a subset of themulti-process model that POSIX assumes; it cannot support thefork()function. In contrast, QNX Neutrino fully utilizes an MMU to deliverthe complete POSIX process model in a protected environment.

8 Chapter 1 � The Philosophy of QNX Neutrino November 2, 2006

Page 31: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Microkernel architecture

As the following diagrams show, a true microkernel offerscompletememory protection, not only for user applications, but also for OScomponents (device drivers, filesystems, etc.):

Kernel space

Applications Device drivers

Filesystem

TCP/IPstack

Conventional executives offer no memory protection.

Applications

Kernel space

User space

Filesystem

TCP/IPstack

Device drivers

In a monolithic OS, system processes have no protection.

User space

Applications

Device drivers

Filesystem

TCP/IPstack

Microkernel

A microkernel provides complete memory protection.

November 2, 2006 Chapter 1 � The Philosophy of QNX Neutrino 9

Page 32: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Microkernel architecture 2006, QNX Software Systems GmbH & Co. KG.

The first version of the QNX OS was shipped in 1981. With eachsuccessive product revision, we have applied the experience fromprevious product generations to the latest incarnation: QNX Neutrino,our most capable, scalable OS to date. We believe that this time-testedexperience is what enables the QNX Neutrino OS to deliver thefunctionality it does using the limited resources it consumes.

The OS as a team of processesThe QNX Neutrino OS consists of a small microkernel managing agroup of cooperating processes. As the following illustration shows,the structure looks more like a team than a hierarchy, as several“players” of equal rank interact with each other through thecoordinating kernel.

Software bus

QNX 4file

manager

DOS filemanager

Processmanager

Flashfile

manager

CD-ROMfile

manager

NFS filemanager

PhotonGUI

manager

Charactermanager

Mqueuemanager

CIFS filemanager

Application

Qnetnetworkmanager

Neutrinomicrokernel

TCP/IPmanager

The QNX Neutrino architecture.

10 Chapter 1 � The Philosophy of QNX Neutrino November 2, 2006

Page 33: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Microkernel architecture

QNX Neutrino acts as a kind of “software bus” that lets youdynamically plug in/out OS modules whenever they’re needed.

A true kernelThekernel is the heart of any operating system. In some systems, the“kernel” comprises so many functions that for all intents and purposesit is the entire operating system!

But our microkernel is truly a kernel. First of all, like the kernel of arealtime executive, it’s very small. Secondly, it’s dedicated to only afew fundamental services:

� thread services via POSIX thread-creation primitives

� signal services via POSIX signal primitives

� message-passing services — the microkernel handles the routingof all messages between all threads throughout the entire system.

� synchronization services via POSIX thread-synchronizationprimitives.

� scheduling services — the microkernel schedules threads forexecution using the various POSIX realtime scheduling algorithms.

� timer services — the microkernel provides the rich set of POSIXtimer services.

� process management services — the microkernel and the processmanager together form a unit (calledprocnto). The processmanager portion is responsible for managing processes, memory,and the pathname space.

Unlike threads, the microkernel itself is never scheduled forexecution. The processor executes code in the microkernel only as theresult of an explicit kernel call, an exception, or in response to ahardware interrupt.

November 2, 2006 Chapter 1 � The Philosophy of QNX Neutrino 11

Page 34: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Microkernel architecture 2006, QNX Software Systems GmbH & Co. KG.

System processesAll OS services, except those provided by the mandatorymicrokernel/process manager module (procnto), are handled viastandard processes. A richly configured system could include thefollowing:

� filesystem managers

� character device managers

� graphical user interface (Photon)

� native network manager

� TCP/IP

System processes vs user-written processes

System processes are essentially indistinguishable from anyuser-written program — they use the same public API and kernelservices available to any (suitably privileged) user process.

It is this architecture that gives QNX Neutrino unparalleledextensibility. Since most OS services are provided by standard systemprocesses, it’s very simple to augment the OS itself: just write newprograms to provide new OS services.

In fact, the boundary between the operating system and theapplication can become very blurred. The only real differencebetween system services and applications is that OS services manageresources for clients.

Suppose you’ve written a database server — how should such aprocess be classified?

Just as a filesystem accepts requests (via messages) to open files andread or write data, so too would a database server. While the requeststo the database server may be more sophisticated, both servers arevery much the same in that they provide an API (implemented bymessages) that clients use to access a resource. Both are independentprocesses that can be written by an end-user and started and stoppedon an as-needed basis.

12 Chapter 1 � The Philosophy of QNX Neutrino November 2, 2006

Page 35: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Interprocess communication

A database server might be considered a system process at oneinstallation, and an application at another.It really doesn’t matter!The important point is that the OS allows such processes to beimplemented cleanly, with no need for modifications to the standardcomponents of the OS itself. For developers creating customembedded systems, this provides the flexibility to extend the OS indirections that are uniquely useful to their applications, withoutneeding access to OS source code.

Device drivers

Device drivers allow the OS and application programs to make use ofthe underlying hardware in a generic way (e.g. a disk drive, a networkinterface). While most OSs require device drivers to be tightly boundinto the OS itself, device drivers for QNX Neutrino can be started andstopped as standard processes. As a result, adding device driversdoesn’t affect any other part of the OS — drivers can be developedand debugged like any other application.

Interprocess communicationWhen several threads run concurrently, as in typical realtimemultitasking environments, the OS must provide mechanisms to allowthem to communicate with each other.

Interprocess communication (IPC) is the key to designing anapplication as a set of cooperating processes in which each processhandles one well-defined part of the whole.

The OS provides a simple but powerful set of IPC capabilities thatgreatly simplify the job of developing applications made up ofcooperating processes.

QNX Neutrino as a message-passing operating systemQNX was the first commercial operating system of its kind to makeuse of message passing as the fundamental means of IPC. The OSowes much of its power, simplicity, and elegance to the completeintegration of the message-passing method throughout the entiresystem.

November 2, 2006 Chapter 1 � The Philosophy of QNX Neutrino 13

Page 36: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Network distribution of kernels 2006, QNX Software Systems GmbH & Co. KG.

In QNX Neutrino, a message is a parcel of bytes passed from oneprocess to another. The OS attaches no special meaning to the contentof a message — the data in a message has meaning for the sender ofthe message and for its receiver, but for no one else.

Message passing not only allows processes to pass data to each other,but also provides a means of synchronizing the execution of severalprocesses. As they send, receive, and reply to messages, processesundergo various “changes of state” that affect when, and for howlong, they may run. Knowing their states and priorities, themicrokernel can schedule all processes as efficiently as possible tomake the most of available CPU resources. This single, consistentmethod — message-passing — is thus constantly operativethroughout the entire system.

Realtime and other mission-critical applications generally require adependable form of IPC, because the processes that make up suchapplications are so strongly interrelated. The discipline imposed byQNX Neutrino’s message-passing design helps bring order andgreater reliability to applications.

Network distribution of kernelsIn its simplest form, local area networking provides a mechanism forsharing files and peripheral devices among several interconnectedcomputers. QNX Neutrino goes far beyond this simple concept andintegrates the entire network into a single, homogeneous set ofresources.

Any thread on any machine in the network can directly make use ofany resource on any other machine. From the application’sperspective, there’s no difference between a local or remote resource— no special facilities need to be built into applications to allow themto make use of remote resources.

Users may access files anywhere on the network, take advantage ofany peripheral device, and run applications on any machine on thenetwork (provided they have the appropriate authority). Processes cancommunicate in the same manner anywhere throughout the entire

14 Chapter 1 � The Philosophy of QNX Neutrino November 2, 2006

Page 37: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Network distribution of kernels

network. Again, the OS’s all-pervasive message-passing IPC accountsfor such fluid, transparent networking.

Single-computer modelQNX Neutrino is designed from the ground up as a network-wideoperating system. In some ways, a native QNX Neutrino networkfeels more like a mainframe computer than a set of individual micros.Users are simply aware of a large set of resources available for use byany application. But unlike a mainframe, QNX Neutrino provides ahighly responsive environment, since the appropriate amount ofcomputing power can be made available at each node to meet theneeds of each user.

In a mission-critical environment, for example, applications thatcontrol realtime I/O devices may require more performance thanother, less critical, applications, such as a web browser. The networkis responsive enough to support both types of applicationsat the sametime — the OS lets you focus computing power on the devices in yourhard realtime system where and when it’s needed, without sacrificingconcurrent connectivity to the desktop. Moreover, critical aspects ofrealtime computing, such as priority inheritance, function seamlesslyacross a QNX Neutrino network, regardless of the physical mediaemployed (switch fabric, serial, etc.).

Flexible networkingQNX Neutrino networks can be put together using various hardwareand industry-standard protocols. Since these are completelytransparent to application programs and users, new networkarchitectures can be introduced at any time without disturbing the OS.

Each node in the network is assigned a unique name that becomes itsidentifier. This name is the only visible means to determine whetherthe OS is running as a network or as a standalone operating system.

This degree of transparency is yet another example of the distinctivepower of QNX Neutrino’s message-passing architecture. In manysystems, important functions such as networking, IPC, or evenmessage passing are built on top of the OS, rather than integrated

November 2, 2006 Chapter 1 � The Philosophy of QNX Neutrino 15

Page 38: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Network distribution of kernels 2006, QNX Software Systems GmbH & Co. KG.

directly into its core. The result is often an awkward, inefficient“double standard” interface, whereby communication betweenprocesses is one thing, while penetrating the private interface of amysterious monolithic kernel is another matter altogether.

In contrast to monolithic systems, QNX Neutrino is grounded on theprinciple that effective communication is the key to effectiveoperation. Message passing thus forms the cornerstone of ourmicrokernel architecture and enhances the efficiency ofalltransactions among all processes throughout the entire system,whether across a PC backplane or across a mile of twisted pair.

16 Chapter 1 � The Philosophy of QNX Neutrino November 2, 2006

Page 39: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 2

The QNX Neutrino Microkernel

In this chapter. . .Introduction 19The implementation of QNX Neutrino 20System services 21Threads and processes 23Thread scheduling 31Synchronization services 44QNX Neutrino IPC 56Clock and timer services 94Interrupt handling 99

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 17

Page 40: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 41: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

IntroductionThe QNX Neutrino microkernel,procnto, implements the corePOSIX features used in embedded realtime systems, along with thefundamental QNX Neutrino message-passing services. The POSIXfeatures that aren’t implemented in the microkernel (file and deviceI/O, for example) are provided by optional processes and sharedlibraries.

Successive QNX microkernels have seen a reduction in the coderequired to implement a given kernel call. The object definitions atthe lowest layer in the kernel code have become more specific,allowing greater code reuse (such as folding various forms of POSIXsignals, realtime signals, and QNX pulses into common datastructures and code to manipulate those structures).

At its lowest level, the microkernel contains a few fundamentalobjects and the highly tuned routines that manipulate them. The OS isbuilt from this foundation.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 19

Page 42: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

The implementation of QNX Neutrino 2006, QNX Software Systems GmbH & Co. KG.

Interface Microkernel Objects

Thread

Sched

Synch

Signal

Message

Clock

Interrupt

Channel

Dispatch

Vector

Connection

Channel

Pulse

Timer

Thread

Timer

The QNX Neutrino microkernel.

Some developers have assumed that our microkernel is implementedentirely in assembly code for size or performance reasons. In fact, ourimplementation is coded primarily in C; size and performance goalsare achieved through successively refined algorithms and datastructures, rather than via assembly-level peep-hole optimizations.

The implementation of QNX NeutrinoHistorically, the “application pressure” on QNX operating systemshas been from both ends of the computing spectrum — frommemory-limited embedded systems all the way up to high-end SMP(symmetrical multiprocessing) machines with gigabytes of physicalmemory. Accordingly, the design goals for QNX Neutrinoaccommodate both seemingly exclusive sets of functionality.Pursuing these goals is intended to extend the reach of systems wellbeyond what other OS implementations could address.

20 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 43: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. System services

POSIX realtime and thread extensionsSince QNX Neutrino implements the majority of the realtime andthread services directly in the microkernel, these services areavailable even without the presence of additional OS modules.

In addition, some of the profiles defined by POSIX suggest that theseservices be present without necessarily requiring a process model. Inorder to accommodate this, the OS provides direct support for threads,but relies on its process manager portion to extend this functionalityto processes containing multiple threads.

Note that many realtime executives and kernels provide only anonmemory-protected threaded model, with no process model and/orprotected memory model at all. Without a process model, full POSIXcompliance cannot be achieved.

System servicesThe QNX Neutrino microkernel has kernel calls to support thefollowing:

� threads

� message passing

� signals

� clocks

� timers

� interrupt handlers

� semaphores

� mutual exclusion locks (mutexes)

� condition variables (condvars)

� barriers.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 21

Page 44: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

System services 2006, QNX Software Systems GmbH & Co. KG.

The entire OS is built upon these calls. The OS is fully preemptible,even while passing messages between processes; it resumes themessage pass where it left off before preemption.

The minimal complexity of the microkernel helps place an upperbound on the longest nonpreemptible code path through the kernel,while the small code size makes addressing complex multiprocessorissues a tractable problem. Services were chosen for inclusion in themicrokernel on the basis of having a short execution path. Operationsrequiring significant work (e.g. process loading) were assigned toexternal processes/threads, where the effort to enter the context of thatthread would be insignificant compared to the work done within thethread to service the request.

Rigorous application of this rule to dividing the functionality betweenthe kernel and external processes destroys the myth that a microkernelOS must incur higher runtime overhead than a monolithic kernel OS.Given the work done between context switches (implicit in a messagepass), and the very quick context-switch times that result from thesimplified kernel, the time spent performing context switchesbecomes “lost in the noise” of the work done to service the requestscommunicated by the message passing between the processes thatmake up the OS.

The following diagram shows the preemption details for the non-SMPkernel (x86 implementation).

22 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 45: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Threads and processes

Entry

Exit

SYSCALL

SYSEXIT

usecs

usecs tomsecs

Interrupts off

Interrupts off

Interrupts on,full preemption

Interrupts on,no preemption

usecs

usecs

Lockdown

Kerneloperations,which mayinclude amessage

pass

QNX Neutrino preemption details.

Interrupts are disabled, or preemption is held off, for only very briefintervals (typically in the order of hundreds of nanoseconds).

Threads and processesWhen building an application (realtime, embedded, graphical, orotherwise), the developer may want several algorithms within theapplication to execute concurrently. This concurrency is achieved byusing the POSIX thread model, which defines a process as containingone or more threads of execution.

A thread can be thought of as the minimum “unit of execution,” theunit of scheduling and execution in the microkernel. A process, on theother hand, can be thought of as a “container” for threads, defining the“address space” within which threads will execute. A process willalways contain at least one thread.

Depending on the nature of the application, threads might executeindependently with no need to communicate between the algorithms(unlikely), or they may need to be tightly coupled, with

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 23

Page 46: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Threads and processes 2006, QNX Software Systems GmbH & Co. KG.

high-bandwidth communications and tight synchronization. To assistin this communication and synchronization, QNX Neutrino provides arich variety of IPC and synchronization services.

The followingpthread * (POSIX Threads) library calls don’t involveany microkernel thread calls:

pthread attr destroy()

pthread attr getdetachstate()

pthread attr getinheritsched()

pthread attr getschedparam()

pthread attr getschedpolicy()

pthread attr getscope()

pthread attr getstackaddr()

pthread attr getstacksize()

pthread attr init()

pthread attr setdetachstate()

pthread attr setinheritsched()

pthread attr setschedparam()

pthread attr setschedpolicy()

pthread attr setscope()

pthread attr setstackaddr()

pthread attr setstacksize()

pthread cleanup pop()

pthread cleanup push()

pthread equal()

pthread getspecific()

pthread setspecific()

pthread testcancel()

pthread key create()

pthread key delete()

pthread once()

pthread self()

24 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 47: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Threads and processes

pthread setcancelstate()

pthread setcanceltype()

The following table lists the POSIX thread calls that have acorresponding microkernel thread call, allowing you to choose eitherinterface:

POSIX call Microkernel call Description

pthread create() ThreadCreate() Create a new thread ofexecution.

pthread exit() ThreadDestroy() Destroy a thread.

pthread detach() ThreadDetach() Detach a thread so it doesn’tneed to be joined.

pthread join() ThreadJoin() Join a thread waiting for itsexit status.

pthread cancel() ThreadCancel() Cancel a thread at the nextcancellation point.

N/A ThreadCtl() Change a thread’sNeutrino-specific threadcharacteristics.

pthread mutex init() SyncTypeCreate() Create a mutex.

pthread mutex destroy() SyncDestroy() Destroy a mutex.

pthread mutex lock() SyncMutexLock() Lock a mutex.

pthread mutex trylock() SyncMutexLock() Conditionally lock a mutex.

pthread mutex unlock() SyncMutexUnlock() Unlock a mutex.

pthread cond init() SyncTypeCreate() Create a condition variable.

pthread cond destroy() SyncDestroy() Destroy a condition variable.

pthread cond wait() SyncCondvarWait() Wait on a condition variable.

continued. . .

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 25

Page 48: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Threads and processes 2006, QNX Software Systems GmbH & Co. KG.

POSIX call Microkernel call Description

pthread cond signal() SyncCondvarSignal() Signal a condition variable.

pthread cond broadcast() SyncCondvarSignal() Broadcast a condition variable.

pthread getschedparam() SchedGet() Get scheduling parameters andpolicy of thread.

pthread setschedparam() SchedSet() Set scheduling parameters andpolicy of thread.

pthread sigmask() SignalProcmask() Examine or set a thread’ssignal mask.

pthread kill() SignalKill() Send a signal to a specificthread.

The OS can be configured to provide a mix of threads and processes(as defined by POSIX). Each process is MMU-protected from eachother, and each process may contain one or more threads that sharethe process’s address space.

The environment you choose affects not only the concurrencycapabilities of the application, but also the IPC and synchronizationservices the application might make use of.

Even though the common term “IPC” refers to communicatingprocesses, we use it here to describe the communication betweenthreads, whether they’re within the same process or separateprocesses.

Thread attributesAlthough threads within a process share everything within theprocess’s address space, each thread still has some “private” data. Insome cases, this private data is protected within the kernel (e.g. thetidor thread ID), while other private data resides unprotected in theprocess’s address space (e.g. each thread has a stack for its own use).Some of the more noteworthy thread-private resources are:

26 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 49: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Threads and processes

tid Each thread is identified by an integer thread ID,starting at 1. Thetid is unique within the thread’sprocess.

register set Each thread has its own instruction pointer (IP), stackpointer (SP), and other processor-specific registercontext.

stack Each thread executes on its own stack, stored withinthe address space of its process.

signal mask Each thread has its own signal mask.

thread local storage

A thread has a system-defined data area called“thread local storage” (TLS). The TLS is used tostore “per-thread” information (such astid, pid, stackbase,errno, and thread-specific key/data bindings).The TLS doesn’t need to be accessed directly by auser application. A thread can have user-defined dataassociated with a thread-specific data key.

cancellation handlers

Callback functions that are executed when the threadterminates.

Thread-specific data, implemented in thepthread library and stored inthe TLS, provides a mechanism for associating a process globalinteger key with a unique per-thread data value. To use thread-specificdata, you first create a new key and then bind a unique data value tothe key (per thread). The data value may, for example, be an integeror a pointer to a dynamically allocated data structure. Subsequently,the key can return the bound data value per thread.

A typical application of thread-specific data is for a thread-safefunction that needs to maintain a context for each calling thread.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 27

Page 50: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Threads and processes 2006, QNX Software Systems GmbH & Co. KG.

key

tid

Sparse matrix (tid,key) to value mapping.

You use the following functions to create and manipulate this data:

Function Description

pthread key create() Create a data key with destructor function

pthread key delete() Destroy a data key

pthread setspecific() Bind a data value to a data key

pthread getspecific() Return the data value bound to a data key

Thread life cycle

The number of threads within a process can vary widely, with threadsbeing created and destroyed dynamically. Thread creation(pthread create()) involves allocating and initializing the necessaryresources within the process’s address space (e.g. thread stack) andstarting the execution of the thread at some function in the addressspace.

Thread termination (pthread exit(), pthread cancel()) involvesstopping the thread and reclaiming the thread’s resources. As a thread

28 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 51: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Threads and processes

executes, its state can generally be described as either “ready” or“blocked.” More specifically, it can be one of the following:

SEND

REPLY

NANOSLEEP

CONDVAR

MUTEX

SIGSUSP

READY

JOIN

SIGWAITINFO

INTERRUPT

DEAD

RECEIVE

NET_SEND

SEM

STACK

WAITCTX

WAITPAGE

WAITTHREAD

STOPPED

RUNNING

NET_REPLY

Possible thread states.

CONDVAR The thread is blocked on a condition variable (e.g.it calledpthread cond wait()).

DEAD The thread has terminated and is waiting for a joinby another thread.

INTERRUPT The thread is blocked waiting for an interrupt (i.e.it calledInterruptWait()).

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 29

Page 52: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Threads and processes 2006, QNX Software Systems GmbH & Co. KG.

JOIN The thread is blocked waiting to join anotherthread (e.g. it calledpthread join()).

MUTEX The thread is blocked on a mutual exclusion lock(e.g. it calledpthread mutex lock()).

NANOSLEEP The thread is sleeping for a short time interval(e.g. it callednanosleep()).

NET REPLY The thread is waiting for a reply to be deliveredacross the network (i.e. it calledMsgReply*()).

NET SEND The thread is waiting for a pulse or signal to bedelivered across the network (i.e. it calledMsgSendPulse(), MsgDeliverEvent(), orSignalKill()).

READY The thread is waiting to be executed while theprocessor executes another thread of equal orhigher priority.

RECEIVE The thread is blocked on a message receive (e.g. itcalledMsgReceive()).

REPLY The thread is blocked on a message reply (i.e. itcalledMsgSend(), and the server received themessage).

RUNNING The thread is being executed by a processor.

SEM The thread is waiting for a semaphore to be posted(i.e. it calledSyncSemWait()).

SEND The thread is blocked on a message send (e.g. itcalledMsgSend(), but the server hasn’t yetreceived the message).

SIGSUSPEND The thread is blocked waiting for a signal (i.e. itcalledsigsuspend()).

SIGWAITINFO The thread is blocked waiting for a signal (i.e. itcalledsigwaitinfo()).

30 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 53: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Thread scheduling

STACK The thread is waiting for the virtual address spaceto be allocated for the thread’s stack (parent willhave calledThreadCreate()).

STOPPED The thread is blocked waiting for aSIGCONTsignal.

WAITCTX The thread is waiting for a noninteger (e.g.floating point) context to become available for use.

WAITPAGE The thread is waiting for physical memory to beallocated for a virtual address.

WAITTHREAD The thread is waiting for a child thread to finishcreating itself (i.e. it calledThreadCreate()).

Thread schedulingWhen scheduling decisions are made

The execution of a running thread is temporarily suspended wheneverthe microkernel is entered as the result of a kernel call, exception, orhardware interrupt. A scheduling decision is made whenever theexecution state of any thread changes — it doesn’t matter whichprocesses the threads might reside within. Threads are scheduledglobally across all processes.

Normally, the execution of the suspended thread will resume, but thescheduler will perform a context switch from one thread to anotherwhenever the running thread:

� is blocked

� is preempted

� yields.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 31

Page 54: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Thread scheduling 2006, QNX Software Systems GmbH & Co. KG.

When thread is blocked

The running thread blocks when it must wait for some event to occur(response to an IPC request, wait on a mutex, etc.). The blockedthread is removed from the ready queue and the highest-priority readythread is then run. When the blocked thread is subsequentlyunblocked, it is placed on the end of the ready queue for that prioritylevel.

When thread is preempted

The running thread is preempted when a higher-priority thread isplaced on the ready queue (it becomes READY, as the result of itsblock condition being resolved). The preempted thread remains at thebeginning of the ready queue for that priority and the higher-prioritythread runs.

When thread yields

The running thread voluntarily yields the processor (sched yield())and is placed on the end of the ready queue for that priority. Thehighest-priority thread then runs (which may still be the thread thatjust yielded).

Scheduling priorityEvery thread is assigned a priority. The scheduler selects the nextthread to run by looking at the priority assigned to every thread that isREADY (i.e. capable of using the CPU). The thread with the highestpriority is selected to run.

The following diagram shows the ready queue for six threads (A-F)that are READY. All other threads (G-Z) are BLOCKED. Thread A iscurrently running. Thread A, B, and C are at the highest priority, sothey’ll share the processor based on the running thread’s schedulingalgorithm.

32 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 55: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Thread scheduling

Readyqueue

5

0

255

10

Priority

E

D

Blocked

B C

G Z

Idle

F

Active

A

The ready queue.

The OS supports a total of 256 scheduling priority levels. Anon-root thread can set its priority to a level from 1 to 63 (thehighest priority),independent of the scheduling policy. Onlyrootthreads (i.e. those whose effectiveuid is 0) are allowed to setpriorities above 63. The specialidle thread (in the process manager)has priority 0 and is always ready to run. A thread inherits the priorityof its parent thread by default.

You can change the allowed priority range for non-root processeswith theprocnto -P option:

procnto -P priority

Here’s a summary of the ranges:

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 33

Page 56: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Thread scheduling 2006, QNX Software Systems GmbH & Co. KG.

Priority level Owner

0 Idle thread

1 throughpriority − 1 Non-root or root

priority through 255 root

Note that in order to preventpriority inversion, the kernel maytemporarily boost a thread’s priority.

The threads on the ready queue are ordered by priority. The readyqueue is actually implemented as 256 separate queues, one for eachpriority. Most of the time, threads are queued in FIFO order in thequeue of their priority. (The exception is a server thread that’s comingout of a RECEIVE-blocked state with a message from a client; theserver thread is inserted at the head of the queue for that priority —that is, the order is LIFO, not FIFO.) The first thread in thehighest-priority queue is selected to run.

Scheduling algorithmsTo meet the needs of various applications, QNX Neutrino providesthese scheduling algorithms:

� FIFO scheduling

� round-robin scheduling

� sporadic scheduling.

Each thread in the system may run using any method. The methodsare effective on a per-thread basis, not on a global basis for all threadsand processes on a node.

Remember that the FIFO and round-robin scheduling algorithmsapply only when two or more threads that share thesame priority areREADY (i.e. the threads are directly competing with each other). Thesporadic method, however, employs a “budget” for a thread’sexecution. In all cases, if a higher-priority thread becomes READY, itimmediately preempts all lower-priority threads.

34 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 57: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Thread scheduling

In the following diagram, three threads of equal priority are READY.If Thread A blocks, Thread B will run.

Blocked

Ready

queue

10

C

A

Active

B

Priority

Thread A blocks, Thread B runs.

Although a thread inherits its scheduling algorithm from its parentprocess, the thread can request to change the algorithm applied by thekernel.

FIFO scheduling

In FIFO scheduling, a thread selected to run continues executing untilit:

� voluntarily relinquishes control (e.g. it blocks)

� is preempted by a higher-priority thread.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 35

Page 58: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Thread scheduling 2006, QNX Software Systems GmbH & Co. KG.

Ready

queue

10

B C

Active

A

Priority

FIFO scheduling.

Round-robin scheduling

In round-robin scheduling, a thread selected to run continuesexecuting until it:

� voluntarily relinquishes control

� is preempted by a higher-priority thread

� consumes itstimeslice.

As the following diagram shows, Thread A ran until it consumed itstimeslice; the next READY thread (Thread B) now runs:

Readyqueue

10

C A

Active

BPrio

rity

Round-robin scheduling.

36 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 59: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Thread scheduling

A timeslice is the unit of time assigned to every process. Once itconsumes its timeslice, a thread is preempted and the next READYthread at the same priority level is given control. A timeslice is 4�

the clock period. (For more information, see the entry forClockPeriod() in theLibrary Reference.)

Apart from time slicing, round-robin scheduling is identical to FIFOscheduling.

Sporadic scheduling

The sporadic scheduling algorithm is generally used to provide acapped limit on the execution time of a threadwithin a given period oftime. This behavior is essential when Rate Monotonic Analysis(RMA) is being performed on a system that services both periodicand aperiodic events. Essentially, this algorithm allows a thread toservice aperiodic events without jeopardizing the hard deadlines ofother threads or processes in the system.

As in FIFO scheduling, a thread using sporadic scheduling continuesexecuting until it blocks or is preempted by a higher-priority thread.And as in adaptive scheduling, a thread using sporadic scheduling willdrop in priority, but with sporadic scheduling you have much moreprecise control over the thread’s behavior.

Under sporadic scheduling, a thread’s priority can oscillatedynamically between aforeground or normal priority and abackground or low priority. Using the following parameters, you cancontrol the conditions of this sporadic shift:

Initial budget (C)

The amount of time a thread is allowed to execute at its normalpriority (N) before being dropped to its low priority (L).

Low priority (L)

The priority level to which the thread will drop. The threadexecutes at this lower priority (L) while in the background, andruns at normal priority (N) while in the foreground.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 37

Page 60: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Thread scheduling 2006, QNX Software Systems GmbH & Co. KG.

Replenishment period (T)

The period of time during which a thread is allowed to consumeits execution budget. To schedule replenishment operations, thePOSIX implementation also uses this value as the offset fromthe time the thread becomes READY.

Max number of pending replenishments

This value limits the number of replenishment operations thatcan take place, thereby bounding the amount of systemoverhead consumed by the sporadic scheduling policy.

In a poorly configured system, a thread’s execution budget maybecome eroded because of too much blocking — i.e. it won’t receiveenough replenishments.

As the following diagram shows, the sporadic scheduling policyestablishes a thread’s initial execution budget (C), which is consumedby the thread as it runs and is replenished periodically (for the amountT). When a thread blocks, the amount of the execution budget that’sbeen consumed (R) is arranged to be replenished at some later time(e.g. at 40 msec) after the thread first became ready to run.

RC

T

C

Replenished at this point

0 msec 40 msec 80 msec

A thread’s budget is replenished periodically.

38 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 61: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Thread scheduling

At its normal priority N, a thread will execute for the amount of timedefined by its initial execution budget C. As soon as this time isexhausted, the priority of the thread will drop to its low priority Luntil the replenishment operation occurs.

Assume, for example, a system where the thread never blocks or isnever preempted:

T

Replenished at this point;priority is restored

0 msec 40 msec80 msec

Priority N

Priority L

Exhausts budget;drops to lower priority

May or may not run

Thread drops in priority until its budget is replenished.

Here the thread will drop to its low-priority (background) level, whereit may or may not get a chance to run depending on the priority ofother threads in the system.

Once the replenishment occurs, the thread’s priority is raised to itsoriginal level. This guarantees that within a properly configuredsystem, the thread will be given the opportunityevery period T to runfor a maximum execution time C. This ensures that a thread runningat priority N will consume only C/T percent of the system’s resources.

When a thread blocks multiple times, then several replenishmentoperations may be started and occur at different times. This couldmean that the thread’s execution budget will total C within a period T;however, the execution budget may not be contiguous during thatperiod.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 39

Page 62: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Thread scheduling 2006, QNX Software Systems GmbH & Co. KG.

T

0 msec 40 msec 80 msec

Priority N

Priority L

T

1 2

3 4 5 6

Thread oscillates between high and low priority.

In the diagram above, the thread has a budget (C) of 10 msec to beconsumed within each 40-msec replenishment period (T).

1 The initial run of the thread is blocked after 3 msec, so areplenishment operation of 3 msec is scheduled to begin at the40-msec mark, i.e. when its first replenishment period haselapsed.

2 The thread gets an opportunity to run again at 6 msec, whichmarks the start of another replenishment period (T). The threadstill has 7 msec remaining in its budget.

3 The thread runs without blocking for 7 msec, therebyexhausting its budget, and then drops to low priority L, where itmay or may not be able to execute. A replenishment of 7 msecis scheduled to occur at 46 msec (40+ 6), i.e. when T haselapsed.

4 The thread has 3 msec of its budget replenished at 40 msec (seeStep 1) and is therefore boosted back to its normal priority.

5 The thread consumes the 3 msec of its budget and then isdropped back to the low priority.

6 The thread has 7 msec of its budget replenished at 46 msec (seeStep 3) and is boosted back to its normal priority.

40 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 63: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Thread scheduling

And so on. The thread will continue to oscillate between its twopriority levels, servicing aperiodic events in your system in acontrolled, predictable manner.

Manipulating priority and scheduling algorithms

A thread’s priority can vary during its execution, either from directmanipulation by the thread itself or from the kernel adjusting thethread’s priority as it receives a message from a higher-priority thread.

In addition to priority, you can also select the scheduling algorithmthat the kernel will use for the thread. Here are the POSIX calls forperforming these manipulations, along with the microkernel callsused by these library routines:

POSIX call Microkernel call Description

sched getparam() SchedGet() Get schedulingpriority

sched setparam() SchedSet() Set scheduling priority

sched getscheduler() SchedGet() Get scheduling policy

sched setscheduler() SchedSet() Set scheduling policy

IPC issuesSince all the threads in a process have unhindered access to the shareddata space, wouldn’t this execution model “trivially” solve all of ourIPC problems? Can’t we just communicate the data through sharedmemory and dispense with any other execution models and IPCmechanisms?

If only it were that simple!

One issue is that the access of individual threads to common datamust besynchronized. Having one thread read inconsistent databecause another thread is part way through modifying it is a recipe fordisaster. For example, if one thread is updating a linked list, no otherthreads can be allowed to traverse or modify the list until the first

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 41

Page 64: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Thread scheduling 2006, QNX Software Systems GmbH & Co. KG.

thread has finished. A code passage that must execute “serially” (i.e.by only one thread at a time) in this manner is termed a “criticalsection.” The program would fail (intermittently, depending on howfrequently a “collision” occurred) with irreparably damaged linksunless some synchronization mechanism ensured serial access.

Mutexes, semaphores, and condvars are examples of synchronizationtools that can be used to address this problem. These tools aredescribed later in this section.

Although synchronization services can be used to allow threads tocooperate, shared memory per se can’t address a number of IPCissues. For example, although threads can communicate through thecommon data space, this works only if all the threads communicatingare within a single process. What if our application needs tocommunicate a query to a database server? We need to pass thedetails of our query to the database server, but the thread we need tocommunicate with lieswithin a database server process and theaddress space of that server isn’t addressable to us.

The OS takes care of the network-distributed IPC issue because theone interface — message passing — operates in both the local andnetwork-remote cases, and can be used to access all OS services.Since messages can be exactly sized, and since most messages tend tobe quite tiny (e.g. the error status on a write request, or a tiny readrequest), the data moved around the network can be far less withmessage passing than with network-distributed shared memory, whichwould tend to copy 4K pages around.

Thread complexity issuesAlthough threads are very appropriate for some system designs, it’simportant to respect the Pandora’s box of complexities their useunleashes. In some ways, it’s ironic that while MMU-protectedmultitasking has become common, computing fashion has madepopular the use of multiple threads in an unprotected address space.This not only makes debugging difficult, but also hampers thegeneration of reliable code.

42 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 65: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Thread scheduling

Threads were initially introduced to UNIX systems as a“light-weight” concurrency mechanism to address the problem ofslow context switches between “heavy weight” processes. Althoughthis is a worthwhile goal, an obvious question arises: Why areprocess-to-process context switches slow in the first place?

Architecturally, the OS addresses the context-switch performanceissue first. In fact, threads and processes provide nearly identicalcontext-switch performance numbers. QNX Neutrino’sprocess-switch times are faster than UNIX thread-switch times. As aresult, QNX Neutrino threads don’t need to be used to solve the IPCperformance problem; instead, they’re a tool for achieving greaterconcurrency within application and server processes.

Without resorting to threads, fast process-to-process contextswitching makes it reasonable to structure an application as a team ofcooperating processes sharing an explicitly allocated shared-memoryregion. An application thus exposes itself to bugs in the cooperatingprocesses only so far as the effects of those bugs on the contents ofthe shared-memory region. The private memory of the process is stillprotected from the other processes. In the purely threaded model, theprivate data of all threads (including their stacks) is openly accessible,vulnerable to stray pointer errors in any thread in the process.

Nevertheless, threads can also provide concurrency advantages that apure process model cannot address. For example, a filesystem serverprocess that executes requests on behalf of many clients (where eachrequest takes significant time to complete), definitely benefits fromhaving multiple threads of execution. If one client process requests ablock from disk, while another client requests a block already incache, the filesystem process can utilize a pool of threads toconcurrently service client requests, rather than remain “busy” untilthe disk block is read for the first request.

As requests arrive, each thread is able to respond directly from thebuffer cache or to block and wait for disk I/O without increasing theresponse latency seen by other client processes. The filesystem servercan “precreate” a team of threads, ready to respond in turn to client

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 43

Page 66: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Synchronization services 2006, QNX Software Systems GmbH & Co. KG.

requests as they arrive. Although this complicates the architecture ofthe filesystem manager, the gains in concurrency are significant.

Synchronization servicesQNX Neutrino provides the POSIX-standard thread-levelsynchronization primitives, some of which are useful even betweenthreads in different processes. The synchronization services include atleast the following:

Synchronizationservice

Supported betweenprocesses

Supported across aQNX LAN

Mutexes Yes No

Condvars Yes No

Barriers No No

Sleepon locks No No

Reader/writer locks Yes No

Semaphores Yes Yes (named only)

FIFO scheduling Yes No

Send/Receive/Reply Yes Yes

Atomic operations Yes No

44 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 67: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Synchronization services

The above synchronization primitives are implemented directly by thekernel, except for:

� barriers, sleepon locks, and reader/writer locks (which are builtfrom mutexes and condvars)

� atomic operations (which are either implemented directly by theprocessor or emulated in the kernel).

Mutual exclusion locksMutual exclusion locks, or mutexes, are the simplest of thesynchronization services. A mutex is used to ensure exclusive accessto data shared between threads. It is typically acquired(pthread mutex lock()) and released (pthread mutex unlock()) aroundthe code that accesses the shared data (usually a critical section).

Only one thread may have the mutex locked at any given time.Threads attempting to lock an already locked mutex will block untilthe thread is later unlocked. When the thread unlocks the mutex, thehighest-priority thread waiting to lock the mutex will unblock andbecome the new owner of the mutex. In this way, threads willsequence through a critical region in priority-order.

On most processors, acquisition of a mutex doesn’t require entry tothe kernel for a free mutex. What allows this is the use of thecompare-and-swap opcode on x86 processors and the load/storeconditional opcodes on most RISC processors.

Entry to the kernel is done at acquisition time only if the mutex isalready held so that the thread can go on a blocked list; kernel entry isdone on exit if other threads are waiting to be unblocked on thatmutex. This allows acquisition and release of an uncontested criticalsection or resource to be very quick, incurring work by the OS only toresolve contention.

A nonblocking lock function (pthread mutex trylock()) can be used totest whether the mutex is currently locked or not. For best

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 45

Page 68: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Synchronization services 2006, QNX Software Systems GmbH & Co. KG.

performance, the execution time of the critical section should be smalland of bounded duration. A condvar should be used if the thread mayblock within the critical section.

Priority inheritance

If a thread with a higher priority than the mutex owner attempts tolock a mutex, then the effective priority of the current owner will beincreased to that of the higher-priority blocked thread waiting for themutex. The owner will return to its real priority when it unlocks themutex. This scheme not only ensures that the higher-priority threadwill be blocked waiting for the mutex for the shortest possible time,but also solves the classic priority-inversion problem.

The attributes of the mutex can also be modified (usingpthread mutexattr setrecursive()) to allow a mutex to be recursivelylocked by the same thread. This can be useful to allow a thread to calla routine that might attempt to lock a mutex that the thread alreadyhappens to have locked.

Recursive mutexes arenon-POSIX services — they don’t work withcondvars.

Condition variablesA condition variable, or condvar, is used to block a thread within acritical section until some condition is satisfied. The condition can bearbitrarily complex and is independent of the condvar. However, thecondvar must always be used with a mutex lock in order to implementa monitor.

A condvar supports three operations:

� wait (pthread cond wait())

� signal (pthread cond signal())

� broadcast (pthread cond broadcast()).

46 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 69: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Synchronization services

Note that there’s no connection between a condvar signal and aPOSIX signal.

Here’s a typical example of how a condvar can be used:

pthread mutex lock( &m );. . .while (!arbitrary condition) {

pthread cond wait( &cv, &m );}

. . .pthread mutex unlock( &m );

In this code sample, the mutex is acquired before the condition istested. This ensures that only this thread has access to the arbitrarycondition being examined. While the condition is true, the codesample will block on the wait call until some other thread performs asignal or broadcast on the condvar.

Thewhile loop is required for two reasons. First of all, POSIXcannot guarantee that false wakeups will not occur (e.g.multiprocessor systems). Second, when another thread has made amodification to the condition, we need to retest to ensure that themodification matches our criteria. The associated mutex is unlockedatomically bypthread cond wait() when the waiting thread is blockedto allow another thread to enter the critical section.

A thread that performs a signal will unblock the highest-prioritythread queued on the condvar, while a broadcast will unblock allthreads queued on the condvar. The associated mutex is lockedatomically by the highest-priority unblocked thread; the thread mustthen unlock the mutex after proceeding through the critical section.

A version of the condvar wait operation allows a timeout to bespecified (pthread cond timedwait()). The waiting thread can then beunblocked when the timeout expires.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 47

Page 70: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Synchronization services 2006, QNX Software Systems GmbH & Co. KG.

BarriersA barrier is a synchronization mechanism that lets you “corral”several cooperating threads (e.g. in a matrix computation), forcingthem to wait at a specific point until all have finished before any onethread can continue.

Unlike thepthread join() function, where you’d wait for the threadsto terminate, in the case of a barrier you’re waiting for the threads torendezvous at a certain point. When the specified number of threadsarrive at the barrier, we unblockall of them so they can continue torun.

You first create a barrier withpthread barrier init():

#include <pthread.h>

intpthread barrier init (pthread barrier t *barrier,

const pthread barrierattr t *attr,unsigned int count);

This creates a barrier object at the passed address (a pointer to thebarrier object is inbarrier), with the attributes as specified byattr.Thecount member holds the number of threads that must callpthread barrier wait().

Once the barrier is created, each thread will callpthread barrier wait() to indicate that it has completed:

#include <pthread.h>

int pthread barrier wait (pthread barrier t *barrier);

When a thread callspthread barrier wait(), it blocks until the numberof threads specified initially in thepthread barrier init() functionhave calledpthread barrier wait() (and blocked also). When thecorrect number of threads have calledpthread barrier wait(), allthose threads will unblockat the same time.

Here’s an example:

48 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 71: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Synchronization services

/** barrier1.c

*/

#include <stdio.h>

#include <time.h>

#include <pthread.h>#include <sys/neutrino.h>

pthread barrier t barrier; // barrier synchronization object

main () // ignore arguments{

time t now;

// create a barrier object with a count of 3

pthread barrier init (&barrier, NULL, 3);

// start up two threads, thread1 and thread2

pthread create (NULL, NULL, thread1, NULL);

pthread create (NULL, NULL, thread2, NULL);

// at this point, thread1 and thread2 are running

// now wait for completion

time (&now);printf ("main() waiting for barrier at %s", ctime (&now));

pthread barrier wait (&barrier);

// after this point, all three threads have completed.

time (&now);

printf ("barrier in main() done at %s", ctime (&now));}

void *thread1 (void *not used)

{

time t now;

time (&now);

printf ("thread1 starting at %s", ctime (&now));

// do the computation// let’s just do a sleep here...

sleep (20);

pthread barrier wait (&barrier);// after this point, all three threads have completed.

time (&now);

printf ("barrier in thread1() done at %s", ctime (&now));}

void *thread2 (void *not used)

{

time t now;

time (&now);

printf ("thread2 starting at %s", ctime (&now));

// do the computation

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 49

Page 72: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Synchronization services 2006, QNX Software Systems GmbH & Co. KG.

// let’s just do a sleep here...sleep (40);

pthread barrier wait (&barrier);

// after this point, all three threads have completed.time (&now);

printf ("barrier in thread2() done at %s", ctime (&now));

}

The main thread created the barrier object and initialized it with acount of the total number of threads that must be synchronized to thebarrier before the threads may carry on. In the example above, weused a count of3: one for themain() thread, one forthread1(), andone forthread2().

Then we startthread1() andthread2(). To simplify this example, wehave the threads sleep to cause a delay, as if computations wereoccurring. To synchronize, the main thread simply blocks itself on thebarrier, knowing that the barrier will unblock only after the twoworker threads have joined it as well.

In this release, the following barrier functions are included:

Function Description

pthread barrierattr getpshared() Get the value of a barrier’sprocess-shared attribute

pthread barrierattr destroy() Destroy a barrier’s attributesobject

pthread barrierattr init() Initialize a barrier’s attributesobject

pthread barrierattr setpshared() Set the value of a barrier’sprocess-shared attribute

pthread barrier destroy() Destroy a barrier

pthread barrier init() Initialize a barrier

pthread barrier wait() Synchronize participatingthreads at the barrier

50 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 73: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Synchronization services

Sleepon locksSleepon locks are very similar to condvars, with a few subtledifferences. Like condvars, sleepon locks (pthread sleepon lock())can be used to block until a condition becomes true (like a memorylocation changing value). But unlike condvars, which must beallocated for each condition to be checked, sleepon locks multiplextheir functionality over a single mutex and dynamically allocatedcondvar, regardless of the number of conditions being checked. Themaximum number of condvars ends up being equal to the maximumnumber of blocked threads. These locks are patterned after thesleepon locks commonly used within the UNIX kernel.

Reader/writer locksMore formally known as “Multiple readers, single writer locks,” theselocks are used when the access pattern for a data structure consists ofmany threads reading the data, and (at most) one thread writing thedata. These locks are more expensive than mutexes, but can be usefulfor this data access pattern.

This lock works by allowing all the threads that request a read-accesslock (pthread rwlock rdlock()) to succeed in their request. But when athread wishing to write asks for the lock (pthread rwlock wrlock()),the request is denied until all the current reading threads release theirreading locks (pthread rwlock unlock()).

Multiple writing threads can queue (in priority order) waiting for theirchance to write the protected data structure, and all the blockedwriter-threads will get to run before reading threads are allowedaccess again. The priorities of the reading threads are not considered.

There are also calls (pthread rwlock tryrdlock() andpthread rwlock trywrlock()) to allow a thread to test the attempt toachieve the requested lock, without blocking. These calls return witha successful lock or a status indicating that the lock couldn’t begranted immediately.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 51

Page 74: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Synchronization services 2006, QNX Software Systems GmbH & Co. KG.

Reader/writer locks aren’t implemented directly within the kernel, butare instead built from the mutex and condvar services provided by thekernel.

SemaphoresSemaphores are another common form of synchronization that allowsthreads to “post” (sem post()) and “wait” (sem wait()) on asemaphore to control when threads wake or sleep. The post operationincrements the semaphore; the wait operation decrements it.

If you wait on a semaphore that is positive, you will not block.Waiting on a nonpositive semaphore will block until some otherthread executes a post. It is valid to post one or more times before await. This use will allow one or more threads to execute the waitwithout blocking.

A significant difference between semaphores and othersynchronization primitives is that semaphores are “async safe” andcan be manipulated by signal handlers. If the desired effect is to havea signal handler wake a thread, semaphores are the right choice.

Note that in general, mutexes are much faster than semaphores, whichalways require a kernel entry.

Another useful property of semaphores is that they were defined tooperate between processes. Although our mutexes work betweenprocesses, the POSIX thread standard considers this an optionalcapability and as such may not be portable across systems. Forsynchronization between threads in a single process, mutexes will bemore efficient than semaphores.

As a useful variation, anamed semaphore service is also available. Ituses a resource manager and as such allows semaphores to be usedbetween processes on different machines connected by a network.

52 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 75: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Synchronization services

Note that named semaphores areslower than the unnamed variety.☞

Since semaphores, like condition variables, can legally return anonzero value because of a false wakeup, correct usage requires aloop:

while (sem wait(&s) && (errno == EINTR)) { do nothing(); }do critical region(); /* Semaphore was decremented */

Synchronization via scheduling algorithmBy selecting the POSIX FIFO scheduling algorithm, we canguarantee that no two threads of the same priority execute the criticalsection concurrently on a non-SMP system. The FIFO schedulingalgorithm dictates that all FIFO-scheduled threads in the system at thesame priority will run, when scheduled, until they voluntarily releasethe processor to another thread.

This “release” can also occur when the thread blocks as part ofrequesting the service of another process, or when a signal occurs.The critical region must therefore be carefully coded and documentedso that later maintenance of the code doesn’t violate this condition.

In addition, higher-priority threads in that (or any other) process couldstill preempt these FIFO-scheduled threads. So, all the threads thatcould “collide” within the critical section must be FIFO-scheduled atthesame priority. Having enforced this condition, the threads can thencasually access this shared memory without having to first makeexplicit synchronization calls.

CAUTION: This exclusive-access relationship doesn’t apply inmultiprocessor systems, since each CPU could run a threadsimultaneously through the region that would otherwise be seriallyscheduled on a single-processor machine.

!

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 53

Page 76: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Synchronization services 2006, QNX Software Systems GmbH & Co. KG.

Synchronization via message passingOur Send/Receive/Reply message-passing IPC services (describedlater) implement an implicit synchronization by their blocking nature.These IPC services can, in many instances, render othersynchronization services unnecessary. They are also the onlysynchronization and IPC primitives (other than named semaphores,which are built on top of messaging) that can be used across thenetwork.

Synchronization via atomic operationsIn some cases, you may want to perform a short operation (such asincrementing a variable) with the guarantee that the operation willperformatomically — i.e. the operation won’t be preempted byanother thread or ISR (Interrupt Service Routine).

Under QNX Neutrino, we provide atomic operations for:

� adding a value

� subtracting a value

� clearing bits

� setting bits

� toggling (complementing) bits.

These atomic operations are available by including the C header file<atomic.h>.

Although you can use these atomic operations just about anywhere,you’ll find them particularly useful in these two cases:

� between an ISR and a thread

� between two threads (SMP or single-processor).

Since an ISR can preempt a thread at any given point, the only waythat the thread would be able to protect itself would be todisableinterrupts. Since you should avoid disabling interrupts in a realtime

54 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 77: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Synchronization services

system, we recommend that you use the atomic operations providedwith QNX Neutrino.

On an SMP system, multiple threadscan anddo run concurrently.Again, we run into the same situation as with interrupts above — youshould use the atomic operations where applicable to eliminate theneed to disable and reenable interrupts.

Synchronization services implementationThe following table lists the various microkernel calls and thehigher-level POSIX calls constructed from them:

Microkernel call POSIX call Description

SyncTypeCreate() pthread mutex init(),pthread cond init(),sem init()

Create object for mutex,condvars, and semaphore.

SyncDestroy() pthread mutex destroy(),pthread cond destroy(),sem destroy()

Destroy synchronizationobject.

SyncCondvarWait() pthread cond wait(),pthread cond timedwait()

Block on a condvar.

SyncCondvarSignal() pthread cond broadcast(),pthread cond signal()

Wake up condvar-blockedthreads.

SyncMutexLock() pthread mutex lock(),pthread mutex trylock()

Lock a mutex.

SyncMutexUnlock() pthread mutex unlock() Unlock a mutex.

SyncSemPost() sem post() Post a semaphore.

SyncSemWait() sem wait(), sem trywait() Wait on a semaphore.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 55

Page 78: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

QNX Neutrino IPCIPC plays a fundamental role in the transformation of QNX Neutrinofrom an embedded realtime kernel into a full-scale POSIX operatingsystem. As various service-providing processes are added to themicrokernel, IPC is the “glue” that connects those components into acohesive whole.

Although message passing is the primary form of IPC in QNXNeutrino, several other forms are available as well. Unless otherwisenoted, those other forms of IPC are built over our native messagepassing. The strategy is to create a simple, robust IPC service that canbe tuned for performance through a simplified code path in themicrokernel; more “feature cluttered” IPC services can then beimplemented from these.

Benchmarks comparing higher-level IPC services (like pipes andFIFOs implemented over our messaging) with their monolithic kernelcounterparts show comparable performance.

QNX Neutrino offers at least the following forms of IPC:

Service: Implemented in:

Message-passing Kernel

Signals Kernel

POSIX message queues External process

Shared memory Process manager

Pipes External process

FIFOs External process

The designer can select these services on the basis of bandwidthrequirements, the need for queuing, network transparency, etc. Thetradeoff can be complex, but the flexibility is useful.

As part of the engineering effort that went into defining the QNXNeutrino microkernel, the focus on message passing as the

56 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 79: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

fundamental IPC primitive was deliberate. As a form of IPC, messagepassing (as implemented inMsgSend(), MsgReceive(), andMsgReply()), is synchronous and copies data. Let’s explore these twoattributes in more detail.

Synchronous message passingA thread that does aMsgSend() to another thread (which could bewithin another process) will be blocked until the target thread does aMsgReceive(), processes the message, and executes aMsgReply(). If athread executes aMsgReceive() without a previously sent messagepending, it will block until another thread executes aMsgSend().

In Neutrino, a server thread typically loops, waiting to receive amessage from a client thread. As described earlier, a thread —whether a server or a client — is in the READY state if it can use theCPU. It might not actually be getting any CPU time because of its andother threads’ priority and scheduling algorithm, but the thread isn’tblocked.

Let’s look first at the client thread:

Server does aMsgReceive()

Server does aMsgReply() or

MsgError()

Client does aMsgSend()

SENDblocked

Client does aMsgSend()

Other thread

This thread

Legend:READY

REPLYblocked

Changes of state for a client thread in a send-receive-reply transaction.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 57

Page 80: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

� If the client thread callsMsgSend(), and the server thread hasn’tyet calledMsgReceive(), then the client thread becomes SENDblocked. Once the server thread callsMsgReceive(), the kernelchanges the client thread’s state to be REPLY blocked, whichmeans that server thread has received the message and now mustreply. When the server thread callsMsgReply(), the client threadbecomes READY.

� If the client thread callsMsgSend(), and the server thread isalready blocked on theMsgReceive(), then the client threadimmediately becomes REPLY blocked, skipping theSEND-blocked state completely.

� If the server thread fails, exits, or disappears, the client threadbecomes READY, withMsgSend() indicating an error.

Next, let’s consider the server thread:

Server does aMsgReceive()

Server does aMsgReply() or

MsgError()

READY

Client does aMsgSend()

RECEIVEblocked

Server does aMsgReceive() Other thread

This thread

Legend:

Changes of state for a server thread in a send-receive-reply transaction.

� If the server thread callsMsgReceive(), and no other thread hassent to it, then the server thread becomes RECEIVE blocked.When another thread sends to it, the server thread becomesREADY.

58 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 81: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

� If the server thread callsMsgReceive(), and another thread hasalready sent to it, thenMsgReceive() returns immediately with themessage. In this case, the server thread doesn’t block.

� If the server thread callsMsgReply(), it doesn’t become blocked.

This inherent blocking synchronizes the execution of the sendingthread, since the act of requesting that the data be sent also causes thesending thread to be blocked and the receiving thread to be scheduledfor execution. This happens without requiring explicit work by thekernel to determine which thread to run next (as would be the casewith most other forms of IPC). Execution and data move directlyfrom one context to another.

Data-queuing capabilities are omitted from these messagingprimitives because queueing could be implemented when neededwithin the receiving thread. The sending thread is often prepared towait for a response; queueing is unnecessary overhead and complexity(i.e. it slows down the nonqueued case). As a result, the sendingthread doesn’t need to make a separate, explicit blocking call to waitfor a response (as it would if some other IPC form had been used).

While the send and receive operations are blocking and synchronous,MsgReply() (or MsgError()) doesn’t block. Since the client thread isalready blocked waiting for the reply, no additional synchronization isrequired, so a blockingMsgReply() isn’t needed. This allows a serverto reply to a client and continue processing while the kernel and/ornetworking code asynchronously passes the reply data to the sendingthread and marks it ready for execution. Since most servers will tendto do some processing to prepare to receive the next request (at whichpoint they block again), this works out well.

Note that in a network, a reply may not complete as “immediately” asin a local message pass. For more information on network messagepassing, see the chapter on Qnet networking in this book.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 59

Page 82: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

MsgReply() vs MsgError()

TheMsgReply() function is used to return a status and zero or morebytes to the client.MsgError(), on the other hand, is used to returnonly a status to the client. Both functions will unblock the client fromits MsgSend().

Message copyingSince our messaging services copy a message directly from theaddress space of one thread to another without intermediate buffering,the message-delivery performance approaches the memory bandwidthof the underlying hardware. The kernel attaches no special meaningto the content of a message — the data in a message has meaning onlyas mutually defined by sender and receiver. However, “well-defined”message types are also provided so that user-written processes orthreads can augment or substitute for system-supplied services.

The messaging primitives support multipart transfers, so that amessage delivered from the address space of one thread to anotherneedn’t pre-exist in a single, contiguous buffer. Instead, both thesending and receiving threads can specify a vector table that indicateswhere the sending and receiving message fragments reside inmemory. Note that the size of the various parts can be different for thesender and receiver.

Multipart transfers allow messages that have a header block separatefrom the data block to be sent without performance-consumingcopying of the data to create a contiguous message. In addition, if theunderlying data structure is a ring buffer, specifying a three-partmessage will allow a header and two disjoint ranges within the ringbuffer to be sent as a single atomic message. A hardware equivalentof this concept would be that of a scatter/gather DMA facility.

60 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 83: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

EachIOV

may haveany

numberof parts

Message Data

Len Addr

Part 1

Part 3

Part 2

2

0

IOV

Each partmay be0 to 4G

A multipart transfer.

The multipart transfers are also used extensively by filesystems. On aread, the data is copied directly from the filesystem cache into theapplication using a message withn parts for the data. Each part pointsinto the cache and compensates for the fact that cache blocks aren’tcontiguous in memory with a read starting or ending within a block.

For example, with a cache block size of 512 bytes, a read of 1454bytes can be satisfied with a 5-part message:

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 61

Page 84: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

Len Addr

Header

4

0

Five-part IOVFile

system

cache

16

400

512

512

30

4

1

2

3

Scatter/gather of a read of 1454 bytes.

Since message data is explicitly copied between address spaces(rather than by doing page table manipulations), messages can beeasily allocated on the stack instead of from a special pool ofpage-aligned memory for MMU “page flipping.” As a result, many ofthe library routines that implement the API between client and serverprocesses can be trivially expressed, without elaborate IPC-specificmemory allocation calls.

For example, the code used by a client thread to request that thefilesystem manager executelseek on its behalf is implemented asfollows:

#include <unistd.h>#include <errno.h>#include <sys/iomsg.h>

off64 t lseek64(int fd, off64 t offset, int whence) {io lseek t msg;

62 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 85: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

off64 t off;

msg.i.type = IO LSEEK;msg.i.combine len = sizeof msg.i;msg.i.offset = offset;msg.i.whence = whence;msg.i.zero = 0;if(MsgSend(fd, &msg.i, sizeof msg.i, &off, sizeof off) == -1) {

return -1;}return off;

}

off64 t tell64(int fd) {return lseek64(fd, 0, SEEK CUR);

}

off t lseek(int fd, off t offset, int whence) {return lseek64(fd, offset, whence);

}

off t tell(int fd) {return lseek64(fd, 0, SEEK CUR);

}

This code essentially builds a message structure on the stack,populates it with various constants and passed parameters from thecalling thread, and sends it to the filesystem manager associated withfd. The reply indicates the success or failure of the operation.

This implementation doesn’t prevent the kernel from detecting largemessage transfers and choosing to implement “page flipping” forthose cases. Since most messages passed are quite tiny, copyingmessages is often faster than manipulating MMU page tables. Forbulk data transfer, shared memory between processes (withmessage-passing or the other synchronization primitives fornotification) is also a viable option.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 63

Page 86: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

Simple messagesFor simple single-part messages, the OS provides functions that take apointer directly to a buffer without the need for an IOV (input/outputvector). In this case, the number of parts is replaced by the size of themessage directly pointed to. In the case of themessage send primitive— which takes a send and a reply buffer — this introduces fourvariations:

Function Send message Reply message

MsgSend() Simple Simple

MsgSendsv() Simple IOV

MsgSendvs() IOV Simple

MsgSendv() IOV IOV

The other messaging primitives that take a direct message simplydrop the trailing “v” in their names:

IOV Simple direct

MsgReceivev() MsgReceive()

MsgReceivePulsev() MsgReceivePulse()

MsgReplyv() MsgReply()

MsgReadv() MsgRead()

MsgWritev() MsgWrite()

Channels and connectionsIn QNX Neutrino, message passing is directed towards channels andconnections, rather than targeted directly from thread to thread. Athread that wishes to receive messages first creates a channel; another

64 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 87: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

thread that wishes to send a message to that thread must first make aconnection by “attaching” to that channel.

Channels are required by the message kernel calls and are used byservers toMsgReceive() messages on. Connections are created byclient threads to “connect” to the channels made available by servers.Once connections are established, clients canMsgSend() messagesover them. If a number of threads in a process all attach to the samechannel, then the connections all map to the same kernel object forefficiency. Channels and connections are named within a process by asmall integer identifier. Client connections map directly into filedescriptors.

Architecturally, this is a key point. By having client connections mapdirectly into FDs, we have eliminated yet another layer of translation.We don’t need to “figure out” where to send a message based on thefile descriptor (e.g. via aread(fd) call). Instead, we can simplysend a message directly to the “file descriptor” (i.e. connection ID).

Function Description

ChannelCreate() Create a channel to receive messages on.

ChannelDestroy() Destroy a channel.

ConnectAttach() Create a connection to send messages on.

ConnectDetach() Detach a connection.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 65

Page 88: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

Connections

Channel

Channel

Server

Server

Client

Connections map elegantly into file descriptors.

A process acting as a server would implement an event loop to receiveand process messages as follows:

chid = ChannelCreate(flags);SETIOV(&iov, &msg, sizeof(msg));for(;;) {

rcv id = MsgReceivev( chid, &iov, parts, &info );

switch( msg.type ) {/* Perform message processing here */}

MsgReplyv( rcv id, &iov, rparts );}

This loop allows the thread to receive messages from any thread thathad a connection to the channel.

The channel has several lists of messages associated with it:

Receive A LIFO queue of threads waiting for messages.

66 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 89: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

Send A priority FIFO queue of threads that have sent messagesthat haven’t yet been received.

Reply An unordered list of threads that have sent messages thathave been received, but not yet replied to.

While in any of these lists, the waiting thread is blocked (i.e.RECEIVE-, SEND-, or REPLY-blocked). Multiple threads andmultiple clients may wait on one channel.

Pulses

In addition to the synchronous Send/Receive/Reply services, the OSalso supports fixed-size, nonblocking messages. These are referred toaspulses and carry a small payload (four bytes of data plus a singlebyte code).

Pulses pack a relatively small payload — eight bits of code and 32bits of data. Pulses are often used as a notification mechanism withininterrupt handlers. They also allow servers to signal clients withoutblocking on them.

Value

8 bits

32 bits

Code

Pulses pack a small payload.

Priority inheritance

A server process receives messages in priority order. As the threadswithin the server receive requests, they then inherit the priority of thesending thread (but not the scheduling algorithm). As a result, therelative priorities of the threads requesting work of the server arepreserved, and the server work will be executed at the appropriatepriority. This message-driven priority inheritance avoidspriority-inversion problems.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 67

Page 90: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

Message-passing APIThe message-passing API consists of the following functions:

Function Description

MsgSend() Send a message and block until reply.

MsgReceive() Wait for a message.

MsgReceivePulse() Wait for a tiny, nonblocking message (pulse).

MsgReply() Reply to a message.

MsgError() Reply only with an error status. No messagebytes are transferred.

MsgRead() Read additional data from a receivedmessage.

MsgWrite() Write additional data to a reply message.

MsgInfo() Obtain info on a received message.

MsgSendPulse() Send a tiny, nonblocking message (pulse).

MsgDeliverEvent() Deliver an event to a client.

MsgKeyData() Key a message to allow security checks.

Robust implementations with Send/Receive/ReplyArchitecting a QNX Neutrino application as a team of cooperatingthreads and processes via Send/Receive/Reply results in a system thatusessynchronous notification. IPC thus occurs at specified transitionswithin the system, rather than asynchronously.

A significant problem with asynchronous systems is that eventnotification requires signal handlers to be run. Asynchronous IPC canmake it difficult to thoroughly test the operation of the system andmake sure that no matter when the signal handler runs, that processingwill continue as intended. Applications often try to avoid this scenario

68 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 91: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

by relying on a “window” explicitly opened and shut, during whichsignals will be tolerated.

With a synchronous, nonqueued system architecture built aroundSend/Receive/Reply, robust application architectures can be veryreadily implemented and delivered.

Avoiding deadlock situations is another difficult problem whenconstructing applications from various combinations of queued IPC,shared memory, and miscellaneous synchronization primitives. Forexample, suppose thread A doesn’t release mutex 1 until thread Breleases mutex 2. Unfortunately, if thread B is in the state of notreleasing mutex 2 until thread A releases mutex 1, a standoff results.Simulation tools are often invoked in order to ensure that deadlockwon’t occur as the system runs.

The Send/Receive/Reply IPC primitives allow the construction ofdeadlock-free systems with the observation of only these simple rules:

1 Never have two threads send to each other.

2 Always arrange your threads in a hierarchy, with sends goingup the tree.

The first rule is an obvious avoidance of the standoff situation, but thesecond rule requires further explanation. The team of cooperatingthreads and processes is arranged as follows:

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 69

Page 92: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

MsgSend()

MsgSend()MsgSend()

A

B C

D E F

Threads should always send up to higher-level threads.

Here the threads at any given level in the hierarchy never send to eachother, but send only upwards instead.

One example of this might be a client application that sends to adatabase server process, which in turn sends to a filesystem process.Since the sending threads block and wait for the target thread to reply,and since the target thread isn’t send-blocked on the sending thread,deadlock can’t happen.

But how does a higher-level thread notify a lower-level thread that ithas the results of a previously requested operation? (Assume thelower-level thread didn’t want to wait for the replied results when itlast sent.)

QNX Neutrino provides a very flexible architecture with theMsgDeliverEvent() kernel call to deliver nonblocking events. All ofthe common asynchronous services can be implemented with this.For example, the server-side of theselect() call is an API that anapplication can use to allow a thread to wait for an I/O event tocomplete on a set of file descriptors. In addition to an asynchronousnotification mechanism being needed as a “back channel” fornotifications from higher-level threads to lower-level threads, we canalso build a reliable notification system for timers, hardwareinterrupts, and other event sources around this.

70 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 93: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

MsgSend()

A

B

MsgSendPulse()or

MsgDeliverEvent()

A higher-level thread can “send” a pulse event.

A related issue is the problem of how a higher-level thread canrequest work of a lower-level thread without sending to it, riskingdeadlock. The lower-level thread is present only to serve as a “workerthread” for the higher-level thread, doing work on request. Thelower-level thread would send in order to “report for work,” but thehigher-level thread wouldn’t reply then. It would defer the reply untilthe higher-level thread had work to be done, and it would reply(which is a nonblocking operation) with the data describing the work.In effect, the reply is being used to initiate work, not the send, whichneatly side-steps rule #1.

EventsA significant advance in the kernel design for QNX Neutrino is theevent-handling subsystem. POSIX and its realtime extensions define anumber of asynchronous notification methods (e.g. UNIX signals thatdon’t queue or pass data, POSIX realtime signals that may queue andpass data, etc.).

The kernel also defines additional, QNX-specific notificationtechniques such as pulses. Implementing all of these eventmechanisms could have consumed significant code space, so ourimplementation strategy was to build all of these notification methodsover a single, rich, event subsystem.

A benefit of this approach is that capabilities exclusive to onenotification technique can become available to others. For example,an application can apply the same queueing services of POSIX

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 71

Page 94: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

realtime signals to UNIX signals. This can simplify the robustimplementation of signal handlers within applications.

The events encountered by an executing thread can come from any ofthree sources:

� a MsgDeliverEvent() kernel call invoked by a thread

� an interrupt handler

� the expiry of a timer.

The event itself can be any of a number of different types: QNXNeutrino pulses, interrupts, various forms of signals, and forced“unblock” events. “Unblock” is a means by which a thread can bereleased from a deliberately blocked state without any explicit eventactually being delivered.

Given this multiplicity of event types, and applications needing theability to request whichever asynchronous notification technique bestsuits their needs, it would be awkward to require that server processes(the higher-level threads from the previous section) carry code tosupport all these options.

Instead, the client thread can give a data structure, or “cookie,” to theserver to hang on to until later. When the server needs to notify theclient thread, it will invokeMsgDeliverEvent() and the microkernelwill set the event type encoded within the cookie upon the clientthread.

Server Client

MsgDeliverEvent()

MsgReply()

MsgSend()

sigevent

The client sends a sigevent to the server.

72 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 95: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

I/O notification

Theionotify() function is a means by which a client thread can requestasynchronous event delivery. Many of the POSIX asynchronousservices (e.g.mq notify() and the client-side of theselect()) are builton top of it. When performing I/O on a file descriptor (fd), the threadmay choose to wait for an I/O event to complete (for thewrite() case),or for data to arrive (for theread() case). Rather than have the threadblock on the resource manager process that’s servicing the read/writerequest,ionotify() can allow the client thread to post an event to theresource manager that the client thread would like to receive when theindicated I/O condition occurs. Waiting in this manner allows thethread to continue executing and responding to event sources otherthan just the single I/O request.

Theselect() call is implemented using I/O notification and allows athread to block and wait for a mix of I/O events on multiplefd’s whilecontinuing to respond to other forms of IPC.

Here are the conditions upon which the requested event can bedelivered:

NOTIFY COND OUTPUT — there’s room in the output bufferfor more data.

NOTIFY COND INPUT — resource-manager-defined amountof data is available to read.

NOTIFY OUT OF BAND — resource-manager-defined “out ofband” data is available.

SignalsThe OS supports the 32 standard POSIX signals (as in UNIX) as wellas the POSIX realtime signals, both numbered from akernel-implemented set of 64 signals with uniform functionality.While the POSIX standard defines realtime signals as differing fromUNIX-style signals (in that they may contain four bytes of data and abyte code and may be queued for delivery), this functionality can beexplicitly selected or deselected on a per-signal basis, allowing thisconverged implementation to still comply with the standard.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 73

Page 96: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

Incidentally, the UNIX-style signals can select POSIX realtime signalqueuing, if the application wants it. QNX Neutrino also extends thesignal-delivery mechanisms of POSIX by allowing signals to betargeted at specific threads, rather than simply at the processcontaining the threads. Since signals are an asynchronous event,they’re also implemented with the event-delivery mechanisms.

Microkernel call POSIX call Description

SignalKill() kill(), pthread kill(),raise(), sigqueue()

Set a signal on aprocess group,process, or thread.

SignalAction() sigaction() Define action totake on receipt of asignal.

SignalProcmask() sigprocmask(),pthread sigmask()

Change signalblocked mask of athread.

SignalSuspend() sigsuspend(),pause()

Block until a signalinvokes a signalhandler.

SignalWaitinfo() sigwaitinfo() Wait for signal andreturn info on it.

The original POSIX specification defined signal operation onprocesses only. In a multithreaded process, the following rules arefollowed:

� The signal actions are maintained at the process level. If a threadignores or catches a signal, it affectsall threads within the process.

� The signal mask is maintained at the thread level. If a threadblocks a signal, it affects only that thread.

� An un-ignored signal targeted at a thread will be delivered to thatthread alone.

74 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 97: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

� An un-ignored signal targeted at a process is delivered to the firstthread that doesn’t have the signal blocked. If all threads have thesignal blocked, the signal will be queued on the process until anythread ignores or unblocks the signal. If ignored, the signal on theprocess will be removed. If unblocked, the signal will be movedfrom the process to the thread that unblocked it.

When a signal is targeted at a process with a large number of threads,the thread table must be scanned, looking for a thread with the signalunblocked. Standard practice for most multithreaded processes is tomask the signal in all threads but one, which is dedicated to handlingthem. To increase the efficiency of process-signal delivery, the kernelwill cache the last thread that accepted a signal and will alwaysattempt to deliver the signal to it first.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 75

Page 98: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

Process

Signal

ignore

Signal

queue

64...1

64

64...1

47

Signals queued to this thread.

64...1

Signals delivered to this thread.

Thread

vector

Thread

Signal

blocked

Signal

queue

Thread

Signal

blocked

Signal

queue

33

33

Signals delivered to processbut blocked by all threads.

Signal delivery.

The POSIX standard includes the concept of queued realtime signals.QNX Neutrino supports optional queuing of any signal, not justrealtime signals. The queuing can be specified on a signal-by-signalbasis within a process. Each signal can have an associated 8-bit codeand a 32-bit value.

This is very similar to message pulses described earlier. The kerneltakes advantage of this similarity and uses common code formanaging both signals and pulses. The signal number is mapped to apulse priority usingSIGMAX – signo. As a result, signals aredelivered in priority order withlower signal numbers havinghigherpriority. This conforms with the POSIX standard, which states thatexisting signals have priority over the new realtime signals.

76 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 99: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

Special signals

As mentioned earlier, the OS defines a total of 64 signals. Their rangeis as follows:

Signal range Description

1 ... 57 57 POSIX signals (including traditional UNIXsignals)

41 ... 56 16 POSIX realtime signals (SIGRTMIN toSIGRTMAX)

57 ... 64 Eight special-purpose QNX Neutrino signals

The eight special signals cannot be ignored or caught. An attempt tocall thesignal() or sigaction() functions or theSignalAction() kernelcall to change them will fail with an error ofEINVAL .

In addition, these signals are always blocked and have signal queuingenabled. An attempt to unblock these signals via thesigprocmask()function orSignalProcmask() kernel call will be quietly ignored.

A regular signal can be programmed to this behavior using thefollowing standard signal calls. The special signals save theprogrammer from writing this code and protect the signal fromaccidental changes to this behavior.

sigset t *set;struct sigaction action;

sigemptyset(&set);sigaddset(&set, signo);sigprocmask(SIG BLOCK, &set, NULL);

action.sa handler = SIG DFL;action.sa flags = SA SIGINFO;sigaction(signo, &action, NULL);

This configuration makes these signals suitable for synchronousnotification using thesigwaitinfo() function orSignalWaitinfo() kernel

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 77

Page 100: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

call. The following code will block until the eighth special signal isreceived:

sigset t *set;siginfo t info;

sigemptyset(&set);sigaddset(&set, SIGRTMAX + 8);sigwaitinfo(&set, &info);printf("Received signal %d with code %d and value %d\n",

info.si signo,info.si code,info.si value.sival int);

Since the signals are always blocked, the program cannot beinterrupted or killed if the special signal is delivered outside of thesigwaitinfo() function. Since signal queuing is always enabled, signalswon’t be lost — they’ll be queued for the nextsigwaitinfo() call.

These signals were designed to solve a common IPC requirementwhere a server wishes to notify a client that it has informationavailable for the client. The server will use theMsgDeliverEvent()call to notify the client. There are two reasonable choices for theevent within the notification: pulses or signals.

A pulse is the preferred method for a client that may also be a serverto other clients. In this case, the client will have created a channel forreceiving messages and can also receive the pulse.

This won’t be true for most simple clients. In order to receive a pulse,a simple client would be forced to create a channel for this expresspurpose. A signal can be used in place of a pulse if the signal isconfigured to be synchronous (i.e. the signal is blocked) and queued— this is exactly how the special signals are configured. The clientwould replace theMsgReceive() call used to wait for a pulse on achannel with a simplesigwaitinfo() call to wait for the signal.

This signal mechanism is used by Photon to wait for events and by theselect() function to wait for I/O from multiple servers. Of the eightspecial signals, the first two have been given special names for thisuse.

#define SIGSELECT (SIGRTMAX + 1)

78 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 101: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

#define SIGPHOTON (SIGRTMAX + 2)

Summary of signals

Signal Description

SIGABRT Abnormal termination signal such as issued by theabort() function.

SIGALRM Timeout signal such as issued by thealarm()function.

SIGBUS Indicates a memory parity error (QNX-specificinterpretation). Note that if a second fault occurswhile your process is in a signal handler for thisfault, the process will be terminated.

SIGCHLD Child process terminated. The default action is toignore the signal.

SIGCONT Continue if HELD. The default action is to ignorethe signal if the process isn’t HELD.

SIGDEADLK Mutex deadlock occurred. If you haven’t calledSyncMutexEvent(), and if the conditions that wouldcause the kernel to deliver the event occur, then thekernel delivers aSIGDEADLK instead.

SIGEMT EMT instruction (emulator trap).

SIGFPE Erroneous arithmetic operation (integer or floatingpoint), such as division by zero or an operationresulting in overflow. Note that if a second faultoccurs while your process is in a signal handler forthis fault, the process will be terminated.

SIGHUP Death of session leader, or hangup detected oncontrolling terminal.

continued. . .

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 79

Page 102: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

Signal Description

SIGILL Detection of an invalid hardware instruction. Notethat if a second fault occurs while your process isin a signal handler for this fault, the process will beterminated.

SIGINT Interactive attention signal (Break).

SIGIOT IOT instruction (not generated on x86 hardware).

SIGKILL Termination signal — should be used only foremergency situations.This signal cannot be caughtor ignored.

SIGPIPE Attempt to write on a pipe with no readers.

SIGPOLL Pollable event occurred.

SIGQUIT Interactive termination signal.

SIGSEGV Detection of an invalid memory reference. Notethat if a second fault occurs while your process isin a signal handler for this fault, the process will beterminated.

SIGSTOP Stop process (the default).This signal cannot becaught or ignored.

SIGSYS Bad argument to system call.

SIGTERM Termination signal.

SIGTRAP Unsupported software interrupt.

SIGTSTP Stop signal generated from keyboard.

SIGTTIN Background read attempted from control terminal.

SIGTTOU Background write attempted to control terminal.

SIGURG Urgent condition present on socket.

SIGUSR1 Reserved as application-defined signal 1.

continued. . .

80 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 103: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

Signal Description

SIGUSR2 Reserved as application-defined signal 2.

SIGWINCH Window size changed.

POSIX message queuesPOSIX defines a set of nonblocking message-passing facilities knownasmessage queues. Like pipes, message queues are named objectsthat operate with “readers” and “writers.” As a priority queue ofdiscrete messages, a message queue has more structure than a pipeand offers applications more control over communications.

To use POSIX message queues in QNX Neutrino, the message queueserver must be running. QNX Neutrino has two implementations ofmessage queues:

� a “traditional” implementation that uses themqueue resourcemanager (see the Resource Managers chapter in this book)

� an alternate implementation that uses themq server andasynchronous messages.

For more information about these implementations, see theUtilitiesReference.

Unlike our inherent message-passing primitives, the POSIX messagequeues resideoutside the kernel.

Why use POSIX message queues?

POSIX message queues provide a familiar interface for many realtimeprogrammers. They are similar to the “mailboxes” found in manyrealtime executives.

There’s a fundamental difference between our messages and POSIXmessage queues. Our messages block — they copy their data directlybetween the address spaces of the processes sending the messages.POSIX messages queues, on the other hand, implement a

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 81

Page 104: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

store-and-forward design in which the sender need not block and mayhave many outstanding messages queued. POSIX message queuesexist independently of the processes that use them. You would likelyuse message queues in a design where a number of named queues willbe operated on by a variety of processes over time.

For raw performance, POSIX message queues will beslower thanQNX Neutrino native messages for transferring data. However, theflexibility of queues may make this small performance penalty worththe cost.

File-like interface

Message queues resemble files, at least as far as their interface isconcerned. You open a message queue withmq open(), close it withmq close(), and destroy it withmq unlink(). And to put data into(“write”) and take it out of (“read”) a message queue, you usemq send() andmq receive().

For strict POSIX conformance, you should create message queuesthat start with a single slash (/) and contain no other slashes. But notethat we extend the POSIX standard by supporting pathnames that maycontain multiple slashes. This allows, for example, a company toplace all its message queues under its company name and distribute aproduct with increased confidence that a queue name willnot conflictwith that of another company.

In QNX Neutrino, all message queues created will appear in thefilename space under the directory:

� /dev/mqueue if you’re using the traditional (mqueue)implementation

� /dev/mq if you’re using the alternate (mq) implementation.

For example, with the traditional implementation:

82 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 105: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

mq open() name: Pathname of message queue:

/data /dev/mqueue/data

/acme/data /dev/mqueue/acme/data

/qnx/data /dev/mqueue/qnx/data

You can display all message queues in the system using thels

command as follows:

ls -Rl /dev/mqueue

The size printed is the number of messages waiting.

Message-queue functions

POSIX message queues are managed via the following functions:

Function Description

mq open() Open a message queue.

mq close() Close a message queue.

mq unlink() Remove a message queue.

mq send() Add a message to the message queue.

mq receive() Receive a message from the message queue.

mq notify() Tell the calling process that a message is availableon a message queue.

mq setattr() Set message queue attributes.

mq getattr() Get message queue attributes.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 83

Page 106: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

Shared memoryShared memory offers the highest bandwidth IPC available. Once ashared-memory object is created, processes with access to the objectcan use pointers to directly read and write into it. This means thataccess to shared memory is in itselfunsynchronized. If a process isupdating an area of shared memory, care must be taken to preventanother process from reading or updating the same area. Even in thesimple case of a read, the other process may get information that is influx and inconsistent.

To solve these problems, shared memory is often used in conjunctionwith one of the synchronization primitives to make updates atomicbetween processes. If the granularity of updates is small, then thesynchronization primitives themselves will limit the inherently highbandwidth of using shared memory. Shared memory is therefore mostefficient when used for updating large amounts of data as a block.

Both semaphores and mutexes are suitable synchronization primitivesfor use with shared memory. Semaphores were introduced with thePOSIX realtime standard for interprocess synchronization. Mutexeswere introduced with the POSIX threads standard for threadsynchronization. Mutexes may also be used between threads indifferent processes. POSIX considers this an optional capability; wesupport it. In general, mutexes are more efficient than semaphores.

Shared memory with message passing

Shared memory and message passing can be combined to provide IPCthat offers:

� very high performance (shared memory)

� synchronization (message passing)

� network transparency (message passing).

Using message passing, a client sends a request to a server andblocks. The server receives the messages in priority order fromclients, processes them, and replies when it can satisfy a request. Atthis point, the client is unblocked and continues. The very act of

84 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 107: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

sending messages provides natural synchronization between the clientand the server. Rather than copy all the data through the messagepass, the message can contain a reference to a shared-memory region,so the server could read or write the data directly. This is bestexplained with a simple example.

Let’s assume a graphics server accepts draw image requests fromclients and renders them into a frame buffer on a graphics card. Usingmessage passing alone, the client would send a message containingthe image data to the server. This would result in a copy of the imagedata from the client’s address space to the server’s address space. Theserver would then render the image and issue a short reply.

If the client didn’t send the image data inline with the message, butinstead sent a reference to a shared-memory region that contained theimage data, then the server could access the client’s datadirectly.

Since the client is blocked on the server as a result of sending it amessage, the server knows that the data in shared memory is stableand will not change until the server replies. This combination ofmessage passing and shared memory achieves natural synchronizationand very high performance.

This model of operation can also be reversed — the server cangenerate data and give it to a client. For example, suppose a clientsends a message to a server that will read video data directly from aCD-ROM into a shared memory buffer provided by the client. Theclient will be blocked on the server while the shared memory is beingchanged. When the server replies and the client continues, the sharedmemory will be stable for the client to access. This type of design canbe pipelined using more than one shared-memory region.

Simple shared memory can’t be used between processes on differentcomputers connected via a network. Message passing, on the otherhand, is network transparent. A server could use shared memory forlocal clients and full message passing of the data for remote clients.This allows you to provide a high-performance server that is alsonetwork transparent.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 85

Page 108: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

In practice, the message-passing primitives are more than fast enoughfor the majority of IPC needs. The added complexity of a combinedapproach need only be considered for special applications with veryhigh bandwidth.

Creating a shared-memory object

Multiple threads within a process share the memory of that process.To share memory between processes, you must first create ashared-memory region and then map that region into your process’saddress space. Shared-memory regions are created and manipulatedusing the following calls:

Function Description

shm open() Open (or create) a shared-memory region.

close() Close a shared-memory region.

mmap() Map a shared-memory region into a process’saddress space.

munmap() Unmap a shared-memory region from a process’saddress space.

mprotect() Change protections on a shared-memory region.

msync() Synchronize memory with physical storage.

shm ctl() Give special attributes to a shared-memory object.

shm unlink() Remove a shared-memory region.

POSIX shared memory is implemented in QNX Neutrino via theprocess manager (procnto). The above calls are implemented asmessages toprocnto (see the Process Manager chapter in this book).

Theshm open() function takes the same arguments asopen() andreturns a file descriptor to the object. As with a regular file, thisfunction lets you create a new shared-memory object or open anexisting shared-memory object.

86 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 109: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

When a new shared-memory object is created, the size of the object isset to zero. To set the size, you use theftruncate() or shm ctl()function. Note that this is the very same function used to set the sizeof a file.

mmap()

Once you have a file descriptor to a shared-memory object, you usethemmap() function to map the object, or part of it, into yourprocess’s address space. Themmap() function is the cornerstone ofmemory management within QNX Neutrino and deserves a detaileddiscussion of its capabilities.

Themmap() function is defined as follows:

void * mmap(void *where i want it, size t length,int memory protections, int mapping flags, int fd,off t offset within shared memory);

In simple terms this says: “Map inlength bytes of shared memory atoffset within shared memory in the shared-memory object associatedwith fd.”

Themmap() function will try to place the memory at the addresswhere i want it in your address space. The memory will be given theprotections specified bymemory protections and the mapping will bedone according to themapping flags.

The three argumentsfd, offset within shared memory, andlengthdefine a portion of a particular shared object to be mapped in. It’scommon to map in an entire shared object, in which case the offsetwill be zero and the length will be the size of the shared object inbytes. On an Intel processor, the length will be a multiple of the pagesize, which is 4096 bytes.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 87

Page 110: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

mmap ( addr, len, prot, flags, fd, offset );

len

len

addr

offset

Shared memory

object

Process address

space

Arguments to mmap().

The return value ofmmap() will be the address in your process’saddress space where the object was mapped. The argumentwhere i want it is used as a hint by the system to where you want theobject placed. If possible, the object will be placed at the addressrequested. Most applications specify an address of zero, which givesthe system free reign to place the object where it wishes.

The following protection types may be specified formemory protections:

Manifest Description

PROT EXEC Memory may be executed.

PROT NOCACHE Memory should not be cached.

PROT NONE No access allowed.

continued. . .

88 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 111: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

Manifest Description

PROT READ Memory may be read.

PROT WRITE Memory may be written.

ThePROT NOCACHEmanifest should be used when ashared-memory region is used to gain access to dual-ported memorythat may be modified by hardware (e.g. a video frame buffer or amemory-mapped network or communications board). Without thismanifest, the processor may return “stale” data from a previouslycached read.

Themapping flags determine how the memory is mapped. Theseflags are broken down into two parts — the first part is a type andmust be specified as one of the following:

Map type Description

MAP SHARED The mapping is shared by the calling processes.

MAP PRIVATE The mapping is private to the calling process. Itallocates system RAM and makes a copy of theobject.

TheMAP SHAREDtype is the one to use for setting up sharedmemory between processes;MAP PRIVATE has more specializeduses.

A number of flags may be ORed into the above type to further definethe mapping. These are described in detail in themmap() entry in theLibrary Reference. A few of the more interesting flags are:

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 89

Page 112: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

Map type modifier Description

MAP ANON You commonly use this withMAP PRIVATE;you must set thefd parameter toNOFD. Youcan useMAP ANON as the basis for apage-level memory allocator. By default, theallocated memory is zero-filled; see“Initializing allocated memory,” below.

MAP FIXED Map object to the address specified bywhere i want it. If a shared-memory regioncontains pointers within it, then you mayneed to force the region at the same addressin all processes that map it. This can beavoided by using offsets within the region inplace of direct pointers.

MAP PHYS This flag indicates that you wish to deal withphysical memory. Thefd parameter shouldbe set toNOFD. When used withMAP SHARED, theoffset within shared memory specifies theexact physical address to map (e.g. for videoframe buffers). If used withMAP ANONthen physically contiguous memory isallocated (e.g. for a DMA buffer).MAP NOX64K andMAP BELOW16M areused to further define theMAP ANONallocated memory and address limitationspresent in some forms of DMA.

MAP NOX64K Used withMAP PHYS| MAP ANON. Theallocated memory area will not cross a 64Kboundary. This is required for the old 16-bitPC DMA.

continued. . .

90 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 113: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

Map type modifier Description

MAP BELOW16M Used withMAP PHYS| MAP ANON. Theallocated memory area will reside inphysical memory below 16M. This isnecessary when using DMA with ISA busdevices.

MAP NOINIT Relax the POSIX requirement to zero theallocated memory; see “Initializing allocatedmemory,” below.

Using the mapping flags described above, a process can easily sharememory between processes:

/* Map in a shared memory region */fd = shm open("datapoints", O RDWR);addr = mmap(0, len, PROT READ|PROT WRITE, MAP SHARED, fd, 0);

Or share memory with hardware such as video memory:

/* Map in VGA display memory */addr = mmap(0, 65536, PROT READ|PROT WRITE,

MAP PHYS|MAP SHARED, NOFD, 0xa0000);

Or allocate a DMA buffer for a bus-mastering PCI network card:

/* Allocate a physically contiguous buffer */addr = mmap(0, 262144, PROT READ|PROT WRITE|PROT NOCACHE,

MAP PHYS|MAP ANON, NOFD, 0);

You can unmap all or part of a shared-memory object from youraddress space usingmunmap(). This primitive isn’t restricted tounmapping shared memory — it can be used to unmap any region ofmemory within your process. When used in conjunction with theMAP ANON flag tommap(), you can easily implement a privatepage-level allocator/deallocator.

You can change the protections on a mapped region of memory usingmprotect(). Like munmap(), mprotect() isn’t restricted toshared-memory regions — it can change the protection on any regionof memory within your process.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 91

Page 114: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino IPC 2006, QNX Software Systems GmbH & Co. KG.

Initializing allocated memory

POSIX requires thatmmap() zero any memory that it allocates. It cantake a while to initialize the memory, so QNX Neutrino provides away to relax the POSIX requirement. This allows for faster starting,but can be a security problem.

This feature was added in the QNX Neutrino Core OS 6.3.2.☞

Avoiding initializing the memory requires the cooperation of theprocess doing the unmapping and the one doing the mapping:

� Themunmap flags() function is a non-POSIX function that’ssimilar tomunmap() but lets you control what happens when thememory is next mapped:int munmap flags( void *addr, size t len,

unsigned flags );

Pass one of the following for theflags argument:

0 Behave likemunmap().

UNMAP INIT REQUIRED

POSIX initialization of the page to all zeroes is required thenext time the underlying physical memory is allocated.

UNMAP INIT OPTIONAL

Initialization of the underlying physical memory to zeroes onits next allocation is optional.

� If you specify theMAP NOINIT flag tommap(), and the physicalmemory being mapped was previously unmapped withUNMAP INIT OPTIONAL, the POSIX requirement that thememory be zeroed is relaxed.

By default, the kernel initializes the memory, but you can control thisby using the-m option toprocnto. The argument to this option is astring that lets you enable or disable aspects of the memory manager:

92 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 115: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino IPC

i munmap() acts as ifUNMAP INIT REQUIREDwere specified.

˜i munmap() acts as ifUNMAP INIT OPTIONAL were specified.

Note again thatmunmap flags() with a flags argument of 0 behavesthe same asmunmap() does.

Pipes and FIFOs

To use pipes or FIFOs in QNX Neutrino, the pipe resource manager(pipe) must be running.

Pipes

A pipe is an unnamed file that serves as an I/O channel between twoor more cooperating processes—one process writes into the pipe, theother reads from the pipe. Thepipe manager takes care of bufferingthe data. The buffer size is defined asPIPEBUF in the<limits.h>file. A pipe is removed once both of its ends have closed. Thefunctionpathconf() returns the value of the limit.

Pipes are normally used when two processes want to run in parallel,with data moving from one process to the other in a single direction.(If bidirectional communication is required, messages should be usedinstead.)

A typical application for a pipe is connecting the output of oneprogram to the input of another program. This connection is oftenmade by the shell. For example:

ls | more

directs the standard output from thels utility through a pipe to thestandard input of themore utility.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 93

Page 116: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Clock and timer services 2006, QNX Software Systems GmbH & Co. KG.

If you want to: Use the:

Create pipes from within the shell pipe symbol (“|”)

Create pipes from within programspipe() or popen() functions

FIFOs

FIFOs are essentially the same as pipes, except that FIFOs are namedpermanent files that are stored in filesystem directories.

If you want to: Use the:

Create FIFOs from within the shell mkfifo utility

Create FIFOs from within programs mkfifo() function

Remove FIFOs from within the shell rm utility

Remove FIFOs from within programsremove() or unlink()function

Clock and timer servicesClock services are used to maintain the time of day, which is in turnused by the kernel timer calls to implement interval timers.

Valid dates on a QNX Neutrino system range from January 1970 toJanuary 2554, but note that POSIX-compliant code may be limited tothe year 2038. The internal date and time representation reaches itsmaximum value in 2554. If your system must operate past 2554 andthere’s no way for the system to be upgraded or modified in the field,you’ll have to take special care with system dates (contact us for helpwith this).

TheClockTime() kernel call allows you to get or set the system clockspecified by an ID (CLOCK REALTIME), which maintains the systemtime. Once set, the system time increments by some number of

94 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 117: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Clock and timer services

nanoseconds based on the resolution of the system clock. Thisresolution can be queried or changed using theClockPeriod() call.

Within thesystem page, an in-memory data structure, there’s a 64-bitfield (nsec) that holds the number of nanoseconds since the systemwas booted. Thensec field is always monotonically increasing and isnever affected by setting the current time of day viaClockTime() orClockAdjust().

TheClockCycles() function returns the current value of a free-running64-bit cycle counter. This is implemented on each processor as ahigh-performance mechanism for timing short intervals. For example,on Intel x86 processors, an opcode that reads the processor’stime-stamp counter is used. On a Pentium processor, this counterincrements on each clock cycle. A 100 MHz Pentium would have acycle time of 1/100,000,000 seconds (10 nanoseconds). Other CPUarchitectures have similar instructions.

On processors that don’t implement such an instruction in hardware(e.g. a 386), the kernel will emulate one. This will provide a lowertime resolution than if the instruction is provided (838.095345nanoseconds on an IBM PC-compatible system).

In all cases, theSYSPAGE ENTRY(qtime)->cycles per sec fieldgives the number ofClockCycles() increments in one second.

TheClockPeriod() function allows a thread to set the system timer tosome multiple of nanoseconds; the OS kernel will do the best it can tosatisfy the precision of the request with the hardware available.

The interval selected is always rounded down to an integral of theprecision of the underlying hardware timer. Of course, setting it to anextremely low value can result in a significant portion of CPUperformance being consumed servicing timer interrupts.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 95

Page 118: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Clock and timer services 2006, QNX Software Systems GmbH & Co. KG.

Microkernel call POSIX call Description

ClockTime() clock gettime(),clock settime()

Get or set the time of day(using a 64-bit value innanoseconds ranging from1970 to 2554).

ClockAdjust() N/A Apply small time adjustmentsto synchronize clocks.

ClockCycles() N/A Read a 64-bit free-runninghigh-precision counter.

ClockPeriod() clock getres() Get or set the period of theclock.

ClockId() clock getcpuclockid(),pthread getcpuclockid()

Return an integer that’spassed toClockTime() as aclockid t.

Time correctionIn order to facilitate applying time corrections without having thesystem experience abrupt “steps” in time (or even having time jumpbackwards), theClockAdjust() call provides the option to specify aninterval over which the time correction is to be applied. This has theeffect of speeding or retarding time over a specified interval until thesystem has synchronized to the indicated current time. This servicecan be used to implement network-coordinated time averagingbetween multiple nodes on a network.

TimersQNX Neutrino directly provides the full set of POSIX timerfunctionality. Since these timers are quick to create and manipulate,they’re an inexpensive resource in the kernel.

The POSIX timer model is quite rich, providing the ability to have thetimer expire on:

� an absolute date

96 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 119: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Clock and timer services

� a relative date (i.e.n nanoseconds from now)

� cyclical (i.e. everyn nanoseconds).

The cyclical mode is very significant, because the most common useof timers tends to be as a periodic source of events to “kick” a threadinto life to do some processing and then go back to sleep until thenext event. If the thread had to re-program the timer for every event,there would be the danger that time would slip unless the thread wasprogramming an absolute date. Worse, if the thread doesn’t get to runon the timer event because a higher-priority thread is running, the datenext programmed into the timer could be one that has alreadyelapsed!

The cyclical mode circumvents these problems by requiring that thethread set the timer once and then simply respond to the resultingperiodic source of events.

Since timers are another source of events in the OS, they also makeuse of its event-delivery system. As a result, the application canrequest that any of the Neutrino-supported events be delivered to theapplication upon occurrence of a timeout.

An often-needed timeout service provided by the OS is the ability tospecify the maximum time the application is prepared to wait for anygiven kernel call or request to complete. A problem with usinggeneric OS timer services in a preemptive realtime OS is that in theinterval between the specification of the timeout and the request forthe service, a higher-priority process might have been scheduled torun and preempted long enough that the specified timeout will haveexpired before the service is even requested. The application will thenend up requesting the service with an already lapsed timeout in effect(i.e. no timeout). This timing window can result in “hung” processes,inexplicable delays in data transmission protocols, and otherproblems.

alarm(...);...... ← Alarm fires here

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 97

Page 120: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Clock and timer services 2006, QNX Software Systems GmbH & Co. KG.

...blocking call();

Our solution is a form of timeout request atomic to the service requestitself. One approach might have been to provide an optional timeoutparameter on every available service request, but this would overlycomplicate service requests with a passed parameter that would oftengo unused.

QNX Neutrino provides aTimerTimeout() kernel call that allows anapplication to specify a list of blocking states for which to start aspecified timeout. Later, when the application makes a request of thekernel, the kernel will atomically enable the previously configuredtimeout if the application is about to block on one of the specifiedstates.

Since the OS has a very small number of blocking states, thismechanism works very concisely. At the conclusion of either theservice request or the timeout, the timer will be disabled and controlwill be given back to the application.

TimerTimeout(...);.........

blocking call();... ← Timer atomically armed within kernel

Microkernel call POSIX call Description

TimerAlarm() alarm() Set a process alarm.

TimerCreate() timer create() Create an interval timer.

TimerDestroy() timer delete() Destroy an interval timer.

continued. . .

98 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 121: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Interrupt handling

Microkernel call POSIX call Description

TimerInfo() timer gettime() Get time remaining on aninterval timer.

TimerInfo() timer getoverrun() Get number of overrunson an interval timer.

TimerSettime() timer settime() Start an interval timer.

TimerTimeout() sleep(), nanosleep(),sigtimedwait(),pthread cond timedwait(),pthread mutex trylock()

Arm a kernel timeout forany blocking state.

Interrupt handlingNo matter how much we wish it were so, computers are not infinitelyfast. In a realtime system, it’s absolutely crucial that CPU cyclesaren’t unnecessarily spent. It’s also crucial to minimize the time fromthe occurrence of an external event to the actual execution of codewithin the thread responsible for reacting to that event. This time isreferred to aslatency.

The two forms of latency that most concern us are interrupt latencyand scheduling latency.

Latency times can vary significantly, depending on the speed of theprocessor and other factors. For more information, visit our website(www.qnx.com).

Interrupt latencyInterrupt latency is the time from the assertion of a hardware interruptuntil the first instruction of the device driver’s interrupt handler isexecuted. The OS leaves interrupts fully enabled almost all the time,so that interrupt latency is typically insignificant. But certain criticalsections of code do require that interrupts be temporarily disabled.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 99

Page 122: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Interrupt handling 2006, QNX Software Systems GmbH & Co. KG.

The maximum such disable time usually defines the worst-caseinterrupt latency — in QNX Neutrino this is very small.

The following diagrams illustrate the case where a hardware interruptis processed by an established interrupt handler. The interrupt handlereither will simply return, or it will return and cause an event to bedelivered.

T interrupt latency

T interrupt processing time

T

il

int

iret interrupt termination time

Tint

Interrupt

occurs

Interrupt handler

runs

Interrupt handler

finishes

Interrupted process

continues execution

Til Tiret

Time

Interrupt handler simply terminates.

The interrupt latency (Til) in the above diagram represents theminimum latency — that which occurs when interrupts were fullyenabled at the time the interrupt occurred. Worst-case interruptlatency will be this timeplus the longest time in which the OS, or therunning system process, disables CPU interrupts.

Scheduling latencyIn some cases, the low-level hardware interrupt handler must schedulea higher-level thread to run. In this scenario, the interrupt handler willreturn and indicate that an event is to be delivered. This introduces asecond form of latency —scheduling latency — which must beaccounted for.

Scheduling latency is the time between the last instruction of theuser’s interrupt handler and the execution of the first instruction of a

100 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 123: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Interrupt handling

driver thread. This usually means the time it takes to save the contextof the currently executing thread and restore the context of therequired driver thread. Although larger than interrupt latency, thistime is also kept small in a QNX Neutrino system.

Til interrupt latency

Tint interrupt processing time

Tsl scheduling latency

Tint

Interrupt

occurs

Interrupt handler

runs

Interrupt handler

finishes,

triggering a

sigevent

Driver thread

runs

Til Tsl

Time

Interrupt handler terminates, returning an event.

It’s important to note thatmost interrupts terminate without deliveringan event. In a large number of cases, the interrupt handler can takecare of all hardware-related issues. Delivering an event to wake up ahigher-leveldriver thread occurs only when a significant event occurs.For example, the interrupt handler for a serial device driver wouldfeed one byte of data to the hardware upon each received transmitinterrupt, and would trigger the higher-level thread within(devc-ser*) only when the output buffer is nearly empty.

Nested interruptsQNX Neutrino fully supports nested interrupts. The previousscenarios describe the simplest — and most common — situationwhere only one interrupt occurs. Worst-case timing considerations forunmasked interrupts must take into account the time for all interruptscurrently being processed, because a higher priority, unmaskedinterrupt will preempt an existing interrupt.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 101

Page 124: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Interrupt handling 2006, QNX Software Systems GmbH & Co. KG.

In the following diagram, Thread A is running. Interrupt IRQx causesinterrupt handler Intx to run, which is preempted by IRQy and itshandler Inty. Inty returns an event causing Thread B to run; Intxreturns an event causing Thread C to run.

IRQy

IRQx

Intx

Thread B

Thread C

Thread A

Inty

Time

Stacked interrupts.

Interrupt callsThe interrupt-handling API includes the following kernel calls:

Function Description

InterruptAttach() Attach a local function to an interruptvector.

InterruptAttachEvent() Generate an event on an interrupt, whichwill ready a thread. No user interrupthandler runs. This is the preferred call.

InterruptDetach() Detach from an interrupt using the IDreturned byInterruptAttach() orInterruptAttachEvent().

continued. . .

102 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 125: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Interrupt handling

Function Description

InterruptWait() Wait for an interrupt.

InterruptEnable() Enable hardware interrupts.

InterruptDisable() Disable hardware interrupts.

InterruptMask() Mask a hardware interrupt.

InterruptUnmask() Unmask a hardware interrupt.

InterruptLock() Guard a critical section of code betweenan interrupt handler and a thread. Aspinlock is used to make this codeSMP-safe. This function is a superset ofInterruptDisable() and should be used inits place.

InterruptUnlock() Remove an SMP-safe lock on a criticalsection of code.

Using this API, a suitably privileged user-level thread can callInterruptAttach() or InterruptAttachEvent(), passing a hardwareinterrupt number and the address of a function in the thread’s addressspace to be called when the interrupt occurs. QNX Neutrino allowsmultiple ISRs to be attached to each hardware interrupt number —unmasked interrupts can be serviced during the execution of runninginterrupt handlers.

For more information onInterruptLock() andInterruptUnlock(), see“Critical sections” in the chapter on Multicore Processing in thisguide.

The following code sample shows how to attach an ISR to thehardware timer interrupt on the PC (which the OS also uses for thesystem clock). Since the kernel’s timer ISR is already dealing withclearing the source of the interrupt, this ISR can simply increment acounter variable in the thread’s data space and return to the kernel:

#include <stdio.h>

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 103

Page 126: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Interrupt handling 2006, QNX Software Systems GmbH & Co. KG.

#include <sys/neutrino.h>#include <sys/syspage.h>

struct sigevent event;volatile unsigned counter;

const struct sigevent *handler( void *area, int id ) {// Wake up the thread every 100th interruptif ( ++counter == 100 ) {

counter = 0;return( &event );}

elsereturn( NULL );

}

int main() {int i;int id;

// Request I/O privilegesThreadCtl( NTO TCTL IO, 0 );

// Initialize event structureevent.sigev notify = SIGEV INTR;

// Attach ISR vectorid=InterruptAttach( SYSPAGE ENTRY(qtime)->intr, &handler,

NULL, 0, 0 );

for( i = 0; i < 10; ++i ) {// Wait for ISR to wake us upInterruptWait( 0, NULL );printf( "100 events\n" );}

// Disconnect the ISR handlerInterruptDetach(id);return 0;}

With this approach, appropriately privileged user-level threads candynamically attach (and detach) interrupt handlers to (and from)hardware interrupt vectors at run time. These threads can be debuggedusing regular source-level debug tools; the ISR itself can be debuggedby calling it at the thread level and source-level stepping through it orby using theInterruptAttachEvent() call.

104 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 127: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Interrupt handling

When the hardware interrupt occurs, the processor will enter theinterrupt redirector in the microkernel. This code pushes the registersfor the context of the currently running thread into the appropriatethread table entry and sets the processor context such that the ISR hasaccess to the code and data that are part of the thread the ISR iscontained within. This allows the ISR to use the buffers and code inthe user-level thread to resolve the interrupt and, if higher-level workby the thread is required, to queue an event to the thread the ISR ispart of, which can then work on the data the ISR has placed intothread-owned buffers.

Since it runs with the memory-mapping of the thread containing it, theISR can directly manipulate devices mapped into the thread’s addressspace, or directly perform I/O instructions. As a result, device driversthat manipulate hardware don’t need to be linked into the kernel.

The interrupt redirector code in the microkernel will call each ISRattached to that hardware interrupt. If the value returned indicates thata process is to be passed an event of some sort, the kernel will queuethe event. When the last ISR has been called for that vector, the kernelinterrupt handler will finish manipulating the interrupt controlhardware and then “return from interrupt.”

This interrupt return won’t necessarily be into the context of thethread that was interrupted. If the queued event caused ahigher-priority thread to become READY, the microkernel will theninterrupt-return into the context of the now-READY thread instead.

This approach provides a well-bounded interval from the occurrenceof the interrupt to the execution of the first instruction of theuser-level ISR (measured asinterrupt latency), and from the lastinstruction of the ISR to the first instruction of the thread readied bythe ISR (measured as thread or processscheduling latency).

The worst-case interrupt latency is well-bounded, because the OSdisables interrupts only for a couple opcodes in a few critical regions.Those intervals when interrupts are disabled have deterministicruntimes, because they’re not data dependent.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 105

Page 128: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Interrupt handling 2006, QNX Software Systems GmbH & Co. KG.

The microkernel’s interrupt redirector executes only a fewinstructions before calling the user’s ISR. As a result, processpreemption for hardware interrupts or kernel calls is equally quickand exercises essentially the same code path.

While the ISR is executing, it has full hardware access (since it’s partof a privileged thread), but can’t issue other kernel calls. The ISR isintended to respond to the hardware interrupt in as few microsecondsas possible, do the minimum amount of work to satisfy the interrupt(read the byte from the UART, etc.), and if necessary, cause a threadto be scheduled at some user-specified priority to do further work.

Worst-case interrupt latency is directly computable for a givenhardware priority from the kernel-imposed interrupt latency and themaximum ISR runtime for each interrupt higher in hardware prioritythan the ISR in question. Since hardware interrupt priorities can bereassigned, the most important interrupt in the system can be madethe highest priority.

Note also that by using theInterruptAttachEvent() call, no user ISR isrun. Instead, a user-specified event is generated on each and everyinterrupt; the event will typically cause a waiting thread to bescheduled to run and do the work. The interrupt is automaticallymasked when the event is generated and then explicitly unmasked bythe thread that handles the device at the appropriate time.

Both InterruptMask() andInterruptUnmask() arecounting functions.For example, ifInterruptMask() is called ten times, thenInterruptUnmask() must also be called ten times.

Thus the priority of the work generated by hardware interrupts can beperformed at OS-scheduled priorities rather than hardware-definedpriorities. Since the interrupt source won’t re-interrupt until serviced,the effect of interrupts on the runtime of critical code regions forhard-deadline scheduling can be controlled.

In addition to hardware interrupts, various “events” within themicrokernel can also be “hooked” by user processes and threads.When one of these events occurs, the kernel can upcall into the

106 Chapter 2 � The QNX Neutrino Microkernel November 2, 2006

Page 129: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Interrupt handling

indicated function in the user thread to perform some specificprocessing for this event. For example, whenever the idle thread in thesystem is called, a user thread can have the kernel upcall into thethread so that hardware-specific low-power modes can be readilyimplemented.

Microkernel call Description

InterruptHookIdle() When the kernel has no active thread toschedule, it will run the idle thread, whichcan upcall to a user handler. This handlercan perform hardware-specificpower-management operations.

InterruptHookTrace() This function attaches a pseudo interrupthandler that can receive trace events fromthe instrumented kernel.

November 2, 2006 Chapter 2 � The QNX Neutrino Microkernel 107

Page 130: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 131: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 3

The Instrumented Microkernel

In this chapter. . .Introduction 111Instrumentation at a glance 111Event control 112Data interpretation 114Proactive tracing 116

November 2, 2006 Chapter 3 � The Instrumented Microkernel 109

Page 132: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 133: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

IntroductionAn instrumented version of the QNX Neutrino microkernel(procnto-instr) is equipped with a sophisticated tracing andprofiling mechanism that lets you monitor your system’s execution inreal time. Theprocnto-instr module works on both single-CPUand SMP systems.

Theprocnto-instr module uses very little overhead and givesexceptionally good performance — it’s typically about 98% as fast asthe noninstrumented kernel (when it isn’t logging). The additionalamount of code (about 30K on an x86 system) in the instrumentedkernel is a relatively small price to pay for the added power andflexibility of this useful tool. Depending on the footprint requirementsof your final system, you may choose to use this special kernel as adevelopment/prototyping tool or as the actual kernel in your finalproduct.

The instrumented module is nonintrusive — you don’t have to modifya program’s source code in order to monitor how that programinteracts with the kernel. You can trace as many or as few interactions(e.g. kernel calls, state changes, and other system activities) as youwant between the kernel and any running thread or process in yoursystem. You can even monitor interrupts. In this context, all suchactivities are known asevents.

Instrumentation at a glanceHere are the essential tasks involved in kernel instrumentation:

1 The instrumented microkernel (procnto-instr) emits traceevents as a result of various system activities. These events areautomatically copied to a set of buffers grouped into a circularlinked list.

2 As soon as the number of events inside a buffer reaches thehigh-water mark, the kernel notifies a data-capture utility.

3 The data-capture utility then writes the trace events from thebuffer to an output device (e.g. a serial port, an event file, etc.).

November 2, 2006 Chapter 3 � The Instrumented Microkernel 111

Page 134: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Event control 2006, QNX Software Systems GmbH & Co. KG.

4 A data-interpretation facility then interprets the events andpresents this data to the user.

Data filter/interpretation

Instrumentedmicrokernel

Data-captureutility

Event buffers

Process and thread activity

Instrumentation at a glance.

Event controlGiven the large number of activities occurring in a live system, thenumber of events that the kernel emits can be overwhelming (in termsof the amount of data, the processing requirements, and the resourcesneeded to store it). But you can easily control the amount of dataemitted. Specifically, you can:

� control the initial conditions that trigger event emissions

� apply predefined kernel filters to dynamically control emissions

� implement your own event handlers for even more filtering.

Once the data has been collected by the data-capture utility(tracelogger), it can then be analyzed. You can analyze the data inreal time or offline after the relevant events have been gathered. TheSystem Analysis tool within the IDE presents this data graphically soyou can “see” what’s going on in your system.

112 Chapter 3 � The Instrumented Microkernel November 2, 2006

Page 135: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Event control

Modes of emissionApart from applying the various filters to control the event stream, youcan also specify one of two modes the kernel can use to emit events:

fast mode Emits only the most pertinent information (e.g. onlytwo kernel call arguments) about an event.

wide mode Generates more information (e.g.all kernel callarguments) for the same event.

The trade-off here is one of speed vs knowledge: fast mode deliversless data, while wide mode packs much more information for eachevent. Either way, you can easily tune your system, because thesemodes work on a per-event basis.

As an example of the difference between the fast and wide emissionmodes, let’s look at the kinds of information we might see for aMsgSendv() call entry:

Fast mode data Number of bytes for the event

Connection ID 4 bytes

Message data 4 bytes (the first 4 bytes usually comprise theheader)

Total emitted: 8 bytes

Wide mode data Number of bytes for the event

Connection ID 4 bytes

# of parts to send 4 bytes

# of parts to receive 4 bytes

continued. . .

November 2, 2006 Chapter 3 � The Instrumented Microkernel 113

Page 136: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Data interpretation 2006, QNX Software Systems GmbH & Co. KG.

Wide mode data Number of bytes for the event

Message data 4 bytes (the first 4 bytes usually comprisethe header)

Message data 4 bytes

Message data 4 bytes

Total emitted: 24 bytes

Ring bufferRather than always emit events to an external device, the kernel cankeep all of the trace events in aninternal circular buffer.

This buffer can be programmatically dumped to an external device ondemand when a certain triggering condition is met, making this a verypowerful tool for identifying elusive bugs that crop up under certainruntime conditions.

Data interpretationThe data of an event includes a high-precision timestamp as well asthe ID number of the CPU on which the event was generated. Thisinformation helps you easily diagnose difficult timing problems,which are more likely to occur on multiprocessor systems.

The event format also includes the CPU platform (e.g. x86, PowerPC,etc.) and endian type, which facilitates remote analysis (whether inreal time or offline). Using a data interpreter, you can view the dataoutput in various ways, such as:

� a timestamp-based linear presentation of the entire system

� a “running” view of only the active threads/processes

� a state-based view of events per process/thread.

The linear output from the data interpreter might look something likethis:

114 Chapter 3 � The Instrumented Microkernel November 2, 2006

Page 137: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Data interpretation

TRACEPRINTER version 0.94-- HEADER FILE INFORMATION --

TRACE FILE NAME:: /dev/shmem/tracebuffer

TRACE DATE:: Fri Jun 8 13:14:40 2001TRACE VER MAJOR:: 0

TRACE VER MINOR:: 96

TRACE LITTLE ENDIAN:: TRUETRACE ENCODING:: 16 byte events

TRACE BOOT DATE:: Fri Jun 8 04:31:05 2001TRACE CYCLES PER SEC:: 400181900

TRACE CPU NUM:: 4

TRACE SYSNAME:: QNXTRACE NODENAME:: x86quad.gp.qa

TRACE SYS RELEASE:: 6.1.0

TRACE SYS VERSION:: 2001/06/04-14:07:56TRACE MACHINE:: x86pc

TRACE SYSPAGE LEN:: 2440

-- KERNEL EVENTS --t:0x1310da15 CPU:01 CONTROL :TIME msb:0x0000000f, lsb(offset):0x1310d81c

t:0x1310e89d CPU:01 PROCESS :PROCCREATE NAME

ppid:0pid:1

name:./procnto-smp-instr

t:0x1310eee4 CPU:00 THREAD :THCREATE pid:1 tid:1t:0x1310f052 CPU:00 THREAD :THRUNNING pid:1 tid:1

t:0x1310f144 CPU:01 THREAD :THCREATE pid:1 tid:2t:0x1310f201 CPU:01 THREAD :THREADY pid:1 tid:2

t:0x1310f32f CPU:02 THREAD :THCREATE pid:1 tid:3

t:0x1310f3ec CPU:02 THREAD :THREADY pid:1 tid:3t:0x1310f52d CPU:03 THREAD :THCREATE pid:1 tid:4

t:0x1310f5ea CPU:03 THREAD :THRUNNING pid:1 tid:4

t:0x1310f731 CPU:02 THREAD :THCREATE pid:1 tid:5.

.

.

To help you fine-tune your interpretation of the event data stream, weprovide a library (traceparser) so you can write your own customevent interpreters.

System analysis with the IDEThe IDE module of the System Analysis Toolkit (SAT) can serve as acomprehensive instrumentation control and post-processingvisualization tool.

From within the IDE, developers can configure all trace events andmodes, and then transfer log files automatically to a remote systemfor analysis. As a visualization tool, the IDE provides a rich set ofevent and process filters designed to help developers quickly prunedown massive event sets in order to see only those events of interest.

November 2, 2006 Chapter 3 � The Instrumented Microkernel 115

Page 138: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Proactive tracing 2006, QNX Software Systems GmbH & Co. KG.

The IDE helps you visualize system activity.

Proactive tracingWhile the instrumented kernel provides an excellent unobtrusivemethod for instrumenting and monitoring processes, threads, and thestate of your system in general, you can also have your applicationsproactively influence the event-collection process.

Using theTraceEvent() library call, applications themselves can injectcustom events into the trace stream. This facility is especially usefulwhen building large, tightly coupled, multicomponent systems.

For example, the following simple call would inject the integer valuesof eventcode, first, andsecond into the event stream:

TraceEvent( NTO TRACE INSERTSUSEREVENT, eventcode, first,second);

116 Chapter 3 � The Instrumented Microkernel November 2, 2006

Page 139: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Proactive tracing

You can also inject a string (e.g. “My Event”) into the event stream,as shown in the following code:

#include <stdio.h>#include <sys/trace.h>

/* Code to associate with emitted events */#define MYEVENTCODE 12

int main(int argc, char **argv) {printf("My pid is %d \n", getpid());

/* Inject two integer events (26, 1975) */TraceEvent( NTO TRACE INSERTSUSEREVENT, MYEVENTCODE,

26, 1975);

/* Inject a string event (My Event) */TraceEvent( NTO TRACE INSERTUSRSTREVENT, MYEVENTCODE,

"My Event");

return 0;}

The output, as gathered by thetraceprinter data interpreter,would then look something like this:

.

.

.t:0x38ea737e CPU:00 USREVENT:EVENT:12, d0:26 d1:1975...t:0x38ea7cb0 CPU:00 USREVENT:EVENT:12 STR:"My Event"

Note that 12 was specified as the trace user eventcode for these events.

November 2, 2006 Chapter 3 � The Instrumented Microkernel 117

Page 140: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 141: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 4

Multicore Processing

In this chapter. . .Introduction 121Asymmetric multiprocessing (AMP) 122Symmetric multiprocessing (SMP) 123Bound multiprocessing (BMP) 130Choosing between AMP, SMP, and BMP132

November 2, 2006 Chapter 4 � Multicore Processing 119

Page 142: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 143: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

Introduction“Two heads are better than one” goes the old saying, and the same istrue for computer systems, where two—or more—processors cangreatly improve performance. Multiprocessing systems can be inthese forms:

Discrete or traditional

A system that has separate physical processors hookedup in multiprocessing mode over a board-level bus.

Multicore A chip that has one physical processor with multipleCPUs interconnected over a chip-level bus.

Multicore processors deliver greater computing powerthrough concurrency, offer greater system density, andrun at lower clock speeds than uniprocessor chips.Multicore processors also reduce thermal dissipation,power consumption, and board area (and hence the costof the system).

Multiprocessing includes several operating modes:

Asymmetric multiprocessing (AMP)

A separate OS, or a separate instantiation of the same OS, runson each CPU.

Symmetric multiprocessing (SMP)

A single instantiation of an OS manages all CPUssimultaneously, and applications can float to any of them.

Bound multiprocessing (BMP)

A single instantiation of an OS manages all CPUssimultaneously, but each application is locked to a specificCPU.

November 2, 2006 Chapter 4 � Multicore Processing 121

Page 144: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Asymmetric multiprocessing (AMP) 2006, QNX Software Systems GmbH & Co. KG.

In order to use symmetric or bound multiprocessing, you need to getthe Multicore Technology Development Kit (TDK).

Asymmetric multiprocessing (AMP)Asymmetric multiprocessing provides an execution environmentthat’s similar to conventional uniprocessor systems. It offers arelatively straightforward path for porting legacy code and provides adirect mechanism for controlling how the CPUs are used. In mostcases, it lets you work with standard debugging tools and techniques.

AMP can be:

� homogeneous — each CPU runs the same type and version of theOS

� heterogeneous — each CPU runs either a different OS or adifferent version of the same OS

Neutrino’s distributed programming model lets you make the best useof the multiple CPUs in a homogeneous environment. Applicationsrunning on one CPU can communicate transparently with applicationsand system services (e.g. device drivers, protocol stacks) on otherCPUs, without the high CPU utilization imposed by traditional formsof interprocessor communication.

In heterogeneous systems, you must either implement a proprietarycommunications scheme or choose two OSs that share a commoninfrastructure (likely IP based) for interprocessor communications. Tohelp avoid resource conflicts, the OSs should also providestandardized mechanisms for accessing shared hardware components.

With AMP, you decide how the shared hardware resources used byapplications are divided up between the CPUs. Normally, thisresource allocation occurs statically during boot time and includesphysical memory allocation, peripheral usage, and interrupt handling.While the system could allocate the resources dynamically, doing sowould entail complex coordination between the CPUs.

122 Chapter 4 � Multicore Processing November 2, 2006

Page 145: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Symmetric multiprocessing (SMP)

In an AMP system, a process always runs on the same CPU, evenwhen other CPUs run idle. As a result, one CPU can end up beingunder- or overutilized. To address the problem, the system couldallow applications to migrate dynamically from CPU to another.Doing so, however, can involve complex checkpointing of stateinformation or a possible service interruption as the application isstopped on one CPU and restarted on another. Also, such migration isdifficult, if not impossible, if the CPUs run different OSs.

Symmetric multiprocessing (SMP)Allocating resources in a multicore design can be difficult, especiallywhen multiple software components are unaware of how othercomponents are employing those resources.

Symmetric multiprocessing addresses the issue by running only onecopy of Neutrino on all of the system’s CPUs. Because the OS hasinsight into all system elements at all times, it can allocate resourceson the multiple CPUs with little or no input from the applicationdesigner. Moreover, Neutrino provides built-in standardizedprimitives, such aspthread mutex lock(), pthread mutex unlock(),pthread spin lock(), andpthread spin unlock(), that let multipleapplications share these resources safely and easily.

By running only one copy of Neutrino, SMP can dynamically allocateresources to specific applications rather than to CPUs, therebyenabling greater utilization of available processing power. It also letssystem tracing tools gather operating statistics and applicationinteractions for the multiprocessing system as a whole, giving youvaluable insight into how to optimize and debug applications.

For instance, the System Profiler in the QNX Momentics developmentsuite can track thread migration from one CPU to another, as well asOS primitive usage, scheduling events, application-to-applicationmessaging, and other events, all with high-resolution timestamping.Application synchronization also becomes much easier since you usestandard OS primitives rather than complex IPC mechanisms.

November 2, 2006 Chapter 4 � Multicore Processing 123

Page 146: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Symmetric multiprocessing (SMP) 2006, QNX Software Systems GmbH & Co. KG.

Neutrino lets the threads of execution within an application runconcurrently on any CPU, making the entire computing power of thechip available to applications at all times. Neutrino’s preemption andthread prioritization capabilities help you ensure that CPU cycles goto the application that needs them the most.

Neutrino’s microkernel approachSMP is typically associated with high-end operating systems such asUnix and Windows NT running on high-end servers. These largemonolithic systems tend to be quite complex, the result of manyperson-years of development. Since these large kernels contain thebulk of all OS services, the changes to support SMP are extensive,usually requiring large numbers of modifications and the use ofspecialized spinlocks throughout the code.

QNX Neutrino, on the other hand, contains a very small microkernelsurrounded by processes that act as resource managers, providingservices such as filesystems, character I/O, and networking. Bymodifying the microkernel alone, all other OS services will gain fulladvantage of SMPwithout the need for coding changes. If theseservice-providing processes are multithreaded, their many threadswill be scheduled among the available processors. Even asingle-threaded server would also benefit from an SMP system,because its thread would be scheduled on the available processorsbeside other servers and client processes.

As a testament to this microkernel approach, the SMP-enabled QNXNeutrino kernel/process manager adds only a few kilobytes ofadditional code. The SMP versions are designed for these mainprocessor families:

� PowerPC (e.g.procnto-600-smp)

� MIPS (procnto-smp)

� x86 (procnto-smp)

The x86 version can boot on any system that conforms to the IntelMultiProcessor Specification (MP Spec) with up to eight Pentium (or

124 Chapter 4 � Multicore Processing November 2, 2006

Page 147: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Symmetric multiprocessing (SMP)

better) processors. QNX Neutrino also supports Intel’sHyper-Threading Technology found in P4 and Xeon processors.

Theprocnto-smp manager will also function on a single non-SMPsystem. With the cost of building a dual-processor Pentiummotherboard very nearly the same as that for a single-processormotherboard, it’s possible to deliver cost-effective solutions that canbe scaled in the field by the simple addition of a second CPU. The factthat the OS itself is only a few kilobytes larger also allows SMP to beseriously considered for small CPU-intensive embedded systems, notjust high-end servers.

The PowerPC and MIPS versions of the SMP-enabled kernel deliverfull SMP support (e.g. cache-coherency, interprocessor interrupts,etc.) on appropriate PPC and MIPS hardware. The PPC versionsupports any SMP system with 7xx or 74xx series processors, as insuch reference design platforms as the Motorola MVP or the MarvellEV-64260-2XMPC7450 SMP Development System. The MIPSversion supports such systems as the dual-core Broadcom BCM1250processor.

Booting an x86 SMP systemThe microkernel itself contains very little hardware- orsystem-specific code. The code that determines the capabilities of thesystem is isolated in a startup program, which is responsible forinitializing the system, determining available memory, etc.Information gathered is placed into a memory table available to themicrokernel and to all processes (on a read-only basis).

Thestartup-bios program is designed to work on systemscompatible with the Intel MP Spec (version 1.4 or later). This startupprogram is responsible for:

� determining the number of processors

� determining the address of the local and I/O APIC

� initializing each additional processor

November 2, 2006 Chapter 4 � Multicore Processing 125

Page 148: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Symmetric multiprocessing (SMP) 2006, QNX Software Systems GmbH & Co. KG.

After a reset, only one processor will be executing the reset code.This processor is called theboot processor (BP). For each additionalprocessor found, the BP running thestartup-bios code will:

� initialize the processor

� switch it to 32-bit protected mode

� allocate the processor its own page directory

� set the processor spinning with interrupts disabled, waiting to bereleased by the kernel

Booting a PowerPC or MIPS SMP systemOn a PPC or MIPS SMP system, the boot sequence is similar to thatof an x86, but a specific startup program (e.g.startup-mvp,startup-bcm1250) will be used instead. Specifically, thePPC-specific startup is responsible for:

� determining the number of processors

� initializing each additional processor

� initializing the IRQ, IPI, system controller, etc.

For each additional processor found, the startup code will:

� initialize the processor

� initialize the MMU

� initialize the caches

� set the processor spinning with interrupts disabled, waiting to bereleased by the kernel

How the SMP microkernel worksOnce the additional processors have been released and are running, allprocessors are considered peers for the scheduling of threads.

126 Chapter 4 � Multicore Processing November 2, 2006

Page 149: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Symmetric multiprocessing (SMP)

Scheduling

The scheduling algorithm follows the same rules as on a uniprocessorsystem. That is, the highest-priority thread will be running on theavailable processor. If a new thread becomes ready to run as thehighest-priority thread in the system, it will be dispatched to theappropriate processor. If more than one processor is selected as apotential target, then the microkernel will try to dispatch the thread tothe processor where it last ran. This affinity is used as an attempt toreduce thread migration from one processor to another, which canaffect cache performance.

In an SMP system, the scheduler has some flexibility in decidingexactly how to schedulelow-priority threads, with an eye towardsoptimizing cache usage and minimizing thread migration. In any case,the realtime scheduling rules that were in place on a uniprocessorsystem are guaranteed to be upheld on an SMP system.

Kernel locking

In a uniprocessor system, only one thread is allowed to execute withinthe microkernel at a time. Most kernel operations are short in duration(typically a few microseconds on a Pentium-class processor). Themicrokernel is also designed to be completely preemptible andrestartable for those operations that take more time. This design keepsthe microkernel lean and fast without the need for large numbers offine-grained locks. It is interesting to note that placing many locks inthe main code path through a kernel will noticeably slow the kerneldown. Each lock typically involves processor bus transactions, whichcan cause processor stalls.

In an SMP system, QNX Neutrino maintains this philosophy of onlyone thread in a preemptible and restartable kernel. The microkernelmay be entered on any processor, but only one processor will begranted access at a time.

For most systems, the time spent in the microkernel represents only asmall fraction of the processor’s workload. Therefore, while conflictswill occur, they should be more the exception than the norm. This is

November 2, 2006 Chapter 4 � Multicore Processing 127

Page 150: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Symmetric multiprocessing (SMP) 2006, QNX Software Systems GmbH & Co. KG.

especially true for a microkernel where traditional OS services likefilesystems are separate processes and not part of the kernel itself.

Interprocessor interrupts (IPIs)

The processors communicate with each other through IPIs(interprocessor interrupts). IPIs can effectively schedule and controlthreads over multiple processors. For example, an IPI to anotherprocessor is often needed when:

� a higher-priority thread becomes ready

� a thread running on another processor is hit with a signal

� a thread running on another processor is canceled

� a thread running on another processor is destroyed

Critical sectionsTo control access to data structures that are shared between them,threads and processes use the standard POSIX primitives of mutexes,condvars, and semaphores. These work without change in an SMPsystem.

Many realtime systems also need to protect access to shared datastructures between an interrupt handler and the thread that owns thehandler. The traditional POSIX primitives used between threads aren’tavailable for use by an interrupt handler. There are two solutions here:

� One is to remove all work from the interrupt handler and do all thework at thread time instead. Given our fast thread scheduling, thisis a very viable solution.

� In a uniprocessor system running QNX Neutrino, an interrupthandler may preempt a thread, but a thread will never preempt aninterrupt handler. This allows the thread to protect itself from theinterrupt handler by disabling and enabling interrupts forvery briefperiods of time.

The thread on a non-SMP system protects itself with code of theform:

128 Chapter 4 � Multicore Processing November 2, 2006

Page 151: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Symmetric multiprocessing (SMP)

InterruptDisable()// critical sectionInterruptEnable()

Or:

InterruptMask(intr)// critical sectionInterruptUnmask(intr)

Unfortunately, this code will fail on an SMP system since the threadmay be running on one processor while the interrupt handler isconcurrently running on another processor!

One solution would be to lock the thread to a particular processor (see“Bound Multiprocessing (BMP),” later in this chapter).

A better solution would be to use a new exclusion lock available toboth the thread and the interrupt handler. This is provided by thefollowing primitives, which work on both uniprocessor and SMPmachines:

InterruptLock(intrspin t* spinlock)

Attempt to acquire aspinlock, a variable shared between theinterrupt handler and thread. The code will spin in a tight loopuntil the lock is acquired. After disabling interrupts, the codewill acquire the lock (if it was acquired by a thread). The lockmust be released as soon as possible (typically within a fewlines of C code without any loops).

InterruptUnlock(intrspin t* spinlock)

Release a lock and reenable interrupts.

On a non-SMP system, there’s no need for a spinlock.

For more information, see the Multicore TDKUser’s Guide.

November 2, 2006 Chapter 4 � Multicore Processing 129

Page 152: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Bound multiprocessing (BMP) 2006, QNX Software Systems GmbH & Co. KG.

Bound multiprocessing (BMP)Bound multiprocessing provides the scheduling control of anasymmetric multiprocessing model, while preserving the hardwareabstraction and management of symmetric multiprocessing. BMP issimilar to SMP, but you can specify which processors a thread can runon. You can use both SMP and BMP on the same system, allowingsome threads to migrate from one processor to another, while otherthreads are restricted to one or more processors.

As with SMP, a single copy of the OS maintains an overall view of allsystem resources, allowing them to be dynamically allocated andshared among applications. But, during application initialization, asetting determined by the system designer forces all of anapplication’s threads to execute only on a specified CPU.

Compared to full, floating SMP operation, this approach offers severaladvantages:

� It eliminates the cache thrashing that can reduce performance in anSMP system by allowing applications that share the same data setto run exclusively on the same CPU.

� It offers simpler application debugging than SMP since allexecution threads within an application run on a single CPU.

� It helps legacy applications that use poor techniques forsynchronizing shared data to run correctly, again by letting themrun on a single CPU.

With BMP, an application locked to one CPU can’t use other CPUs,even if they’re idle. However, Neutrino lets you dynamically changethe designated CPU, without having to checkpoint, and then stop andrestart the application.

QNX Neutrino supports the concept of hard processor affinity througha runmask. Each bit that’s set in the runmask represents a processorthat a thread can run on. By default, a thread’s runmask is set to allones, allowing it to run on any processor. A value of0x01 wouldallow a thread to execute only on the first processor.

130 Chapter 4 � Multicore Processing November 2, 2006

Page 153: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Bound multiprocessing (BMP)

By default, a process’s or thread’s children don’t inherit the runmask;there’s a separateinherit mask.

By careful use of these masks, a systems designer can further optimizethe runtime performance of a system (e.g. by relegating nonrealtimeprocesses to a specific processor). In general, however, this shouldn’tbe necessary, because our realtime scheduler will always preempt alower-priority thread immediately when a higher-priority threadbecomes ready. Processor locking will likely affect only the efficiencyof the cache, since threads can be prevented from migrating.

You can specify the runmask for a new thread or process by:

� setting therunmask member of theinheritance structure andspecifying theSPAWN EXPLICIT CPUflag when you callspawn()

Or:

� using the-C or -R option to theon utility when you launch aprogram. This also sets the process’s inherit mask to the samevalue.

You can change the runmask for an existing thread or process by:

� using the NTO TCTL RUNMASK orNTO TCTL RUNMASK GET AND SET INHERIT command to the

ThreadCtl() kernel call

Or:

� using the-C or -R option to theslay utility. If you also use the-ioption,slay sets the inherit mask to the same value.

For more information, see the Multicore TDKUser’s Guide.

A viable migration strategyAs a midway point between AMP and SMP, BMP offers a viablemigration strategy if you wish to move towards full SMP, but you’reconcerned that your existing code may operate incorrectly in a trulyconcurrent execution model.

November 2, 2006 Chapter 4 � Multicore Processing 131

Page 154: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Choosing between AMP, SMP, and BMP 2006, QNX Software Systems GmbH & Co. KG.

You can port legacy code to a multicore system and initially bind it toa single CPU to ensure correct operation. By judiciously bindingapplications (and possibly single threads) to specific CPUs, you canisolate potential concurrency issues down to the application andthread level. Resolving these issues will allow the application to runfully concurrently, thereby maximizing the performance gainsprovided by the multiple processors.

Choosing between AMP, SMP, and BMPThe choice between AMP, SMP, and BMP depends on the problemyou’re trying to solve:

� AMP works well with legacy applications, but has limitedscalability beyond two CPUs.

� SMP offers transparent resource management, but software thathasn’t been properly designed for concurrency might haveproblems.

� BMP offers many of the same benefits as SMP, but guarantees thatuniprocessor applications will behave correctly, greatlysimplifying the migration of legacy software.

As the following table illustrates, the flexibility to choose from any ofthese models lets you strike the optimal balance betweenperformance, scalability, and ease of migration.

Feature SMP BMP AMP

Seamless resourcesharing

Yes Yes —

Scalable beyonddual CPU

Yes Yes Limited

continued. . .

132 Chapter 4 � Multicore Processing November 2, 2006

Page 155: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Choosing between AMP, SMP, and BMP

Feature SMP BMP AMP

Legacyapplicationoperation

In most cases Yes Yes

Mixed OSenvironment (e.g.Neutrino andLinux)

— — Yes

Dedicatedprocessor byfunction

— Yes Yes

Intercoremessaging

Fast (OSprimitives)

Fast (OSprimitives)

Slower(application)

Threadsynchronizationbetween CPUs

Yes Yes —

Load balancing Yes Yes —

System-widedebugging andoptimization

Yes Yes —

November 2, 2006 Chapter 4 � Multicore Processing 133

Page 156: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 157: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 5

Process Manager

In this chapter. . .Introduction 137Process management137Memory management 143Pathname management150

November 2, 2006 Chapter 5 � Process Manager 135

Page 158: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 159: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

IntroductionIn QNX Neutrino, the microkernel is paired with the Process Managerin a single module (procnto). This module is required for allruntime systems.

The process manager is capable of creating multiple POSIX processes(each of which may contain multiple POSIX threads). Its main areasof responsibility include:

� process management — manages process creation, destruction,and process attributes such as user ID (uid) and group ID (gid).

� memory management — manages a range of memory-protectioncapabilities, shared libraries, and interprocess POSIXshared-memory primitives.

� pathname management — manages the pathname space into whichresource managers may attach.

User processes can access microkernel functions directly via kernelcalls and process manager functions by sending messages toprocnto. Note that a user process sends a message by invoking theMsgSend*() kernel call.

It’s important to note that threads executing withinprocnto invokethe microkernel in exactly the same way as threads in other processes.The fact that the process manager code and the microkernel share thesame process address space doesn’t imply a “special” or “private”interface. All threads in the system share the same consistent kernelinterface and all perform a privilege switch when invoking themicrokernel.

Process managementThe first responsibility ofprocnto is to dynamically create newprocesses. These processes will then depend onprocnto’s otherresponsibilities of memory management and pathname management.

November 2, 2006 Chapter 5 � Process Manager 137

Page 160: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Process management 2006, QNX Software Systems GmbH & Co. KG.

Process management consists of both process creation and destructionas well as the management of process attributes such as process IDs,process groups, user IDs, etc.

Process primitivesThere are four process primitives:

spawn() POSIX

fork() POSIX

vfork() UNIX BSD extension

exec*() POSIX

spawn()

Thespawn() call was introduced by POSIX. The function creates achild process by directly specifying an executable to load. To thosefamiliar with UNIX systems, the call is modeled after afork()followed by anexec*(). However, it operates much more efficiently inthat there’s no need to duplicate address spaces as in afork(), only todestroy and replace it when theexec*() is called.

One of the main advantages of using thefork()-then-exec*() methodof creating a child process is the flexibility in changing the defaultenvironment inherited by the new child process. This is done in theforked child just before theexec*(). For example, the followingsimple shell command would close and reopen the standard outputbeforeexec*()’ing:

ls >file

The POSIXspawn() function gives control over four classes ofenvironment inheritance, which are often adjusted when creating anew child process:

� file descriptors

138 Chapter 5 � Process Manager November 2, 2006

Page 161: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Process management

� process group ID

� signal mask

� ignored signals.

Our implementation also allows you to change:

� node to create process on

� scheduling policy

� scheduling parameters (priority)

� maximum stack size.

There are two forms of the POSIXspawn() function:

spawn() Spawn with explicit specified path.

spawnp() Search the current PATH and invokespawn() with thefirst matching executable.

There’s also a set of non-POSIX convenience functions that are builton top ofspawn() andspawnp() as follows:

spawnl() Spawn with command line provided as inlinearguments.

spawnle() spawnl() with explicitly passed environmentvariables.

spawnlp() spawnp() that follows the command search path.

spawnlpe() spawnlp() with explicitly passed environmentvariables.

spawnv() Spawn with command line pointed to by an array ofpointers.

spawnve() spawnv() with explicitly passed environmentvariables.

November 2, 2006 Chapter 5 � Process Manager 139

Page 162: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Process management 2006, QNX Software Systems GmbH & Co. KG.

spawnvp() spawnv() that follows the command search path.

spawnvpe() spawnvp() with explicitly passed environmentvariables.

Usingspawn() is the preferred way to create a new child process.

When a process isspawn()’ed, the child process inherits the followingattributes of its parent:

� Process group ID (unlessSPAWN SETGROUPis set ininherit.flags)

� Session membership

� Real user ID and real group ID

� Supplementary group IDs

� Priority and scheduling policy

� Current working directory and root directory

� File creation mask

� Signal mask (unlessSPAWN SETSIGMASKis set ininherit.flags)

� Signal actions specified asSIG DFL

� Signal actions specified asSIG IGN (except the ones modified byinherit.sigdefault whenSPAWN SETSIGDEFis set ininherit.flags).

The child process has several differences from the parent process:

� Signals set to be caught by the parent process are set to the defaultaction (SIG DFL).

� The child process’stms utime, tms stime, tms cutime, andtms cstime are tracked separately from the parent’s.

� The number of seconds left until aSIGALRM signal would begenerated is set to zero for the child process.

140 Chapter 5 � Process Manager November 2, 2006

Page 163: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Process management

� The set of pending signals for the child process is empty.

� File locks set by the parent aren’t inherited.

� Per-process timers created by the parent aren’t inherited.

� Memory locks and mappings set by the parent aren’t inherited.

If the child process is spawned on a remote node, the process groupID and the session membership aren’t set; the child process is put intoa new session and a new process group.

The child process can access the parent process’s environment byusing theenviron global variable (found in<unistd.h>).

For more information, see thespawn() function in theLibraryReference.

fork()

Thefork() function creates a new child process by sharing the samecode as the calling process and duplicating the calling process’s datato give the child process an exact copy. Most process resources areinherited. The following lists some resources that are explicitlynotinherited:

� process ID

� parent process ID

� file locks

� pending signals and alarms

� timers.

Thefork() function is typically used for one of two reasons:

� To create a new instance of the current execution environment.

� To create a new process running a different program.

November 2, 2006 Chapter 5 � Process Manager 141

Page 164: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Process management 2006, QNX Software Systems GmbH & Co. KG.

When creating a new thread, common data is placed in an explicitlycreated shared memory region. Prior to the POSIX thread standard,this was the only way to accomplish this. With POSIX threads, thisuse offork() is better accomplished by creating threads within a singleprocess usingpthread create().

When creating a new process running a different program, the call tofork() is soon followed by a call to one of theexec*() functions. Thistoo is better accomplished by a single call to the POSIXspawn()function, which combines both operations with far greater efficiency.

Since QNX Neutrino provides better POSIX solutions than usingfork(), its use is probably best suited for porting existing code and forwriting portable code that must run on a UNIX system that doesn’tsupport the POSIXpthread create() or spawn() API.

Note thatfork() should be called from a process containing only asingle thread.

vfork()

Thevfork() function (which should also be called only from asingle-threaded process) is useful when the purpose offork() wouldhave been to create a new system context for a call to one of theexec*() functions. Thevfork() function differs fromfork() in that thechild doesn’t get a copy of the calling process’s data. Instead, itborrows the calling process’s memory and thread of control until acall to one of theexec*() functions is made. The calling process issuspended while the child is using its resources.

Thevfork() child can’t return from the procedure that calledvfork(),since the eventual return from the parentvfork() would then return toa stack frame that no longer existed.

exec*()

Theexec*() family of functions replaces the current process with anew process, loaded from an executable file. Since the calling processis replaced, there can be no successful return.

142 Chapter 5 � Process Manager November 2, 2006

Page 165: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Memory management

The followingexec*() functions are defined:

execl() Exec with command line provided as inline arguments.

execle() execl() with explicitly passed environment variables.

execlp() execl() that follows the command search path.

execlpe() execlp()with explicitly passed environment variables.

execv() execl() with command line pointed to by an array ofpointers.

execve() execv() with explicitly passed environment variables.

execvp() execv() that follows the command search path.

execvpe() execvp() with explicitly passed environment variables.

Theexec*() functions usually follow afork() or vfork() in order toload a new child process. This is better achieved by using the newPOSIXspawn() call.

Process loadingProcesses loaded from a filesystem using either theexec*() or spawn()calls are in ELF format. If the filesystem is on a block-orienteddevice, the code and data are loaded into main memory.

If the filesystem is memory mapped (e.g. ROM/Flash image), thecode needn’t be loaded into RAM, but may be executed in place. Thisapproach makes all RAM available for data and stack, leaving thecode in ROM or Flash. In all cases, if the same process is loaded morethan once, its code will be shared.

Memory managementWhile some realtime kernels or executives provide support formemory protection in the development environment, few provideprotected memory support for the runtime configuration, citing

November 2, 2006 Chapter 5 � Process Manager 143

Page 166: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Memory management 2006, QNX Software Systems GmbH & Co. KG.

penalties in memory and performance as reasons. But with memoryprotection becoming common on many embedded processors, thebenefits of memory protection far outweigh the very small penalties inperformance for enabling it.

The key advantage gained by adding memory protection to embeddedapplications, especially for mission-critical systems, is improvedrobustness.

With memory protection, if one of the processes executing in amultitasking environment attempts to access memory that hasn’t beenexplicitly declared or allocated for the type of access attempted, theMMU hardware can notify the OS, which can then abort the thread (atthe failing/offending instruction).

This “protects” process address spaces from each other, preventingcoding errors in a thread on one process from “damaging” memoryused by threads in other processes or even in the OS. This protectionis useful both for development and for the installed runtime system,because it makes postmortem analysis possible.

During development, common coding errors (e.g. stray pointers andindexing beyond array bounds) can result in one process/threadaccidentally overwriting the data space of another process. If theoverwrite touches memory that isn’t referenced again until much later,you can spend hours of debugging — often using in-circuit emulatorsand logic analyzers — in an attempt to find the “guilty party.”

With an MMU enabled, the OS can abort the process the instant thememory-access violation occurs, providing immediate feedback to theprogrammer instead of mysteriously crashing the system some timelater. The OS can then provide the location of the errant instruction inthe failed process, or position a symbolic debugger directly to thisinstruction.

Memory Management Units (MMUs)A typical MMU operates by dividing physical memory into a numberof 4-KB pages. The hardware within the processor then makes use ofa set of page tables stored in system memory that define the mapping

144 Chapter 5 � Process Manager November 2, 2006

Page 167: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Memory management

of virtual addresses (i.e. the memory addresses used within theapplication program) to the addresses emitted by the CPU to accessphysical memory.

While the thread executes, the page tables managed by the OS controlhow the memory addresses that the thread is using are “mapped” ontothe physical memory attached to the processor.

PhysicalmemoryPage

tables

1023

0

1023

0

4G

0

Linearspace

Pagedirectory

1023

0

Virtual address mapping (on an x86).

For a large address space with many processes and threads, thenumber of page-table entries needed to describe these mappings canbe significant — more than can be stored within the processor. Tomaintain performance, the processor caches frequently used portionsof the external page tables within a TLB (translation look-asidebuffer).

The servicing of “misses” on the TLB cache is part of the overheadimposed by enabling the MMU. Our OS uses various cleverpage-table arrangements to minimize this overhead.

November 2, 2006 Chapter 5 � Process Manager 145

Page 168: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Memory management 2006, QNX Software Systems GmbH & Co. KG.

Associated with these page tables are bits that define the attributes ofeach page of memory. Pages can be marked as read-only, read-write,etc. Typically, the memory of an executing process would bedescribed with read-only pages for code and read-write for the dataand stack.

When the OS performs a context switch (i.e. suspends the executionof one thread and resumes another), it will manipulate the MMU touse a potentially different set of page tables for the newly resumedthread. If the OS is switching between threadswithin a single process,no MMU manipulations are necessary.

When the new thread resumes execution, any addresses generated asthe thread runs are mapped to physical memory through the assignedpage tables. If the thread tries to use an address not mapped to it or touse an address in a way that violates the defined attributes (e.g.writing to a read-only page), the CPU will receive a “fault” (similar toa divide-by-zero error), typically implemented as a special type ofinterrupt.

By examining the instruction pointer pushed on the stack by theinterrupt, the OS can determine the address of the instruction thatcaused the memory-access fault within the thread/process and can actaccordingly.

Memory protection at run timeWhile memory protection is useful during development, it can alsoprovide greater reliability for embedded systems installed in the field.Many embedded systems already employ a hardware “watchdogtimer” to detect if the software or hardware has “lost its mind,” butthis approach lacks the finesse of an MMU-assisted watchdog.

Hardware watchdog timers are usually implemented as a retriggerablemonostable timer attached to the processor reset line. If the systemsoftware doesn’t strobe the hardware timer regularly, the timer willexpire and force a processor reset. Typically, some component of thesystem software will check for system integrity and strobe the timerhardware to indicate the system is “sane.”

146 Chapter 5 � Process Manager November 2, 2006

Page 169: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Memory management

Although this approach enables recovery from a lockup related to asoftware or hardware glitch, it results in a complete system restart andperhaps significant “downtime” while this restart occurs.

Software watchdog

When an intermittent software error occurs in a memory-protectedsystem, the OS can catch the event and pass control to a user-writtenthread instead of the memory dump facilities. This thread can makean intelligent decision about how best to recover from the failure,instead of forcing a full reset as the hardware watchdog timer woulddo. The software watchdog could:

� Abort the process that failed due to a memory access violation andsimply restart that process without shutting down the rest of thesystem.

� Abort the failed process and any related processes, initialize thehardware to a “safe” state, and then restart the related processes ina coordinated manner.

� If the failure is very critical, perform a coordinated shutdown ofthe entire system and sound an audible alarm.

The important distinction here is that we retain intelligent,programmed control of the embedded system, even though variousprocesses and threads within the control software may have failed forvarious reasons. A hardware watchdog timer is still of use to recoverfrom hardware “latch-ups,” but for software failures we now havemuch better control.

While performing some variation of these recovery strategies, thesystem can also collect information about the nature of the softwarefailure. For example, if the embedded system contains or has accessto some mass storage (Flash memory, hard drive, a network link toanother computer with disk storage), the software watchdog cangenerate a chronologically archived sequence of dump files. Thesedump files could then be used for postmortem diagnostics.

Embedded control systems often employ these “partial restart”approaches to surviving intermittent software failures without the

November 2, 2006 Chapter 5 � Process Manager 147

Page 170: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Memory management 2006, QNX Software Systems GmbH & Co. KG.

operators experiencing any system “downtime” or even being awareof these quick-recovery software failures. Since the dump files areavailable, the developers of the software can detect and correctsoftware problems without having to deal with the emergencies thatresult when critical systems fail at inconvenient times. If we comparethis to the hardware watchdog timer approach and the prolongedinterruptions in service that result, it’s obvious what our preference is!

Postmortem dump-file analysis is especially important formission-critical embedded systems. Whenever a critical system failsin the field, significant effort should be made to identify the cause ofthe failure so that a “fix” can be engineered and applied to othersystems before they experience similar failures.

Dump files give programmers the information they need to fix theproblem — without them, programmers may have little more to go onthan a customer’s cryptic complaint that “the system crashed.”

Quality controlBy dividing embedded software into a team of cooperating,memory-protected processes (containing threads), we can readily treatthese processes as “components” to be used again in new projects.Because of the explicitly defined (and hardware-enforced) interfaces,these processes can be integrated into applications with confidencethat they won’t disrupt the system’s overall reliability. In addition,because the exact binary image (not just the source code) of theprocess is being reused, we can better control changes and instabilitiesthat might have resulted from recompilation of source code, relinking,new versions of development tools, header files, library routines, etc.

Since the binary image of the process is reused (with its behaviorperhaps modified by command-line options), the confidence we havein that binary module from acquired experience in the field moreeasily carries over to new applications than if the binary image of theprocess were changed.

As much as we strive to produce error-free code for the systems wedeploy, the reality of software-intensive embedded systems is thatprogramming errors will end up in released products. Rather than

148 Chapter 5 � Process Manager November 2, 2006

Page 171: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Memory management

pretend these bugs don’t exist (until the customer calls to report them),we should adopt a “mission-critical” mindset. Systems should bedesigned to be tolerant of, and able to recover from, software faults.Making use of the memory protection delivered by integrated MMUsin the embedded systems we build is a good step in that direction.

Full-protection modelOur full-protection model relocates all code in the image into a newvirtual space, enabling the MMU hardware and setting up the initialpage-table mappings. This allowsprocnto to start in a correct,MMU-enabled environment. The process manager will then take overthis environment, changing the mapping tables as needed by theprocesses it starts.

Private virtual memory

In the full-protection model, each process is given its own privatevirtual memory, which spans to 2 or 3.5 Gigabytes (depending on theCPU). This is accomplished by using the CPU’s MMU. Theperformance cost for a process switch and a message pass willincrease due to the increased complexity of obtaining addressabilitybetween two completely private address spaces.

Private memory space starts at 0 on x86, SH-4, ARM, and MIPSprocessors, but not on the PowerPC, where the space from 0 to 1G isreserved for system processes.

November 2, 2006 Chapter 5 � Process Manager 149

Page 172: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Pathname management 2006, QNX Software Systems GmbH & Co. KG.

3.5G 4G0 3.5G 0 3.5G 0 3.5G

Userprocess 1

Userprocess 2

Userprocess 3

System process

procnto

Full protection VM (on an x86).

The memory cost per process may increase by 4 KB to 8 KB for eachprocess’s page tables. Note that this memory model supports thePOSIX fork() call.

Pathname managementDomains of authority

I/O resources arenot built into the microkernel, but are insteadprovided by resource manager processes that may be starteddynamically at runtime. Theprocnto manager allows resourcemanagers, through a standard API, to adopt a subset of the pathnamespace as a “domain of authority” to administer. As other resourcemanagers adopt their respective domains of authority,procnto

becomes responsible for maintaining a pathname tree to track theprocesses that own portions of the pathname space. An adoptedpathname is sometimes referred to as a “prefix” because it prefixesany pathnames that lie beneath it. The adopted pathname is alsocalled amountpoint, because that’s where a server mounts into thepathname.

This approach to pathname space management is what allows QNXNeutrino to preserve the POSIX semantics for device and file access,

150 Chapter 5 � Process Manager November 2, 2006

Page 173: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Pathname management

while making the presence of those services optional for smallembedded systems.

At startup,procnto populates the pathname space with the followingpathname prefixes:

Prefix Description

/ Root of the filesystem.

/proc/boot/ Some of the files from the boot image presentedas a flat filesystem.

/proc/ The running processes, each represented by theirprocess ID (PID).

/dev/zero A device that always returns zero. Used forallocating zero-filled pages using themmap()function.

/dev/mem A device that represents all physical memory.

Resolving pathnamesWhen a process opens a file, the POSIX-compliantopen() libraryroutine first sends the pathname toprocnto, where the pathname iscompared against the prefix tree to determine which resourcemanagers should be sent theopen() message.

The prefix tree may contain identical or partially overlapping regionsof authority — multiple servers can register the same prefix. If theregions are identical, the order of resolution can be specified. If theregions are overlapping, the longest match wins. For example,suppose we have these prefixes registered:

continued. . .

November 2, 2006 Chapter 5 � Process Manager 151

Page 174: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Pathname management 2006, QNX Software Systems GmbH & Co. KG.

/ QNX 4 disk-based filesystem (fs-qnx4.so)

/ QNX 4 disk-based filesystem (fs-qnx4.so)

/dev/ser1 Serial device manager (devc-ser*)

/dev/ser2 Serial device manager (devc-ser*)

/dev/hd0 Raw disk volume (devb-eide)

The filesystem manager has registered a prefix for a mounted QNX 4filesystem (i.e./). The block device driver has registered a prefix fora block special file that represents an entire physical hard drive (i.e./dev/hd0). The serial device manager has registered two prefixes forthe two PC serial ports.

The following table illustrates the longest-match rule for pathnameresolution:

This pathname: matches: and resolves to:

/dev/ser1 /dev/ser1 devc-ser*

/dev/ser2 /dev/ser2 devc-ser*

/dev/ser / fs-qnx4.so

/dev/hd0 /dev/hd0 devb-eide.so

/usr/jhsmith/test / fs-qnx4.so

Consider another example involving three servers:

Server A A QNX 4 filesystem. Its mountpoint is/. It contains thefilesbin/true andbin/false.

Server B A flash filesystem. Its mountpoint is/bin. It containsthe filesls andecho.

Server C A single device that generates numbers. Its mountpointis /dev/random.

152 Chapter 5 � Process Manager November 2, 2006

Page 175: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Pathname management

At this point, the process manager’s internal mount table would looklike this:

Mountpoint Server

/ Server A (QNX 4 filesystem)

/bin Server B (flash filesystem)

/dev/random Server C (device)

Of course, each “Server” name is actually an abbreviation for thend,pid,chid for that particular server channel.

Single-device mountpoints

Now suppose a client wants to send a message to Server C. Theclient’s code might look like this:

int fd;fd = open("/dev/random", ...);read(fd, ...);close(fd);

In this case, the C library will ask the process manager for the serversthat couldpotentially handle the path/dev/random. The processmanager would return a list of servers:

� Server C (most likely; longest path match)

� Server A (least likely; shortest path match)

From this information, the library will then contact each server in turnand send it anopen message, including the component of the path thatthe server should validate:

1 Server C receives a null path, since the request came in on thesame path as the mountpoint.

2 Server A receives the pathdev/random, since its mountpointwas/.

November 2, 2006 Chapter 5 � Process Manager 153

Page 176: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Pathname management 2006, QNX Software Systems GmbH & Co. KG.

As soon as one server positively acknowledges the request, the librarywon’t contact the remaining servers. This means Server A iscontacted only if Server C denies the request.

This process is fairly straightforward with single device entries, wherethe first server is generally the server that will handle the request.Where it becomes interesting is in the case ofunioned filesystemmountpoints.

Unioned filesystem mountpoints

Let’s assume we have two servers set up as before:

Server A A QNX 4 filesystem. Its mountpoint is/. It contains thefilesbin/true andbin/false.

Server B A flash filesystem. Its mountpoint is/bin. It containsthe filesls andecho.

Note that each server has a/bin directory, but with different contents.

Once both servers are mounted, you would see the following due tothe unioning of the mountpoints:

/ Server A

/bin Servers A and B

/bin/echo Server B

/bin/false Server A

/bin/ls Server B

/bin/true Server A

What’s happening here is that the resolution for the path/bin takesplace as before, but rather than limit the return to just one connectionID, all the servers are contacted and asked about their handling for thepath:

154 Chapter 5 � Process Manager November 2, 2006

Page 177: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Pathname management

DIR *dirp;dirp = opendir("/bin", ...);closedir(dirp);

which results in:

1 Server B receives a null path, since the request came in on thesame path as the mountpoint.

2 Server C receives the path"bin", since its mountpoint was"/".

The result now is that we have a collection of file descriptors toservers who handle the path/bin (in this case two servers); the actualdirectory name entries are read in turn when areaddir() is called. Ifany of the names in the directory are accessed with a regular open,then the normal resolution procedure takes place and only one serveris accessed.

Why overlay mountpoints?

This overlaying of mountpoints is a very handy feature when doingfield updates, servicing, etc. It also makes for a more unified system,where pathnames result in connections to servers regardless of whatservices they’re providing, thus resulting in a more unified API.

Symbolic prefixesWe’ve discussed prefixes that map to a resource manager. A secondform of prefix, known as asymbolic prefix, is a simple stringsubstitution for a matched prefix.

You create symbolic prefixes using the POSIXln (link) command.This command is typically used to create hard or symbolic links on afilesystem by using the-s option. If you also specify the-P option,then a symbolic link is created in the in-memory prefix space ofprocnto.

November 2, 2006 Chapter 5 � Process Manager 155

Page 178: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Pathname management 2006, QNX Software Systems GmbH & Co. KG.

Command Description

ln -s existing file symbolic link Create a filesystem symboliclink.

ln -Ps existing file symbolic link Create a prefix tree symboliclink.

Note that a prefix tree symbolic link will always take precedence overa filesystem symbolic link.

For example, assume you’re running on a machine that doesn’t have alocal filesystem. However, there’s a filesystem on another node (sayneutron) that you wish to access as “/bin”. You accomplish thisusing the following symbolic prefix:

ln -Ps /net/neutron/bin /bin

This will cause/bin to be mapped into/net/neutron/bin. Forexample,/bin/ls will be replaced with the following:

/net/neutron/bin/ls

This new pathname will again be applied against the prefix tree, butthis time the prefix matched will be/net, which will point tonpm-qnet. Thenpm-qnet resource manager will then resolve theneutron component, and redirect further resolution requests to thenode calledneutron. On nodeneutron, the rest of the pathname(i.e. /bin/ls) will be resolved against the prefix space on that node.This will resolve to the filesystem manager on nodeneutron, wheretheopen() request will be directed. With just a few characters, thissymbolic prefix has allowed us to access a remote filesystem asthough it were local.

It’s not necessary to run a local filesystem process to perform theredirection. A diskless workstation’s prefix tree might look somethinglike this:

156 Chapter 5 � Process Manager November 2, 2006

Page 179: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Pathname management

/

dev

ser2

ser1

console

devc-ser...

npm-qnet.so

devc-con

With this prefix tree, local devices such as/dev/ser1 or/dev/console will be routed to the local character device manager,while requests for other pathnames will be routed to the remotefilesystem.

Creating special device names

You can also use symbolic prefixes to create special device names.For example, if your modem was on/dev/ser1, you could create asymbolic prefix of/dev/modem as follows:

ln -Ps /dev/ser1 /dev/modem

Any request to open/dev/modem will be replaced with/dev/ser1.This mapping would allow the modem to be changed to a differentserial port simply by changing the symbolic prefix and withoutaffecting any applications.

Relative pathnames

Pathnames need not start with slash. In such cases, the path isconsideredrelative to the current working directory. The OSmaintains the current working directory as a character string. Relativepathnames are always converted to full network pathnames by

November 2, 2006 Chapter 5 � Process Manager 157

Page 180: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Pathname management 2006, QNX Software Systems GmbH & Co. KG.

prepending the current working directory string to the relativepathname.

Note that different behaviors result when your current workingdirectory starts with a slash versus starting with a network root.

A note about cd

In some traditional UNIX systems, thecd (change directory)command modifies the pathname given to it if that pathname containssymbolic links. As a result, the pathname of the new current workingdirectory — which you can display withpwd — may differ from theone given tocd.

In QNX Neutrino, however,cd doesn’t modify the pathname — asidefrom collapsing.. references. For example:

cd /usr/home/dan/test/../doc

would result in a current working directory of/usr/home/dan/doc,even if some of the elements in the pathname were symbolic links.

For more information about symbolic links and.. references, see“QNX 4 filesystem” in the Working with Filesystems chapter of theQNX NeutrinoUser’s Guide.

File descriptor namespaceOnce an I/O resource has been opened, a different namespace comesinto play. Theopen() returns an integer referred to as afile descriptor(FD), which is used to direct all further I/O requests to that resourcemanager.

Unlike the pathname space, the file descriptor namespace iscompletely local to each process. The resource manager uses thecombination of a SCOID (server connection ID) and FD (filedescriptor/connection ID) to identify the control structure associatedwith the previousopen() call. This structure is referred to as anopencontrol block (OCB) and is contained within the resource manager.

The following diagram shows an I/O manager taking someSCOID, FD pairs and mapping them to OCBs.

158 Chapter 5 � Process Manager November 2, 2006

Page 181: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Pathname management

I/O Manager

space

FD

SCOID

System space

Opencontrolblock

Opencontrolblock

Sparse array

The SCOID and FD map to an OCB of an I/O Manager.

Open control blocks

The open control block (OCB) contains active information about theopen resource. For example, the filesystem keeps the current seekpoint within the file here. Eachopen() creates a new OCB. Therefore,if a process opens the same file twice, any calls tolseek() using oneFD will not affect the seek point of the other FD. The same is true fordifferent processes opening the same file.

The following diagram shows two processes, in which one opens thesame file twice, and the other opens it once. There are no shared FDs.

November 2, 2006 Chapter 5 � Process Manager 159

Page 182: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Pathname management 2006, QNX Software Systems GmbH & Co. KG.

/tmp/file

1

0

0

Process A

Process B

Server process

Opencontrol blocks

Filedescriptors

Filedescriptors

Two processes open the same file.

FDs are aprocess resource, not a thread resource.☞

Several file descriptors in one or more processes can refer to the sameOCB. This is accomplished by two means:

� A process may use thedup(), dup2(), or fcntl() functions to createa duplicate file descriptor that refers to the same OCB.

� When a new process is created viavfork(), fork(), or spawn(), allopen file descriptors are by default inherited by the new process;these inherited descriptors refer to the same OCBs as thecorresponding file descriptors in the parent process.

When several FDs refer to the same OCB, then any change in the stateof the OCB is immediately seen by all processes that have filedescriptors linked to the same OCB.

160 Chapter 5 � Process Manager November 2, 2006

Page 183: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Pathname management

For example, if one process uses thelseek() function to change theposition of the seek point, then reading or writing takes place from thenew position no matter which linked file descriptor is used.

The following diagram shows two processes in which one opens a filetwice, then does adup() to get a third FD. The process then creates achild that inherits all open files.

/tmp/file

1

2

0

0

1

2

Server process

Parentprocess

Childprocess

Opencontrol blocks

Filedescriptors

Filedescriptors

A process opens a file twice.

You can prevent a file descriptor from being inherited when youspawn() or exec*() by calling thefcntl() function and setting theFD CLOEXEC flag.

November 2, 2006 Chapter 5 � Process Manager 161

Page 184: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 185: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 6

Dynamic Linking

In this chapter. . .Shared objects 165How shared objects are used167

November 2, 2006 Chapter 6 � Dynamic Linking 163

Page 186: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 187: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Shared objects

Shared objectsIn a typical system, a number of programs will be running. Eachprogram relies on a number of functions, some of which will be“standard” C library functions, likeprintf(), malloc(), write(), etc.

If every program uses the standard C library, it follows that eachprogram would normally have a unique copy of this particular librarypresent within it. Unfortunately, this results in wasted resources.Since the C library is common, it makes more sense to have eachprogramreference the common instance of that library, instead ofhaving each programcontain a copy of the library. This approachyields several advantages, not the least of which is the savings interms of total system memory required.

Statically linkedThe termstatically linked means that the program and the particularlibrary that it’s linked against are combined together by the linker atlinktime. This means that the binding between the program and theparticular library is fixed and known at linktime — well in advance ofthe program ever running. It also means that we can’t change thisbinding, unless we relink the program with a new version of thelibrary.

You might consider linking a program statically in cases where youweren’t sure whether the correct version of a library will be availableat runtime, or if you were testing a new version of a library that youdon’t yet want to install as shared.

Programs that are linked statically are linked against archives ofobjects (libraries) that typically have the extension of.a. An exampleof such a collection of objects is the standard C library,libc.a.

Dynamically linkedThe termdynamically linked means that the program and theparticular library it references arenot combined together by the linkerat linktime. Instead, the linker places information into the executablethat tells the loader which shared object module the code is in and

November 2, 2006 Chapter 6 � Dynamic Linking 165

Page 188: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Shared objects 2006, QNX Software Systems GmbH & Co. KG.

which runtime linker should be used to find and bind the references.This means that the binding between the program and the sharedobject is doneat runtime — before the program starts, the appropriateshared objects are found and bound.

This type of program is called apartially bound executable, because itisn’t fully resolved — the linker, at linktime, didn’t cause all thereferenced symbols in the program to be associated with specific codefrom the library. Instead, the linker simply said: “This program callssome functions within a particular shared object, so I’ll just make anote ofwhich shared object these functions are in, and continue on.”Effectively, this defers the binding until runtime.

Programs that are linked dynamically are linked against sharedobjects that have the extension.so. An example of such an object isthe shared object version of the standard C library,libc.so.

You use a command-line option to the compiler driverqcc to tell thetool chain whether you’re linking statically or dynamically. Thiscommand-line option then determines the extension used (either.a or.so).

Augmenting code at runtimeTaking this one step further, a program may not know which functionsit needs to call until it’s running. While this mayseem a little strangeinitially (after all, how could a programnot know what functions it’sgoing to call?), it really can be a very powerful feature. Here’s why.

Consider a “generic” disk driver. It starts, probes the hardware, anddetects a hard disk. The driver would then dynamically load theio-blk code to handle the disk blocks, because it found ablock-oriented device. Now that the driver has access to the disk atthe block level, it finds two partitions present on the disk: a DOSpartition and a QNX 4 partition. Rather than force the disk driver tocontain filesystem drivers for all possible partition types it mayencounter, we kept it simple: it doesn’t haveany filesystem drivers!At runtime, it detects the two partitions andthen knows that it shouldload thefs-dos.so andfs-qnx4.so filesystem code to handlethose partitions.

166 Chapter 6 � Dynamic Linking November 2, 2006

Page 189: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. How shared objects are used

By deferring the decision of which functions to call, we’ve enhancedthe flexibility of the disk driver (and also reduced its size).

How shared objects are usedTo understand how a program makes use of shared objects, let’s firstsee the format of an executable and then examine the steps that occurwhen the program starts.

ELF formatQNX Neutrino uses the ELF (Executable and Linking Format) binaryformat, which is currently used in SVR4 Unix systems. ELF not onlysimplifies the task of making shared libraries, but also enhancesdynamic loading of modules at runtime.

In the following diagram, we show two views of an ELF file: thelinking view and the execution view. The linking view, which is usedwhen the program or library is linked, deals withsections within anobject file. Sections contain the bulk of the object file information:data, instructions, relocation information, symbols, debugginginformation, etc. The execution view, which is used when theprogram runs, deals withsegments.

At linktime, the program or library is built by merging togethersections with similar attributes into segments. Typically, all theexecutable and read-only data sections are combined into a single“text” segment, while the data and “BSS”s are combined into the“data” segment. These segments are calledload segments, becausethey need to be loaded in memory at process creation. Other sectionssuch as symbol information and debugging sections are merged intoother, nonload segments.

November 2, 2006 Chapter 6 � Dynamic Linking 167

Page 190: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

How shared objects are used 2006, QNX Software Systems GmbH & Co. KG.

Linking view:

ELF header

Program header table(optional)

Section 1

...

Section n

...

...

Section header table

Execution view:

ELF header

Program header table

Segment 1

Segment 2

...

Section header table(optional)

Object file format: linking view and execution view.

ELF without COFFMost implementations of ELF loaders are derived fromCOFF(Common Object File Format) loaders; they use the linking view ofthe ELF objects at load time. This is inefficient because the programloader must load the executable using sections. A typical programcould contain a large number of sections, each of which would haveto be located in the program and loaded into memory separately.

QNX Neutrino, however, doesn’t rely at all on the COFF technique ofloading sections. When developing our ELF implementation, weworked directly from the ELF spec and kept efficiency paramount.The ELF loader uses the “execution view” of the program. By usingthe execution view, the task of the loader is greatly simplified: all ithas to do is copy to memory the load segments (usually two) of theprogram or library. As a result, process creation and library loadingoperations are much faster.

The processThe diagram below shows the memory layout of a typical process.The process load segments (corresponding to “text” and “data” inthe diagram) are loaded at the process’s base address. The main stack

168 Chapter 6 � Dynamic Linking November 2, 2006

Page 191: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. How shared objects are used

is located just below and grows downwards. Any additional threadsthat are created will have their own stacks, located below the mainstack. Each of the stacks is separated by a guard page to detect stackoverflows. The heap is located above the process and grows upwards.

Reserved

Shared libraries

Shared memory

Heap

Data

Text

Stack

Stack

Process base address

0xFFFFFFFF

0

Guard page

Process memory layout on an x86.

In the middle of the process’s address space, a large region is reservedfor shared objects. Shared libraries are located at the top of theaddress space and grow downwards.

When a new process is created, the process manager first maps thetwo segments from the executable into memory. It then decodes theprogram’s ELF header. If the program header indicates that theexecutable was linked against a shared library, the process managerwill extract the name of thedynamic interpreter from the program

November 2, 2006 Chapter 6 � Dynamic Linking 169

Page 192: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

How shared objects are used 2006, QNX Software Systems GmbH & Co. KG.

header. The dynamic interpreter points to a shared library thatcontains theruntime linker code. The process manager will load thisshared library in memory and will then pass control to the runtimelinker code in this library.

Runtime linkerThe runtime linker is invoked when a program that was linked againsta shared object is started or when a program requests that a sharedobject be dynamically loaded. The runtime linker is contained withinthe C runtime library.

The runtime linker performs several tasks when loading a sharedlibrary (.so file):

1 If the requested shared library isn’t already loaded in memory,the runtime linker loads it. If the shared library name is fullyqualified (i.e. begins with a slash), it’s loaded directly from thespecified location. If it can’t be found there, no further searchesare performed. If it’s not a fully qualified pathname, theruntime linker searches for it in the directories specified byLD LIBRARY PATH only if the program isn’t marked assetuid.

2 If the shared library still isn’t found, and if the executable’sdynamic section contains aDT RPATH tag, then the pathspecified byDT RPATH is searched next.

3 If the shared library still isn’t found, then the runtime linkersearches for the default library search path as specified by theLD LIBRARY PATH environment variable toprocnto. Ifnone has been specified, then the default library path is set tothe image filesystem’s path.

4 Once the requested shared library is found, it’s loaded intomemory. For ELF shared libraries, this is a very efficientoperation: the runtime linker simply needs to use themmap()call twice to map the two load segments into memory.

170 Chapter 6 � Dynamic Linking November 2, 2006

Page 193: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. How shared objects are used

5 The shared library is then added to the internal list of alllibraries that the process has loaded. The runtime linkermaintains this list.

6 The runtime linker then decodes the dynamic section of theshared object.

This dynamic section provides information to the linker about otherlibraries that this library was linked against. It also gives informationabout the relocations that need to be applied and the external symbolsthat need to be resolved. The runtime linker will first load any otherrequired shared libraries (which may themselves reference othershared libraries). It will then process the relocations for each library.Some of these relocations are local to the library, while others requirethe runtime linker to resolve a global symbol. In the latter case, theruntime linker will search through the list of libraries for this symbol.In ELF files, hash tables are used for the symbol lookup, so they’revery fast. The order in which libraries are searched for symbols isvery important, as we’ll see in the section on “Symbol nameresolution” below.

Once all relocations have been applied, any initialization functionsthat have been registered in the shared library’s init section are called.This is used in some implementations of C++ to call globalconstructors.

Loading a shared library at runtimeA process can load a shared library at runtime by using thedlopen()call, which instructs the runtime linker to load this library. Once thelibrary is loaded, the program can call any function within that libraryby using thedlsym() call to determine its address.

Remember: shared libraries are available only to processes that aredynamically linked.

The program can also determine the symbol associated with a givenaddress by using thedladdr() call. Finally, when the process no

November 2, 2006 Chapter 6 � Dynamic Linking 171

Page 194: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

How shared objects are used 2006, QNX Software Systems GmbH & Co. KG.

longer needs the shared library, it can calldlclose() to unload thelibrary from memory.

Symbol name resolutionWhen the runtime linker loads a shared library, the symbols withinthat library have to be resolved. The order and the scope of thesymbol resolution are important. If a shared library calls a functionthat happens to exist by the same name in several libraries that theprogram has loaded, the order in which these libraries are searched forthis symbol is critical. This is why the OS defines several options thatcan be used when loading libraries.

All the objects (executables and libraries) that have global scope arestored on an internal list (theglobal list). Any global-scope object, bydefault, makes available all of its symbols to any shared library thatgets loaded. The global list initially contains the executable and anylibraries that are loaded at the program’s startup.

By default, when a new shared library is loaded by using thedlopen()call, symbols within that library are resolved by searching in thisorder through:

1 The shared library.

2 The global list.

3 Any dependent objects that the shared library references (i.e.any other libraries that the shared library was linked against).

The runtime linker’s scoping behavior can be changed in two wayswhendlopen()’ing a shared library:

� When the program loads a new library, it may instruct the runtimelinker to place the library’s symbols on the global list by passingtheRTLD GLOBAL flag to thedlopen() call. This will make thelibrary’s symbols available to any libraries that are subsequentlyloaded.

� The list of objects that are searched when resolving the symbolswithin the shared library can be modified. If theRTLD GROUPflag

172 Chapter 6 � Dynamic Linking November 2, 2006

Page 195: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. How shared objects are used

is passed todlopen(), then only objects that the library directlyreferences will be searched for symbols. If theRTLD WORLD flagis passed, only the objects on the global list will be searched.

November 2, 2006 Chapter 6 � Dynamic Linking 173

Page 196: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 197: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 7

Resource Managers

In this chapter. . .Introduction 177What is a resource manager?177Resource manager architecture183Summary 190

November 2, 2006 Chapter 7 � Resource Managers 175

Page 198: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 199: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

IntroductionTo give QNX Neutrino a great degree of flexibility, to minimize theruntime memory requirements of the final system, and to cope withthe wide variety of devices that may be found in a custom embeddedsystem, the OS allows user-written processes to act asresourcemanagers that can be started and stopped dynamically.

Resource managers are typically responsible for presenting aninterface to various types of devices. This may involve managingactual hardware devices (like serial ports, parallel ports, networkcards, and disk drives) or virtual devices (like/dev/null, a networkfilesystem, and pseudo-ttys).

In other operating systems, this functionality is traditionallyassociated withdevice drivers. But unlike device drivers, resourcemanagers don’t require any special arrangements with the kernel. Infact, a resource manager looks just like any other user-level program.

What is a resource manager?Since QNX Neutrino is a distributed, microkernel OS with virtuallyall nonkernel functionality provided by user-installable programs, aclean and well-defined interface is required between client programsand resource managers. All resource manager functions aredocumented; there’s no “magic” or private interface between thekernel and a resource manager.

In fact, a resource manager is basically a user-level server programthat accepts messages from other programs and, optionally,communicates with hardware. Again, the power and flexibility of ournative IPC services allow the resource manager to be decoupled fromthe OS.

The binding between the resource manager and the client programsthat use the associated resource is done through a flexible mechanismcalledpathname space mapping.

In pathname space mapping, an association is made between apathname and a resource manager. The resource manager sets up this

November 2, 2006 Chapter 7 � Resource Managers 177

Page 200: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

What is a resource manager? 2006, QNX Software Systems GmbH & Co. KG.

pathname space mapping by informing the process manager that it isthe one responsible for handling requests at (or below, in the case offilesystems), a certainmountpoint. This allows the process manager toassociate services (i.e. functions provided by resource managers) withpathnames.

For example, a serial port may be managed by a resource managercalleddevc-ser*, but the actual resource may be called/dev/ser1

in the pathname space. Therefore, when a program requests serialport services, it typically does so by opening a serial port — in thiscase/dev/ser1.

Why write a resource manager?Here are a few reasons why you may want to write a resourcemanager:

� The client API is POSIX.

The API for communicating with the resource manager is for themost part, POSIX. All C programmers are familiar with theopen(),read(), andwrite() functions. Training costs are minimized, and sois the need to document the interface to your server.

� You can reduce the number of interface types.

If you have many server processes, writing each server as aresource manager keeps the number of different interfaces thatclients need to use to a minimum.

For example, suppose you have a team of programmers buildingyour overall application, and each programmer is writing one ormore servers for that application. These programmers may workdirectly for your company, or they may belong to partnercompanies who are developing addon hardware for your modularplatform.

If the servers are resource managers, then the interface to all ofthose servers is the POSIX functions:open(), read(), write(), andwhatever else makes sense. For control-type messages that don’tfit into a read/write model, there’sdevctl() (althoughdevctl() isn’tPOSIX).

178 Chapter 7 � Resource Managers November 2, 2006

Page 201: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. What is a resource manager?

� Command-line utilities can communicate with resource managers.

Since the API for communicating with a resource manager is thePOSIX set of functions, and since standard POSIX utilities use thisAPI, the utilities can be used for communicating with the resourcemanagers.

For instance, the tiny TCP/IP protocol module containsresource-manager code that registers the name/proc/ipstats.If you open this name and read from it, the resource manager coderesponds with a body of text that describes the statistics for IP.

Thecat utility takes the name of a file and opens the file, readsfrom it, and displays whatever it reads to standard output (typicallythe screen). As a result, you can type:cat /proc/ipstats

The resource manager code in the TCP/IP protocol moduleresponds with text such as:Ttcpip Sep 5 2000 08:56:16

verbosity level 0

ip checksum errors: 0udp checksum errors: 0

tcp checksum errors: 0

packets sent: 82

packets received: 82

lo0 : addr 127.0.0.1 netmask 255.0.0.0 up

DST: 127.0.0.0 NETMASK: 255.0.0.0 GATEWAY: lo0

TCP 127.0.0.1.1227 > 127.0.0.1.6000 ESTABLISHED snd 0 rcv 0TCP 127.0.0.1.6000 > 127.0.0.1.1227 ESTABLISHED snd 0 rcv 0

TCP 0.0.0.0.6000 LISTEN

You could also use command-line utilities for a robot-arm driver.The driver could register the name,/dev/robot/arm/angle,and any writes to this device are interpreted as the angle to set therobot arm to. To test the driver from the command line, you’d type:echo 87 >/dev/robot/arm/angle

Theecho utility opens/dev/robot/arm/angle and writes thestring (“87”) to it. The driver handles the write by setting the robotarm to 87 degrees. Note that this was accomplished withoutwriting a special tester program.

November 2, 2006 Chapter 7 � Resource Managers 179

Page 202: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

What is a resource manager? 2006, QNX Software Systems GmbH & Co. KG.

Another example would be names such as/dev/robot/registers/r1, r2,.... Reading from these namesreturns the contents of the corresponding registers; writing to thesenames sets the corresponding registers to the given values.

Even if all of your other IPC is done via some non-POSIX API, it’sstill worth having one thread written as a resource manager forresponding to reads and writes for doing things as shown above.

The types of resource managersDepending on how much work you want to do yourself in order topresent a proper POSIX filesystem to the client, you can breakresource managers into two types:

� Device resource managers

� Filesystem resource managers

Device resource managers

Device resource managers create only single-file entries in thefilesystem, each of which is registered with the process manager.Each name usually represents a single device. These resourcemanagers typically rely on the resource-manager library to do most ofthe work in presenting a POSIX device to the user.

For example, a serial port driver registers names such as/dev/ser1

and/dev/ser2. When the user doesls -l /dev, the library doesthe necessary handling to respond to the resultingIO STAT messageswith the proper information. The person who writes the serial portdriver can concentrate instead on the details of managing the serialport hardware.

Filesystem resource managers

Filesystem resource managers register amountpoint with the processmanager. A mountpoint is the portion of the path that’s registeredwith the process manager. The remaining parts of the path aremanaged by the filesystem resource manager. For example, when a

180 Chapter 7 � Resource Managers November 2, 2006

Page 203: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. What is a resource manager?

filesystem resource manager attaches a mountpoint at/mount, andthe path/mount/home/thomasf is examined:

/mount/ Identifies the mountpoint that’s managed by theprocess manager.

home/thomasf

Identifies the remaining part that’s to be managed bythe filesystem resource manager.

Here are some examples of using filesystem resource managers:

� flash filesystem drivers (although a flash driver toolkit is availablethat takes care of these details)

� atar filesystem process that presents the contents of atar file asa filesystem that the user cancd into andls from

� a mailbox-management process that registers the name/mailboxes and manages individual mailboxes that look likedirectories, and files that contain the actual messages.

Communication via native IPCOnce a resource manager has established its pathname prefix, it willreceive messages whenever any client program tries to do anopen(),read(), write(), etc. on that pathname. For example, afterdevc-ser*

has taken over the pathname/dev/ser1, and a client programexecutes:

fd = open ("/dev/ser1", O RDONLY);

the client’s C library will construct anio open message, which itthen sends to thedevc-ser* resource manager via IPC.

Some time later, when the client program executes:

read (fd, buf, BUFSIZ);

November 2, 2006 Chapter 7 � Resource Managers 181

Page 204: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

What is a resource manager? 2006, QNX Software Systems GmbH & Co. KG.

the client’s C library constructs anio read message, which is thensent to the resource manager.

A key point is thatall communications between the client programand the resource manager are done throughnative IPC messaging.This allows for a number of unique features:

� A well-defined interface to application programs. In a developmentenvironment, this allows a very clean division of labor for theimplementation of the client side and the resource manager side.

� A simple interface to the resource manager. Since all interactionswith the resource manager go through native IPC, and there are nospecial “back door” hooks or arrangements with the OS, the writerof a resource manager can focus on the task at hand, rather thanworry about all the special considerations needed in otheroperating systems.

� Free network transparency. Since the underlying native IPCmessaging mechanism is inherently network-distributed withoutany additional effort required by the client or server (resourcemanager), programs can seamlessly access resources on othernodes in the network without even being aware that they’re goingover a network.

All QNX Neutrino device drivers and filesystems are implemented asresource managers. This means that everything that a “native” QNXNeutrino device driver or filesystem can do, a user-written resourcemanager can do as well.

Consider FTP filesystems, for instance. Here a resource managerwould take over a portion of the pathname space (e.g./ftp) andallow users tocd into FTP sites to get files. For example,cd

/ftp/rtfm.mit.edu/pub, would connect to the FTP sitertfm.mit.edu and change directory to/pub. After that point, theuser could open, edit, or copy files.

Application-specific filesystems would be another example of auser-written resource manager. Given an application that makes

182 Chapter 7 � Resource Managers November 2, 2006

Page 205: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Resource manager architecture

extensive use of disk-based files, a custom tailored filesystem can bewritten that works with that application and delivers superiorperformance.

The possibilities for custom resource managers are limited only bythe application developer’s imagination.

Resource manager architectureHere is the heart of a resource manager:

initialize the dispatch interfaceregister the pathname with the process managerDO forever

receive a messageSWITCH on the type of message

CASE io open:perform io open processingENDCASE

CASE io read:perform io read processingENDCASE

CASE io write:perform io write processingENDCASE

. // etc. handle all other messages

. // that may occur, performing

. // processing as appropriateENDSWITCH

ENDDO

The architecture contains three parts:

1 A channel is created so that client programs can connect to theresource manager to send it messages.

2 The pathname (or pathnames) that the resource manager isgoing to be responsible for is registered with the processmanager, so that it can resolve open requests for that particularpathname to this resource manager.

November 2, 2006 Chapter 7 � Resource Managers 183

Page 206: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Resource manager architecture 2006, QNX Software Systems GmbH & Co. KG.

3 Messages are received and processed.

This message-processing structure (the switch/case, above) isrequired for each and every resource manager. However, we provide aset of convenient library functions to handle this functionality (andother key functionality as well).

Message typesArchitecturally, there are two categories of messages that a resourcemanager will receive:

� connect messages

� I/O messages.

A connect message is issued by the client to perform an operationbased on a pathname (e.g. anio open message). This may involveperforming operations such as permission checks (does the clienthave the correct permission to open this device?) and setting up acontext for that request.

An I/O message is one that relies upon this context (created betweenthe client and the resource manager) to perform subsequentprocessing of I/O messages (e.g.io read).

There are good reasons for this design. It would be inefficient to passthe full pathname for each and everyread() request, for example. Theio open handler can also perform tasks that we want done only once(e.g. permission checks), rather than with each I/O message. Also,when theread() has read 4096 bytes from a disk file, there may beanother 20 megabytes still waiting to be read. Therefore, theread()function would need to have some context information telling it theposition within the file it’s reading from.

The resource manager shared libraryIn a custom embedded system, part of the design effort may be spentwriting a resource manager, because there may not be an off-the-shelfdriver available for the custom hardware component in the system.

184 Chapter 7 � Resource Managers November 2, 2006

Page 207: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Resource manager architecture

Our resource manager shared library makes this task relatively simple.

Automatic default message handling

If there are functions that the resource manager doesn’t want tohandle for some reason (e.g. a digital-to-analog converter doesn’tsupport a function such aslseek(), or the software doesn’t require it),the shared library will conveniently supply default actions.

There are two levels of default actions:

� The first level simply returnsENOSYSto the client application,informing it that that particular function is not supported.

� The second level (i.e. theiofunc *() shared library) allows aresource manager to automatically handle various functions.

For more information on default actions, see the section on“Second-level default message handling” in this chapter.

open(), dup(), and close()

Another convenient service that the resource manager shared libraryprovides is the automatic handling ofdup() messages.

Suppose that the client program executed code that eventually endedup performing:

fd = open ("/dev/device", O RDONLY);...fd2 = dup (fd);...fd3 = dup (fd);...close (fd3);...close (fd2);...close (fd);

The client would generate anio open message for the firstopen(),and then twoio dup messages for the twodup() calls. Then, when

November 2, 2006 Chapter 7 � Resource Managers 185

Page 208: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Resource manager architecture 2006, QNX Software Systems GmbH & Co. KG.

the client executed theclose() calls, threeio close messages wouldbe generated.

Since thedup() functions generate duplicates of the file descriptors,new context information should not be allocated for each one. Whentheio close messages arrive, because no new context has beenallocated for eachdup(), norelease of the memory by eachio close message should occur either! (If it did, the first closewould wipe out the context.)

The resource manager shared library provides default handlers thatkeep track of theopen(), dup(), andclose() messages and performwork only for the last close (i.e. the thirdio close message in theexample above).

Multiple thread handling

One of the salient features of QNX Neutrino is the ability to usethreads. By using multiple threads, a resource manager can bestructured so that several threads are waiting for messages and thensimultaneously handling them.

This thread management is another convenient function provided bythe resource manager shared library. Besides keeping track of boththe number of threads created and the number of threads waiting, thelibrary also takes care of maintaining the optimal number of threads.

Dispatch functions

The OS provides a set ofdispatch * functions that:

� allow a common blocking point for managers and clients that needto support multiple message types (e.g. a resource manager couldhandle its own private message range).

� provide a flexible interface for message types that isn’t tied to theresource manager (for clean handling of private messages andpulse codes)

� decouple the blocking and handler code from threads. You canimplement the resource manager event loop in your main code.

186 Chapter 7 � Resource Managers November 2, 2006

Page 209: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Resource manager architecture

This decoupling also makes for easier debugging, because you canput a breakpoint between the block function and the handlerfunction.

For more information, see the Writing a Resource Manager chapter intheProgrammer’s Guide.

Combine messages

In order to conserve network bandwidth and to provide support foratomic operations, the OS supportscombine messages. A combinemessage is constructed by the client’s C library and consists of anumber of I/O and/or connect messages packaged together into one.

For example, the functionreadblock() allows a thread to atomicallyperform anlseek() andread() operation. This is done in the clientlibrary by combining theio lseek andio read messages into one.When the resource manager shared library receives the message, itwill process both theio lseek andio read messages, effectivelymaking thatreadblock() function behave atomically.

Combine messages are also useful for thestat() function. Astat() callcan be implemented in the client’s library as anopen(), fstat(), andclose(). Instead of generating three separate messages (one for each ofthe component functions), the library puts them together into onecontiguous combine message. This boosts performance, especiallyover a networked connection, and also simplifies the resourcemanager, which doesn’t need a connect function to handlestat().

The resource manager shared library takes care of the issuesassociated with breaking out the individual components of thecombine message and passing them to the various handler functionssupplied. Again, this minimizes the effort associated with writing aresource manager.

Second-level default message handling

Since a large number of the messages received by a resource managerdeal with a common set of attributes, the OS provides another level ofdefault handling. This second level, called theiofunc *() sharedlibrary, allows a resource manager to handle functions likestat(),

November 2, 2006 Chapter 7 � Resource Managers 187

Page 210: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Resource manager architecture 2006, QNX Software Systems GmbH & Co. KG.

chmod(), chown(), lseek(), etc.automatically, without theprogrammer having to write additional code. As an added benefit,theseiofunc *() default handlers implement the POSIX semantics forthe messages, again offloading work from the programmer.

Three main structures need to be considered:

� context

� attributes structure

� mount structure

Contextstructure

Attributestructure

Mountstructure

(optional)

A resource manager is responsible for three data structures.

The first data structure, the context, has already been discussed (seethe section on “Message types”). It holds data used on a per-openbasis, such as the current position into a file (thelseek() offset).

Since a resource manager may be responsible for more than onedevice (e.g.devc-ser* may be responsible for/dev/ser1,/dev/ser2, /dev/ser3, etc.), theattributes structure holds data ona per-device basis. The attributes structure contains such items as theuser and group ID of the owner of the device, the last modificationtime, etc.

For filesystem (block I/O device) managers, one more structure isused. This is themount structure, which contains data items that areglobal to the entire mount device.

When a number of client programs have opened various devices on aparticular resource, the data structures may look like this:

188 Chapter 7 � Resource Managers November 2, 2006

Page 211: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Resource manager architecture

Channel

ProcessA

ProcessB

ProcessC

Client Context A

Context B

Context C

Attributestructure

Attributestructure

Mountstructure(optional)

resmgr

Multiple clients opening various devices.

Theiofunc *() default functions operate on the assumption that theprogrammer has used the default definitions for the context block andthe attributes structures. This is a safe assumption for two reasons:

1 The default context and attribute structures contain sufficientinformation for most applications.

2 If the default structures don’t hold enough information, theycan be encapsulated within the structures that you’ve defined.

By definition, the default structures must be the first members of theirrespective superstructures, allowing clean and simple access to therequisite base members by theiofunc *() default functions:

November 2, 2006 Chapter 7 � Resource Managers 189

Page 212: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Summary 2006, QNX Software Systems GmbH & Co. KG.

(iofunc_attr_t *)

Attributesuperstructure

Defaultmembers

Extensions

Encapsulation.

The library containsiofunc *() default handlers for these clientfunctions:

chmod()chown()close()devctl()fpathconf()fseek()fstat()

lock()lseek()mmap()open()pathconf()stat()utime()

SummaryBy supporting pathname space mapping, by having a well-definedinterface to resource managers, and by providing a set of libraries forcommon resource manager functions, QNX Neutrino offers thedeveloper unprecedented flexibility and simplicity in developing“drivers” for new hardware — a critical feature for many embeddedsystems.

190 Chapter 7 � Resource Managers November 2, 2006

Page 213: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 8

Filesystems

In this chapter. . .Introduction 193Filesystem classes 194Image filesystem 201RAM “filesystem” 202Embedded transaction filesystem (ETFS)203QNX 4 filesystem 208DOS Filesystem 209CD-ROM filesystem 212FFS3 filesystem 212NFS filesystem 218CIFS filesystem 219Linux Ext2 filesystem 219Virtual filesystems 219

November 2, 2006 Chapter 8 � Filesystems 191

Page 214: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 215: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

IntroductionQNX Neutrino provides a rich variety of filesystems. Like mostservice-providing processes in the OS, these filesystems executeoutside the kernel; applications use them by communicating viamessages generated by the shared-library implementation of thePOSIX API.

Most of these filesystems areresource managers as described in thisbook. Each filesystem adopts a portion of the pathname space (calleda mountpoint) and provides filesystem services through the standardPOSIX API (open(), close(), read(), write(), lseek(), etc.). Filesystemresource managers take over a mountpoint and manage the directorystructure below it. They also check the individual pathnamecomponents for permissions and for access authorizations.

This implementation means that:

� Filesystems may be started and stopped dynamically.

� Multiple filesystems may run concurrently.

� Applications are presented with a single unified pathname spaceand interface, regardless of the configuration and number ofunderlying filesystems.

� A filesystem running on one node is transparently accessible fromany other node.

Filesystems and pathname resolutionYou can seamlessly locate and connect to any service or filesystemthat’s been registered with the process manager. When a filesystemresource manager registers a mountpoint, the process manager createsan entry in the internal mount table for that mountpoint and itscorresponding server ID (i.e. thend, pid, chid identifiers).

This table effectively joins multiple filesystem directories into whatusers perceive as a single directory. The process manager handles themountpoint portion of the pathname; the individual filesystem

November 2, 2006 Chapter 8 � Filesystems 193

Page 216: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Filesystem classes 2006, QNX Software Systems GmbH & Co. KG.

resource managers take care of the remaining parts of the pathname.Filesystems can be registered (i.e. mounted) in any order.

When a pathname is resolved, the process manager contacts all thefilesystem resource managers that can handle some component of thatpath. The result is a collection of file descriptors that can resolve thepathname.

If the pathname represents a directory, the process manager asks allthe filesystems that can resolve the pathname for a listing of files inthat directory whenreaddir() is called. If the pathname isn’t adirectory, then the first filesystem that resolves the pathname isaccessed.

For more information on pathname resolution, see the section“Pathname management” in the chapter on the Process Manager inthis guide.

Filesystem classesThe many filesystems available can be categorized into the followingclasses:

Image A special filesystem that presents the modules in theimage and is always present. Note that theprocnto

process automatically provides an image filesystem anda RAM filesystem.

Block Traditional filesystems that operate on block devices likehard disks and CD-ROM drives. This includes theQNX 4, DOS, and CD-ROM filesystems.

Flash Nonblock-oriented filesystems designed explicitly forthe characteristics of flash memory devices. For NORdevices, use the FFS3 filesystem; for NAND, use ETFS.

Network Filesystems that provide network file access to thefilesystems on remote host computers. This includes theNFS and CIFS (SMB) filesystems.

194 Chapter 8 � Filesystems November 2, 2006

Page 217: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Filesystem classes

Virtual QNX Neutrino provides the following virtualfilesystems:

� Package Filesystem, which presents custom “views”of selected files and directories to a client

� Inflator, a resource manager that sits in front of otherfilesystems and uncompresses files that werepreviously compressed (using thedeflate utility).

Filesystems as shared librariesSince it’s common to run many filesystems under QNX Neutrino,they have been designed as a family of drivers and shared libraries tomaximize code reuse. This means the cost of adding an additionalfilesystem is typically smaller than might otherwise be expected.

Once an initial filesystem is running, the incremental memory cost foradditional filesystems is minimal, since only the code to implementthe new filesystem protocol would be added to the system.

The various filesystems are layered as follows:

November 2, 2006 Chapter 8 � Filesystems 195

Page 218: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Filesystem classes 2006, QNX Software Systems GmbH & Co. KG.

procnto

devf-*

io-cam

fs-cifsfs-nfs2

io-net

devb-*.so

fs-qnx4.so fs-dos.so fs-cd.so

io_blk.so

cam-cdrom.so cam-disk.so

npm-tcpip.so

devn-*.so

QNX Neutrino filesystem layering.

As shown in this diagram, the filesystems, disk drivers, andio-blk

are implemented as shared libraries (essentially passive blocks ofcode resident in memory), whileio-cam is the active executable thatcalls into the libraries. In operation, theio-cam process starts firstand invokes the block-level shared library (io-blk.so) as well as thedisk driver shared libraries (devb-*). The filesystem shared librariesmay be dynamically loaded later to provide filesystem interfaces andservices.

A “filesystem” shared library implements a filesystem protocol or“personality” on a set of blocks on a physical disk device. Thefilesystems aren’t built into the OS kernel; rather, they’re dynamicentities that can be loaded or unloaded on demand.

196 Chapter 8 � Filesystems November 2, 2006

Page 219: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Filesystem classes

For example, a removable storage device (PCCard flash card, floppydisk, removable cartridge disk, etc.) may be inserted at any time, withany of a number of filesystems stored on it. While the hardware thedriver interfaces to is unlikely to change dynamically, the on-disk datastructure could vary widely. The dynamic nature of the filesystemcopes with this very naturally.

io-blkMost of the filesystem shared libraries ride on top of the Block I/Omodule (io-blk.so). This module also acts as a resource managerand exports a block-special file for each physical device. For a systemwith two hard disks the default files would be:

/dev/hd0 First hard disk

/dev/hd1 Second hard disk

These files represent each raw disk and may be accessed using all thenormal POSIX file primitives (open(), close(), read(), write(), lseek(),etc.). Although theio-blk module can support a 64-bit offset onseek, the driver interface is 32-bit, allowing access to 2-terabyte disks.

Builtin RAM disk

Theio-blk module supports an internal RAM-disk device that canbe created via a command-line option (blk ramdisk=size). Sincethis RAM disk is internal to theio-blk module (rather than createdand maintained by an additional device driver such asdevb-ram),performance is significantly better than that of a dedicated RAM-diskdriver.

By incorporating the RAM-disk device directly at theio-blk layer,the device’s data memory parallels the main cache, so I/O operationsto this device can bypass the buffer cache, eliminating a memory copyyet still retaining coherency. Contrast this with a driver-levelimplementation (e.g.devb-ram) where transparently presenting theRAM as a block device involves additional memory copies andduplicates data in the buffer cache. Inter-DLL callouts are also

November 2, 2006 Chapter 8 � Filesystems 197

Page 220: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Filesystem classes 2006, QNX Software Systems GmbH & Co. KG.

eliminated. In addition, there are benefits in terms of installationfootprint for systems that have a hard disk and also want a RAM disk— only the single driver is needed.

Partitions

QNX Neutrino complies with the de facto industry standard forpartitioning a disk. This allows a number of filesystems to share thesame physical disk. Each partition is also represented as ablock-special file, with the partition type appended to the filename ofthe disk it’s located on. In the above “two-disk” example, if the firstdisk had a QNX partition and a DOS partition, while the second diskhad only a QNX partition, then the default files would be:

/dev/hd0 First hard disk

/dev/hd0t6 DOS partition on first hard disk

/dev/hd0t79 QNX partition on first hard disk

/dev/hd1 Second hard disk

/dev/hd1t79 QNX partition on second hard disk

The following list shows some typical assigned partition types:

Type Filesystem

1 DOS (12-bit FAT)

4 DOS (16-bit FAT; partitions<32M)

5 DOS Extended Partition (enumerated but not presented)

6 DOS 4.0 (16-bit FAT; partitions≥32M)

7 OS/2 HPFS

7 Previous QNX version 2 (pre-1988)

continued. . .

198 Chapter 8 � Filesystems November 2, 2006

Page 221: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Filesystem classes

Type Filesystem

8 QNX 1.x and 2.x (“qny”)

9 QNX 1.x and 2.x (“qnz”)

11 DOS 32-bit FAT; partitions up to 2047G

12 Same as Type 11, but uses Logical Block AddressInt13h extensions

14 Same as Type 6, but uses Logical Block AddressInt 13hextensions

15 Same as Type 5, but uses Logical Block AddressInt 13hextensions

77 QNX POSIX partition (secondary)

78 QNX POSIX partition (secondary)

79 QNX POSIX partition

99 UNIX

131 Linux (Ext2)

Buffer cache

Theio-blk shared library implements abuffer cache that allfilesystems inherit. The buffer cache attempts to store frequentlyaccessed filesystem blocks in order to minimize the number of times asystem has to perform a physical I/O to the disk.

Read operations are synchronous; write operations are usuallyasynchronous. When an application writes to a file, the data enters thecache, and the filesystem manager immediately replies to the clientprocess to indicate that the data has been written. The data is thenwritten to the disk.

Critical filesystem blocks such as bitmap blocks, directory blocks,extent blocks, and inode blocks are written immediately andsynchronously to disk.

November 2, 2006 Chapter 8 � Filesystems 199

Page 222: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Filesystem classes 2006, QNX Software Systems GmbH & Co. KG.

Applications can modify write behavior on a file-by-file basis. Forexample, a database application can cause all writes for a given file tobe performed synchronously. This would ensure a high level of fileintegrity in the face of potential hardware or power problems thatmight otherwise leave a database in an inconsistent state.

Filesystem limitationsPOSIX defines the set of services a filesystem must provide. However,not all filesystems are capable of delivering all those services.

The following chart highlights some of the POSIX-definedcapabilities and indicates if the filesystems support them:

200 Chapter 8 � Filesystems November 2, 2006

Page 223: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Image filesystem

Capability Image RAM ETFS QNX 4 DOS CD-ROM FFS3 NFS CIFS Ext2

Access date no yes yes yes yesa yesb no yes no yes

Modification date no yes yes yes yes yesb yes yes yes yes

Status change date no yes yes yes no yesb yes yes no yes

Filename length 255 255 91 48c 8.3d 32e 255 —f —f 255

User permissions yes yes yes yes no yesb yes yesf yesf yes

Group permissions yes yes yes yes no yesb yes yesf yesf yes

Other permissions yes yes yes yes no yesb yes yesf yesf yes

Directories no no yes yes yes yes yes yes yes yes

Hard links no no no yes no no no yesf no yes

Soft links no no yes yes no yesb yes yesf no yes

Decompression on read no no no no no no yes no no no

a VFAT or FAT32 (e.g. Windows 95).

b With Rock Ridge extensions.

c 505 if .longfilenames is enabled; otherwise, 48.

d 255-character filename lengths used by VFAT or FAT32 (e.g.Windows 95).

e 128 with Joliet extensions; 255 with Rock Ridge extensions.

f Limited by the remote filesystem.

Image filesystemEvery QNX Neutrino system image provides a simpleread-onlyfilesystem that presents the set of files built into the OS image.

Since this image may include both executables and data files, thisfilesystem is sufficient for many embedded systems. If additional

November 2, 2006 Chapter 8 � Filesystems 201

Page 224: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

RAM “filesystem” 2006, QNX Software Systems GmbH & Co. KG.

filesystems are required, they would be placed as modules within theimage where they can be started as needed.

RAM “filesystem”Every QNX system also provides a simple RAM-based “filesystem”that allows read/write files to be placed under/dev/shmem.

This RAM filesystem finds the most use in tiny embedded systemswhere persistent storage across reboots isn’t required, yet where asmall, fast,temporary-storage filesystem with limited features iscalled for.

The filesystem comes for free withprocnto and doesn’t require anysetup. You can simply create files under/dev/shmem and grow themto any size (depending on RAM resources).

Although the RAM filesystem itself doesn’t support hard or soft linksor directories, you can create a link to it by using process-managerlinks. For example, you could create a link to a RAM-based/tmp

directory:

ln -sP /dev/shmem /tmp

This tellsprocnto to create a process manager link to/dev/shmem

known as “/tmp.” Application programs can then open files under/tmp as if it were a normal filesystem.

In order to minimize the size of the RAM filesystem code inside theprocess manager, this filesystem specifically doesn’t include “bigfilesystem” features such as file locking and directory creation.

202 Chapter 8 � Filesystems November 2, 2006

Page 225: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Embedded transaction filesystem (ETFS)

Embedded transaction filesystem (ETFS)ETFS implements a high-reliability filesystem for use with embeddedsolid-state memory devices, particularly NAND flash memory. Thefilesystem supports a fully hierarchical directory structure withPOSIX semantics as shown in the table above.

ETFS is a filesystem composed entirely oftransactions. Every writeoperation, whether of user data or filesystem metadata, consists of atransaction. A transaction either succeeds or is treated as if it neveroccurred.

Transactions never overwrite live data. A write in the middle of a fileor a directory update always writes to a new unused area. In this way,if the operation fails part way through (due to a crash or powerfailure), the old data is still intact.

Some log-based filesystems also operate under the principle that livedata is never overwritten. But ETFS takes this to the extreme byturning everything into a log of transactions. The filesystem hierarchyis built on the fly by processing the log of transactions in the device.This scan occurs at startup, but is designed such that only a smallsubset of the data is read and CRC-checked, resulting in faster startuptimes without sacrificing reliability.

Transactions are position-independent in the device and may occur inany order. You could read the transactions from one device and writethem in a different order to another device. This is important becauseit allows bulk programming of devices containing bad blocks that maybe at arbitrary locations.

This design is well-suited for NAND flash memory. NAND flash isshipped with factory-marked bad blocks that may occur in anylocation.

November 2, 2006 Chapter 8 � Filesystems 203

Page 226: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Embedded transaction filesystem (ETFS) 2006, QNX Software Systems GmbH & Co. KG.

Header Data DataHeader Header ...

FID

Offset

Size

Sequence

CRCs

ECCs

Other

Required;device-independent

Optional;device-dependent

Transaction

ETFS is a filesystem composed entirely of transactions.

Inside a transactionEach transaction consists of a header followed by data. The headercontains the following:

FID A unique file ID that identifies which file thetransaction belongs to.

Offset The offset of the data portion within the file.

Size The size of the data portion.

Sequence A monotonically increasing number (to enable timeordering).

CRCs Data integrity checks (for NAND, NOR, SRAM).

ECCs Error correction (for NAND).

Other Reserved for future expansion.

204 Chapter 8 � Filesystems November 2, 2006

Page 227: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Embedded transaction filesystem (ETFS)

Types of storage mediaAlthough best for NAND devices, ETFS also supports other types ofembedded storage media through the use of driver classes as follows:

Class CRC ECC Wear-levelingerase

Wear-levelingread

Cluster size

NAND 512+16 Yes Yes Yes Yes 1K

NAND 2048+64 Yes Yes Yes Yes 2K

RAM No No No No 1K

SRAM Yes No No No 1K

NOR Yes No Yes No 1K

Although ETFS can support NOR flash, we recommend instead theFFS3 filesystem (devf-*), which is designed explicitly for NORflash devices.

Reliability featuresETFS is designed to survive across a power failure, even during anactive flash write or block erase. The following features contribute toits reliability:

� dynamic wear-leveling

� static wear-leveling

� CRC error detection

� ECC error correction

� read degradation monitoring with automatic refresh

� transaction rollback

� atomic file operations

November 2, 2006 Chapter 8 � Filesystems 205

Page 228: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Embedded transaction filesystem (ETFS) 2006, QNX Software Systems GmbH & Co. KG.

� automatic file defragmentation.

Dynamic wear-leveling

Flash memory allows a limited number of erase cycles on a flashblock before the block will fail. This number can be as low as100,000. ETFS tracks the number of erases on each block. Whenselecting a block to use, ETFS attempts to spread the erase cyclesevenly over the device, dramatically increasing its life. The differencecan be extreme: from usage scenarios of failure within a few dayswithout wear-leveling to over 40 years with wear-leveling.

Static wear-leveling

Filesystems often consist of a large number of static files that are readbut not written. These files will occupy flash blocks that have noreason to be erased. If the majority of the files in flash are static, thiswill cause the remaining blocks containing dynamic data to wear at adramatically increased rate.

ETFS notices these under-worked static blocks and forces them intoservice by copying their data to an over-worked block. This solvestwo problems: It gives the over-worked block a rest, since it nowcontains static data, and it forces the under-worked static block intothe dynamic pool of blocks.

CRC error detection

Each transaction is protected by a cyclic redundancy check (CRC).This ensures quick detection of corrupted data, and forms the basis forthe rollback operation of damaged or incomplete transactions atstartup. The CRC can detect multiple bit errors that may occur duringa power failure.

ECC error correction

On a CRC error, ETFS can apply error correction coding (ECC) toattempt to recover the data. This is suitable for NAND flash memory,in which single-bit errors may occur during normal usage. An ECCerror is a warning signal that the flash block the error occurred in maybe getting weak, i.e. loosing charge.

206 Chapter 8 � Filesystems November 2, 2006

Page 229: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Embedded transaction filesystem (ETFS)

ETFS will mark the weak block for arefresh operation, which copiesthe data to a new flash block and erases the weak block. The eraserecharges the flash block.

Read degradation monitoring with automatic refresh

Each read operation within a NAND flash block weakens the chargemaintaining the data bits. Most devices support about 100,000 readsbefore there’s danger of loosing a bit. The ECC will recover asingle-bit error, but may not be able to recover multi-bit errors.

ETFS solves this by tracking reads and marking blocks for refreshbefore the 100,000 read limit is reached.

Transaction rollback

When ETFS starts, it processes all transactions and rolls back(discards) the last partial or damaged transaction. The rollback code isdesigned to handle a power failure during a rollback operation, thusallowing the system to recover from multiple nested faults. Thevalidity of a transaction is protected by CRC codes on eachtransaction.

Atomic file operations

ETFS implements a very simple directory structure on the device,allowing significant modifications with a single flash write. Forexample, the move of a file or directory to another directory is often amultistage operation in most filesystems. In ETFS, a move isaccomplished with a single flash write.

Automatic file defragmentation

Log-based filesystems often suffer from fragmentation, since eachupdate or write to an existing file causes a new transaction to becreated. ETFS uses write-buffering to combine small writes intolarger write transactions in an attempt to minimize fragmentationcaused by lots of very small transactions. ETFS also monitors thefragmentation level of each file and will do a background defragoperation on files that do become badly fragmented. Note that this

November 2, 2006 Chapter 8 � Filesystems 207

Page 230: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX 4 filesystem 2006, QNX Software Systems GmbH & Co. KG.

background activity will always be preempted by a user data requestin order to ensure immediate access to the file being defragmented.

QNX 4 filesystemThe QNX 4 filesystem (fs-qnx4.so) is a high-performancefilesystem that shares the same on-disk structure as QNX 4.

The QNX 4 filesystem implements an extremely robust design,utilizing an extent-based, bitmap allocation scheme with fingerprintcontrol structures to safeguard against data loss and to provide easyrecovery. Features include:

� extent-based POSIX filesystem

� robustness: all sensitive filesystem info is written through to disk

� on-disk “signatures” and special key information to allow fast datarecovery in the event of disk damage

� 505-character filenames

� multi-threaded design

� client-driven priority

� same disk format as the filesystem under QNX 4.

Since the release of 6.2.1, the 48-character filename limit hasincreased to 505 characters via a backwards-compatible extension.The same on-disk format is retained, but new systems will see thelonger name, old ones will see a truncated 48-character name.

For more information, see “QNX 4 filesystem” in the Working WithFilesystems chapter of the QNX NeutrinoUser’s Guide.

208 Chapter 8 � Filesystems November 2, 2006

Page 231: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. DOS Filesystem

DOS FilesystemThe DOS Filesystem,fs-dos.so, provides transparent access toDOS disks, so you can treat DOS filesystems as though they werePOSIX filesystems. This transparency allows processes to operate onDOS files without any special knowledge or work on their part.

The structure of the DOS filesystem on disk is old and inefficient, andlacks many desirable features. Its only major virtue is its portability toDOS and Windows environments. You should choose this filesystemonly if you need to transport DOS files to other machines that requireit. Consider using the QNX filesystem alone if DOS file portabilityisn’t an issue or in conjunction with the DOS filesystem if it is.

If there’s no DOS equivalent to a POSIX feature,fs-dos.so willeither return an error or a reasonable default. For example, an attemptto create alink() will result in the appropriateerrno being returned.On the other hand, if there’s an attempt to read the POSIX times on afile, fs-dos.so will treat any of theunsupported times the same asthe last write time.

DOS version support

Thefs-dos.so program supports both floppies and hard diskpartitions from DOS version 2.1 to Windows 98 with long filenames.

DOS text files

DOS terminates each line in a text file with two characters (CR/LF),while POSIX (and most other) systems terminate each line with asingle character (LF). Note thatfs-dos.so makes no attempt totranslate text files being read. Most utilities and programs won’t beaffected by this difference.

Note also that some very old DOS programs may use aCtrl – Z (ˆZ)as a file terminator. This character is also passed through withoutmodification.

November 2, 2006 Chapter 8 � Filesystems 209

Page 232: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

DOS Filesystem 2006, QNX Software Systems GmbH & Co. KG.

QNX-to-DOS filename mapping

In DOS, a filename cannot contain any of the following characters:

/ \ [ ] : * | + = ; , ?

An attempt to create a file that contains one of these invalid characterswill return an error. DOS (8.3 format) also expects all alphabeticalcharacters to be uppercase, sofs-dos.so maps these characters touppercase when creating a filename on disk. But it maps a filename tolowercase by default when returning a filename to a QNX Neutrinoapplication, so that QNX Neutrino users and programs can always seeand type lowercase (via thesfn=sfn mode option).

Handling filenames

You can specify how you wantfs-dos.so to handle long filenames(via thelfn=lfn mode option):

� Ignore them — display/create only 8.3 filenames.

� Show them — if filenames are longer than 8.3 or if mixed case isused.

� Always create both short and long filenames.

If you use theignore option, you can specify whether or not tosilently truncate filename characters beyond the 8.3 limit.

International filenames

The DOS filesystem supports DOS “code pages” (internationalcharacter sets) for locale filenames. Short 8.3 names are stored usinga particular character set (typically the most common extendedcharacters for a locale are encoded in the 8th-bit character range). Allthe common American as well as Western and Eastern European codepages (437, 850, 852, 866, 1250, 1251, 1252) are supported. If youproduce software that must access a variety of DOS/Windows harddisks, or operate in non-US-English countries, this feature offersimportant portability — filenames will be created with both aUnicode and locale name and are accessible via either name.

210 Chapter 8 � Filesystems November 2, 2006

Page 233: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. DOS Filesystem

The DOS filesystem supports international text in filenames only. Noattempt is made to be aware of data contents, with the sole exceptionof Windows “shortcut” (.LNK) files, which will be parsed andtranslated into symbolic links if you’ve specified that option(lnk=lnk mode).

DOS volume labels

DOS uses the concept of a volume label, which is an actual directoryentry in the root of the DOS filesystem. To distinguish between thevolume label and an actual DOS directory,fs-dos.so reports thevolume label according to the way you specify itsvollabel option.You can choose to:

� Ignore the volume label.

� Display the volume label as a name-special file.

� Display the volume label as a name-special file with an equal sign(=) as the first character of the volume name (the default).

DOS-QNX permission mapping

DOS doesn’t support all the permission bits specified by POSIX. Ithas a READONLY bit in place of separate READ and WRITE bits;it doesn’t have an EXECUTE bit. When a DOS file is created, theDOS READ ONLY bit will be set if all the POSIX WRITE bits areoff. When a DOS file is accessed, the POSIX READ bit is alwaysassumed to be set for user, group, and other.

Since you can’t execute a file that doesn’t have EXECUTEpermission,fs-dos.so has an option (exe=exec mode) that lets youspecify how to handle the POSIX EXECUTE bit for executables.

File ownership

Although the DOS file structure doesn’t support user IDs and groupIDs, fs-dos.so (by default) won’t return an error code if an attemptis made to change them. An error isn’t returned because a number ofutilities attempt to do this and failure would result in unexpected

November 2, 2006 Chapter 8 � Filesystems 211

Page 234: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

CD-ROM filesystem 2006, QNX Software Systems GmbH & Co. KG.

errors. The approach taken is “you can change anything to anythingsince it isn’t written to disk anyway.”

Theposix= options let you set stricter POSIX checks and enablePOSIX emulation. For example, in POSIX mode, an error ofEINVALis flagged for attempts to do any of the following:

� Set the user ID or group ID to something other than the default(root).

� Remove anr (read) permission.

� Set ans (set ID on execution) permission.

If you set theposix option toemulate (the default) orstrict, youget the following benefits:

� The. and.. directory entries are created in theroot directory.

� The directory size is calculated.

� The number of links in a directory is calculated, based on itssubdirectories.

CD-ROM filesystemThe CD-ROM filesystem provides transparent access to CD-ROMmedia, so you can treat CD-ROM filesystems as though they werePOSIX filesystems. This transparency allows processes to operate onCD-ROM files without any special knowledge or work on their part.

Thefs-cd.so manager implements the ISO 9660 standard as well asa number of extensions, including Rock Ridge (RRIP), Joliet(Microsoft), and multisession (Kodak Photo CD, enhanced audio).

FFS3 filesystemThe FFS3 filesystem drivers implement a POSIX-like filesystem onNOR flash memory devices. The drivers are standalone executablesthat contain both the flash filesystem code and the flash device code.

212 Chapter 8 � Filesystems November 2, 2006

Page 235: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. FFS3 filesystem

There are versions of the FFS3 filesystem driver for differentembedded systems hardware as well as PCMCIA memory cards.

The naming convention for the drivers isdevf-system, wheresystemdescribes the embedded system. For example, thedevf-800fads

driver is for the 800FADS PowerPC evaluation board.

To find out what flash devices we currently support, please refer to thefollowing sources:

� theboards andmtd-flash directories underbsp working dir/src/hardware/flash.

� QNX Neutrino OS docs (devf-* entries inUtilities Reference).

� the QNX Software Systems website (www.qnx.com).

CustomizationAlong with the pre-built flash filesystem drivers, including the“generic” driver (devf-generic), provide the libraries and sourcecode needed to build custom flash filesystem drivers for differentembedded systems. For information on how to do this, see theCustomizing the Flash Filesystem chapter in theBuilding EmbeddedSystems book.

OrganizationThe FFS3 filesystem drivers support one or more logical flash drives.Each logical drive is called asocket, which consists of a contiguousand homogeneous region of flash memory. For example, in a systemcontaining two different types of flash device at different addresses,where one flash device is used for the boot image and the other for theflash filesystem, each flash device would appear in a different socket.

Each socket may be divided into one or more partitions. Two types ofpartitions are supported: raw partitions and flash filesystem partitions.

November 2, 2006 Chapter 8 � Filesystems 213

Page 236: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

FFS3 filesystem 2006, QNX Software Systems GmbH & Co. KG.

Raw partitions

A raw partition in the socket is any partition that doesn’t contain aflash filesystem. The driver doesn’t recognize any filesystem typesother than the flash filesystem. A raw partition may contain an imagefilesystem or some application-specific data.

The filesystem will make accessible through a raw mountpoint (seebelow) any partitions on the flash that aren’t flash filesystempartitions. Note that the flash filesystem partitions are available as rawpartitions as well.

Filesystem partitions

A flash filesystem partition contains the POSIX-like flash filesystem,which uses a QNX proprietary format to store the filesystem data onthe flash devices. This format isn’t compatible with either theMicrosoft FFS2 or PCMCIA FTL specification.

The filesystem allows files and directories to be freely created anddeleted. It recovers space from deleted files using a reclaimmechanism similar to garbage collection.

Mountpoints

When you start the flash filesystem driver, it will by default mountany partitions it finds in the socket. Note that you can specify themountpoint usingmkefs or flashctl (e.g./flash).

Mountpoint Description

/dev/fsX raw mountpoint socketX

/dev/fsXpY raw mountpoint socketX partitionY

/fsXpY filesystem mountpoint socketX partitionY

/fsXpY/.cmp filesystem compressed mountpoint socketXpartitionY

214 Chapter 8 � Filesystems November 2, 2006

Page 237: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. FFS3 filesystem

FeaturesThe FFS3 filesystem supports many advanced features, such asPOSIX compatibility, multiple threads, background reclaim, faultrecovery, transparent decompression, endian-awareness,wear-leveling, and error-handling.

POSIX

The filesystem supports the standard POSIX functionality (includinglong filenames, access privileges, random writes, truncation, andsymbolic links) with the following exceptions:

� You can’t create hard links.

� Access times aren’t supported (but file modification times andattribute change times are).

These design compromises allow this filesystem to remain small andsimple, yet include most features normally found with block devicefilesystems.

Background reclaim

The FFS3 filesystem stores files and directories as a linked list ofextents, which are marked for deletion as they’re deleted or updated.Blocks to be reclaimed are chosen using a simple algorithm that findsthe block with the most space to be reclaimed while keeping level theamount of wear of each individual block. This wear-leveling increasesthe MTBF (mean time between failures) of the flash devices, thusincreasing their longevity.

The background reclaim process is performed when there isn’tenough free space. The reclaim process first copies the contents of thereclaim block to an empty spare block, which then replaces thereclaim block. The reclaim block is then erased. Unlike rotatingmedia with a mechanical head, proximity of data isn’t a factor with aflash filesystem, so data can be scattered on the media without loss ofperformance.

November 2, 2006 Chapter 8 � Filesystems 215

Page 238: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

FFS3 filesystem 2006, QNX Software Systems GmbH & Co. KG.

Fault recovery

The filesystem has been designed to minimize corruption due toaccidental loss-of-power faults. Updates to extent headers and eraseblock headers are always executed in carefully scheduled sequences.These sequences allow the recovery of the filesystem’s integrity in thecase of data corruption.

Note that properly designed flash hardware is essential for effectivefault-recovery systems. In particular, special reset circuitry must be inplace to hold the system in “reset” before power levels drop belowcritical. Otherwise, spurious or random bus activity can formwrite/erase commands and corrupt the flash beyond recovery.

Rename operations are guaranteed atomic, even throughloss-of-power faults. This means, for example, that if you lost powerwhile giving an image or executable a new name, you would still beable to access the file via its old name upon recovery.

When the FFS3 filesystem driver is started, it scans the state of everyextent header on the media (in order to validate its integrity) and takesappropriate action, ranging from a simple block reclamation to theerasure of dangling extent links. This process is merged with thefilesystem’s normal mount procedure in order to achieve optimalbootstrap timings.

Compression/decompression

For fast and efficient compression/decompression, you can use thedeflate andinflator utilities, which rely on populardeflate/inflate algorithms.

The deflate algorithm combines two algorithms. The first takes careof removing data duplication in files; the second algorithm handlesdata sequences that appear the most often by giving them shortersymbols. Those two algorithms provide excellent losslesscompression of data and executable files. The inflate algorithmsimply reverses what the deflate algorithm does.

216 Chapter 8 � Filesystems November 2, 2006

Page 239: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. FFS3 filesystem

Thedeflate utility is intended for use with thefilter attribute formkefs. You can also use it to precompress files intended for a flashfilesystem.

Theinflator resource manager sits in front of the other filesystemsthat were previously compressed using thedeflate utility. It canalmost double the effective size of the flash memory.

Compressed files can be manipulated with standard utilities such ascp or ftp — they can display their compressed and uncompressedsize with thels utility if used with the proper mountpoint. Thesefeatures make the management of a compressed flash filesystemseamless to a systems designer.

Flash errors

As flash hardware wears out, its write state-machine may find that itcan’t write or erase a particular bit cell. When this happens, the errorstatus is propagated to the flash driver so it can take proper action (i.e.mark the bad area and try to write/erase in another place).

This error-handling mechanism is transparent. Note that after severalflash errors, all writes and erases that fail will eventually render theflash read-only. Fortunately, this situation shouldn’t happen beforeseveral years of flash operation. Check your flash specification andanalyze your application’s data flow to flash in order to calculate itspotential longevity or MTBF.

Endian awareness

The FFS3 filesystem is endian-aware, making it portable acrossdifferent platforms. The optimal approach is to use themkefs utilityto select the target’s endian-ness.

UtilitiesThe filesystem supports all the standard POSIX utilities such asls,mkdir, rm, ln, mv, andcp. There are also some QNX Neutrinoutilities for managing the flash:

flashctl Erase, format, and mount flash partitions.

November 2, 2006 Chapter 8 � Filesystems 217

Page 240: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

NFS filesystem 2006, QNX Software Systems GmbH & Co. KG.

deflate Compress files for flash filesystems.

mkefs Create flash filesystem image files.

System callsThe filesystem supports all the standard POSIX I/O functions such asopen(), close(), read(), andwrite(). Special functions such as erasingare supported using thedevctl() function.

NFS filesystemThe Network File System (NFS) allows a client workstation toperform transparent file access over a network. It allows a clientworkstation to operate on files that reside on a server across a varietyof operating systems. Client file access calls are converted to NFSprotocol requests, and are sent to the server over the network. Theserver receives the request, performs the actual filesystem operation,and sends a response back to the client.

The Network File System operates in a stateless fashion by usingremote procedure calls (RPC) and TCP/IP for its transport. Therefore,to usefs-nfs2 or fs-nfs3, you’ll also need to run the TCP/IPclient for Neutrino.

Any POSIX limitations in the remote server filesystem will be passedthrough to the client. For example, the length of filenames may varyacross servers from different operating systems. NFS (versions 2 and3) limits filenames to 255 characters;mountd (versions 1 and 3)limits pathnames to 1024 characters.

Although NFS (version 2) is older than POSIX, it was designed toemulate UNIX filesystem semantics and happens to be relativelyclose to POSIX.

218 Chapter 8 � Filesystems November 2, 2006

Page 241: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. CIFS filesystem

CIFS filesystemFormerly known as SMB, the Common Internet File System (CIFS)allows a client workstation to perform transparent file access over anetwork to a Windows 98 or NT system, or a UNIX system runningan SMB server. Client file access calls are converted to CIFS protocolrequests and are sent to the server over the network. The serverreceives the request, performs the actual filesystem operation, andsends a response back to the client.

The CIFS protocol makes no attempt to conform to POSIX.☞

Thefs-cifs manager uses TCP/IP for its transport. Therefore, tousefs-cifs (SMBfsys in QNX 4), you’ll also need to run theTCP/IP client for Neutrino.

Linux Ext2 filesystemThe Ext2 filesystem (fs-ext2.so) provides transparent access toLinux disk partitions. This implementation supports the standard setof features found in Ext2 versions 0 and 1.

Sparse file support is included in order to be compatible with existingLinux partitions. Other filesystems can only be “stacked” read-onlyon top of sparse files. There are no such restrictions on normal files.

If an Ext2 filesystem isn’t unmounted properly, a filesystem checkeris usually responsible for cleaning up the next time the filesystem ismounted. Although thefs-ext2.so module is equipped to performa quick test, it automatically mounts the filesystem as read-only if itdetects any significant problems (which should be fixed using afilesystem checker).

Virtual filesystemsQNX Neutrino provides two types ofvirtual filesystems:

� Package filesystem

November 2, 2006 Chapter 8 � Filesystems 219

Page 242: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Virtual filesystems 2006, QNX Software Systems GmbH & Co. KG.

� Inflator

Package filesystemOur package filesystem component is a virtual filesystem managerthat presents a customized view of a set of files and directories to aclient. The actual files are present on some medium (e.g. a server’shard disk) — the package filesystem presents a “virtual” view ofselected files to the client.

This can be useful in a networked environment, where a centralizedserver maintains a separate package of files for each client machine.Each client may want to see their own custom versions of certain filesor unique combinations of files in a package.

Traditionally, you would achieve this either by copying all the files forall the packages into directories, with one directory per machine, orby creating various symbolic links.

By capitalizing on QNX Neutrino’s pathname space and resourcemanager concepts, the package filesystem manager alleviates theseproblems. You can easily install a set of directories and files, and justas easilyuninstall them, all without having to copy the filesthemselves.

What’s in a package?

A package consists of a number of files and directories that are relatedto each other by virtue of being part of a product or of a particularrelease. For example, the QNX Neutrino OS, including its binary filesand online documentation, is considered a package. A set of updates(called apatch) to the OS might contain only selected files anddocumentation (i.e. the files that have changed); this would be adifferent package.

The purpose of the package filesystem is to manage these packages insuch a way that various nodes can pick and choose which particularpackages they want to use. For example, one node might be an x86box running, say, Patch B of a certain release, while another nodemight be a PowerPC box running Patch C.

220 Chapter 8 � Filesystems November 2, 2006

Page 243: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Virtual filesystems

Since these two machines are different CPU types, we need to giveeach of them not only different types of executables in their/bin

directories (one set of x86 executables and one set of PowerPCexecutables), but also different contents based on the desired versionof the software and patch levels.

Single definition file

In a nutshell, the package filesystem presents, on each node, a virtualfilesystem that contains only selected files from the master packagedatabase. The selection is controlled by adefinition file that indicates:

� which packages should be presented

� where the master package database resides

� where the virtual filesystem is to be rooted.

The advantage of a single definition file is clear: When the timecomes to change the package lineup on a particular node, this can beaccomplished by a simple (andfast!) change to the definition file.

InflatorInflator is a resource manager that sits in front of other filesystems andinflates files that were previously deflated (using thedeflate utility).

Theinflator utility is typically used when the underlyingfilesystem is a flash filesystem. Using it can almost double theeffective size of the flash memory.

If a file is being opened for a read,inflator attempts to open thefile itself on an underlying filesystem. It reads the first 16 bytes andchecks for the signature of a deflated file. If the file was deflated,inflator places itself between the application and the underlyingfilesystem. All reads return the original file data before it was deflated.

From the application’s point of view, the file appears to beuncompressed. Random seeks are also supported. If the applicationdoes astat() on the file, the size of the inflated file (the original sizebefore it was deflated) is returned.

November 2, 2006 Chapter 8 � Filesystems 221

Page 244: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 245: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 9

Character I/O

In this chapter. . .Introduction 225Console devices 234Serial devices 235Parallel devices 236Pseudo terminal devices (ptys)236

November 2, 2006 Chapter 9 � Character I/O 223

Page 246: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 247: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

IntroductionA key requirement of any realtime operating system ishigh-performance character I/O. Character devices can be describedas devices to which I/O consists of a sequence of bytes transferredserially, as opposed to block-oriented devices (e.g. disk drives).

As in the POSIX and UNIX tradition, these character devices arelocated in the OS pathname space under the/dev directory. Forexample, a serial port to which a modem or terminal could beconnected might appear in the system as:

/dev/ser1

Typical character devices found on PC hardware include:

� serial ports

� parallel ports

� text-mode consoles

� pseudo terminals (ptys)

Programs access character devices using the standardopen(), close(),read(), andwrite() API functions. Additional functions are availablefor manipulating other aspects of the character device, such as baudrate, parity, flow control, etc.

Since it’s common to run multiple character devices, they have beendesigned as a family of drivers and a library calledio-char tomaximize code reuse.

November 2, 2006 Chapter 9 � Character I/O 225

Page 248: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Introduction 2006, QNX Software Systems GmbH & Co. KG.

Serialdriver

Paralleldriver

Consoledriver

Ptydriver

io-char

The io-char module is implemented as a library.

As shown in this diagram,io-char is implemented as a library. Theio-char module contains all the code to support POSIX semanticson the device. It also contains a significant amount of code toimplement character I/O features beyond POSIX but desirable in arealtime system. Since this code is in the common library, all driversinherit these capabilities.

The driver is the executing process that calls into the library. Inoperation, the driver starts first and invokesio-char. The driversthemselves are just like any other QNX Neutrino process and can runat different priorities according to the nature of the hardware beingcontrolled and the client’s requesting service.

Once a single character device is running, the memory cost of addingadditional devices is minimal, since only the code to implement thenew driver structure would be new.

Driver/ io-char communicationTheio-char library manages the flow of data between anapplication and the device driver. Data flows betweenio-char andthe driver through a set of memory queues associated with eachcharacter device.

Three queues are used for each device. Each queue is implementedusing a first-in, first-out (FIFO) mechanism.

226 Chapter 9 � Character I/O November 2, 2006

Page 249: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

Driver interface

Applicationprocesses

Consoledriver

Paralleldriver

out in canon

Serialdriver

System console

Parallel printer

Serialcommunication

ports

ProcessA

ProcessB

ProcessC

io-char

Device I/O in QNX Neutrino.

Received data is placed into theraw input queue by the driver and isconsumed byio-char only when application processes request data.(For details on raw versus edited or canonical input, see the section“Input modes” later in this chapter.)

Interrupt handlers within drivers typically call a trusted library routinewithin io-char to add data to this queue — this ensures a consistent

November 2, 2006 Chapter 9 � Character I/O 227

Page 250: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Introduction 2006, QNX Software Systems GmbH & Co. KG.

input discipline and minimizes the responsibility of the driver (andeffort required to create new drivers).

Theio-char module places output data into the output queue to beconsumed by the driver as characters are physically transmitted to thedevice. The module calls a trusted routine within the driver each timenew data is added so it can “kick” the driver into operation (in theevent that it was idle). Since output queues are used,io-char

implementswrite-behind for all character devices. Only when theoutput buffers are full willio-char cause a process to block whilewriting.

The canonical queue is managed entirely byio-char and is usedwhile processing input data inedited mode. The size of this queuedetermines the maximum edited input line that can be processed for aparticular device.

The sizes of these queues are configurable using command-lineoptions. Default values are usually more than adequate to handle mosthardware configurations, but you can “tune” these to reduce overallsystem memory requirements, to accommodate unusual hardwaresituations, or to handle unique protocol requirements.

Device drivers simply add received data to the raw input queue orconsume and transmit data from the output queue. Theio-char

module decides when (and if) output transmission is to be suspended,how (and if) received data is echoed, etc.

Device controlLow-level device control is implemented using thedevctl() call. ThePOSIX terminal control functions are layered on top ofdevctl() asfollows:

tcgetattr() Get terminal attributes.

tcsetattr() Set terminal attributes.

tcgetpgrp() Get ID of process group leader for a terminal.

tcsetpgrp() Set ID of process group leader for a terminal.

228 Chapter 9 � Character I/O November 2, 2006

Page 251: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

tcsendbreak() Send a break condition.

tcflow() Suspend or restart data transmission/reception.

QNX extensions

The QNX extensions to the terminal control API are as follows:

tcdropline() Initiate a disconnect. For a serial device, this willpulse the DTR line.

tcinject() Inject characters into the canonical buffer.

Theio-char module acts directly on a common set ofdevctl()commands supported by most drivers. Applications senddevice-specificdevctl() commands throughio-char to the drivers.

Input modesEach device can be in araw or edited input mode.

Raw input mode

In raw mode,io-char performs no editing on received characters.This reduces the processing done on each character to a minimum andprovides the highest performance interface for reading data.

Fullscreen programs and serial communications programs areexamples of applications that use a character device in raw mode.

In raw mode, each character is received into the raw input buffer bythe interrupt handler. When an application requests data from thedevice, it can specify under what conditions an input request is to besatisfied. Until the conditions are satisfied, the interrupt handler won’tsignal the driver to run, and the driver won’t return any data to theapplication. The normal case of a simple read by an application wouldblock until at least one character was available.

The following diagram shows the full set of available conditions:

November 2, 2006 Chapter 9 � Character I/O 229

Page 252: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Introduction 2006, QNX Software Systems GmbH & Co. KG.

MINTIME

TIMEOUTFORWARD

Respond when at least this number of characters arrives.Respond if a pause in the character stream occurs.Respond if an overall amount of time passes.Respond if a framing character arrives.

MINTIME

TIMEOUT

FORWARD

OR io-char reads n bytes

Conditions for satisfying an input request.

In the case where multiple conditions are specified, the read will besatisfied when any one of them is satisfied.

MIN

The qualifierMIN is useful when an application has knowledge of thenumber of characters it expects to receive.

Any protocol that knows the character count for a frame of data canuseMIN to wait for the entire frame to arrive. This significantlyreduces IPC and process scheduling.MIN is often used in conjunctionwith TIME or TIMEOUT. MIN is part of the POSIX standard.

TIME

The qualifierTIME is useful when an application is receivingstreaming data and wishes to be notified when the data stops orpauses. The pause time is specified in 1/10ths of a second.TIME ispart of the POSIX standard.

TIMEOUT

The qualifierTIMEOUT is useful when an application has knowledgeof how long it should wait for data before timing out. The timeout isspecified in 1/10ths of a second.

Any protocol that knows the character count for a frame of data itexpects to receive can useTIMEOUT. This in combination with thebaud rate allows a reasonable guess to be made when data should be

230 Chapter 9 � Character I/O November 2, 2006

Page 253: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

available. It acts as a deadman timer to detect dropped characters. Itcan also be used in interactive programs with user input to timeout aread if no response is available within a given time.

TIMEOUT is a QNX extension and is not part of the POSIX standard.

FORWARD

The qualifierFORWARD is useful when a protocol is delimited by aspecial framing character. For example, the PPP protocol used forTCP/IP over a serial link starts and ends its packets with a framingcharacter. When used in conjunction withTIMEOUT, theFORWARDcharacter can greatly improve the efficiency of a protocolimplementation. The protocol process will receive complete frames,rather than character by character. In the case of a dropped framingcharacter,TIMEOUT or TIME can be used to quickly recover.

This greatly minimizes the amount of IPC work for the OS and resultsin a much lower processor utilization for a given TCP/IP data rate. It’sinteresting to note that PPP doesn’t contain a character count for itsframes. Without the data-forwarding character, an implementationmight be forced to read the data one character at a time.

FORWARD is a QNX extension and is not part of the POSIX standard.

The ability to “push” the processing for application notification intothe service-providing components of the OS reduces the frequencywith which user-level processing must occur. This minimizes the IPCwork to be done in the system and frees CPU cycles for applicationprocessing. In addition, if the application implementing the protocolis executing on a different network node than the communicationsport, the number of network transactions is also minimized.

For intelligent, multiport serial cards, the data-forwarding characterrecognition can also be implemented within the intelligent serial carditself, thereby significantly reducing the number of times the cardmust interrupt the host processor for interrupt servicing.

November 2, 2006 Chapter 9 � Character I/O 231

Page 254: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Introduction 2006, QNX Software Systems GmbH & Co. KG.

Edited input mode

In edited mode,io-char performs line-editing operations on eachreceived character. Only when a line is “completely entered” —typically when a carriage return (CR) is received — will the line ofdata be made available to application processes. This mode ofoperation is often referred to ascanonical or sometimes “cooked”mode.

Most nonfullscreen applications run in edited mode, because thisallows the application to deal with the data a line at a time, rather thanhave to examine each character received, scanning for an end-of-linecharacter.

In edited mode, each character is received into the raw input buffer bythe interrupt handler. Unlike raw mode where the driver is scheduledto run only when some input conditions are met, the interrupt handlerwill schedule the driver on every received character.

There are two reasons for this. First, edited input mode is rarely usedfor high-performance communication protocols. Second, the work ofediting is significant and not suitable for an interrupt handler.

When the driver runs, code inio-char will examine the characterand apply it to the canonical buffer in which it’s building a line. Whena line is complete and an application requests input, the line will betransferred from the canonical buffer to the application — the transferis direct from the canonical buffer to the application buffer withoutany intervening copies.

The editing code correctly handles multiple pending input lines in thecanonical buffer and allows partial lines to be read. This can happen,for example, if an application asked only for 1 character when a10-character line was available. In this case, the next read willcontinue where the last one left off.

Theio-char module provides a rich set of editing capabilities,including full support for moving over the line with cursor keys andfor changing, inserting, or deleting characters. Here are some of themore common capabilities:

232 Chapter 9 � Character I/O November 2, 2006

Page 255: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

LEFT Move the cursor one character to the left.

RIGHT Move the cursor one character to the right.

HOME Move the cursor to the beginning of the line.

END Move the cursor to the end of the line.

ERASE Erase the character to the left of the cursor.

DEL Erase the character at the current cursor position.

KILL Erase the entire input line.

UP Erase the current line and recall a previous line.

DOWN Erase the current line and recall the next line.

INS Toggle between insert mode and typeover mode (everynew line starts in insert mode).

Line-editing characters vary from terminal to terminal. The consolealways starts out with a full set of editing keys defined.

If a terminal is connected via a serial channel, you need to define theediting characters that apply to that particular terminal. To do this,you can use thestty utility. For example, if you have an ANSIterminal connected to a serial port (called/dev/ser1), you woulduse the following command to extract the appropriate editing keysfrom theterminfo database and apply them to/dev/ser1:

stty term=ansi </dev/ser1

Device subsystem performanceThe flow of events within the device subsystem is engineered tominimize overhead and maximize throughput when a device is inrawmode. To accomplish this, the following rules are used:

November 2, 2006 Chapter 9 � Character I/O 233

Page 256: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Console devices 2006, QNX Software Systems GmbH & Co. KG.

� Interrupt handlers place received data directly into a memoryqueue. Only when a read operation is pending,and that readoperation can be satisfied, will the interrupt handler schedule thedriver to run. In all other cases, the interrupt simply returns.Moreover, ifio-char is already running, no scheduling takesplace, since the availability of data will be noticed without furthernotification.

� When a read operation is satisfied, the driver replies to theapplication processdirectly from the raw input buffer into theapplication’s receive buffer. The net result is that the data is copiedonly once.

These rules — coupled with the extremely small interrupt andscheduling latencies inherent within the OS — result in a very leaninput model that provides POSIX conformance together withextensions suitable to the realtime requirements of protocolimplementations.

Console devicesSystem consoles (with VGA-compatible graphics chips intext mode)are managed by thedevc-con or devc-tcon driver. The videodisplay card/screen and the system keyboard are collectively referredto as thephysical console.

Thedevc-con permits multiple sessions to be run concurrently on aphysical console by means ofvirtual consoles. Thedevc-conconsole driver process typically manages more than one set of I/Oqueues toio-char, which are made available to user processes as aset ofcharacter devices with names like/dev/con1, /dev/con2,etc. From the application’s point of view, there “really are” multipleconsoles available to be used.

Of course, there’s only onephysical console (screen and keyboard),so onlyone of these virtual consoles is actually displayed at any onetime. The keyboard is “attached” to whichever virtual console iscurrently visible.

234 Chapter 9 � Character I/O November 2, 2006

Page 257: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Serial devices

Thedevc-tcon manager, a “tiny” implementation ofdevc-con, isintended for memory-constrained systems and supports only a singleconsole.

Terminal emulationBoth console drivers emulate an ANSI terminal;devc-tcon

implements a subset ofdevc-con’s extended ANSI emulation.

Serial devicesSerial communication channels are managed by thedevc-ser*

family of driver processes. These drivers can manage more than onephysical channel and provide character devices with names such as/dev/ser1, /dev/ser2, etc.

Whendevc-ser* is started, command-line arguments can specifywhich — and how many — serial ports are installed. On aPC-compatible system, this will typically be the two standard serialports often referred to ascom1 andcom2. Thedevc-ser* driverdirectly supports most nonintelligent multiport serial cards.

QNX Neutrino has several serial drivers (e.g.devc-ser8250,devc-serppc800, etc.). For details, see the (devc-ser* entries inUtilities Reference).

Many drivers have a “tiny” version (devc-t*) intended formemory-constrained systems. Note that the tiny versions support rawmode only.

Thedevc-ser* drivers support hardware flow control (except underedited mode) provided that the hardware supports it. Loss of carrieron a modem can be programmed to deliver aSIGHUPsignal to anapplication process (as defined by POSIX).

November 2, 2006 Chapter 9 � Character I/O 235

Page 258: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Parallel devices 2006, QNX Software Systems GmbH & Co. KG.

Parallel devicesParallel printer ports are managed by thedevc-par driver. Whendevc-par is started, command-line arguments can specify whichparallel port is installed.

Thedevc-par driver is an output-only driver, so it has no raw inputor canonical input queues. The size of the output buffer can beconfigured with a command-line argument. If configured to a largesize, this creates the effect of a software print buffer.

Pseudo terminal devices (ptys)Pseudo terminals are managed by thedevc-pty driver.Command-line arguments todevc-pty specify the number of pseudoterminals to create.

A pseudo terminal (pty) is apair of character devices: a master deviceand a slave device. The slave device provides an interface identical tothat of a tty device as defined by POSIX. However, while other ttydevices represent hardware devices, the slave device instead hasanother process manipulating it through the master half of the pseudoterminal. That is, anything written on the master device is given to theslave device as input; anything written on the slave device ispresented as input to the master device. As a result, pseudo-ttys canbe used to connect processes that would otherwise expect to becommunicating with a character device.

236 Chapter 9 � Character I/O November 2, 2006

Page 259: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Pseudo terminal devices (ptys)

Serial line devc-ser* Applicationprocess

Process devc-pty Applicationprocess

Pseudo-ttys.

Ptys are routinely used to create pseudo-terminal interfaces forprograms likepterm, a terminal emulator that runs under the PhotonmicroGUI andtelnet, which uses TCP/IP to provide a terminalsession to a remote system.

November 2, 2006 Chapter 9 � Character I/O 237

Page 260: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 261: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 10

Networking Architecture

In this chapter. . .Introduction 241Network manager (io-net) 242Filter module 243Converter module 243Protocol module 243Driver module 244

November 2, 2006 Chapter 10 � Networking Architecture 239

Page 262: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 263: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

IntroductionAs with other service-providing processes in QNX Neutrino, thenetworking services execute outside the kernel. Developers arepresented with a single unified interface, regardless of theconfiguration and number of networks involved.

This architecture allows:

� Network drivers to be started and stoppeddynamically.

� Qnet, TCP/IP, and other protocols to run together in anycombination.

Our native network subsystem consists of the network managerexecutable (io-net), plus one or more shared library modules. Thesemodules can include protocols (e.g.npm-qnet.so,npm-tcpip.so), drivers (e.g.devn-ne2000.so), and filters.

November 2, 2006 Chapter 10 � Networking Architecture 241

Page 264: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Network manager (io-net) 2006, QNX Software Systems GmbH & Co. KG.

io-net

Hardware

Client processes

npm-tcpip.so

Network drivers

(devn-*.so)

npm-qnet.so

The io-net process.

Network manager ( io-net)Theio-net component is the active executable within the networksubsystem. Acting as a kind of packet redirector/multiplexer,io-net

is responsible for loading protocol and driver modules based on theconfiguration given to it on its command line (or via themountcommand after it’s started).

Employing a zero-copy architecture, theio-net executableefficiently loads multiple networking protocols, filters, or drivers (e.g.npm-qnet.so, npm-tcpip.so) on the fly— these modules areshared objects that install intoio-net.

242 Chapter 10 � Networking Architecture November 2, 2006

Page 265: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Filter module

Filter moduleTheio-net framework lets you set up a filter module, which can beregistered above or below a producer module, such as a driver orprotocol module.

A filter module allows you to intercept data packets as they’re passedfrom this producerio-net module. This allows you to modify, drop,or simply monitor these packets. You can also direct packets to otherinterfaces (e.g. bridge or forward). Typically, a filter module would beregistered above a network driver module (e.g. Ethernet).

Converter moduleThe basic role of the converter module is to encapsulate andde-encapsulate packets as they pass from one network layer to another(e.g. IP to Ethernet). You use converters to connect producer modulestogether (e.g. a network protocol stack to a network driver).

These modules may also implement the protocols used to resolve theaddressing used by the network protocol module to the physicalnetwork addresses supported by the network driver. For example, theARP protocol (IP-to-Ethernet address translation) could beimplemented as part of a converter module.

Protocol moduleThe networking protocol module is responsible for implementing thedetails of a particular protocol (e.g. Qnet, TCP/IP, etc.). Each protocolcomponent is packaged as a shared object (e.g.npm-qnet.so). Oneor more protocol components may run concurrently.

For example, the following line from a buildfile showsio-netloading two protocols (TCP/IP and Qnet) via its-p protocolcommand-line option:

io-net -dne2000 -ptcpip -pqnet

November 2, 2006 Chapter 10 � Networking Architecture 243

Page 266: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Driver module 2006, QNX Software Systems GmbH & Co. KG.

tcpip We’ve implemented the BSD TCP/IP stack as a protocolmodule.

qnet Qnet is the QNX Neutrino native networking protocol. Itsmain purpose is to extend the OS’s powerfulmessage-passing IPCtransparently over a network ofmicrokernels.

Qnet also provides Quality of Service policies to helpensure reliable network transactions.

For more information on the Qnet and TCP/IP protocols, see thefollowing chapters in this book:

� Qnet Networking

� TCP/IP Networking

Driver moduleThe network driver module is responsible for managing the details ofa particular network adaptor (e.g. an NE-2000 compatible Ethernetcontroller). Each driver is packaged as a shared object and installsinto theio-net component.

Loading and unloading a driverOnceio-net is running, you can dynamically load drivers at thecommand line using themount command. For example:

io-net &mount -T io-net devn-ne2000.so

would startio-net and then mount the driver for an NE-2000Ethernet adapter. All network device drivers are shared objects of theform:

devn-driver.so

244 Chapter 10 � Networking Architecture November 2, 2006

Page 267: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Driver module

Once the shared object is loaded,io-net will then initialize it. Thedriver andio-net are then effectively bound together — the driverwill call into io-net (for example when packets arrive from theinterface) andio-net will call into the driver (for example whenpackets need to be sent from an application to the interface).

You can also use theumount command to unload a driver:

umount /dev/io-net/en0

For more information on network device drivers, see their individualutility pages (devn-*) in theUtilities Reference.

Network DDKAlthough several network drivers are shipped with the OS, you maywant to write your own driver for your particular networkinghardware. The Network Driver Development Kit makes this taskrelatively easy. The DDK provides full source code for several sampledrivers as well as detailed instructions for handling thehardware-dependent issues involved in developing custom drivers fortheio-net infrastructure.

November 2, 2006 Chapter 10 � Networking Architecture 245

Page 268: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 269: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 11

Native Networking (Qnet)

In this chapter. . .QNX Neutrino distributed 249Name resolution and lookup 251Quality of Service (QoS) and multiple paths257Examples 261Custom device drivers 263

November 2, 2006 Chapter 11 � Native Networking (Qnet) 247

Page 270: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 271: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. QNX Neutrino distributed

QNX Neutrino distributedEarlier in this manual, we described message passing in the context ofa single node (see the section “QNX Neutrino IPC” in themicrokernel chapter). But the true power of QNX Neutrino lies in itsability to take the message-passing paradigm and extend ittransparently over a network of microkernels.

This chapter describes QNX Neutrino native networking (via the Qnetprotocol). For information on TCP/IP networking, please refer to thenext chapter.

At the heart of QNX Neutrino native networking is the Qnet protocol,which is deployed as a network of tightly coupled trusted machines.Qnet lets these machines share their resources efficiently with littleoverhead. Using Qnet, you can use the standard OS utilities (cp, mv,and so on) to manipulate files anywhere on the Qnet network as ifthey were on your machine. In addition, the Qnet protocol doesn’t doany authentication of remote requests; files are protected by thenormal permissions that apply to users and groups. Besides files, youcan also access and start/stopprocesses, including managers, thatreside on any machine on the Qnet network.

The distributed processing power of Qnet lets you do the followingtasks efficiently:

� Access your remote filesystem.

� Scale your application with unprecedented ease.

� Write applications using a collection of cooperating processes thatcommunicate transparently with each other using Neutrinomessage-passing.

� Extend your application easily beyond a single processor or SMPmachine to several single-processor machines and distribute yourprocesses among those CPUs.

November 2, 2006 Chapter 11 � Native Networking (Qnet) 249

Page 272: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

QNX Neutrino distributed 2006, QNX Software Systems GmbH & Co. KG.

� Divide your large application into several processes, where eachprocess can perform different functions. These processescoordinate their work using message passing.

� Take advantage of Qnet’s inherent remote procedure callfunctionality.

Moreover, since Qnet extends Neutrino message passing over thenetwork, other forms of IPC (e.g. signals, message queues, namedsemaphores) also work over the network.

To understand how network-wide IPC works, consider two processesthat wish to communicate with each other: a client process and aserver process (in this case, the serial port manager process). In thesingle-node case, the client simply callsopen(), read(), write(), etc.As we’ll see shortly, a high-level POSIX call such asopen() actuallyentails message-passing kernel calls “underneath” (ConnectAttach(),MsgSend(), etc.). But the client doesn’t need to concern itself withthose functions; it simply callsopen().

fd = open("/dev/ser1",O RDWR....); /*Open a serial device*/

Now consider the case of a simple network with two machines — onecontains the client process, the other contains the server process.

250 Chapter 11 � Native Networking (Qnet) November 2, 2006

Page 273: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Name resolution and lookup

lab1 lab2

Client Server

A simple network where the client and server reside on separate machines.

The code required for client-server communication isidentical to thecode in the single-node case, but with one important exception:thepathname. The pathname will contain a prefix that specifies the nodethat the service (/dev/ser1) resides on. As we’ll see later, thisprefix will be translated into a node descriptor for the lower-levelConnectAttach() kernel call that will take place. Each node in thenetwork is assigned a node descriptor, which serves as the only visiblemeans to determine whether the OS is running as a network orstandalone.

For more information on node descriptors, see the TransparentDistributed Networking via Qnet chapter of theProgrammer’s Guide.

Name resolution and lookupWhen you run Qnet, the pathname space of all the nodes in your Qnetnetwork is added to yours. Recall that a pathname is a symbolic namethat tells a program where to find a file within the directory hierarchybased at root (/).

November 2, 2006 Chapter 11 � Native Networking (Qnet) 251

Page 274: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Name resolution and lookup 2006, QNX Software Systems GmbH & Co. KG.

The pathname space of remote nodes will appear under the prefix/net (the directory created by the Qnet protocol manager,npm-qnet.so, by default).

For example, remotenode1 would appear as:

/net/node1/dev/socket/net/node1/dev/ser1/net/node1/home/net/node1/bin....

So with Qnet running, you can now open pathnames (files ormanagers) on other remote Qnet nodes, just as you open files locallyon your own node. This means you can access regular files ormanager processes on other Qnet nodes as if they were executing onyour local node.

Recall ouropen() example above. If you wanted to open a serialdevice onnode1 instead of on your local machine, you simplyspecify the path:

fd = open("/net/node1/dev/ser1",O RDWR...); /*Open a serial device on node1*/

For client-server communications, how does the client know whatnode descriptor to use for the server?

The client uses the filesystem’spathname space to “look up” theserver’s address. In the single-machine case, the result of that lookupwill be a node descriptor, a process ID, and a channel ID. In thenetworked case, the results are the same — the only difference will bethevalue of the node descriptor.

If node descriptor is: Then the server is:

0 (orND LOCAL NODE) Local (i.e. “this node”)

Nonzero Remote

252 Chapter 11 � Native Networking (Qnet) November 2, 2006

Page 275: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Name resolution and lookup

File descriptor (connection ID)The practical result in both the local and networked case is that whenthe client connects to the server, the client gets a file descriptor (orconnection ID in the case of kernel calls such asConnectAttach()).This file descriptor is then used for all subsequent message-passingoperations. Note that from the client’s perspective, the file descriptoris identical for both the local and networked case.

Behind a simple open()Let’s return to ouropen() example. Suppose a client on one node(lab1) wishes to use the serial port (/dev/ser1) on another node(lab2). The client will effectively perform anopen() on the pathname/net/lab2/dev/ser1.

The following diagram shows the steps involved when the clientopen()’s /net/lab2/dev/ser1:

3

4

Processmanager

Serialdriver

Qnet

3

4

Qnet

Processmanager

Client

1

2

3

4

lab1 lab2

A client-server message pass across the network.

Here are the interactions:

November 2, 2006 Chapter 11 � Native Networking (Qnet) 253

Page 276: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Name resolution and lookup 2006, QNX Software Systems GmbH & Co. KG.

1 A message is sent from the client to its local process manager,effectively asking who should be contacted to resolve thepathname/net/lab2/dev/ser1.

Since the native network manager (npm-qnet) has taken overthe entire/net namespace, the process manager returns aredirect message, saying that the client should contact the localnetwork manager for more information.

2 The client then sends a message to the local network manager,again asking who should be contacted to resolve the pathname.

The local network manager then replies with another redirectmessage, giving the node descriptor, process ID, and channelID of the process manager on nodelab2 — effectivelydeferring the resolution of the request to nodelab2.

3 The client then creates a connection to the process manager onnodelab2, once again asking who should be contacted toresolve the pathname.

The process manager on nodelab2 returns another redirect,this time with the node descriptor, channel ID, and process IDof the serial driver on its own node.

4 The client creates a connection to the serial driver on nodelab2, and finally gets a connection ID that it can then use forsubsequent message-passing operations.

After this point, from the client’s perspective, message passingto the connection ID is identical to the local case. Note that allfurther message operations are now direct between the clientand server.

The key thing to keep in mind here is that the client isn’t aware of theoperations taking place; these are all handled by the POSIXopen()call. As far as the client is concerned, it performs anopen() and getsback a file descriptor (or an error indication).

254 Chapter 11 � Native Networking (Qnet) November 2, 2006

Page 277: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Name resolution and lookup

In each subsequent name-resolution step, the request from the client isstripped of already-resolved name components; this occursautomagically within the resource manager framework. This meansthat in step 2 above, the relevant part of the request islab2/dev/ser1 from the perspective of the local network manager.In step 3, the relevant part of the request has been stripped to justdev/ser1, because that’s all thatlab2’s process manager needs toknow. Finally, in step 4, the relevant part of the request is simplyser1, because that’s all the serial driver needs to know.

Global Name Service (GNS)In the examples shown so far, remote services or files are located onknown nodes or at known pathnames. For example, the serial port onlab1 is found at/net/lab1/dev/ser1.

GNS allows you to locate services via an arbitrary name wherever theservice is located, whether on the local system or on a remote node.For example, if you wanted to locate a modem on the network, youcould simply look for the name “modem.” This would cause the GNSserver to locate the “modem” service, instead of using a static pathsuch as/net/lab1/dev/ser1. The GNS server can be deployedsuch that it services all or a portion of your Qnet nodes. And you canhave redundant GNS servers.

Network namingAs mentioned earlier, the pathname prefix/net is the most commonname thatnpm-qnet uses. In resolving names in a network-widepathname space, the following terms come into play:

node name A character string that identifies the node you’retalking to. Note that a node namecan’t containslashes or dots. In the example above, we usedlab2 as one of our node names. The default isfetched viaconfstr() with the CS HOSTNAMEparameter.

November 2, 2006 Chapter 11 � Native Networking (Qnet) 255

Page 278: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Name resolution and lookup 2006, QNX Software Systems GmbH & Co. KG.

node domain A character string that’s “tacked” onto the nodename bynpm-qnet. Together the node name andnode domainmust form a string that’s unique forall nodes that are talking to each other. The defaultis fetched viaconfstr() with the CS DOMAINparameter.

fully qualified node name (FQNN)

The string formed by tacking the node name andnode domain together. For example, if the nodename islab2 and the node domain name isqnx.com, the resulting FQNN would be:lab2.qnx.com.

network directory

A directory in the pathname space implemented bynpm-qnet. Each network directory (there can bemore than one on a node) has an associated nodedomain. The default is/net, as used in theexamples in this chapter.

name resolution The process by whichnpm-qnet converts anFQNN to a list of destination addresses that thetransport layer knows how to get to.

name resolver A piece of code that implements one method ofconverting an FQNN to a list of destinationaddresses. Each network directory has a list ofname resolvers that are applied in turn to attemptto resolve the FQNN. The default isndp (see thenext section).

Quality of Service (QoS)

A definition of connectivity between two nodes.The default QoS isloadbalance (see the sectionon QoS later in this chapter.)

256 Chapter 11 � Native Networking (Qnet) November 2, 2006

Page 279: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Quality of Service (QoS) and multiple paths

Resolvers

The followingresolvers are built into the network manager:

� ndp — Node Discovery Protocol for broadcasting name resolutionrequests on the LAN (similar to the TCP/IP ARP protocol). This isthe default.

� dns — Take the node name, add a dot (.) followed by the nodedomain, and send the result to the TCP/IPgethostbyname()function.

� file — Search for accessible nodes, including the relevantnetwork address, in a static file.

Quality of Service (QoS) and multiple pathsQuality of Service (QoS) is an issue that often arises inhigh-availability networks as well as realtime control systems. In theQnet context, QoS really boils down totransmission media selection— in a system with two or more network interfaces, Qnet will choosewhich one to use according to the policy you specify.

If you have only a single network interface, the QoS policies don’tapply at all.

QoS policiesQnet supports transmission overmultiple networks and provides thefollowing policies for specifying how Qnet should select a networkinterface for transmission:

loadbalance (the default)

Qnet is free to use all available network links, andwill share transmission equally among them.

preferred Qnet uses one specified link, ignoring all othernetworks (unless the preferred one fails).

November 2, 2006 Chapter 11 � Native Networking (Qnet) 257

Page 280: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Quality of Service (QoS) and multiple paths 2006, QNX Software Systems GmbH & Co. KG.

exclusive Qnet uses one — and only one — link, ignoring allothers, even if the exclusive link fails.

To fully benefit from Qnet’s QoS, you need to have physicallyseparate networks. For example, consider a network with two nodesand a hub, where each node has two connections to the hub:

Hub

Node 1 Node 2

Qnet and a single network.

If the link that’s currently in use fails, Qnet detects the failure, butdoesn’t switch to the other link because both links go to the same hub.It’s up to the application to recover from the error; when theapplication reestablishes the connection, Qnet switches to the workinglink.

Now, consider the same network, but with two hubs:

258 Chapter 11 � Native Networking (Qnet) November 2, 2006

Page 281: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Quality of Service (QoS) and multiple paths

Hub

Node 1 Node 2

Hub

Qnet and physically separate networks.

If the networks are physically separate and a link fails, Qnetautomatically switches to another link, depending on the QoS that youchose. The application isn’t aware that the first link failed.

Let’s look in more detail at the QoS policies.

loadbalance

Qnet decides which links to use for sending packets, depending oncurrent load and link speeds as determined byio-net. A packet isqueued on the link that can deliver the packet the soonest to theremote end. This effectively provides greater bandwidth betweennodes when the links are up (the bandwidth is the sum of thebandwidths of all available links), and allows a graceful degradationof service when links fail.

If a link does fail, Qnet will switch to the next available link. Thisswitch takes a few secondsthe first time, because the network driveron the bad link will have timed out, retried, and finally died. But onceQnet “knows” that a link is down, it willnot send user data over thatlink. (This is a significant improvement over the QNX 4implementation.)

November 2, 2006 Chapter 11 � Native Networking (Qnet) 259

Page 282: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Quality of Service (QoS) and multiple paths 2006, QNX Software Systems GmbH & Co. KG.

While load-balancing among the live links, Qnet will send periodicmaintenance packets on the failed link in order to detect recovery.When the link recovers, Qnet places it back into the pool of availablelinks.

Theloadbalance QoS policy is the default.☞

preferred

With this policy, you specify a preferred link to use for transmissions.Qnet will use only that one link until it fails. If your preferred linkfails, Qnet will then turn to the other available links and resumetransmission, using theloadbalance policy.

Once your preferred link is available again, Qnet will again use onlythat link, ignoring all others (unless the preferred link fails).

exclusive

You use this policy when you want to lock transmissions to only onelink. Regardless of how many other links are available, Qnet willlatch onto the one interface you specify. And if that exclusive linkfails, Qnet willNOT use any other link.

Why would you want to use theexclusive policy? Suppose youhave two networks, one much faster than the other, and you have anapplication that moves large amounts of data. You might want torestrict transmissions to only the fast network in order to avoidswamping the slow network under failure conditions.

Specifying QoS policiesYou specify the QoS policy as part of the pathname. For example, toaccess/net/lab2/dev/ser1 with a QoS ofexclusive, you coulduse the following pathname:

/net/lab2˜exclusive:en0/dev/ser1

260 Chapter 11 � Native Networking (Qnet) November 2, 2006

Page 283: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Examples

The QoS parameter always begins with a tilde (˜) character. Herewe’re telling Qnet to lock onto theen0 interface exclusively,even if itfails.

Symbolic linksYou can set up symbolic links to the various “QoS-qualified”pathnames:

ln -sP /net/lab2˜preferred:en1 /remote/sql server

This assigns an “abstracted” name of/remote/sql server to thenodelab2 with a preferred QoS (i.e. over theen1 link).

You can’t create symbolic links inside/net because Qnet takes overthat namespace.

Abstracting the pathnames by one level of indirection gives youmultiple servers available in a network, all providing the sameservice. When one server fails, the abstract pathname can be“remapped” to point to the pathname of a different server. Forexample, iflab2 failed, then a monitoring program could detect thisand effectively issue:

rm /remote/sql serverln -sP /net/lab1 /remote/sql server

This would removelab2 and reassign the service tolab1. The realadvantage here is that applications can be coded based on the abstract“service name” rather than be bound to a specific node name.

ExamplesLet’s look at a few examples of how you’d use the network manager.

November 2, 2006 Chapter 11 � Native Networking (Qnet) 261

Page 284: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Examples 2006, QNX Software Systems GmbH & Co. KG.

The QNX Neutrino native network managernpm-qnet is actually ashared object that installs into the executableio-net.

Local networksIf you’re using QNX Neutrino on a small LAN, you can use just thedefaultndp resolver. When a node name that’s currently unknown isbeing resolved, the resolver will broadcast the name request over theLAN, and the node that has the name will respond with anidentification message. Once the name’s been resolved, it’s cached forfuture reference.

Sincendp is the default resolver when you startnpm-qnet.so, youcan simply issue commands like:

ls /net/lab2/

If you have a machine called “lab2” on your LAN, you’ll see thecontents of its root directory.

Remote networksCAUTION: For security reasons, you should have a firewall set up onyour network before connecting to the Internet. For more information,see the section “Setting up a firewall” in the chapter Securing YourSystem in theUser’s Guide.

!

Qnet uses DNS (Domain Name System) when resolving remotenames. To usenpm-qnet.so with DNS, you specify this resolver onmount’s command line:

mount -Tio-net -o"mount=:,resolve=dns,mount=.com:.net:.edu" /lib/dll/npm-qnet.so

In this example, Qnet will useboth its nativendp resolver (indicatedby the firstmount= command) and DNS for resolving remote names.

Note that we’ve specified several types of domain names(mount=.com:.net:.edu) as mountpoints, simply to ensure betterremote name resolution.

262 Chapter 11 � Native Networking (Qnet) November 2, 2006

Page 285: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Custom device drivers

Now you could enter a command such as:

ls /net/qnet.qnx.com/repository

and you’d get a listing of therepository directory at theqnet.qnx.com site.

Custom device driversIn most cases, you can use standard QNX drivers to implement Qnetover a local area network or to encapsulate Qnet messages in IP(TCP/IP) to allow Qnet to be routed to remote networks. But supposeyou want to set up a very tightly coupled network between two CPUsover a super-fast interconnect (e.g. PCI or RapidIO)?

You can easily take advantage of the performance of such ahigh-speed link, because Qnet can talk directly to your hardwaredriver. There’s noio-net layer in this case. All you need is a littlecode at the very bottom of the Qnet layer that understands how totransmit and receive packets. This is simple, thanks to a standardinternal API between the rest of Qnet and this very bottom portion,the driver interface.

Qnet already supports different packet transmit/receive interfaces, soadding another is reasonably straightforward. Qnet’s transportmechanism (called “L4”) is quite generic, and can be configured fordifferent size MTUs, for whether or not ACK packets or CRC checksare required, etc., to take full advantage of your link’s advancedfeatures (e.g. guaranteed reliability).

A Qnet software development kit is available to help you writecustom drivers and/or modify Qnet components to suit your particularapplication.

November 2, 2006 Chapter 11 � Native Networking (Qnet) 263

Page 286: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 287: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 12

TCP/IP Networking

In this chapter. . .Introduction 267Structure of the TCP/IP manager269Socket API 270Multiple stacks 272SCTP 272IP filtering and NAT 273NTP 274Dynamic host configuration 274PPP over Ethernet 275/etc/autoconnect 275SNMP support 276Embedded web server 276

November 2, 2006 Chapter 12 � TCP/IP Networking 265

Page 288: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 289: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Introduction

IntroductionAs the Internet has grown to become more and more visible in ourdaily lives, the protocol it’s based on — IP (Internet Protocol) — hasbecome increasingly important. The IP protocol and tools that go withit are ubiquitous, making IP the de facto choice for many privatenetworks.

IP is used for everything from simple tasks (e.g. remote login) tomore complicated tasks (e.g. delivering realtime stock quotes). Mostbusinesses are turning to the World Wide Web, which commonly rideson IP, for communication with their customers, advertising, and otherbusiness connectivity. QNX Neutrino is well-suited for a variety ofroles in this global network, from embedded devices connected to theInternet, to the routers that are used to implement the Internet itself.

Given these and many other user requirements, we’ve made ourTCP/IP stack (npm-tcpip) relatively light on resources, while usingthe common BSD API.

Stack configurationsWe provide three stack configurations:

NetBSD TCP/IP stack

Supports the latest RFCs, including UDP, IP, TCP, and SCTP.Also supports forwarding, broadcast and multicast, hardwarechecksum support, routing sockets, Unix domain sockets,multilink PPP, PPPoE, supernetting (CIDR), NAT/IP filtering,ARP, ICMP, and IGMP, as well as CIFS, DHCP, AutoIP, DNS,NFS (v2 and v3 server/client), NTP, RIP, RIPv2, and anembedded web server.

To create applications for this stack, you use theindustry-standard BSD socket API. This stack also includesoptimized forwarding code for additional performance andefficient packet routing when the stack is functioning as anetwork gateway.

November 2, 2006 Chapter 12 � TCP/IP Networking 267

Page 290: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Introduction 2006, QNX Software Systems GmbH & Co. KG.

Enhanced NetBSD stack with IPsec and IPv6

Includes all the features in the standard stack, plus thefunctionality targeted at the new generation of mobile andsecure communications. This stack provides full IPv6 and IPsec(both IPv4 and IPv6) support through KAME extensions, aswell as support for VPNs over IPsec tunnels.

This dual-mode stack supports IPv4 and IPv6 simultaneouslyand includes IPv6 support for autoconfiguration, which allowsdevice configuration in plug-and-play network environments.IPv6 support includes IPv6-aware utilities and RIP/RIPng tosupport dynamic routing. An Advanced Socket API is alsoprovided to supplement the standard socket API to takeadvantage of IPv6 extended-development capabilities.

IPsec support allows secure communication between hosts ornetworks, providing data confidentiality via strong encryptionalgorithms and data authentication features. IPsec support alsoincludes the IKE (ISAKMP/Oakley) key management protocolfor establishing secure host associations.

Tiny TCP/IP stack

Designed for memory-constrained systems, this small-footprint(80K) stack provides complete support for IP, TCP, and UDPover Ethernet, PPP, and PPPoE interfaces. To developapplications, you use the BSD socket interface; in fact, you canswitch between the tiny stack and NetBSD stacks withouthaving to recompile your code.

The QNX TCP/IP suite is also modular. For example, it provides NFSas separate modules. With this kind of modularity, together withsmall-sized modules, embedded systems developers can more easilyand quickly build small TCP/IP-capable systems.

To customize our TCP/IP suite, developers can also obtain a sourcerelease of several components (e.g.io-net, npm-tcpip.so). Thesource kit also includes simple source examples ofio-net modulesto help developers create or port source code for theio-net

architecture.

268 Chapter 12 � TCP/IP Networking November 2, 2006

Page 291: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Structure of the TCP/IP manager

Structure of the TCP/IP managerAs a resource manager, thenpm-tcpip module benefits from thecode savings and standard interface that all native resource managersenjoy. Due to the natural priority inheritance of QNX Neutrino IPC,clients will be dealt with in priority and time order, which leads to amore natural allocation of CPU resources.

ntpd

snmpd

npm-tcpip.so

devc-ser*

pppd

telnetd

Userapplications

devn-*.so

devn-*.sodevn-*.so

io-net

fs-nfs2

syslogd

fs-cifs

routed

ftpdinetd

npm-pppmgr.sonpm-pppoe.so

pppoed

lsm-ipfilter.so

lsm-sctp.so

The npm-tcpip suite and dependents.

November 2, 2006 Chapter 12 � TCP/IP Networking 269

Page 292: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Socket API 2006, QNX Software Systems GmbH & Co. KG.

Most of the link-layer support is implemented outside ofnpm-tcpip.PPP is implemented as separate shared objects. Since ourio-net

PPP modules (npm-pppmgr.so, npm-pppoe.so) handle thetransmission of PPP packets, there’s no need for a memory copy ofthe packet data. This approach allows for high-performance PPPoEconnections.

Note thatnpm-tcpip acts as a protocol shared library to theio-net

process, which directs data to the correct driver. Network drivers areshared libraries as well.

Other components of thenpm-tcpip suite (such as the NFS, thesnmpd daemon, etc.) are implemented outside ofnpm-tcpip. Thisleads to better modularity and fault-tolerance.

Socket APIThe BSD Socket API was the obvious choice for QNX Neutrino. TheSocket API is the standard API for TCP/IP programming in the UNIXworld. In the Windows world, the Winsock API is based on andshares a lot with the BSD Socket API. This makes conversionbetween the two fairly easy.

All the routines that application programmers would expect areavailable, including (but not limited to):

accept()bind()bindresvport()connect()dn comp()dn expand()endprotoent()endservent()gethostbyaddr()gethostbyname()getpeername()getprotobyname()getprotobynumber()

getprotoent()getservbyname()getservent()getsockname()getsockopt()herror()hstrerror()htonl()htons()h errlist()h errno()h nerr()inet addr()

270 Chapter 12 � TCP/IP Networking November 2, 2006

Page 293: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Socket API

inet aton()inet lnaof()inet makeaddr()inet netof()inet network()inet ntoa()ioctl()listen()ntohl()ntohs()recv()recvfrom()res init()

res mkquery()res query()res querydomain()res search()res send()select()send()sendto()setprotoent()setservent()setsockopt()shutdown()socket()

For more information, see the NeutrinoLibrary Reference.

The common daemons and utilities from the Internet will easily portor just compile in this environment. This makes it easy to leveragewhat already exists for your applications.

Database routinesThe database routines have been modified to better suit embeddedsystems.

/etc/resolv.conf

You can use configuration strings (via theconfstr() function) tooverride the data usually contained in the/etc/resolv.conf file.You can also use theRESCONF environment variable to do this.Either method lets you use a nameserver without/etc/resolv.conf. This affectsgethostbyname() and otherresolver routines.

/etc/protocols

Thegetprotobyname() andgetprotobynumber() functions have beenmodified to contain a small number of builtin protocols, including IP,ICNP, UDP, and TCP. For many applications, this means that the/etc/protocols file doesn’t need to exist.

November 2, 2006 Chapter 12 � TCP/IP Networking 271

Page 294: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Multiple stacks 2006, QNX Software Systems GmbH & Co. KG.

/etc/services

Thegetservbyname() function has been modified to contain a smallnumber of builtin services, includingftp, telnet, smtp, domain,nntp, netbios-ns, netbios-ssn, sunrpc, andnfsd. For manyapplications, this means that the/etc/services file doesn’t need toexist.

Multiple stacksThe QNX Neutrino network manager (io-net) lets you loadmultipleprotocol shared objects (e.g.npm-tcpip.so) This means, forexample, that you can load several instances of the TCP/IP stackonthe same physical interface, making it easy to create multiple virtualnetworks (VLANs). You can even run multiple, independent instancesof the network manager (io-net) itself. As with all QNX Neutrinosystem components, eachio-net naturally benefits from completememory protection thanks to our microkernel architecture.

SCTPThe SCTP (Stream Control Transport Protocol) module(lsm-sctp.so) is a dynamically loadable module that adds SCTPprotocol processing to your TCP/IP stack. SCTP is a reliableconnection-oriented transport protocol with features such as:

� acknowledged error-free, non-duplicated transfer of user data

� data fragmentation to conform to the discovered path MTU size

� sequenced delivery of user messages within multiple streams

� multi-homing.

The QNX SCTP implementation makes use of the BSD socket-basedAPI routines along with SCTP-specific API extensions, including:

� sctp peeloff()

� sctp bindx()

272 Chapter 12 � TCP/IP Networking November 2, 2006

Page 295: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. IP filtering and NAT

� sctp connectx()

� sctp getpaddrs()

� sctp freepaddrs()

� sctp getladdrs()

� sctp freeladdrs()

� sctp opt info()

� sctp sendmsg()

� sctp recvmsg()

IP filtering and NATThe IP filtering and NAT (Network Address Translation)io-net

module (lsm-ipfilter-*.so) is a dynamically loadable TCP/IPstack module. This module provides high-efficiency firewall servicesand includes such features as:

� rule grouping — to apply different groups of rules to differentpackets

� stateful filtering — an optional configuration to allow packetsrelated to an already authorized connection to bypass the filterrules

� NAT — for mapping several internal addresses into a public(Internet) address, allowing several internal systems to share asingle Internet IP address.

� proxy services — to allowftp, netbios, and H.323 to use NAT

� port redirection — for redirecting incoming traffic to an internalserver or to a pool of servers.

The IP filtering and NAT rules can be added or deleteddynamically toa running system. Logging services are also provided with the suite ofutilities to monitor and control this module.

November 2, 2006 Chapter 12 � TCP/IP Networking 273

Page 296: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

NTP 2006, QNX Software Systems GmbH & Co. KG.

NTPNTP (Network Time Protocol) allows you to keep the time of day forthe devices in your network synchronized with the Internet standardtime servers. The QNX NTP daemon supports both server and clientmodes.

In server mode, a daemon on the local network synchronizes with thestandard time servers. It will then broadcast or multicast what itlearned to the clients on the local network, or wait for client requests.The client NTP systems will then be synchronized with the serverNTP system. The NTP suite implements NTP v4 while maintainingcompatibility with v3, v2, and v1.

Dynamic host configurationWe support DHCP (Dynamic Host Configuration Protocol), which isused to obtain TCP/IP configuration parameters. Our DHCP client(dhcp.client) will obtain its configuration parameters from theDHCP server and configure the TCP/IP host for the user. This allowsthe user to add a host to the network without knowing whatparameters (IP address, gateway, etc.) are required for the host.DHCP also allows a system administrator to control how hosts areadded to the network. A DHCP server daemon (dhcpd) and relayagent (dhcprelay) are also provided to manage these clients.

For more information, see thedhcp.client, dhcpd, anddhcprelay entries in theUtilities Reference.

AutoIPDeveloped from the Zeroconf IETF draft,nfm-autoip.so is anio-net module that automatically configures the IPv4 address ofyour interface without the need of a server (as per DHCP) bynegotiating with its peers on the network. This module can alsocoexist with DHCP (dhcp.client), allowing your interface to beassigned both a link-local IP address and a DHCP-assigned IP addressat the same time.

274 Chapter 12 � TCP/IP Networking November 2, 2006

Page 297: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. PPP over Ethernet

PPP over EthernetWe support the Point-to-Point Protocol over Ethernet (PPPoE), whichis commonly deployed by broadband service providers. Our PPPoEsupport consists of thenpm-pppoe.so shared object as well as thepppoed daemon, which negotiates the PPPoE session. Once thePPPoE session is established, thepppd daemon creates a PPPconnection.

When you use PPPoE, you don’t need to specify any configurationparameters — our modules will automatically obtain the appropriateconfiguration data from your ISP and set everything up for you.

For more information, see the following in theUtilities Reference:

npm-pppoe.so

Shared object that provides PPP-to-Ethernet services.

pppoed Daemon to negotiate the PPPoE session.

phlip Photon TCP/IP and dialup configuration tool.

phdialer Photon modem dialer.

/etc/autoconnectOur autoconnect feature automatically sets up a connection to yourISP whenever a TCP/IP application is started. For example, supposeyou want to start a dialup connection to the Internet. When your Webbrowser is started, it will pause and the/etc/autoconnect scriptwill automatically dial your ISP. The browser will resume when thePPP session is established.

For more information, see the entry for/etc/autoconnect in theUtilities Reference.

November 2, 2006 Chapter 12 � TCP/IP Networking 275

Page 298: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

SNMP support 2006, QNX Software Systems GmbH & Co. KG.

SNMP supportThe SRI SNMP Suite for QNX Neutrino consists primarily of ports ofthe EMANATE and EMANATE/Lite technologies developed bySNMP Research International (SRI). EMANATE/Lite is a staticallyextensible agent; the EMANATE agent can be extended dynamically.Both agents support SNMP V1, V2, and V3, and include developmentkits for developers to extend the agents.

Embedded web serverOur embedded web server,slinger, uses very little memory andcommunicates over TCP/IP sockets. The embedded web serversupports CGI 1.1, HTTP 1.1, and dynamic HTML (via SSIcommands).

TCP/IP

Hardware

DataServer

Slinger

Filesystem

Controlapps

HTTP Tag Get/Set

Tag Get/Set

HTMLSHTML

Internet

Web pages

Embedded web server.

Many embedded servers force the user to relink the server in order toadd pages, which compromises reliability as vendor and user codecompete in a shared memory space.

276 Chapter 12 � TCP/IP Networking November 2, 2006

Page 299: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Embedded web server

Despite its size, our embedded web server provides enoughfunctionality to support accessing generated (dynamic) HTML viaCGI or SSI.

CGI methodThe embedded web server supports the Common Gateway Interface(CGI) 1.1, a readily available means of handling dynamic data. Thedownside of CGI is that it’s resource-heavy because an interpretedlanguage is often involved.

SSI methodWith SSI (Server Side Includes), a type of command language thatcan be embedded in HTML files, you can add dynamic content toyour HTML. For example, the embedded server can:

� execute utilities at user-defined points in an HTML document (theoutput of these utilities can be optionally placed in the document).

� insert contents of other HTML files at a user-defined point.

� handle conditional statements (if, break, goto) so you candefine what parts of an HTML file are transmitted.

Note that SSI tags are available to interact with a data server.

Data server method

You can also handle dynamic HTML by using what we call adataserver. The data server allows multiple threads to share data withoutregard for process boundaries. Since the embedded web serversupports SSI, we’ve extended this support by adding the ability to talkto the data server.

Now you can have a process updating the data server about the stateof a hardware device while the embedded web server accesses thatstate in a decoupled but reliable manner.

November 2, 2006 Chapter 12 � TCP/IP Networking 277

Page 300: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Embedded web server 2006, QNX Software Systems GmbH & Co. KG.

You can write a simple I/O manager to provide dynamic data. Notethat this method isn’t specific to the embedded web server and that theI/O manager can handle only output, not posts.

278 Chapter 12 � TCP/IP Networking November 2, 2006

Page 301: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 13

High Availability

In this chapter. . .What is High Availability? 281Client library 283High Availability Manager 286

November 2, 2006 Chapter 13 � High Availability 279

Page 302: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 303: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. What is High Availability?

What is High Availability?The termHigh Availability (HA) is commonly used intelecommunications and other industries to describe a system’s abilityto remain up and running without interruption for extended periods oftime. The celebrated “five nines” availability metric refers to thepercentage of uptime a system can sustain in a year — 99.999%uptime amounts to about five minutes downtime per year.

Obviously, an effective HA solution involves various hardware andsoftware components that conspire to form a stable, working system.Assuming reliable hardware components with sufficient redundancy,how can an OS best remain stable and responsive when a particularcomponent or application program fails? And in cases whereredundant hardware may not be an option (e.g. consumer appliances),how can the OS itself support HA?

An OS for HAIf you had to design an HA-capable OS from the ground up, wouldyou start with a single executable environment? In this simple,high-performance design, all OS components, device drivers,applications, the works, would all run without memory protection inkernel mode.

On second thought, maybe such an OS wouldn’t be suited for HA,simply because if a single software component were to fail, the entiresystem would crash. And if you wanted to add a software componentor otherwise modify the HA system, you’d have to take the system outof service to do so. In other words, theconventional realtimeexecutive architecture wasn’t built with HA in mind.

Suppose, then, that you base your HA-enabled OS on a separation ofkernel space and user space, so that all applications would run in usermode and enjoy memory protection. You’d even be able to upgrade anapplication without incurring any downtime.

So far so good, but what would happen if a device driver, filesystemmanager, or other essential OS component were to crash? Or what ifyou needed to add a new driver to a live system? You’d have to

November 2, 2006 Chapter 13 � High Availability 281

Page 304: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

What is High Availability? 2006, QNX Software Systems GmbH & Co. KG.

rebuild and restart the kernel. Based on such amonolithic kernelarchitecture, your HA system wouldn’t be as available as it should be.

Inherent HA

A true microkernel that provides full memory protection is inherentlythe most stable OS architecture. Very little code is running in kernelmode that could cause the kernel itself to fail. And individualprocesses, whether applications or OS services, can be started andstopped dynamically, without jeopardizing system uptime.

QNX Neutrino inherently provides several key features that arewell-suited for HA systems:

� System stability through full memory protection for all OS anduser processes.

� Dynamic loading and unloading of system components (devicedrivers, filesystem managers, etc.).

� Separation of all software components for simpler developmentand maintenance.

While any claims regarding “five nines” availability on the part of anOS must be viewed only in the context of the entire hardware/softwareHA system, one can always ask whether an OS truly has theappropriate underlying architecture capable of supporting HA.

HA-specific modules

Apart from its inherently robust architecture, Neutrino also providesseveral components to help developers simplify the task of buildingand maintaining effective HA systems:

� HA client-side library — cover functions that allow for automaticand transparent recovery mechanisms for failed server connections.

� HA Manager — a “smart watchdog” that can perform multistagerecovery whenever system services or processes fail.

282 Chapter 13 � High Availability November 2, 2006

Page 305: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Client library

Custom hardware supportWhile many operating systems provide HA support in ahardware-specific way (e.g. via PCI Hot Plug), QNX Neutrino isn’ttied to PCI. Your particular HA system may be built on a customchassis, in which case an OS that offers a PCI-based HA “solution”may not address your needs at all.

QNX Software Systems is an actively contributing member of theService Availability Forum (www.saforum.org), an industry bodydedicated to developing open, industry-standard specifications forbuilding HA systems.

Client libraryThe HA client-side library provides a drop-in enhancement solutionfor many standard C Library I/O operations. The HA library’s coverfunctions allow forautomatic and transparent recovery mechanismsfor failed connections that can be recovered from in an HA scenario.Note that the HA library is both thread-safe and cancellation-safe.

The main principle of the client library is to provide drop-inreplacements for all the message-delivery functions (i.e.MsgSend*).A client can select which particular connections it would like to makehighly available, thereby allowing all other connections to operate asordinary connections (i.e. in a non-HA environment).

Normally, when a server that the client is talking to fails, or if there’sa transient network fault, theMsgSend* functions return an errorindicating that the connection ID (or file descriptor) is stale or invalid(e.g.EBADF). But in an HA-aware scenario, these transient faults arerecovered from almost immediately, thus making the servicesavailable again.

Recovery exampleThe following example demonstrates a simple recovery scenario,where a client opens a file across a network file system. If the NFS

November 2, 2006 Chapter 13 � High Availability 283

Page 306: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Client library 2006, QNX Software Systems GmbH & Co. KG.

server were to die, the HA Manager would restart it and remount thefilesystem. Normally, any clients that previously had files open acrossthe old connection would now have a stale connection handle. But ifthe client uses theha attach functions, it can recover from the lostconnection.

Theha attach functions allow the client to provide a custom recoveryfunction that’s automatically invoked by the cover-function library.This recovery function could simply reopen the connection (therebygetting a connection to the new server), or it could perform a morecomplex recovery (e.g. adjusting the file position offsets andreconstructing its state with respect to the connection). Thismechanism thus lets you develop arbitrarily complex recoveryscenarios, while the cover-function library takes care of the details(detecting a failure, invoking recovery functions, and retransmittingstate information).

#include <stdio.h>#include <string.h>#include <stdlib.h>#include <unistd.h>#include <sys/stat.h>#include <fcntl.h>#include <errno.h>#include <ha/cover.h>

#define TESTFILE "/net/machine99/home/test/testfile"

typedef struct handle {int nr;int curr offset;

} Handle ;

int recover conn(int oldfd, void *hdl){int newfd;Handle *thdl;thdl = (Handle *)hdl;newfd = ha reopen(oldfd, TESTFILE, O RDONLY);if (newfd >= 0) {

// adjust file offset to previously known pointlseek(newfd, thdl->curr offset, SEEK SET);// increment our count of successful recoveries(thdl->nr)++;

}return(newfd);

284 Chapter 13 � High Availability November 2, 2006

Page 307: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Client library

}

int main(int argc, char *argv[]){int status;int fd;int fd2;Handle hdl;char buf[80];

hdl.nr = 0;hdl.curr offset = 0;// open a connection// recovery will be using "recovery conn", and "hdl" will// be passed to it as a parameterfd = ha open(TESTFILE, O RDONLY, recover conn, (void *)&hdl, 0);if (fd < 0) {

printf("could not open file\n");exit(-1);

}status = read(fd,buf,15);if (status < 0) {

printf("error: %s\n",strerror(errno));exit(-1);

}else {

hdl.curr offset += status;}fd2 = ha dup(fd);// fs-nfs2 fails, and is restarted, the network mounts// are re-instated at this point.// Our previous "fd" to the file is stalesleep(18);// reading from dup-ped fd// will fail, and will recover via recover connstatus = read(fd,buf,15);if (status < 0) {

printf("error: %s\n",strerror(errno));exit(-1);

}else {

hdl.curr offset += status;}printf("total recoveries, %d\n",hdl.nr);ha close(fd);ha close(fd2);exit(0);

}

November 2, 2006 Chapter 13 � High Availability 285

Page 308: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

High Availability Manager 2006, QNX Software Systems GmbH & Co. KG.

Since the cover-function library takes over the lowestMsgSend*()calls, most standard library functions (read(), write(), printf(), scanf(),etc.) are also automatically HA-aware. The library also provides anha-dup() function, which is semantically equivalent to the standarddup() function in the context of HA-aware connections. You canreplace recovery functionsduring the lifetime of a connection, whichgreatly simplifies the task of developing highly customized recoverymechanisms.

High Availability ManagerThe High Availability Manager (HAM) provides a mechanism formonitoring processes and services on your system. The goal is toprovide a resilient manager (or “smart watchdog”) that can performmultistage recovery whenever system services or processes fail, nolonger respond, or are detected to be in a state where they cease toprovide acceptable levels of service. The HA framework, includingthe HAM, uses a simple publish/subscribe mechanism tocommunicate interesting system events between interestedcomponents in the system. By automatically integrating itself into thenative networking mechanism (Qnet), this framework transparentlyextends a local monitoring mechanism to a network-distributed one.

The HAM acts as a conduit through which the rest of the system canboth obtain and deliver information regarding the state of the systemas a whole. Again, the system could be simply a single node or acollection of nodes connected via Qnet. The HAM can monitorspecific processes and can control the behavior of the system whenspecific components fail and need to be recovered. The HAM alsoallows external detectors to detect and report interesting events to thesystem, and can associate actions with the occurrence of those events.

In many HA systems, each single points of failure (SPOF) must beidentified and dealt with carefully. Since the HAM maintainsinformation about the health of the system and also provides the basicrecovery framework, the HAM itself must never become a SPOF.

286 Chapter 13 � High Availability November 2, 2006

Page 309: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. High Availability Manager

HAM and the GuardianAs a self-monitoring manager, the HAM is resilient to internalfailures. If, for whatever reason, the HAM itself is stoppedabnormally, it can immediately and completely reconstruct its ownstate. A mirror process called theGuardian perpetually stands readyand waiting to take over the HAM’s role. Since all state information ismaintained in shared memory, the Guardian can assume the exactsame state that the original HAM was in before the failure.

But what happens if the Guardian terminates abnormally? TheGuardian (now the new HAM) creates a new Guardian for itselfbefore taking the place of the original HAM. Practically speaking,therefore, one can’t exist without the other.

Since the HAM/Guardian pair monitor each other, the failure of eitherone can be completely recovered from. The only way to stop theHAM is to explicitly instruct it to terminate the Guardian and then toterminate itself.

HAM hierarchyHAM consists of three main components:

� Entities

� Conditions

� Actions

Entities

Entities are the fundamental units of observation/monitoring in thesystem. Essentially, an entity is a process (pid). As processes, allentities are uniquely identifiable by theirpids. Associated with eachentity is a symbolic name that can be used to refer to that specificentity. Again, the names associated with entities are unique across thesystem. Managers are currently associated with a node, so uniquenessrules apply to a node. As we’ll see later, this uniqueness requirementis very similar to the naming scheme used in a hierarchical filesystem.

November 2, 2006 Chapter 13 � High Availability 287

Page 310: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

High Availability Manager 2006, QNX Software Systems GmbH & Co. KG.

There are three fundamental entity types:

� Self-attached entities (HA-aware components) — processes thatchoose to send heartbeats to the HAM, which will then monitorthem for failure. Self-attached entities can, on their own, decide atexactly what point in their lifespan they want to be monitored,what conditions they want acted upon, and when they want to stopthe monitoring. In other words, this is a situation where a processsays, “Do the following if I die.”

� Externally attached entities (HA-unaware components) — genericprocesses (including legacy components) in the system that arebeing monitored. These could be arbitrary daemons/serviceproviders whose health is deemed important. This method is usefulfor the case where Process A says, “Tell me when Process B dies”but Process B needn’t know about this at all.

� Global entity — a place holder for matching any entity. The globalentity can be used to associate actions that will be triggered whenan interesting event is detected with respect to any entity on thesystem. The term “global” refers to the set of entities beingmonitored in the system, and allows a process to say things like,“When any process dies or misses a heartbeat, do the following.”The global entity is never added or removed, but only referred to.Conditions can be added to or removed from the global entity, ofcourse, and actions can be added to or removed from any of theconditions.

Conditions

Conditions are associated with entities; a condition represents theentity’s state.

Condition Description

CONDDEATH The entity has died.

continued. . .

288 Chapter 13 � High Availability November 2, 2006

Page 311: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. High Availability Manager

Condition Description

CONDABNORMALDEATH The entity has died an abnormaldeath. Whenever an entity dies, thiscondition is triggered by amechanism that results in thegeneration of a core dump file.

CONDDETACH The entity that was being monitoredis detaching. This ends the HAM’smonitoring of that entity.

CONDATTACH An entity for whom a place holderwas previously created (i.e. someprocess has subscribed to eventsrelating to this entity) has joined thesystem. This is also the start of theHAM’s monitoring of the entity.

CONDBEATMISSEDHIGH The entity missed sending a“heartbeat” message specified for acondition of “high” severity.

CONDBEATMISSEDLOW The entity missed sending a“heartbeat” message specified for acondition of “low”

CONDRESTART The entity was restarted. Thiscondition is trueafter the entity issuccessfully restarted.

CONDRAISE An externally detected condition isreported to the HAM. Subscriberscan associate actions with theseexternally detected conditions.

CONDSTATE An entity reports a state transition tothe HAM. Subscribers can associateactions with specific state transitions.

continued. . .

November 2, 2006 Chapter 13 � High Availability 289

Page 312: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

High Availability Manager 2006, QNX Software Systems GmbH & Co. KG.

Condition Description

CONDANY This condition type matchesanycondition type. It can be used toassociate the same actions with oneof many conditions.

For the conditions listed above (exceptCONDSTATE, CONDRAISE,andCONDANY), the HAM is the publisher — it automatically detectsand/or triggers the conditions. For theCONDSTATEandCONDRAISEconditions, external detectors publish the conditions to the HAM.

For all conditions, subscribers can associate with lists of actions thatwill be performed in sequence when the condition is triggered. BoththeCONDSTATEandCONDRAISEconditions provide filteringcapabilities, so subscribers can selectively associate actions withindividual conditions based on the information published.

Any condition can be associated as a wild card with any entity, so aprocess can associate actions withany condition in a specific entity, oreven in any entity. Note that conditions are also associated withsymbolic names, which also need to be unique within an entity.

Actions

Actions are associated with conditions. Actions are executed when theappropriate conditions are true with respect to a specific entity. TheHAM API includes several functions for different kinds of actions:

Action Description

ham action restart() This action restarts the entity.

ham action execute() Executes an arbitrary command(e.g. to start a process).

continued. . .

290 Chapter 13 � High Availability November 2, 2006

Page 313: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. High Availability Manager

Action Description

ham action notify pulse() Notifies some process that thiscondition has occurred. Thisnotification is sent using aspecificpulse with a valuespecified by the process thatwished to receive this notifymessage.

ham action notify signal() Notifies some process that thiscondition has occurred. Thisnotification is sent using aspecificrealtime signal with avalue specified by the processthat wished to receive thisnotify message.

ham action notify pulse node() This is the same asham action notify pulse()above, except that the nodename specified for the recipientof the pulse can be the fullyqualified node name.

ham action notify signal node() This is the same asham action notify signal()above, except that the nodename specified for the recipientof the signal can be the fullyqualified node name.

ham action waitfor() Lets you insert delays betweenconsecutive actions in asequence. You can also wait forcertain names to appear in thenamespace.

continued. . .

November 2, 2006 Chapter 13 � High Availability 291

Page 314: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

High Availability Manager 2006, QNX Software Systems GmbH & Co. KG.

Action Description

ham action heartbeat healthy() Resets the heartbeat mechanismfor an entity that had previouslymissed sending heartbeats andhad triggered a missedheartbeat condition, but hasnow recovered.

ham action log() Reports this condition to alogging mechanism.

Actions are also associated with symbolic names, which are uniquewithin a specific condition.

Alternate actions

What happens if an action itself fails? You can specify analternatelist of actions to be performed to recover from that failure. Thesealternate actions are associated with the primary actions throughseveralham action fail* functions:

ham action fail execute()

ham action fail notify pulse()

ham action fail notify signal()

ham action fail notify pulse node()

ham action fail notify signal node()

ham action fail waitfor()

ham action fail log()

Publishing autonomously detected conditionsEntities or other components in the system can inform the HAMabout conditions (events) that they deem interesting, and the HAM inturn can deliver these conditions (events) to other components in thesystem that have expressed interest in (subscribed to) them.

292 Chapter 13 � High Availability November 2, 2006

Page 315: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. High Availability Manager

This publishing feature allows arbitrary components that are capableof detecting error conditions (or potentially erroneous conditions) toreport these to the HAM, which in turn can notify other componentsto start corrective and/or preventive action.

There are currently two different ways of publishing information tothe HAM; both of these are designed to be general enough to permitclients to build more complex information exchange mechanisms:

� publishing state transitions

� publishing other conditions.

State transitions

An entity can report its state transitions to the HAM, which maintainsevery entity’s current state (as reported by the entity). The HAMdoesn’t interpret the meaning of the state value itself, nor does it try tovalidate the state transitions, but it can generate events based ontransitions from one state to another.

Components can publish transitions that they want the external worldto know about. These states needn’t necessarily represent a specificstate the application uses internally for decision making.

To notify the HAM of a state transition, components can use theham entity condition state() function. Since the HAM is interestedonly in thenext state in the transition, this is the only informationthat’s transmitted to the HAM. The HAM then triggers a conditionstate-change event internally, which other components can subscribeto using theham condition state() API call (see below).

Other conditions

In addition to the above, components on the system can also publishautonomously detected conditions by using theham entity condition raise() API call. The component raising thecondition can also specify a type, class, and severity of its choice, toallow subscribers further granularity in filtering out specificconditions to subscribe to. As a result of this call, the HAM triggers a

November 2, 2006 Chapter 13 � High Availability 293

Page 316: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

High Availability Manager 2006, QNX Software Systems GmbH & Co. KG.

condition-raise event internally, which other components cansubscribe to using theham condition raise() API call (see below).

Subscribing to autonomously published conditionsTo express their interest in events published by other components,subscribers can use theham condition state() andham condition raise() API calls. These are similar to theham condition() API call (e.g. they return a handle to a condition),but they allow the subscriber customize which of several possiblepublished conditions they’re interested in.

Trigger based on state transition

When an entity publishes a state transition, astate transition conditionis raised for that entity, based on the two states involved in thetransition (thefrom state and theto state). Subscribers indicate whichstates they’re interested in by specifying values for thefromstate andtostate parameters in the API call. For more information, see the APIreference documentation for theham condition state() call in theHigh Availability Toolkit Developer’s Guide.

Trigger based on specific published condition

To express interest in conditions raised by entities, subscribers can usethe API callham condition raise(), indicating as parameters to thecall what sort of conditions they’re interested in. For moreinformation, refer to the API documentation for theham condition raise() call in the High Availability ToolkitDeveloper’s Guide.

HAM as a “filesystem”Effectively, HAM’s internal state is like a hierarchical filesystem,where entities are like directories, conditions associated with thoseentities are like subdirectories, and actions inside those conditions arelike leaf nodes of this tree structure.

294 Chapter 13 � High Availability November 2, 2006

Page 317: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. High Availability Manager

HAM also presents this state as a read-only filesystem under/proc/ham. As a result, arbitrary processes can also view the currentstate (e.g. you can dols /proc/ham).

The/proc/ham filesystem presents a lot of information about thecurrent state of the system’s entities. It also provides useful statisticson heartbeats, restarts, and deaths, giving you a snapshot in time ofthe system’s various entities, conditions, and actions.

Multistage recoveryHAM can perform a multistage recovery, executing several actions ina certain order. This technique is useful whenever strict dependenciesexist between various actions in a sequence. In most cases, recoveryrequires more than a single restart mechanism in order to properlyrestore the system’s state to what it was before a failure.

For example, suppose you’ve startedfs-nfs2 (the NFS filesystem)and then mounted a few directories from multiple sources. You caninstruct HAM to restartfs-nfs2 upon failure, and also to remountthe appropriate directories as required after restarting the NFSprocess.

As another example, supposeio-net (network I/O manager) were todie. We can tell HAM to restart it and also to load the appropriatenetwork drivers (and maybe a few more services that essentiallydepend on network services in order to function).

HAM APIThe basic mechanism to talk to HAM is to use its API. This API isimplemented as a library that you can link against. The library isthread-safe as well as cancellation-safe.

To control exactly what/how you’re monitoring, the HAM APIprovides a collection of functions, including:

November 2, 2006 Chapter 13 � High Availability 295

Page 318: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

High Availability Manager 2006, QNX Software Systems GmbH & Co. KG.

Function Description

ham action control() Perform control operationson an action object.

ham action execute() Add an execute action to acondition.

ham action fail execute() Add to an action an executeaction that will be executedif the corresponding actionfails.

ham action fail log() Insert a log message intothe activity log.

ham action fail notify pulse() Add to an action a notifypulse action that will beexecuted if thecorresponding action fails.

ham action fail notify pulse node() Add to an action anode-specific notify pulseaction that will be executedif the corresponding actionfails.

ham action fail notify signal() Add to an action a notifysignal action that will beexecuted if thecorresponding action fails.

ham action fail notify signal node() Add to an action anode-specific notify signalaction that will be executedif the corresponding actionfails.

continued. . .

296 Chapter 13 � High Availability November 2, 2006

Page 319: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. High Availability Manager

Function Description

ham action fail waitfor() Add to an action a waitforaction that will be executedif the corresponding actionfails.

ham action handle() Get a handle to an action ina condition in an entity.

ham action handle node() Get a handle to an action ina condition in an entity,using a nodename.

ham action handle free() Free a previously obtainedhandle to an action in acondition in an entity.

ham action heartbeat healthy() Reset a heartbeat’s state tohealthy.

ham action log() Insert a log message intothe activity log.

ham action notify pulse() Add a notify-pulse actionto a condition.

ham action notify pulse node() Add a notify-pulse actionto a condition, using anodename.

ham action notify signal() Add a notify-signal actionto a condition.

ham action notify signal node() Add a notify-signal actionto a condition, using anodename.

ham action remove() Remove an action from acondition.

continued. . .

November 2, 2006 Chapter 13 � High Availability 297

Page 320: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

High Availability Manager 2006, QNX Software Systems GmbH & Co. KG.

Function Description

ham action restart() Add a restart action to acondition.

ham action waitfor() Add a waitfor action to acondition.

ham attach() Attach an entity.

ham attach node() Attach an entity, using anodename.

ham attach self() Attach an application as aself-attached entity.

ham condition() Set up a condition to betriggered when a certainevent occurs.

ham condition control() Perform control operationson a condition object.

ham condition handle() Get a handle to a conditionin an entity.

ham condition handle node() Get a handle to a conditionin an entity, using anodename.

ham condition handle free() Free a previously obtainedhandle to a condition in anentity.

ham condition raise() Attach a conditionassociated with a conditionraise condition that’striggered by an entityraising a condition.

ham condition remove() Remove a condition froman entity.

continued. . .

298 Chapter 13 � High Availability November 2, 2006

Page 321: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. High Availability Manager

Function Description

ham condition state() Attach a conditionassociated with a statetransition condition that’striggered by an entityreporting a state change.

ham connect() Connect to a HAM.

ham connect nd() Connect to a remote HAM.

ham connect node() Connect to a remote HAM,using a nodename.

ham detach() Detach an entity from aHAM.

ham detach name() Detach an entity from aHAM, using an entityname.

ham detach name node() Detach an entity from aHAM, using an entity nameand a nodename.

ham detach self() Detach a self-attachedentity from a HAM.

ham disconnect() Disconnect from a HAM.

ham disconnect nd() Disconnect from a remoteHAM.

ham disconnect node() Disconnect from a remoteHAM, using a nodename.

ham entity() Create entity placeholderobjects in a HAM.

ham entity condition raise() Raise a condition.

continued. . .

November 2, 2006 Chapter 13 � High Availability 299

Page 322: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

High Availability Manager 2006, QNX Software Systems GmbH & Co. KG.

Function Description

ham entity condition state() Notify the HAM of a statetransition.

ham entity control() Perform control operationson an entity object in aHAM.

ham entity handle() Get a handle to an entity.

ham entity handle node() Get a handle to an entity,using a nodename.

ham entity handle free() Free a previously obtainedhandle to an entity.

ham entity node() Create entity placeholderobjects in a HAM, using anodename.

ham heartbeat() Send a heartbeat to a HAM.

ham stop() Stop a HAM.

ham stop nd() Stop a remote HAM.

ham stop node() Stop a remote HAM, usinga nodename.

ham verbose() Modify the verbosity of aHAM.

300 Chapter 13 � High Availability November 2, 2006

Page 323: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 14

Power Management

In this chapter. . .Power management for embedded systems303Application-driven framework 303

November 2, 2006 Chapter 14 � Power Management 301

Page 324: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 325: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Power management for embedded systems

Power management for embedded systemsTraditional power management (PM) is aimed at conserving thepower of computers that are usually lefton. The general-purposeapproach to PM for desktop PCs — or even for “mobile” PCs, such aslaptops — doesn’t take into account the specific demands ofembedded systems, which can beoff (or on standby) much of thetime, yet must respond to external events in predictable ways.

The two main industry standards for PM — APM and its successorACPI — deal with PCs, not embedded systems. The older APMstandard was BIOS-oriented (how many embedded systems even havea BIOS?). The newer ACPI is more OS-focused and therefore moresophisticated, but the standard expressly concerns itself withdesktops, laptops, and servers. An OS can never anticipate theapplication-specific scenarios of embedded systems and makeappropriate power policy decisions.

Application-driven frameworkGiven the sheer variety of embedded systems (e.g hand-held medicalinstruments, network routers, in-car telematics systems, to name but afew), the application developer is in the best position to know thespecific power-related needs of the system in question.

The QNX PM framework is precisely that — aframework or set ofmechanisms that developers can use to control and manage power intheir embedded systems. The OS doesn’t impose a power policy.Instead, you create your own policy based on the specific needs ofyour application.

Framework componentsThe PM framework consists of these main components:

November 2, 2006 Chapter 14 � Power Management 303

Page 326: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Application-driven framework 2006, QNX Software Systems GmbH & Co. KG.

Framework component Description

Power manager A process that implements the system’s PM policy.This policy is user-written — the system designercreates a custom policy for the embedded system, andhas full control over device power modes for thevarious operational modes of the system.

Power-managed objects These represent hardware or software components thatcan be independently power-managed. Device driversfor power-managed devices interact with the powermanager to control device power modes according tothe system’s PM policy.

Power-aware applications These are user-written applications that interact withthe power manager. They provide functionalityspecific to the embedded system (e.g. monitoringenvironmental conditions that could affect the desiredoperational mode of the system).

Persistent storage services These allow device drivers to preserve hardware andsoftware configuration details when the system isplaced in a standby mode. The developer cancustomize the physical storage location of thispersistent data for the embedded system.

BSP support Code within the board-specific IPL and startupprograms implements the board- and CPU-specificsupport for sleep/standby modes.

Libraries and BSPsThese components are built using the following:

libpm Client library that applications and device drivers use tointeract with the power manager.

libps Persistent storage library used to manipulate datamanaged by the persistent storage services.

304 Chapter 14 � Power Management November 2, 2006

Page 327: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Application-driven framework

libpmm Server library used to build the power manager. Thislibrary implements the underlying protocol forcommunicating with device drivers and clientapplications.

BSPs These provide reference implementations for the CPUpower-mode support for reference hardware platforms(e.g. the Renesas Biscayne SH7760). You can customizea reference implementation for your particular embeddedsystem.

November 2, 2006 Chapter 14 � Power Management 305

Page 328: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Application-driven framework 2006, QNX Software Systems GmbH & Co. KG.

Power manager

Device driver

Client library

(libpm)

Server library (libpmm)

Power-monitoringapplication

Power-sensitiveapplication

Device driverStartup

power()callout

CPU Peripherals

OS

BSP

Userapplications

OS and BSPcomponents

The QNX PM framework facilitates application-specific designs.

Power managerAs the central component within the PM framework, the powermanager is responsible for coordinating the power modes of eachpower-managed object according the system’s PM policy.

306 Chapter 14 � Power Management November 2, 2006

Page 329: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Application-driven framework

The power manager is implemented as a user-written process thatuses thelibpmm library to manage the interactions with devicedrivers and power-aware applications.

The power manager consists of three main components:

� A resource manager interface used to represent power-managedobjects. Device drivers and power-aware applications use thisinterface to manipulate the objects controlled by the powermanager.

The functionality for handling this communication is implementedentirely within thelibpmm library.

� A policy that governs the behavior of power-managed objects inresponse to requests made via the resource manager interface.

The policy is customizable. Your policy is informed whenever anoperation is performed on an object. This allows your policy codeto track the state of an object, and where appropriate, to allow ordeny certain operations. For example, your policy may denyrequests to change the power mode of an object if the new modeisn’t compatible with the system’s current power-mode state.

Thelibpm library provides basic default behavior for theseoperations, so the system-specific policy needs to override onlythose operations that are required to implement the system’s PMpolicy.

� Code to manage system power states. This user-written codedefines the range of operational modes for the system along withtheir corresponding device and CPU power modes. This code isresponsible for system initialization and actions that need to beperformed based on system-specific events that are relevant todeciding the system’s optimum power-mode state.

Resource manager interface

The most basic power manager simply initializes the resourcemanager interface to handle driver and client requests, and then startsa thread pool to service them:

November 2, 2006 Chapter 14 � Power Management 307

Page 330: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Application-driven framework 2006, QNX Software Systems GmbH & Co. KG.

#include <sys/procmgr.h>#include <sys/pmm.h>

intmain(){

// initialize and start resource manager interfacepmm init(0, 0);pmm start(0, 0);

// become a daemonprocmgr daemon(0, 0);pthread detach(pthread self());pthread exit(0);return 0;

}

This is sufficient to completely handle dynamic attachment by anydrivers that start after the power manager; it implements a simpledefault policy that allows all power-mode changes to proceed asrequested.

Power manager policy

Thelibpmm library handles the low-level details of interacting withdevice drivers and power-aware applications. However, the decisionsabout what actions are allowed to proceed arepolicy decisions thatdevelopers customize for their embedded system.

The library invokes this policy code whenever requests arrive thataffect a power-managed object’s state. If you don’t supply you ownpolicy function, an internal default is used.

The policy is informed (by invoking policy callback functions) when:

� An object is first created, and when it’s deleted. This allows thepolicy to allocate its own specific data structures to be associatedwith the object.

� A client process attaches and detaches from an object. The policycan thus identify the client process and allow or reject access to theobject. Since objects are accessed via a resource managerinterface, basic authentication/access control is performed usingthe standard file access permissions of the object within the power

308 Chapter 14 � Power Management November 2, 2006

Page 331: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Application-driven framework

manager namespace. Customizing this callback can allowadditional authentication mechanisms.

� An object is assigned a name in the namespace. This allows thepolicy to determine how this object is related to other objects in thenamespace, for example to manage power dependencies. As withthe client attachment, basic permission checking is provided viafile access permissions.

� An object is removed from the namespace. This allows the policyto remove any interobject dependencies.

� A driver asks to manage an object. The policy can thus discoverthe supported device modes and can specify what initial mode thedevice should be placed in. For warm reboots from system standbymodes, the policy can also instruct the driver to recover any data itsaved in persistent storage.

� A driver detaches. This informs the policy that the object can nolonger be power-managed.

� A client (or the driver itself) requests a power-mode change to anobject. The policy can validate that the requested mode iscompatible with the current system power state and allow or rejectthe request.

� A property is added to an object, or a property value is changed.This allows the policy to react to property changes that may affectthe system’s power-mode state.

System power states

An embedded system may pass through a number of different powerstates. Each state defines the functionality and allowed power modesfor the various power-managed objects in the system.

For example, an automotive device may define the following systemstates:

November 2, 2006 Chapter 14 � Power Management 309

Page 332: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Application-driven framework 2006, QNX Software Systems GmbH & Co. KG.

System state Description

ACTIVE A fully operational state where all devices may bepowered up. Within this state, individual devicescan be powered down when not being activelyused, but are allowed to be powered up whenrequired.

CRANK A nonoperational state where the system ispowered down during an engine crank. When theengine has started, the system returns to an activestate.

SLEEP A nonoperational state where devices are powereddown and system RAM is held in self-refreshmode. This implements a “soft off” state thatallows the system to become fully operational withshort latency. Certain devices are configured to actas wakeup sources. For example, an RTC may beused to limit the time spent in this state. If the RTCtimeout expires, the system will move to a lowerpower-standby state to further reduce powerconsumption.

STANDBY A nonoperational state where devices are powereddown and system RAM is disabled. This providesa lower-power “soft off” state, where the systemcan return to its previous operational state statewith a slightly longer latency, since it requires awarm reboot. Certain devices are configured to actas wakeup sources. For example, activating a keylock or phone interface would return the system toan active state, or an RTC can be used toperiodically wake the system to ensure the systemshuts down completely when the battery levelreaches a certain threshold.

continued. . .

310 Chapter 14 � Power Management November 2, 2006

Page 333: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Application-driven framework

System state Description

OFF The entire system is powered down to prevent itsconsuming any battery power.

These system states can typically be modeled using a state machine,where system-specific events are used to cause transitions betweenstates:

� Externally generated events can be implemented using properties(e.g. bus control messages associated with a vehicle bus, ordiscrete events such as key/switch status).

� Wakeup events are provided by the BSP-level IPL/startup code.

The policy callbacks interact with the state machine implementationto limit the allowed operations for the current system state, and tokeep track of the status of each power-managed object so that theycan be appropriately managed during state transitions.

Power-managed objectsEach entity that can be independently power-managed is representedby the power manager as a power-managed object.

The power manager’ssystem PM policy determines the allowablepower mode for each object, based on the system’s currentoperational state. For example:

� System standby states require all power-managed objects to beplaced in an appropriate standby mode.

� Some system states may operate with reduced power, providing alimited subset of the full system functionality. In this case, somedevices are allowed to be powered up, while others must bepowered down.

Power-managed objects are typically used to representpower-managed peripheral devices, allowing the power manager tocontrol the device’s power level. However, power-managed objects

November 2, 2006 Chapter 14 � Power Management 311

Page 334: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Application-driven framework 2006, QNX Software Systems GmbH & Co. KG.

can also represent specialized applications. In this case, theiroperational characteristics are exposed to the power manager via itspower mode; the power manager can control the application’sbehavior by changing its mode when the power manager changes thesystem’s power-mode state.

Power manager namespace

The power manager presents all power-managed objects via ahierarchical namespace under/dev/pmm. This namespace provides asimple configuration database that can represent both physical andlogical relationships between objects:

� The hierarchy can follow the hardware topology between busesand devices. Nonleaf nodes represent buses, whose children aredevices or bridges to other buses. Leaf nodes represent individualdevices.

� Logical relationships can be created where nonleaf nodes canrepresent system services or subsystems whose children are thepower-managed objects managed by that subsystem.

Device drivers for power-managed devices register with the powermanager by attaching to a node within the namespace or by creating anew node in the appropriate location. Once attached, the driver isresponsible for responding to requests from the power manager tomanage the device’s power mode.

The framework allows for both statically and dynamically configuredsystems:

� A statically configured system contains a fixed hardwareconfiguration, where all devices that are present are known to thepower manager. In this case, the power manager can“pre-populate” the namespace with entries for each device, thenstart all device drivers during system initialization.

� A dynamically configured system contains devices that can bedynamically added to the system (e.g. devices on a USB bus orPCMCIA controller). In this case, drivers for these devices are

312 Chapter 14 � Power Management November 2, 2006

Page 335: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Application-driven framework

often started after the power manager has initialized the system.When the drivers start up, they create nodes in the namespace torepresent the new device.

How and when a driver is started makes no difference to the powermanager; as soon as a driver registers with the power manager, itsdevice will be managed according to the system’s PM policy.

Power modes

The framework defines four generic power modes that are universallyapplied to all power-managed objects. These primarily define theoperational state of the object, and by implication, its power level:

Power mode Description

ACTIVE A fully operational state where hardware ispowered up. This is the normal operating mode.

IDLE A partially operational state, where hardware canbe partially powered. From a user’s point of view,the object is fully operational; if necessary, it willbecome ACTIVE to service user requests that can’tbe handled within the IDLE power mode.This mode is typically used only for dynamicpower optimizations by a driver itself. Forexample, if it determines that its device isn’t in use,it may enter an IDLE mode to conserve power.When the device is next used, it will move back toACTIVE.

continued. . .

November 2, 2006 Chapter 14 � Power Management 313

Page 336: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Application-driven framework 2006, QNX Software Systems GmbH & Co. KG.

Power mode Description

STANDBY A nonoperational state, where hardware is powereddown. This mode is used by the power managerwhen the system as a whole is being powered downto a low-power standby state.Entering a STANDBY state typically requires adriver to save the context needed to reinitialize adevice to an ACTIVE state (e.g. because the devicehardware context or the system standby modedisables system RAM, requiring the driver topreserve the software state).

OFF A nonoperational state, where hardware is poweredoff. Entering the OFF state doesn’t require thedriver to save any context.

Note that all power-managed objects support the four operatingmodes, even if there’s no specific IDLE or STANDBY mode.

If there’s no IDLE mode, the object implements the IDLE mode usingthe same operational characteristics and hardware power level as itsACTIVE mode instead.

If there’s no specific STANDBY, the object implements theSTANDBY mode using the same physical power level as its OFFmode. In this case, the only difference is that entry into STANDBYmay require the object to save context needed to reinitialize to anACTIVE state.

These generic power modes provide a uniform way for the powermanager (or power-aware applications) to control and reason aboutthe power levels and functionality of a power-managed object,without requiring any device-specific knowledge.

The framework allows devices to implement an arbitrary number ofadditional device-specific modes that are sublevels of these fourgeneric modes. For example, a driver may define:

314 Chapter 14 � Power Management November 2, 2006

Page 337: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Application-driven framework

� active modes that differ in their performance and powerconsumption

� idle modes that differ in their power consumption and the latencyto return to an active mode

� standby modes that provide system wakeup functionality.

These device-specific modes allow finer-grained control over adevice’s power mode and operational characteristics whereappropriate.

Properties

Each power-managed object may have arbitrary properties associatedwith it. Each property consists of anidentifier,value pair:

identifier An integer value used to the define the property type.

value A data structure whose size and format is defined by theidentifier.

The power manager’s policy code is told whenever a property value ischanged. This allows the policy code to respond to changes that arerelevant to evaluating the optimum system power state.

The framework allows user-definedproperties to be associated withany object, so you can define arbitrary PM-related properties orattributes that can be manipulated by the power manager orpower-aware applications. For example, you can have propertiesdefined that describe battery power levels, vehicle bus status, or otherenvironmental conditions relevant to the embedded system.

Power-aware applicationsPower-aware applications communicate with the power managerusing an API implemented by thelibpm library. The API allows theapplication to manipulate power-managed objects by:

� querying an object’s power-mode status

November 2, 2006 Chapter 14 � Power Management 315

Page 338: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Application-driven framework 2006, QNX Software Systems GmbH & Co. KG.

� modifying an object’s power-mode status (if allowed by the policy)

� receiving notification (via aSIGEVENT) when the power-modestatus changes

� querying or modifying properties associated with an object.

Power-aware applications fall into two main categories:

� system-monitoring applications — these monitor aspects of thesystem that may be relevant to the PM policy (e.g. controlmessages on a vehicle bus, discrete events such as battery-levelindications, ignition key insertion/removal, transducer readings,etc.).

This information can be represented using power managerproperties. The monitoring application can update the propertyvalue to inform the power manager’s PM policy. The policy canthen use this information to evaluate the appropriate system powerstate.

� power-sensitive applications — these register with the powermanager for notification when the state of a power-managed objectchanges. The application can then change its behavior as required(e.g. by enabling or disabling features based on available devices).

Power-sensitive applications can also request to change the powermode of an object, which can be used by the power manager’s PMpolicy to intelligently decide which devices are eligible to bepowered down at any given time.

Persistent storage servicesMost CPUs support a low-power mode, where the CPU andperipherals are powered down with DRAM maintained in self-refreshmode. But for embedded systems that spend most of their time in astandby state for prolonged periods of time, this is unacceptable,because the power used for self-refresh over prolonged periods willexceed the reserve capacity provided by limited battery sources.

For these systems, the typical standby state will disable DRAM, sothat only the limited hardware necessary to wake the system remains

316 Chapter 14 � Power Management November 2, 2006

Page 339: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Application-driven framework

powered. When the system is woken up, a warm reboot is performed;the power manager then needs to reinitialize the system according tothe power-mode state it decides the system should be in. This mayrequire drivers or power-aware applications to save volatileconfiguration information in persistent storage so that this informationcan be recovered when they are restarted.

Services for persistent storage are provided by apersistent storagemanager that controls one or more persistent stores. These persistentstores can make use of a number of different media:

Media Features

Shared memory Can preserve data across applicationrestarts. Note that this data isn’t preservedacross system reboots.

Raw flash partitions Can be used when system RAM isn’tpreserved. This data can then be retrievedwhen the system is restarted.

Flash filesystem As with a raw flash partitions, this data ispreserved across system reboots.

Specialized memorya Can be maintained for limited periods oftime even if system RAM is disabled.Although these do require some power tomaintain, the latency to save and restorepersistent data is much lower than flash.This may be useful where the system mustbe powered down very quickly and will bepowered back up within a reasonably shortperiod of time (e.g. during an enginecrank).

Custom storage Can be implemented using an internalstorage interface in the persistent storagemanager.

November 2, 2006 Chapter 14 � Power Management 317

Page 340: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Application-driven framework 2006, QNX Software Systems GmbH & Co. KG.

a For example, SRAM or a limited set of SDRAM banks that remainin self-refresh mode.

The persistent storage manager can manage multiple storage media,with one of those selected as the currently “active store.” Theunderlying storage is hidden from clients, who simply request to saveor restore their data.

Your power manager controls the choice of media used. Given yourcustom set of available stores for your embedded system, your powermanager policy sets the “active store” based on the nature of thestandby state it is entering before powering down the system.

CPU power managementMany CPUs offer a variety of available operating modes that providedifferent levels of power consumption:

� Frequency or voltage scaling can reduce the power consumption ofthe CPU while maintaining normal instruction execution.

The system as a whole is fully operational; these different modessimply provide a tradeoff between performance and powerconsumption.

� The CPU can be halted, with RAM and peripherals remainingpowered. The CPU will be powered up when an interrupt occurs;the latency to return to normal operation is very short, typicallyonly a few CPU cycles.

The system as a whole is fully operational; this mode simplyreduces the power consumption due to CPU activity. However, thedevice will still draw power for peripherals and RAM refresh.

This mode is generally used by the kernel’s idle loop to reducepower consumption when there are no runnable threads, withoutimpacting the system’s realtime performance in responding toperipheral interrupts.

� The CPU can be placed in a low-power sleep or suspend modewhere RAM is placed into self-refresh mode. The CPU will be

318 Chapter 14 � Power Management November 2, 2006

Page 341: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Application-driven framework

powered up by some configured wakeup source (e.g. an RTC,peripheral interrupt, or GPIO line).

For some system-on-chip processors, this mode may also powerdown on-chip peripherals, further reducing the overall powerconsumption.

The latency to return to normal operation is relatively long becausethe memory controller and on-chip peripherals may need to bereinitialized. However, the system state remains preserved in RAM— this can be a way to significantly reduce system powerconsumption while still providing a relatively low-latency to returnto a fully active system.

� The CPU can be placed in a sleep or suspend mode where RAM ispowered down, along with all on-chip peripherals. The CPU willbe powered up by an appropriately configured wakeup source (e.g.an RTC or GPIO line).

Returning to normal operation is essentially a complete systemreboot, because all system state information preserved in RAM islost. However, this provides the lowest power consumption of all,requiring power for only those components required to wake upthe CPU.

The power manager’s PM policy determines which CPU modes areappropriate for the various system states required for your device.Remember that this policy is user-written — as the embedded systemdesigner, you are in the best position to decide on the optimumbalance between power consumption and performance or latency toreturn to an operational mode.

These CPU modes are highly processor-specific. Also, the nature ofthe wakeup sources depends on the system hardware implementation.For this reason, the implementation is provided by thepower() calloutwithin the BSP-specific startup program, and may also requiresupport from the BSP-specific IPL (if the CPU wakeup causes a warmreboot via the reset vector).

Reference implementations provide the basic CPU-level supportrequired for the modes supported by a particular CPU. You may then

November 2, 2006 Chapter 14 � Power Management 319

Page 342: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Application-driven framework 2006, QNX Software Systems GmbH & Co. KG.

customize this as required to handle board-specific wakeupconfiguration, if necessary.

320 Chapter 14 � Power Management November 2, 2006

Page 343: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 15

Adaptive Partitioning

In this chapter. . .What are partitions? 323Why adaptive? 324Benefits of adaptive partitioning 325Adaptive partitioning scheduler 330

November 2, 2006 Chapter 15 � Adaptive Partitioning 321

Page 344: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 345: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. What are partitions?

QNX Neutrino supportsadaptive partitioning to let you engineer howcompeting processes and threads share system resources.

In order to use adaptive partitioning, you must install the AdaptivePartitioning Technology Development Kit (TDK). For moreinformation, see the Adaptive PartitioningUser’s Guide.

What are partitions?In many computer systems, it’s important to protect differentapplications or groups of applications from others. You don’t wantone application — whether defective or malicious — to corruptanother or prevent it from running.

To address this issue, some systems use virtual walls, calledpartitions, around a set of applications to ensure that each partition isgiven an engineered set of resources. The primary resourceconsidered is CPU time, but can also include any shared resourcesuch as memory and file space (disk or flash).

Partition 2 Partition 3Partition 1

Partition scheduler

50% 20% 30%

Static partitions guarantee that processes get the resources specified by the

system designer.

November 2, 2006 Chapter 15 � Adaptive Partitioning 323

Page 346: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Why adaptive? 2006, QNX Software Systems GmbH & Co. KG.

Partitions provide:

� memory protection — each partition is discrete and controlled bythe Memory Management Unit (MMU)

� overload protection — each partition is guaranteed a slice ofexecution time, as specified by the system designer

By using multiple partitions, you can avoid having a single point offailure. For example, a runaway process can’t tie up the entire system;processes in other partitions still get their allocated share of systemresources.

Neutrino’s process model already provides a lot more protection thansome other operating systems do, including:

� full memory protection between processes

� message-passing to provide uniform and controlled IPC

� priority inheritance with a clean client-server model

� hard realtime deterministic scheduling

� a detailed permission model for devices, files, and memory

� memory, file-descriptor, CPU, and priority limits, using the POSIXsetrlimit() function to constrain runaway processes

The initial version of the Adaptive Partitioning TDK implements thebudgeting of CPU time, via theadaptive partitioning scheduler.Future releases will address budgeting memory and other resources.

Why adaptive?Rigid partitions work best in fairly static systems with little or nodynamic deployment of software. In dynamic systems, staticpartitions can be inefficient. For example, the static division ofexecution time between partitions can waste CPU time and introducedelays:

324 Chapter 15 � Adaptive Partitioning November 2, 2006

Page 347: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Benefits of adaptive partitioning

� If most of the partitions are idle, and one is very busy, the busypartition isn’t given any extra execution time, while backgroundthreads in the other partitions waste CPU time.

� If an interrupt is scheduled for a partition, it has to wait until thepartition runs. This can cause unacceptable latency, especially ifbursts of interrupts occur.

To provide realtime performance with guarantees against overloading,QNX Neutrino introducesadaptive partitioning.

An adaptive partition is a set of threads that work on a common orrelated goal or activity. Like a static partition, an adaptive partitionhas a budget allocated to it that guarantees its minimum share of theCPU’s resources. Unlike a static partition, an adaptive partition:

� isn’t locked to a fixed set of code in a static partition; you candynamically add and configure partitions

� behaves as a global hard realtime scheduler under normal load, butcan still provide minimal interrupt latencies even under overloadconditions

� maximizes the usage of the CPU’s resources, by distributing apartition’s unused budget among partitions that require extraresources when the system isn’t loaded

You can introduce adaptive partitioning without changing — or evenrecompiling — your application code, although you do have torebuild your system’s OS image.

Benefits of adaptive partitioningAdaptive partitioning provides a number of benefits to the design,development, running, and debugging of your system, as described inthe sections that follow.

November 2, 2006 Chapter 15 � Adaptive Partitioning 325

Page 348: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Benefits of adaptive partitioning 2006, QNX Software Systems GmbH & Co. KG.

Engineering your product’s performanceAdaptive partitioning ensures that any free time available in thesystem (i.e. time in a partition’s budget that the partition doesn’tneed) is made available to other partitions. This lets the system handlebursty processing demands that occur in normal system operation.With a cyclic scheduler, there’s a “use it or lose it” approach whereunused CPU time is spent running an idler thread in partitions thatdon’t use their full budget.

Another important feature of adaptive partitioning is the concept ofpartition inheritance. This lets designers develop server processes thatrun with no (or minimal) budget. When the server performs requestsfrom clients, the client partition is billed for the time. Without thisfeature, CPU budget would be allocated to a server regardless of howmuch or often it’s used. The benefits of these features include:

� You don’t have to overengineer the system, so the overall costdecreases.

� If you add an application, you don’t have to reengineer the budgetof common services, such as filesystems or servers.

� The system is faster and more responsive to the user.

� The system guarantees time for important tasks.

� You can use priorities to specify a process’s urgency, and apartition’s CPU budget to specify its importance.

Dealing with design complexityDesigning large-scale distributed systems is inherently complex.Typical systems have a large number of subsystems, processes, andthreads that are developed in isolation from each other. The design isdivided among groups with differing system performance goals,different schemes for determining priorities, and different approachesto runtime optimization.

This can be further compounded by product development in differentgeographic locations and time zones. Once all of these disparate

326 Chapter 15 � Adaptive Partitioning November 2, 2006

Page 349: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Benefits of adaptive partitioning

subsystems are integrated into a common runtime environment, allparts of the system need to provide adequate response under alloperating scenarios such as:

� normal system loading

� peak periods

� failure conditions

Given the parallel development paths, system issues invariably arisewhen integrating the product. Typically, once a system is running,unforeseen interactions that cause serious performance degradationsare uncovered. When situations such as this arise, there are usuallyvery few designers or architects who can diagnose and solve theseproblems at a system level. Solutions often take a lot of tweaking(frequently by trial and error) to get right. This extends systemintegration, impacting the time to market.

Problems of this nature can take a week or more to troubleshoot andseveral weeks to adjust priorities across the system, retest and refine.Product scalability is limited if these problems can’t be solvedeffectively.

This is largely due to the fact that there’s no effective way to “budget”CPU use across these groups. Thread priorities provide a way toensure that critical tasks run, but don’t provide guaranteed CPU timefor important, noncritical tasks, which can be starved in normaloperations. In addition, a common approach to establishing threadpriorities is difficult to scale across a large development team.

Adaptive partitioning lets architects define high-level CPU budgetsper subsystem, allowing development groups to implement their ownpriority schemes and optimizations within a given budget. Thisapproach lets design groups develop subsystems independently andeases the integration effort. The net effect is to improvetime-to-market and facilitate product scaling.

November 2, 2006 Chapter 15 � Adaptive Partitioning 327

Page 350: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Benefits of adaptive partitioning 2006, QNX Software Systems GmbH & Co. KG.

Providing securityMany systems are vulnerable to Denial of Service (DOS) attacks. Forexample, a malicious user could bombard a system with requests thatneed to be processed by one process. When under attack, this processoverloads the CPU and effectively starves the rest of the system.

99%

0%

OperatingSystem

1%

DOSattack!

0%

0% 0%

0%

0%

Without adaptive partitioning, a DOS attack on one process can starve other

critical functions.

Some systems try to overcome this problem by implementing amonitor process that detects CPU utilization and invokes correctiveactions when it deems that a process is using too much CPU. Thisapproach has a number of drawbacks, including:

� Response time is typically slow.

� This approach caps the CPU usage in times when legitimateprocessing is required.

� It isn’t foolproof; it depends on appropriate thread priorities toensure the monitor process gets enough CPU time.

328 Chapter 15 � Adaptive Partitioning November 2, 2006

Page 351: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Benefits of adaptive partitioning

Adaptive partitioning can solve this problem by providing separatebudgets to the system’s various functions. This ensures that thesystem always has some CPU capacity for important tasks. Threadscan change their own priorities, which can be a security hole, but youcan configure the adaptive partitioning scheduler to prevent coderunning in a partition from changing its own budget.

60% 20%

5%

15%

OperatingSystem

DOSattack!

With adaptive partitioning, a DOS attack is contained.

Since adaptive partitioning can allocate any unused CPU time topartitions that need it, it doesn’t unnecessarily cap control-planeactivity when there’s a legitimate need for increased processing.

DebuggingAdaptive partitioning can even make debugging an embedded systemeasier—during development or deployment—by providing an“emergency door” into the system.

Simply create a partition that you can run diagnostic tools in; if youdon’t need to use the partition, the scheduler allocates its budget

November 2, 2006 Chapter 15 � Adaptive Partitioning 329

Page 352: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Adaptive partitioning scheduler 2006, QNX Software Systems GmbH & Co. KG.

among the other partitions. This gives you access to the systemwithout compromising its performance. For more information, see theTesting and Debugging chapter of the Adaptive PartitioningUser’sGuide.

Adaptive partitioning schedulerThe adaptive partitioning scheduler is an optional thread schedulerthat lets you guarantee minimum percentages of the CPU’sthroughput to groups of threads, processes, or applications. Thepercentage of the CPU time allotted to a partition is called abudget.

The adaptive partitioning scheduler has been designed on top of thecore QNX Neutrino architecture primarily to solve these problems inembedded systems design:

� guaranteeing a specified minimum share of CPU time when thesystem is overloaded

� preventing unimportant or untrusted applications frommonopolizing the system

For more information, see the Adaptive PartitioningUser’s Guide.

330 Chapter 15 � Adaptive Partitioning November 2, 2006

Page 353: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Chapter 16

The Photon microGUI

In this chapter. . .A graphical microkernel 333The Photon event space 334Graphics drivers 339Font support 341Unicode multilingual support 342Animation support 343Multimedia support 344Printing support 346The Photon Window Manager 346Widget library 347Driver development kits 364Summary 364

November 2, 2006 Chapter 16 � The Photon microGUI 331

Page 354: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 355: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. A graphical microkernel

A graphical microkernelThis chapter provides an overview of the Photon microGUI, thegraphical environment for QNX Neutrino. For more information, seethe Photon documentation set.

Many embedded systems require a UI so that users can access andcontrol the embedded application. For complex applications, or formaximum ease of use, a graphical windowing system is a naturalchoice. However, the windowing systems on desktop PCs simplyrequire too much in the way of system resources to be practical in anembedded system where memory and cost are limiting factors.

Drawing upon the successful approach of the QNX Neutrinomicrokernel architecture to achieve a POSIX OS environment forembedded systems, we have followed a similar course in creating thePhoton microGUI windowing system.

To implement an effective microkernel OS, we first had to tune themicrokernel so that the kernel calls for IPC were as lean and efficientas possible (since the performance of the whole OS rests on thismessage-based IPC). Using this low-overhead IPC, we were able tostructure a GUI as a graphical “microkernel” process with a team ofcooperating processes around it, communicating via that IPC.

While at first glance this might seem similar to building a graphicalsystem around the classic client/server paradigm already used by theX Window System, the Photon architecture differentiates itself byrestricting the functionality implemented within the graphicalmicrokernel (or server) itself and distributing the bulk of the GUIfunctionality among the cooperating processes.

The Photon microkernel runs as a tiny process, implementing only afew fundamental primitives that external, optional processes use toconstruct the higher-level functionality of a windowing system.Ironically, for the Photon microkernel itself, “windows” do not exist.Nor can the Photon microkernel “draw” anything, or manage a pen,mouse, or keyboard.

November 2, 2006 Chapter 16 � The Photon microGUI 333

Page 356: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

The Photon event space 2006, QNX Software Systems GmbH & Co. KG.

To manage a GUI environment, Photon creates a 3-dimensionaleventspace and confines itself only to handlingregions and processing theclipping and steering of various events as they flow through theregions in this event space.

This abstraction is roughly parallel to the concept of a microkernelOS being incapable of filesystem or device I/O, but relying instead onexternal processes to provide these high-level services. Just as thisallows a microkernel OS to scale up or down in size and functionality,so also a microkernel GUI.

The core microkernel “abstraction” implemented by the Photonmicrokernel is that of a graphical event space that other processes canpopulate with regions. Using native IPC to communicate with thePhoton microkernel, these other processes manipulate their regions toprovide higher-level graphical services or to act as user applications.By removing service-providing processes, Photon can be scaled downfor limited-resource systems; byadding service-providing processes,Photon can be scaled up to full desktop functionality.

The Photon event spaceThe “event space” managed by the Photon microkernel can bevisualized as an empty void with a “root region” at the back. Theend-user can be imagined to be “looking into” this event space fromthe front. Applications place “regions” into the 3-dimensional spacebetween the root region and the end-user; they use those regions togenerate and accept various types of “events” within this space.

Processes that provide device driver services place regions at the frontof the event space. In addition to managing the event space and rootregion, the Photon microkernel projectsdraw events.

334 Chapter 16 � The Photon microGUI November 2, 2006

Page 357: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. The Photon event space

Root

Device

Graphics

Workspace (PWM)

Backdrop (PWM)

Focus (PWM)

Input Group

Pointer/Keyboard

Application

Window (PWM)

Photon regions.

We can think of these events that travel through this space as“photons” (from which this windowing system gets its name). Eventsthemselves consist of a list of rectangles with some attached data. Asthese events flow through the event space, their rectangle lists intersectthe “regions” placed there by various processes (applications).

Events traveling away from the root region of the event space are saidto be traveling outwards (or towards the user), while events from theuser are said to be traveling inwards towards the root region at theback of the event space.

The interaction between events and regions is the basis for the inputand output facilities in Photon. Pen, mouse, and keyboard eventstravel away from the user towards the root plane. Draw events

November 2, 2006 Chapter 16 � The Photon microGUI 335

Page 358: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

The Photon event space 2006, QNX Software Systems GmbH & Co. KG.

originate from regions and travel towards the device plane and theuser.

RegionsRegions are managed in a hierarchy associated as a family ofrectangles that define their location in the 3-dimensional event space.A region also has attributes that define how it interacts with variousclasses of events as they intersect the region. The interactions a regioncan have with events are defined by two bitmasks:

� sensitivity bitmask

� opaque bitmask.

The sensitivity bitmask uses specific event types to define whichintersections the process owning the region wishes to be informed of.A bit in the sensitivity bitmask defines whether or not the region issensitive to each event type. When an event intersects a region forwhich the bit is set, a copy of that event is enqueued to the applicationprocess that owns the region, notifying the application of eventstraveling through the region. This notification doesn’t modify theevent in any way.

The opaque bitmask is used to define which events the region can orcan’t pass through. For each event type, a bit in the opaque maskdefines whether or not the region is opaque or transparent to thatevent. The optical property of “opaqueness” is implemented bymodifying the event itself as it passes through the intersection.

These two bitmasks can be combined to accomplish a variety ofeffects in the event space. The four possible combinations are:

Bitmask combination: Description:

Not sensitive, transparent The event passes through the region, unmodified,without the region owner being notified. The processowning the region simply isn’t interested in the event.

continued. . .

336 Chapter 16 � The Photon microGUI November 2, 2006

Page 359: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. The Photon event space

Bitmask combination: Description:

Not sensitive, opaque The event is clipped by the region as it passes through;the region owner isn’t notified. For example, mostapplications would use this attribute combination fordraw event clipping, so that an application’s windowwouldn’t be overwritten by draw events coming fromunderlying windows.

Sensitive, transparent A copy of the event is sent to the region owner; theevent then continues, unmodified, through the eventspace. A process wishing to log the flow of eventsthrough the event space could use this combination.

Sensitive, opaque A copy of the event is sent to the region owner; theevent is also clipped by the region as it passes through.By setting this bitmask combination, an application canact as an event filter or translator. For every eventreceived, the application can process and regenerate it,arbitrarily transformed in some manner, possiblytraveling in a new direction, and perhaps sourced froma new coordinate in the event space. For example, aregion could absorb pen events, perform handwritingrecognition on those events, and then generate theequivalent keystroke events.

EventsLike regions, events also come in various classes and have variousattributes. An event is defined by:

� an originating region

� a type

� a direction

� an attached list of rectangles

� some event-specific data (optional).

November 2, 2006 Chapter 16 � The Photon microGUI 337

Page 360: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

The Photon event space 2006, QNX Software Systems GmbH & Co. KG.

Unlike most windowing systems, Photon classifiesboth input (pen,mouse, keyboard, etc.) and output (drawing requests) as events.Events can be generated either from the regions that processes haveplaced in the event space or by the Photon microkernel itself. Eventtypes are defined for:

� keystrokes

� pen and mouse button actions

� pen and mouse motion

� region boundary crossings

� expose and covered events

� draw events

� drag events

� drag-and-drop events.

Application processes can either poll for these events, block and waitfor them to occur, or be asynchronously notified of a pending event.

The rectangle list attached to the event can describe one or morerectangular regions, or it can be a “point-source” — a single rectanglewhere the upper-left corner is the same as the lower-right corner.

When an event intersects a region that is opaque to it, that region’srectangle is “clipped out” of the event’s list of rectangles such that thelist describes only the portion of the event that would ultimately bevisible.

The best way to illustrate how this clipping is performed is to examinethe changes in the rectangle list of a draw event as it passes throughvarious intersecting regions. When the draw event is first generated,the rectangle list consists of only a single, simple rectangle describingthe region that the event originated from.

If the event goes through a region that clips the upper-left corner outof the draw event, the rectangle list is modified to contain only the

338 Chapter 16 � The Photon microGUI November 2, 2006

Page 361: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Graphics drivers

two rectangles that would define the area remaining to be drawn.These resulting rectangles are called “tiles.”

Likewise, every time the draw event intersects a region opaque todraw events, the rectangle list will be modified to represent what willremain visible of the draw event after the opaque region has been“clipped out.” Ultimately, when the draw event arrives at a graphicsdriver ready to be drawn, the rectangle list will precisely define onlythe portion of the draw event that is to be visible.

RegionA

RegionB

RegionC

RegionD

Tile1

Tile2

Tile3

Tile4

Event received byRegion D

Opaque regions are clipped out.

If the event is entirely clipped by the intersection of a region, the drawevent will cease to exist. This mechanism of “opaque” windowsmodifying the rectangle list of a draw event is how draw events froman underlying region (and its attached process) are properly clippedfor display as they travel towards the user.

Graphics driversGraphics drivers are implemented as processes that place a region atthe front of the event space. Rather than inject pen, mouse, orkeyboard events (as would an input driver), a graphics driver’s regionis sensitive to draw events coming out of the event space. As drawevents intersect the region, those events are received by the graphicsdriver process. In effect, the region can be imagined to be coated in“phosphor,” which is illuminated by the impact of “photons.”

November 2, 2006 Chapter 16 � The Photon microGUI 339

Page 362: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Graphics drivers 2006, QNX Software Systems GmbH & Co. KG.

Since the Photon drawing API accumulates draw requests into batchesemitted as single draw events, each draw event received by the drivercontains a list of individual graphical primitives to be rendered. Bythe time the draw event intersects the graphics driver region, itsrectangle list will also contain a “clip list” describing exactly whichportions of the draw list are to be rendered to the display. The driver’sjob is to transform this clipped draw list into a visual representationon whatever graphics hardware the driver is controlling.

One advantage of delivering a “clip list” within the event passed to thedriver is that each draw request then represents a significant “batch”of work. As graphics hardware advances, more and more of this“batch” of work can be pushed directly into the graphics hardware.Many display controller chips already handle a single clip rectangle;some handle multiple clip rectangles.

Although using the OS IPC services to pass draw requests from theapplication to the graphics driver may appear to be an unacceptableoverhead, our performance measurements demonstrate that thisimplementation performs as well as having the application makedirect calls into a graphics driver. One reason for such performance isthat multiple draw calls are batched with the event mechanism,allowing the already minimal overhead of our lightweight IPC to beamortized over many draw calls.

Multiple graphics driversSince graphics drivers simply put a region into the Photon eventspace, and since that region describes a rectangle to be intersected bydraw events, it naturally follows that multiple graphics drivers can bestarted for multiple graphics controller cards, all with theirdraw-event-sensitive regions present in the same event space.

These multiple regions could be placed adjacent to each other,describing an array of “drawable” tiles, or overlapped in variousways. Since Photon inherits the OS’s network transparency, Photonapplications or drivers can run on any network node, allowingadditional graphics drivers to extend the graphical space of Photon tothe physical displays of many networked computers. By having the

340 Chapter 16 � The Photon microGUI November 2, 2006

Page 363: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Font support

graphics driver regions overlap, the draw events can be replicatedonto multiple display screens.

Many interesting applications become possible with this capability.For example, a factory operator with a wireless-LAN handheldcomputer could walk up to a workstation and drag a window from aplant control screen onto the handheld, and then walk out onto theplant floor and interact with the control system.

In other environments, an embedded system without a UI couldproject a display onto any network-connected computer. Thisconnectivity also enables useful collaborative modes of work forpeople using computers — a group of people could simultaneouslysee the same application screen and cooperatively operate theapplication.

From the application’s perspective, this looks like a single unifiedgraphical space. From the user’s perspective, this looks like aseamlessly connected set of computers, where windows can bedragged from physical screen to physical screen across network links.

Color modelColors processed by the graphics drivers are defined by a 24-bit RGBquantity (8 bits for each of red, green, and blue), providing a totalrange of 16,777,216 colors. Depending on the actual displayhardware, the driver will either invoke the 24-bit color directly fromthe underlying hardware or map it into the color space supported bythe less-capable hardware.

Since the graphics drivers use a hardware-independent colorrepresentation, applications can be displayed without modificationson hardware that has varied color models. This allows applications tobe “dragged” from screen to screen, without concern for what theunderlying display hardware’s color model might be.

Font supportPhoton uses Bitstream’s Font Fusion object-oriented font engine,which provides developers with full font fidelity and high-quality

November 2, 2006 Chapter 16 � The Photon microGUI 341

Page 364: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Unicode multilingual support 2006, QNX Software Systems GmbH & Co. KG.

typographic output at any resolution on any device, while maintainingthe integrity of the original character shapes.

Photon is shipped with a limited number of TrueType fonts. Theseindustry-standard fonts are readily available from various sources.

Stroke-based fontsTo support Asian languages (e.g. Chinese, Japanese, and Korean),Photon relies on Bitstream’s stroke-based fonts. These high-speedfonts are ideal for memory-constrained environments. For example, acomplete traditional Chinese font with over 13,000 characters canoccupy as much as 8M in a conventional desktop system — astroke-based version of the same font occupies less than 0.5M!

Apart from their compact size and fast rasterization, these fonts arealso fully scalable, which makes them perfect for various nondesktopdisplays such as LCDs, TVs, PDAs, and so on.

Unicode multilingual supportPhoton is designed to handle international characters. Following theUnicode Standard (ISO/IEC 10646), Photon provides developers withthe ability to create applications that can easily support the world’smajor languages and scripts.

Scripts that read from right to left, such as Arabic, aren’t supported atthis time.

Unicode is modeled on the ASCII character set, but uses a 16-bit (or32-bit) encoding to support full multilingual text. There’s no need forescape sequences or control codes when specifying any character inany language. Note that Unicode encoding conveniently treats allcharacters — whether alphabetic, ideographs, or symbols — inexactly the same way.

342 Chapter 16 � The Photon microGUI November 2, 2006

Page 365: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Animation support

UTF-8 encodingFormerly known as UTF-2, the UTF-8 (for “8-bit form”)transformation format is designed to address the use of Unicodecharacter data in 8-bit UNIX environments.

Here are some of the main features of UTF-8:

� Unicode characters from U+0000 to U+007E (ASCII set) map toUTF-8 bytes 00 to 7E (ASCII values).

� ASCII values don’t otherwise occur in a UTF-8 transformation,giving complete compatibility with historical filesystems that parsefor ASCII bytes.

� The first byte indicates the number of bytes to follow in amultibyte sequence, allowing for efficient forward parsing.

� Finding the start of a character from an arbitrary location in a bytestream is efficient, because you need to search at most four bytesbackwards to find an easily recognizable initial byte. For example:isInitialByte = ((byte & 0xC0) != 0x80);

� UTF-8 is reasonably compact in terms of the number of bytes usedfor encoding.

� UTF-8 strings are terminated with a single NULL byte, liketraditional ASCII C strings.

Animation supportPhoton provides flicker-free animations by employing off-screenvideo memory where possible. For instance, a special containerwidget (PtOSContainer) creates a dedicated off-screen memorycontext for drawing images. ThePtOSContainer widget uses ablock of video memory large enough to hold an image the size of itscanvas. (For more information about widgets, see the section “Widgetlibrary” in this chapter.)

Photon’s graphics drivers also maximize the use of off-screen memoryto enhance the perceptual performance of animated images. The

November 2, 2006 Chapter 16 � The Photon microGUI 343

Page 366: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Multimedia support 2006, QNX Software Systems GmbH & Co. KG.

graphics drivers support other advanced techniques, such as directgraphics mode, alpha-blending, chroma-key substitution, and more.

Video overlay

Besides the ability to superimpose a semi-transparent image on top ofa background (alpha-blending) or to place a color-masked foregroundimage on top of a separate background (chroma-key), Photon alsosupportsvideo overlay — a full-motion video image is renderedwithin a window on the display.

Layers

Some display controllers let you transparently overlay multiple“screens” on a single display. Each overlay is called alayer.

You can use layers to combine independent display elements. Sincethe graphics hardware performs the overlaying, this can be moreefficient than rendering all of the display elements onto a single layer.For example, a fast navigational display can have a scrollingnavigational map on a background layer and a web browser or otherpopup GUI element on a foreground layer.

The images on all the active layers of a display are combined, usingalpha-blending, chroma-keying, or both, to produce the final image onthe display.

Multimedia supportMany developers of Photon applications need to targetresource-constrained embedded systems. While conventionalmultimedia solutions rely on powerful CPUs and graphics hardware,Photon’s multimedia support is ideal for embedded systems equippedwith “low-end” processors (e.g. Cyrix MediaGX 200) and limitedsystem RAM (e.g. 16M).

Plugin architecturePhoton provides a suite of multimedia plugins that are as fast and lightas possible, yet able to deliver the functionality of high-end systems.

344 Chapter 16 � The Photon microGUI November 2, 2006

Page 367: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Multimedia support

Using Photon’s extensible plugin architecture and media API,developers can easily integrate their own multimedia components.

To enable embedded systems to play audio/video files withoutdepending on disk-based access, Photon supportsstreamingtechnologies (MPEG-1 System Stream and MPEG-2 ProgramStream).

Media PlayerThis Photon application forms the core of our multimedia support. Itsinterface is simple — the user selects an audio or video file to play,and the Media Player “plays” it. But behind the scenes, the MediaPlayer:

� identifies and loads/unloads the appropriate plugins

� initializes the plugins

� translates user input into plugin commands

� handles plugin-generated events (callbacks)

� identifies minor file types (registry bypass)

� supports playlists

� provides the GUI for QNX plugins

� provides OEM panes for custom plugins.

Media Player plugins

Currently, the multimedia plugins include:

� Audio CD Player — a multitrack plugin that plays audio CDs

� MPEG Audio Player — decodes MPEG-1 audio layers 1, 2, and 3(MP3); also supports MPEG-2 low-bit rate streams and MPEG 2.5audio.

� MPEG Audio/Video Player — decodes MPEG-1 System andMPEG-2 Program streams

November 2, 2006 Chapter 16 � The Photon microGUI 345

Page 368: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Printing support 2006, QNX Software Systems GmbH & Co. KG.

� MPEG Video Player — decodes MPEG-1 video streams

� Soundfile Player — plays simple audio files; supports Wave, AIFF,and other formats.

QNX Software Systems provides licenses for some, but not all, of theabove technologies. For details, contact your sales representative.

Printing supportPhoton provides built-in printing support for a variety of outputs,including:

� bitmap files

� PostScript

� Hewlett-Packard PCL

� Epson ESC/P2

� Epson IJS

� Canon

� Lexmark

Photon also comes with a print-selection widget/convenience dialogto make printing simpler within developers’ own applications.

The Photon Window ManagerAdding a window manager to Photon creates a fully functionaldesktop-class GUI. The window manager is entirely optional and canbe omitted for most classes of embedded systems. If present, thewindow manager allows the user to manipulate application windowsby resizing, moving, and iconifying them.

The window manager is built on the concept of filtering events withadditional regions placed behind application regions, upon which a

346 Chapter 16 � The Photon microGUI November 2, 2006

Page 369: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Widget library

title bar, resize handles, and other elements are drawn and interactedwith. Since the replaceable window manager implements the actual“look and feel” of the environment, various UI flavors can beoptionally installed.

Widget libraryPhoton provides a library of components known aswidgets — objectsthat can manage much of their on-screen behavior automatically,without explicit programming effort. As a result, a completeapplication can be quickly assembled by combining widgets invarious ways and then attaching C code to the appropriate callbacksthe widgets provide.

The Photon Application Builder (PhAB), which is included as part ofthe Photon development system, provides an extensive widget palettein its visual development environment.

Photon provides a wide range of widgets, which can be classified asfollows:

� fundamental widgets (e.g. a button)

� container widgets (e.g. a window widget)

� advanced widgets (e.g. an HTML display widget).

Fundamental widgetsLabel widget ( PtLabel)

The label widget can display bitmaps, images, or textual information.ThePtLabel widget is the superclass for all text-based widgets,providing many customizable attributes (e.g. font typeface, pop-upballoons, colors, borders, alignment, margins, etc.), all of which areinherited by all its subclasses.

November 2, 2006 Chapter 16 � The Photon microGUI 347

Page 370: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Widget library 2006, QNX Software Systems GmbH & Co. KG.

Push-button widget ( PtButton)

Push buttons are a necessary component in every windowing system.They have a raised look that changes to depressed when pushed,giving a visual cue to let the user know the button has been selected.In addition to this visual behavior, push buttons automatically invokean application-defined callback when they’re selected.

Text input widgets ( PtText, PtMultiText)

Photon provides two text-input widgets:

� a simple single-line input widget (PtText) commonly used informs

� a powerful wordprocessor-like multi-line widget (PtMultiText)providing full editing capabilities, word wrapping, automaticscrolling, and multi-font line segments.

Toggle-button widgets ( PtToggleButton)

Toggle buttons are objects that display two states — on or off. Photonprovides various styles of toggle buttons, each with a different visual

348 Chapter 16 � The Photon microGUI November 2, 2006

Page 371: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Widget library

appearance. Toggle buttons are used to display or request stateinformation related to a command or action about to be performed.

Graphical widgets ( PtArc, PtPixel, PtRect, PtLine, PtPolygon,PtEllipse, PtBezier, PtGrid)

Photon has no shortage of graphical widgets. There’s a widget toaccomplish everything from simple lines and rectangles to complexmulti-segmented B́ezier curves. Graphical widgets provide attributesfor color, fills, patterns, line thickness, joins, and much more.

Scrollbar widget ( PtScrollbar)

A scrollbar widget is used to scroll the display of a viewable area.The scrollbar is combined with other widgets (e.g.PtList,PtScrollContainer) to allow scrolling.

November 2, 2006 Chapter 16 � The Photon microGUI 349

Page 372: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Widget library 2006, QNX Software Systems GmbH & Co. KG.

Separator widget ( PtSeparator)

The separator widget is used to separate two or more different areas,such as the menu items as shown in this example. The separator canbe customized for many different styles and looks.

Slider widget ( PtSlider)

Sliders are different from scrollbars. A scrollbar defines a range,whereas a slider definesa single value.

Image widgets ( PtLabel, PtButton)

Photon supports every major graphics file standard, so you can importgraphics and display them inside widgets. Many Photon widgetsdirectly support displaying graphics — the most common arePtButton for making push-button toolbars andPtLabel fordisplaying images.

350 Chapter 16 � The Photon microGUI November 2, 2006

Page 373: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Widget library

Progress-bar widget ( PtProgress)

If an application needs to do something that takes a fair amount oftime (e.g. loading a file), it can use the progress bar widget to let theuser know what’s happening and, more importantly, how much longerthe process is going to take. The progress bar has many attributes forcustomization — it can be horizontal or vertical, it can displayspecific or indeterminate values (both are shown here), etc.

Numeric widgets ( PtNumericInteger, PtNumericFloat)

ThePtNumericInteger class lets the user specify integer valuesbetween given minimum and maximum values. ThePtNumericFloat class lets the user enter floating-point values.

Container widgetsA container widget is a powerful and convenient interface tool thatholds other widgets and manages their layout. Containers are usedextensively in most Photon applications.

November 2, 2006 Chapter 16 � The Photon microGUI 351

Page 374: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Widget library 2006, QNX Software Systems GmbH & Co. KG.

Window widget ( PtWindow)

Windows are the main application containers. The main UIcomponents (menu bars, toolbars, etc.) appear with the windowwidget. The widget automatically handles all the necessaryinteractions with the Photon Window Manager (PWM) — all youneed to specify is what should and shouldn’t be rendered or managed.

Group widget ( PtGroup)

The group widget is a very powerful widget that manages thegeometry of all its child widgets. It can align the widgets horizontally,vertically, or in a matrix. The widget also provides attributes that letyou specify whether the children should be stretched to fit the group ifit’s resized larger due to anchoring.

352 Chapter 16 � The Photon microGUI November 2, 2006

Page 375: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Widget library

Panel group widget ( PtPanelGroup)

The panel group widget is a container for panels, a useful element fororganizing data in dialogs. Besides managing the geometry andlayout of panels,PtPanelGroup also provides two selection modesto switch between panels: multiple-tab selection (each panel has itsown tab to select) and single-tab selection (clicking the tab pops up amenu to select other panels).

Viewport widget ( PtScrollContainer)

ThePtScrollContainer widget is a very powerful widget thatprovides a viewport into a potentially larger container. You can placeany number of widgets inside a scroll container and it willautomatically display a scrollbar if the widgets are contained withinthe viewable area.PtScrollContainer widgets could be used toimplement a text file viewer, wordprocessor, customized list display,and so on.

November 2, 2006 Chapter 16 � The Photon microGUI 353

Page 376: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Widget library 2006, QNX Software Systems GmbH & Co. KG.

To scroll child widgets quickly, the scrolling area widget uses ahardware blitter (provided the underlying graphics driver supports it).

Background widget ( PtBkgd)

The background widget provides a way to create fancy backgrounddisplays, from simple color gradations to tiled textures. This widgetcan handle just about any background requirement.

Advanced widgetsArmed with Photon’s rich set of widgets, developers can buildpractically any graphical application imaginable. Here are some ofthe more powerful widgets at your disposal:

Menu-related widgets ( PtMenu, PtMenuBar, PtMenuButton)

Photon has a widget for every menu-related requirement. There’s awidget to simplify the creation of a standard menu bar. The menuwidget handles the pop-up display, press-drag-release, point and click,keyboard traversal, and selection of menu items. The menu buttonwidget is used for creating individual menu items.

Toolbar widgets ( PtToolbar, PtToolbarGroup)

354 Chapter 16 � The Photon microGUI November 2, 2006

Page 377: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Widget library

This container holds buttons, labels, images, whatever widgets youwish, and aligns them either vertically or horizontally in a toolbar.The toolbar group widget lets you combine a toolbar with a menu barto create a very flexible access element for your applications.

List widget ( PtList)

The list widget is a very powerful widget that manages a list of items.It provides many different selection modes, including single selection,multiple selection and range selection. The list widget also supportsmulti-columned lists through the use of a divider widget(PtDivider).

Pulldown list widget ( PtComboBox)

The pulldown list widget combines thePtText widget (for textinput) with a pulldown button for displaying a list widget. When theuser selects from the list, the text widget is automatically updatedwith the current selection. The pulldown list widget is very useful fordisplaying a list of items using a small space. Dialogs and containersuse a lot less screen real-estate, which is important in embeddedenvironments.

November 2, 2006 Chapter 16 � The Photon microGUI 355

Page 378: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Widget library 2006, QNX Software Systems GmbH & Co. KG.

Tree widget ( PtTree)

The tree widget is similar to the list widget — in fact they both havethe same ancestors. The main difference is that the tree widgetdisplays the items in a hierarchical manner. Items, called branches,can be expanded or collapsed; any number of tree branches can becreated. Each branch can define its own unique image to display.Trees are useful because they display information in a very logicalmanner.

Photon applications that use the tree widget include the File Manager(directory display), PhAB (widget layout), Helpviewer, and manyothers.

356 Chapter 16 � The Photon microGUI November 2, 2006

Page 379: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Widget library

Terminal widgets ( PtTty, PtTerminal)

A terminal widget creates and manages an entire text-mode terminalinside a widget. Just drop it into your application and you’ve createdyour very ownpterm (Photon’s terminal application).

The terminal widget doesn’t stop there — it also provides completecut-and-paste functionality and quick-launch help by highlighting anytext within the widget.

Divider widget ( PtDivider)

This powerful widget manages its children in a unique and usefulway. When you place two or more widgets inside a divider widget, itautomatically puts little separators in between the child widgets.Using these separators, the user can drag back and forth, causing the

November 2, 2006 Chapter 16 � The Photon microGUI 357

Page 380: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Widget library 2006, QNX Software Systems GmbH & Co. KG.

child widgets on either side of the separator to be resized. This is veryuseful for creating resizable column headings for lists. In fact, if youdrop a divider widget into a list widget, it will automatically turn yoursimple list into a resizable multi-column list.

Dividers aren’t limited to just labels or buttons. Any widgets can beplaced inside to create side-by-side resizable trees, scroll areas, and soon.

Trend graph widgets ( PtTrend and PtMTrend)

Realtime systems often require trend graphs. Photon comes with atrend bar widget,PtTrend, that supports the display of multiple trendlines simultaneously. If your graphics hardware supports maskedblits, it can even smooth-scroll the trend across grid lines. ThePtMTrend widget has additional features, such as a trace line, whichmake it appropriate for medical applications.

358 Chapter 16 � The Photon microGUI November 2, 2006

Page 381: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Widget library

Color-selection widgets ( PtColorSel, PtColorPanel,PtColorPatch, PtColorSelGroup,PtColorWell)

Photon provides several handy controls for building color-selectiondialogs. This convenient set of widgets includesPtColorPanel, acompound widget that provides several ways to easily select a color.

November 2, 2006 Chapter 16 � The Photon microGUI 359

Page 382: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Widget library 2006, QNX Software Systems GmbH & Co. KG.

Web client widget ( PtWebClient)

ThePtWebClient widget is used to start, interact, and control a webbrowser. The widget also provides a user-defined area within theapplication for the server to format and display web pages.

The application controls the server by setting widget resources. Theserver communicates status information and user interaction back tothe application using the widget callbacks. ThePtWebClient widgettransparently supports the version of HTML that the server supports.

Convenience functionsOnce a widget has been created, you can take advantage of Photon’sconvenience functions to easily set up dialogs and control the widget.

Here are some examples of common dialogs created using thefollowing convenience functions in Photon’s widget toolkit:

� PtFileSelection() — create a file-selector dialog

360 Chapter 16 � The Photon microGUI November 2, 2006

Page 383: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Widget library

� PtFontSelection() — display a modal dialog for selecting a font

� PtPrintSelection() — display a modal dialog for selecting printoptions

� PtAlert() — display a message and request a response from theuser

� PtNotice() — display a message and wait for acknowledgment bythe user

� PtPrompt() — display a message and get textual input from theuser.

For more information about these functions, see the PhotonLibraryReference.

File-selection dialog ( PtFileSelection())

ThePtFileSelection() function incorporates a tree widget that displaysfiles, directories, links to files or directories, and custom entries.Besides selecting a particular file in response to an applicationprompt, users can also navigate an entire filesystem and choose theirown file and directory.

November 2, 2006 Chapter 16 � The Photon microGUI 361

Page 384: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Widget library 2006, QNX Software Systems GmbH & Co. KG.

Font-selection dialog ( PtFontSelection())

To accommodate the wide selection of fonts available, Photonprovides a handy font-selector. The dialog displays a list of allavailable fonts and allows the user to choose the typeface and style(bold, italic, etc.) and to indicate whether the font should beanti-aliased.

Print-selection dialog ( PtPrintSelection())

The print selector lets a user select a printer or control its properties.The user may also select a range of pages as well as the number ofcopies to print.

362 Chapter 16 � The Photon microGUI November 2, 2006

Page 385: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Widget library

Alert dialog ( PtAlert())

This modal dialog is useful for informing the user of some significantevent and waiting for a response (e.g. clicking a button).

Notice dialog ( PtNotice())

This dialog can display a message that may or may not require aresponse. This type of dialog often contains an “OK” button for theuser to acknowledge the notice.

November 2, 2006 Chapter 16 � The Photon microGUI 363

Page 386: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Driver development kits 2006, QNX Software Systems GmbH & Co. KG.

Prompt dialog ( PtPrompt())

Like the alert dialog, the prompt dialog displays a message thatrequires the user to respond, but it also provides a field for inputtingtext within the dialog.

Driver development kitsAs a product geared towards developers, Photon offers all the toolsneeded to build high-performance, accelerated graphics drivers thatcan be readily tailored for particular graphics cards and chipsets.

Developers will be able to create drivers that support advancedgraphics techniques (e.g. alpha-blending or chroma-key substitution)through a software-only approach, a perfect fallback for “simple”hardware that doesn’t directly handle such techniques.

The Photon graphics driver development kit provides full source codefor several sample drivers as well as detailed instructions for handlingthe hardware-dependent issues involved in developing custom drivers.

SummaryPhoton represents a new approach to GUI building — using amicrokernel and a team of cooperating processes, rather than themonolithic approach typified by other windowing systems. As aresult, Photon exhibits a unique set of capabilities:

364 Chapter 16 � The Photon microGUI November 2, 2006

Page 387: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Summary

� Low memory requirements enable Photon to deliver a high level ofwindowing functionality to environments where only a graphicslibrary might have been allowed within the memory constraints.

� Photon provides a very flexible, user-extensible architecture thatallows developers to extend the GUI in directions unique to theirapplications.

� With flexible cross-platform connectivity, Photon applications canbe used from virtually any connected desktop environment.

November 2, 2006 Chapter 16 � The Photon microGUI 365

Page 388: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 389: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Glossary

November 2, 2006 Glossary 367

Page 390: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded
Page 391: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

A20 gate

On x86-based systems, a hardware component that forces the A20address line on the bus to zero, regardless of the actual setting of theA20 address line on the processor. This component is in place tosupport legacy systems, but the QNX Neutrino OS doesn’t requireany such hardware. Note that some processors, such as the 386EX,have the A20 gate hardware built right into the processor itself — ourIPL will disable the A20 gate as soon as possible after startup.

adaptive

Scheduling algorithm whereby a thread’s priority is decayed by 1. SeealsoFIFO, round robin, andsporadic.

adaptive partitioning

A method of dividing, in a flexible manner, CPU time, memory, fileresources, or kernel resources with some policy of minimumguaranteed usage.

asymmetric multiprocessing (AMP)

A multiprocessing system where a separate OS, or a separateinstantiation of the same OS, runs on each CPU.

atomic

Of or relating to atoms. :-)

In operating systems, this refers to the requirement that an operation,or sequence of operations, be consideredindivisible. For example, athread may need to move a file position to a given location and readdata. These operations must be performed in an atomic manner;otherwise, another thread could preempt the original thread and movethe file position to a different location, thus causing the original threadto read data from the second thread’s position.

November 2, 2006 Glossary 369

Page 392: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

attributes structure

Structure containing information used on a per-resource basis (asopposed to theOCB, which is used on a per-open basis).

This structure is also known as ahandle. The structure definition isfixed (iofunc attr t), but may be extended. See alsomountstructure.

bank-switched

A term indicating that a certain memory component (usually thedevice holding animage) isn’t entirely addressable by the processor.In this case, a hardware component manifests a small portion (or“window”) of the device onto the processor’s address bus. Specialcommands have to be issued to the hardware to move the window todifferent locations in the device. See alsolinearly mapped.

base layer calls

Convenient set of library calls for writing resource managers. Thesecalls all start withresmgr *(). Note that while some base layer callsare unavoidable (e.g.resmgr pathname attach()), we recommend thatyou use thePOSIX layer calls where possible.

BIOS/ROM Monitor extension signature

A certain sequence of bytes indicating to the BIOS or ROM Monitorthat the device is to be considered an “extension” to the BIOS orROM Monitor — control is to be transferred to the device by theBIOS or ROM Monitor, with the expectation that the device willperform additional initializations.

On the x86 architecture, the two bytes0x55and0xAA must be present(in that order) as the first two bytes in the device, with control beingtransferred to offset0x0003.

block-integral

The requirement that data be transferred such that individual structurecomponents are transferred in their entirety — no partial structurecomponent transfers are allowed.

370 Glossary November 2, 2006

Page 393: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

In a resource manager, directory data must be returned to a client asblock-integral data. This means that only completestruct dirent

structures can be returned — it’s inappropriate to return partialstructures, assuming that the nextIO READ request will “pick up”where the previous one left off.

bootable

An image can be either bootable ornonbootable. A bootable image isone that contains the startup code that the IPL can transfer control to.

bootfile

The part of an OS image that runs thestartup code and the Neutrinomicrokernel.

bound multiprocessing (BMP)

A multiprocessing system where a single instantiation of an OSmanages all CPUs simultaneously, but you can lock individualapplications or threads to a specific CPU.

budget

In sporadic scheduling, the amount of time a thread is permitted toexecute at its normal priority before being dropped to its low priority.

buildfile

A text file containing instructions formkifs specifying the contentsand other details of animage, or formkefs specifying the contentsand other details of an embedded filesystem image.

canonical mode

Also called edited mode or “cooked” mode. In this mode thecharacter device library performs line-editing operations on eachreceived character. Only when a line is “completely entered” —typically when a carriage return (CR) is received — will the line ofdata be made available to application processes. Contrastraw mode.

November 2, 2006 Glossary 371

Page 394: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

channel

A kernel object used with message passing.

In QNX Neutrino, message passing is directed towards aconnection(made to a channel); threads can receive messages from channels. Athread that wishes to receive messages creates a channel (usingChannelCreate()), and then receives messages from that channel(usingMsgReceive()). Another thread that wishes to send a messageto the first thread must make a connection to that channel by“attaching” to the channel (usingConnectAttach()) and then sendingdata (usingMsgSend()).

CIFS

Common Internet File System (aka SMB) — a protocol that allows aclient workstation to perform transparent file access over a network toa Windows 95/98/NT server. Client file access calls are converted toCIFS protocol requests and are sent to the server over the network.The server receives the request, performs the actual filesystemoperation, and sends a response back to the client.

CIS

Card Information Structure — a data block that maintains informationabout flash configuration. The CIS description includes the types ofmemory devices in the regions, the physical geometry of thesedevices, and the partitions located on the flash.

combine message

A resource manager message that consists of two or more messages.The messages are constructed as combine messages by the client’s Clibrary (e.g.stat(), readblock()), and then handled as individualmessages by the resource manager.

The purpose of combine messages is to conserve network bandwidthand/or to provide support for atomic operations. See alsoconnectmessage andI/O message.

372 Glossary November 2, 2006

Page 395: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

connect message

In a resource manager, a message issued by the client to perform anoperation based on a pathname (e.g. anio open message).Depending on the type of connect message sent, a context block (seeOCB) may be associated with the request and will be passed tosubsequent I/O messages. See alsocombine message andI/Omessage.

connection

A kernel object used with message passing.

Connections are created by client threads to “connect” to the channelsmade available by servers. Once connections are established, clientscanMsgSendv() messages over them. If a number of threads in aprocess all attach to the same channel, then the one connection isshared among all the threads. Channels and connections are identifiedwithin a process by a small integer.

The key thing to note is that connections and file descriptors (FD) areone and the same object. See alsochannel andFD.

context

Information retained between invocations of functionality.

When using a resource manager, the client sets up an association orcontext within the resource manager by issuing anopen() call andgetting back a file descriptor. The resource manager is responsible forstoring the information required by the context (seeOCB). When theclient issues further file-descriptor based messages, the resourcemanager uses the OCB to determine the context for interpretation ofthe client’s messages.

cooked mode

Seecanonical mode.

November 2, 2006 Glossary 373

Page 396: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

core dump

A file describing the state of a process that terminated abnormally.

critical section

A code passage thatmust be executed “serially” (i.e. by only onethread at a time). The simplest from of critical section enforcement isvia amutex.

deadlock

A condition in which one or more threads are unable to continue dueto resource contention. A common form of deadlock can occur whenone thread sends a message to another, while the other thread sends amessage to the first. Both threads are now waiting for each other toreply to the message. Deadlock can be avoided by good designpractices or massive kludges — we recommend the good designapproach.

device driver

A process that allows the OS and application programs to make use ofthe underlying hardware in a generic way (e.g. a disk drive, a networkinterface). Unlike OSs that require device drivers to be tightly boundinto the OS itself, device drivers for QNX Neutrino are standardprocesses that can be started and stopped dynamically. As a result,adding device drivers doesn’t affect any other part of the OS —drivers can be developed and debugged like any other application.Also, device drivers are in their own protected address space, so a bugin a device driver won’t cause the entire OS to shut down.

discrete (or traditional) multiprocessor system

A system that has separate physical processors hooked up inmultiprocessing mode over a board-level bus.

DNS

Domain Name Service — an Internet protocol used to convert ASCIIdomain names into IP addresses. In QNX native networking,dns isone ofQnet’s builtin resolvers.

374 Glossary November 2, 2006

Page 397: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

dynamic bootfile

An OS image built on the fly. Contraststatic bootfile.

dynamic linking

The process whereby you link your modules in such a way that theProcess Manager will link them to the library modules before yourprogram runs. The word “dynamic” here means that the associationbetween your program and the library modules that it uses is doneatload time, not at linktime. Contraststatic linking. See alsoruntimeloading.

edge-sensitive

One of two ways in which aPIC (Programmable Interrupt Controller)can be programmed to respond to interrupts. In edge-sensitive mode,the interrupt is “noticed” upon a transition to/from the rising/fallingedge of a pulse. Contrastlevel-sensitive.

edited mode

Seecanonical mode.

EOI

End Of Interrupt — a command that the OS sends to the PIC afterprocessing all Interrupt Service Routines (ISR) for that particularinterrupt source so that the PIC can reset the processor’s In ServiceRegister. See alsoPIC andISR.

EPROM

Erasable Programmable Read-Only Memory — a memorytechnology that allows the device to be programmed (typically withhigher-than-operating voltages, e.g. 12V), with the characteristic thatany bit (or bits) may be individually programmed from a1 state to a0state. To change a bit from a0 state into a1 state can only beaccomplished by erasing theentire device, settingall of the bits to a1state. Erasing is accomplished by shining an ultraviolet light throughthe erase window of the device for a fixed period of time (typically

November 2, 2006 Glossary 375

Page 398: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

10-20 minutes). The device is further characterized by having alimited number of erase cycles (typically 10e5 - 10e6). ContrastflashandRAM.

event

A notification scheme used to inform a thread that a particularcondition has occurred. Events can be signals or pulses in the generalcase; they can also be unblocking events or interrupt events in thecase of kernel timeouts and interrupt service routines. An event isdelivered by a thread, a timer, the kernel, or an interrupt serviceroutine when appropriate to the requestor of the event.

FD

File Descriptor — a client must open a file descriptor to a resourcemanager via theopen() function call. The file descriptor then servesas a handle for the client to use in subsequent messages. Note that afile descriptor is the exact same object as a connection ID (coid,returned byConnectAttach()).

FIFO

First In First Out — a scheduling algorithm whereby a thread is ableto consume CPU at its priority level without bounds. See alsoadaptive, round robin, andsporadic.

flash memory

A memory technology similar in characteristics toEPROM memory,with the exception that erasing is performed electrically instead of viaultraviolet light, and, depending upon the organization of the flashmemory device, erasing may be accomplished in blocks (typically64k bytes at a time) instead of the entire device. ContrastEPROMandRAM.

FQNN

Fully Qualified NodeName — a unique name that identifies a QNXNeutrino node on a network. The FQNN consists of the nodenameplus the node domain tacked together.

376 Glossary November 2, 2006

Page 399: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

garbage collection

Aka space reclamation, the process whereby a filesystem managerrecovers the space occupied by deleted files and directories.

HA

High Availability — in telecommunications and other industries, HAdescribes a system’s ability to remain up and running withoutinterruption for extended periods of time.

handle

A pointer that the resource manager base library binds to thepathname registered viaresmgr attach(). This handle is typically usedto associate some kind of per-device information. Note that if you usetheiofunc *() POSIX layer calls, you must use a particulartype ofhandle — in this case called anattributes structure.

hard thread affinity

A user-specified binding of a thread to a set of processors, done bymeans of arunmask. Contrastsoft thread affinity.

image

In the context of embedded QNX Neutrino systems, an “image” canmean either a structure that contains files (i.e. an OS image) or astructure that can be used in a read-only, read/write, orread/write/reclaim FFS-2-compatible filesystem (i.e. a flashfilesystem image).

inherit mask

A bitmask that specifies which processors a thread’s children can runon. Contrastrunmask.

interrupt

An event (usually caused by hardware) that interrupts whatever theprocessor was doing and asks it do something else. The hardware willgenerate an interrupt whenever it has reached some state wheresoftware intervention is required.

November 2, 2006 Glossary 377

Page 400: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

interrupt handler

SeeISR.

interrupt latency

The amount of elapsed time between the generation of a hardwareinterrupt and the first instruction executed by the relevant interruptservice routine. Also designated as “Til ”. Contrastschedulinglatency.

interrupt service routine

SeeISR.

interrupt service thread

A thread that is responsible for performing thread-level servicing ofan interrupt.

Since anISR can call only a very limited number of functions, andsince the amount of time spent in an ISR should be kept to aminimum, generally the bulk of the interrupt servicing work shouldbe done by a thread. The thread attaches the interrupt (viaInterruptAttach() or InterruptAttachEvent()) and then blocks (viaInterruptWait()), waiting for the ISR to tell it to do something (byreturning an event of typeSIGEV INTR). To aid in minimizingscheduling latency, the interrupt service thread should raise itspriority appropriately.

I/O message

A message that relies on an existing binding between the client andthe resource manager. For example, anIO READ message dependson the client’s having previously established an association (orcontext) with the resource manager by issuing anopen() and gettingback a file descriptor. See alsoconnect message, context, combinemessage, andmessage.

378 Glossary November 2, 2006

Page 401: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

I/O privileges

A particular right, that, if enabled for a given thread, allows the threadto perform I/O instructions (such as the x86 assemblerin andoutinstructions). By default, I/O privileges are disabled, because aprogram with it enabled can wreak havoc on a system. To enable I/Oprivileges, the thread must be running asroot, and callThreadCtl().

IPC

Interprocess Communication — the ability for two processes (orthreads) to communicate. QNX Neutrino offers several forms of IPC,most notably native messaging (synchronous, client/serverrelationship), POSIX message queues and pipes (asynchronous), aswell as signals.

IPL

Initial Program Loader — the software component that either takescontrol at the processor’s reset vector (e.g. location0xFFFFFFF0onthe x86), or is a BIOS extension. This component is responsible forsetting up the machine into a usable state, such that the startupprogram can then perform further initializations. The IPL is written inassembler and C. See alsoBIOS extension signature andstartupcode.

IRQ

Interrupt Request — a hardware request line asserted by a peripheralto indicate that it requires servicing by software. The IRQ is handledby thePIC, which then interrupts the processor, usually causing theprocessor to execute anInterrupt Service Routine (ISR).

ISR

Interrupt Service Routine — a routine responsible for servicinghardware (e.g. reading and/or writing some device ports), forupdating some data structures shared between the ISR and thethread(s) running in the application, and for signalling the thread thatsome kind of event has occurred.

November 2, 2006 Glossary 379

Page 402: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

kernel

Seemicrokernel.

level-sensitive

One of two ways in which aPIC (Programmable Interrupt Controller)can be programmed to respond to interrupts. If the PIC is operating inlevel-sensitive mode, the IRQ is considered active whenever thecorresponding hardware line is active. Contrastedge-sensitive.

linearly mapped

A term indicating that a certain memory component is entirelyaddressable by the processor. Contrastbank-switched.

message

A parcel of bytes passed from one process to another. The OSattaches no special meaning to the content of a message — the data ina message has meaning for the sender of the message and for itsreceiver, but for no one else.

Message passing not only allows processes to pass data to each other,but also provides a means of synchronizing the execution of severalprocesses. As they send, receive, and reply to messages, processesundergo various “changes of state” that affect when, and for howlong, they may run.

microkernel

A part of the operating system that provides the minimal servicesused by a team of optional cooperating processes, which in turnprovide the higher-level OS functionality. The microkernel itself lacksfilesystems and many other services normally expected of an OS;those services are provided by optional processes.

mount structure

An optional, well-defined data structure (of typeiofunc mount t)within aniofunc *() structure, which contains information used on a

380 Glossary November 2, 2006

Page 403: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

per-mountpoint basis (generally used only for filesystem resourcemanagers). See alsoattributes structure andOCB.

mountpoint

The location in the pathname space where a resource manager has“registered” itself. For example, the serial port resource managerregisters mountpoints for each serial device (/dev/ser1,/dev/ser2, etc.), and a CD-ROM filesystem may register a singlemountpoint of/cdrom.

multicore system

A chip that has one physical processor with multiple CPUsinterconnected over a chip-level bus.

mutex

Mutual exclusion lock, a simple synchronization service used toensure exclusive access to data shared between threads. It is typicallyacquired (pthread mutex lock()) and released(pthread mutex unlock()) around the code that accesses the shareddata (usually acritical section). See alsocritical section.

name resolution

In a QNX Neutrino network, the process by which theQnet networkmanager converts anFQNN to a list of destination addresses that thetransport layer knows how to get to.

name resolver

Program code that attempts to convert anFQNN to a destinationaddress.

NDP

Node Discovery Protocol — proprietary QNX Software Systemsprotocol for broadcasting name resolution requests on a QNXNeutrino LAN.

November 2, 2006 Glossary 381

Page 404: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

network directory

A directory in the pathname space that’s implemented by theQnetnetwork manager.

Neutrino

Name of an OS developed by QNX Software Systems.

NFS

Network FileSystem — a TCP/IP application that lets you graftremote filesystems (or portions of them) onto your local namespace.Directories on the remote systems appear as part of your localfilesystem and all the utilities you use for listing and managing files(e.g.ls, cp, mv) operate on the remote files exactly as they do onyour local files.

NMI

Nonmaskable Interrupt — an interrupt that can’t be masked by theprocessor. We don’t recommend using an NMI!

Node Discovery Protocol

SeeNDP.

node domain

A character string that theQnet network manager tacks onto thenodename to form anFQNN.

nodename

A unique name consisting of a character string that identifies a nodeon a network.

nonbootable

A nonbootable OS image is usually provided for larger embeddedsystems or for small embedded systems where a separate,configuration-dependent setup may be required. Think of it as asecond “filesystem” that has some additional files on it. Since it’s

382 Glossary November 2, 2006

Page 405: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

nonbootable, it typically won’t contain the OS, startup file, etc.Contrastbootable.

OCB

Open Control Block (or Open Context Block) — a block of dataestablished by a resource manager during its handling of the client’sopen() function. This context block is bound by the resource managerto this particular request, and is then automatically passed to allsubsequent I/O functions generated by the client on the file descriptorreturned by the client’sopen().

package filesystem

A virtual filesystem manager that presents a customized view of a setof files and directories to a client. The “real” files are present on somemedium; the package filesystem presents a virtual view of selectedfiles to the client.

partition

A division of CPU time, memory, file resources, or kernel resourceswith some policy of minimum guaranteed usage.

pathname prefix

Seemountpoint.

pathname space mapping

The process whereby the Process Manager maintains an associationbetween resource managers and entries in the pathname space.

persistent

When applied to storage media, the ability for the medium to retaininformation across a power-cycle. For example, a hard disk is apersistent storage medium, whereas a ramdisk is not, because the datais lost when power is lost.

November 2, 2006 Glossary 383

Page 406: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

Photon microGUI

The proprietary graphical user interface built by QNX SoftwareSystems.

PIC

Programmable Interrupt Controller — hardware component thathandles IRQs. See alsoedge-sensitive, level-sensitive, andISR.

PID

Process ID. Also oftenpid (e.g. as an argument in a function call).

POSIX

An IEEE/ISO standard. The term is an acronym (of sorts) for PortableOperating System Interface — the “X” alludes to “UNIX”, on whichthe interface is based.

POSIX layer calls

Convenient set of library calls for writing resource managers. ThePOSIX layer calls can handle even more of the common-casemessages and functions than thebase layer calls. These calls areidentified by theiofunc *() prefix. In order to use these (and westrongly recommend that you do), you must also use the well-definedPOSIX-layerattributes (iofunc attr t), OCB (iofunc ocb t),and (optionally)mount (iofunc mount t) structures.

preemption

The act of suspending the execution of one thread and starting (orresuming) another. The suspended thread is said to have been“preempted” by the new thread. Whenever a lower-priority thread isactively consuming the CPU, and a higher-priority thread becomesREADY, the lower-priority thread is immediately preempted by thehigher-priority thread.

384 Glossary November 2, 2006

Page 407: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

prefix tree

The internal representation used by the Process Manager to store thepathname table.

priority inheritance

The characteristic of a thread that causes its priority to be raised orlowered to that of the thread that sent it a message. Also used withmutexes. Priority inheritance is a method used to preventpriorityinversion.

priority inversion

A condition that can occur when a low-priority thread consumes CPUat a higher priority than it should. This can be caused by notsupporting priority inheritance, such that when the lower-prioritythread sends a message to a higher-priority thread, the higher-prioritythread consumes CPUon behalf of the lower-priority thread. This issolved by having the higher-priority thread inherit the priority of thethread on whose behalf it’s working.

process

A nonschedulable entity, which defines the address space and a fewdata areas. A process must have at least onethread running in it —this thread is then called the first thread.

process group

A collection of processes that permits the signalling of relatedprocesses. Each process in the system is a member of a process groupidentified by a process group ID. A newly created process joins theprocess group of its creator.

process group ID

The unique identifier representing a process group during its lifetime.A process group ID is a positive integer. The system may reuse aprocess group ID after the process group dies.

November 2, 2006 Glossary 385

Page 408: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

process group leader

A process whose ID is the same as its process group ID.

process ID (PID)

The unique identifier representing a process. A PID is a positiveinteger. The system may reuse a process ID after the process dies,provided no existing process group has the same ID. Only the ProcessManager can have a process ID of 1.

pty

Pseudo-TTY — a character-based device that has two “ends”: amaster end and a slave end. Data written to the master end shows upon the slave end, and vice versa. These devices are typically used tointerface between a program that expects a character device andanother program that wishes to use that device (e.g. the shell and thetelnet daemon process, used for logging in to a system over theInternet).

pulses

In addition to the synchronous Send/Receive/Reply services, QNXNeutrino also supports fixed-size, nonblocking messages known aspulses. These carry a small payload (four bytes of data plus a singlebyte code). A pulse is also one form ofevent that can be returnedfrom an ISR or a timer. SeeMsgDeliverEvent() for more information.

Qnet

The native network manager in QNX Neutrino.

QoS

Quality of Service — a policy (e.g.loadbalance) used to connectnodes in a network in order to ensure highly dependable transmission.QoS is an issue that often arises in high-availability (HA) networks aswell as realtime control systems.

386 Glossary November 2, 2006

Page 409: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

RAM

Random Access Memory — a memory technology characterized bythe ability to read and write any location in the device withoutlimitation. Contrastflash andEPROM.

raw mode

In raw input mode, the character device library performs no editing onreceived characters. This reduces the processing done on eachcharacter to a minimum and provides the highest performanceinterface for reading data. Also, raw mode is used with devices thattypically generate binary data — you don’t want any translations ofthe raw binary stream between the device and the application.Contrastcanonical mode.

replenishment

In sporadic scheduling, the period of time during which a thread isallowed to consume its executionbudget.

reset vector

The address at which the processor begins executing instructions afterthe processor’s reset line has been activated. On the x86, for example,this is the address0xFFFFFFF0.

resource manager

A user-level server program that accepts messages from otherprograms and, optionally, communicates with hardware. QNXNeutrino resource managers are responsible for presenting aninterface to various types of devices, whether actual (e.g. serial ports,parallel ports, network cards, disk drives) or virtual (e.g./dev/null,a network filesystem, and pseudo-ttys).

In other operating systems, this functionality is traditionallyassociated withdevice drivers. But unlike device drivers, QNXNeutrino resource managers don’t require any special arrangementswith the kernel. In fact, a resource manager looks just like any otheruser-level program. See alsodevice driver.

November 2, 2006 Glossary 387

Page 410: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

RMA

Rate Monotonic Analysis — a set of methods used to specify,analyze, and predict the timing behavior of realtime systems.

round robin

Scheduling algorithm whereby a thread is given a certain period oftime to run. Should the thread consume CPU for the entire period ofits timeslice, the thread will be placed at the end of the ready queuefor its priority, and the next available thread will be made READY. Ifa thread is the only thread READY at its priority level, it will be ableto consume CPU again immediately. See alsoadaptive, FIFO, andsporadic.

runmask

A bitmask that indicates which processors a thread can run on.Contrastinherit mask.

runtime loading

The process whereby a program decideswhile it’s actually runningthat it wishes to load a particular function from a library. Contraststatic linking.

scheduling latency

The amount of time that elapses between the point when one threadmakes another thread READY and when the other thread actually getssome CPU time. Note that this latency is almost always at the controlof the system designer.

Also designated as “Tsl”. Contrastinterrupt latency.

session

A collection of process groups established for job control purposes.Each process group is a member of a session. A process belongs tothe session that its process group belongs to. A newly created processjoins the session of its creator. A process can alter its session

388 Glossary November 2, 2006

Page 411: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

membership viasetsid(). A session can contain multiple processgroups.

session leader

A process whose death causes all processes within its process groupto receive a SIGHUP signal.

soft thread affinity

The scheme whereby the microkernel tries to dispatch a thread to theprocessor where it last ran, in an attempt to reduce thread migrationfrom one processor to another, which can affect cache performance.Contrasthard thread affinity.

software interrupts

Similar to a hardware interrupt (seeinterrupt), except that the sourceof the interrupt is software.

sporadic

Scheduling algorithm whereby a thread’s priority can oscillatedynamically between a “foreground” or normal priority and a“background” or low priority. A thread is given an executionbudgetof time to be consumed within a certainreplenishment period. Seealsoadaptive, FIFO, andround robin.

startup code

The software component that gains control after the IPL code hasperformed the minimum necessary amount of initialization. Aftergathering information about the system, the startup code transferscontrol to the OS.

static bootfile

An image created at one time and then transmitted whenever a nodeboots. Contrastdynamic bootfile.

November 2, 2006 Glossary 389

Page 412: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG.

static linking

The process whereby you combine your modules with the modulesfrom the library to form a single executable that’s entirelyself-contained. The word “static” implies that it’s not going to change— all the required modules are already combined into one.

symmetric multiprocessing (SMP)

A multiprocessor system where a single instantiation of an OSmanages all CPUs simultaneously, and applications can float to any ofthem.

system page area

An area in the kernel that is filled by the startup code and containsinformation about the system (number of bytes of memory, locationof serial ports, etc.) This is also called the SYSPAGE area.

thread

The schedulable entity under QNX Neutrino. A thread is a flow ofexecution; it exists within the context of aprocess.

timer

A kernel object used in conjunction with time-based functions. Atimer is created viatimer create() and armed viatimer settime(). Atimer can then deliver anevent, either periodically or on a one-shotbasis.

timeslice

A period of time assigned to around-robin or adaptive scheduledthread. This period of time is small (on the order of tens ofmilliseconds); the actual value shouldn’t be relied upon by anyprogram (it’s considered bad design).

390 Glossary November 2, 2006

Page 413: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Index

!

.longfilenames 201/ directory 151/dev

hd* 197mem 151mq andmqueue 82pmm 312ser* 225shmem 202zero 151

/net directory 251/proc

boot 151ham 294pid 151

/tmp directory 202

A

abort() 79accept() 271actions (HA) 290

adaptive partitioning 323debugging with 329partitions 324scheduler 330

affinity, processor 127, 130alarm() 98Asymmetric Multiprocessing

(AMP) 121, 122atomic operations 44, 54attributes structure (resource

manager) 188autoconnect 275AutoIP 274

B

background priority (sporadicscheduling) 37

barriers 21, 44and threads 48

bind() 271bindresvport() 271block-oriented devices 225boot processor 125

November 2, 2006 Index 391

Page 414: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Index 2006, QNX Software Systems GmbH & Co. KG.

Bound Multiprocessing(BMP) 121, 130

budgetsCPU 323, 330file space (not

implemented) 323memory (not

implemented) 323

C

canonical input mode 232cd command 158CD-ROM filesystem 212CGI (Common Gateway

Interface) 277ChannelCreate() 65ChannelDestroy() 65channels 65, 66character devices 225chmod() 190chown() 190CIFS filesystem 219clipping (Photon) 338clock services 94ClockAdjust() 95, 96ClockCycles() 95, 96clock getcpuclockid() 95clock getres() 95clock gettime() 95ClockId() 95ClockPeriod() 95clock settime() 95ClockTime() 95close() 86, 190

COFF (Common Object FileFormat) 168

color model (Photon) 341combine messages 187Common Gateway Interface

(CGI) 277conditions (HA entity states) 288CONDVAR (thread state) 31condvars 21, 42, 44, 46

example 47operations 46SMP 128

confstr() 255connect messages 184connect() 271ConnectAttach() 65ConnectDetach() 65consoles

physical 234virtual 234

convenience functions (Photonwidget toolkit) 360

conventions, typographical xixcooked input mode 232cooperating processes

FIFOs 94pipes 93

CPUaffinity 127, 130usage, budgeting 323, 330

CRC 206critical section 45–47, 53, 100

defined 42SMP 128

CS DOMAIN 256CS HOSTNAME 255

392 Index November 2, 2006

Page 415: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Index

D

data server 277DEAD (thread state) 31deadlock-free systems, rules for 69debugging, using adaptive partitions

for 329design goals for QNX Neutrino 3,

20devctl() 190, 228device control 228device drivers See also resource

managersno need to link into kernel 105similarity to standard

processes 13device names, creating 157directories, changing 158discrete multiprocessors 121disks

DOS disks, accessing 209partitions 198

dladdr() 172dlclose() 172dlopen() 171, 172dlsym() 171dn comp() 271dn expand() 271domains of authority 150DOS filesystem manager 209draw events (Photon) 334dup() 160, 161, 185, 186dup2() 160dynamic interpreter 169dynamic linking 165

E

edited input mode 232editing capabilities (io-char) 232ELF format 143, 167EMANATE, EMANATE/Lite 276Embedded Transaction Filesystem

(ETFS) 203embedded web server 276endprotoent() 271endservent() 271entities (HA process) 287environment variables

LD LIBRARY PATH 170RESCONF 271

ETFS 203event space (Photon) 334events 71

“unblock” 72draw (Photon) 334instrumented kernel 111

exclusive (QoS policy) 257exec*() functions 138, 142Executable and Linking FormatSee

ELFexecutable, partially bound 166Ext2 filesystem 219extensibility of OS 12extensions to OS

user-written, won’t affectreliability of core OS 8

F

fast emitting mode (instrumentedkernel) 113

November 2, 2006 Index 393

Page 416: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Index 2006, QNX Software Systems GmbH & Co. KG.

fcntl() 160FIFO (scheduling method) 34, 35,

44FIFOs 94.See also pipes

creating 94removing 94

file descriptors (FDs)duplicating 160inheritance 160, 161open control blocks

(OCBs) 158several FDs referring to the same

OCB 160files

DOS files, operating on 209FIFOs 94opened by different

processes 159opened twice by same

process 159pipes 93space, budgeting (not

implemented) 323filesystems

accessing a filesystem onanother node 156

CD-ROM 212CIFS 219DOS 209Embedded Transaction

(ETFS) 203HAM 294Image 201Linux Ext2 219NFS 218QNX 151, 208RAM 202

seek points 159five nines (HA metric) 281Flash 143foreground priority (sporadic

scheduling) 37fork() 138, 141, 142

POSIX process model and 8fpathconf() 190FQNN (fully qualified node

name) 256fs-cd.so 212fs-cifs 219fs-dos.so 209fs-ext2.so 219fs-nfs2, fs-nfs3 218fs-qnx4.so 151, 208fseek() 190fstat() 190ftruncate() 87

G

gethostbyaddr() 271gethostbyname() 271getpeername() 271getprotobyname() 271getprotobynumber() 271getprotoent() 271getservbyname() 271getservent() 271getsockname() 271getsockopt() 271global list 172GNS (Global Name Service) 255graphical microkernel 333

394 Index November 2, 2006

Page 417: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Index

graphics drivers 339multiple 340

Guardian (HAM “stand-in”) 287

H

HA 281–300client-side library 283microkernel architecture

inherently suited for 282recovery example 284

HAM 286API 295hierarchy 287

h errlist() 271h errno() 271herror() 271high availability See HAHigh Availability Manager See

HAMh nerr() 271hstrerror() 271htonl() 271htons() 271Hyper-Threading 124

I

I/O messages 184I/O resources 150i8259 interrupt control

hardware 105idle thread 33, 107Image filesystem 201

inet addr() 271inet aton() 271inet lnaof() 271inet makeaddr() 271inet netof() 271inet network() 271inet ntoa() 271inheritance structure 131initial budget (sporadic

scheduling) 37input mode

edited 232raw 229

input, redirecting 93instrumentation

interrupts can be traced 111kernel can be used in prototypes

or final products 111works on SMP systems 111

interprocess communicationSeeIPC

interprocessor interrupts (IPIs) 128INTERRUPT (thread state) 31interrupt control hardware (i8259 on

a PC) 105interrupt handlers 21, .See also

ISR 99SMP 128

interrupt latency 100Interrupt Service Routine See ISRInterruptAttach() 102InterruptAttachEvent() 102InterruptDetach() 102InterruptDisable() 102

problem on SMP systems 128InterruptEnable() 102

November 2, 2006 Index 395

Page 418: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Index 2006, QNX Software Systems GmbH & Co. KG.

problem on on SMPsystems 128

InterruptHookIdle() 107InterruptHookTrace() 107InterruptLock() 102, 129InterruptMask() 102

problem on SMP systems 129interrupts

nested 101priorities 101

InterruptUnlock() 102, 129InterruptUnmask() 102

problem on SMP systems 129InterruptWait() 29, 102intr timed wait() 98io-net 241ioctl() 271iofunc *() shared library 188ionotify() 73IO STAT 180

IP filtering 273IPC 13

forms of 56term qualified to apply to

“threads” 26ISR 103, .See also interrupt

handlers 106attaching to PC timer

interrupt 103

J

JOIN (thread state) 31

K

kill() 74

L

languages 342latency

interrupt 100, 105scheduling 100, 105

layers, display 344LD LIBRARY PATH 170libpm 304libpmm 305libps 304link() 209linking

dynamically 165sections 167statically 165

Linux Ext2 filesystem 219listen() 271loadbalance (QoS policy) 257lock() 190lseek() 190

M

malloc() 165MAP ANON 89MAP BELOW16M 89MAP FIXED 89MAP NOINIT 89, 92MAP NOX64K 89

396 Index November 2, 2006

Page 419: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Index

MAP PHYS 89MAP PRIVATE 89MAP SHARED 89memory

budgeting (notimplemented) 323

initializing 92protection, advantage of for

embedded systems 144shared 42, 53, 63, 84–91

Memory Management Units(MMUs) 144

message copying 60message passing 13, 21, 57

API 68as means of

synchronization 14network-wide 249

message queues 81messages

contents of 14multipart 60, 61tend to be tiny 42

microkernel 19comparable to a realtime

executive 8defined 8general responsibilities 11instrumentation 111instrumented 111locking 127managing team of cooperating

processes 10modularity as key aspect 8services 11SMP 124, 126

mkfifo 94

mkfifo() 94mmap() 86, 87, 92, 190MMU 62, 63, 145mount structure (resource

manager) 188mountpoints 150, 193mprotect() 86, 91mq server 81mq close() 82, 83mq getattr() 83mq notify() 73, 83mq open() 82, 83mq receive() 82, 83mq send() 82, 83mq setattr() 83mqueue resource manager 81mq unlink() 82, 83MsgDeliverEvent() 30, 68, 72MsgError() 59, 60MsgInfo() 68MsgKeyData() 68MsgRead() 68MsgReadv() 64MsgReceive() 30, 57, 68MsgReceivePulse() 68MsgReceivePulsev() 64MsgReceivev() 64MsgReply() 57, 60, 64, 68MsgReply*() 30MsgReplyv() 64MsgSend() 30, 57, 60, 64, 68MsgSendPulse() 30, 68MsgSendsv() 64MsgSendv() 64, 137MsgSendvs() 64MsgWrite() 68msync() 86

November 2, 2006 Index 397

Page 420: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Index 2006, QNX Software Systems GmbH & Co. KG.

multicore processors 121multiprocessing 121munmap() 86, 92munmap flags() 92MUTEX (thread state) 31mutexes 21, 42, 44, 45

attributes 46priority inheritance 46SMP 128

N

name resolutionnetwork 252

name resolver 256NAND flash 203NANOSLEEP (thread state) 31nanosleep() 30, 98NAT 273ND LOCAL NODE 252nested interrupts 101NET REPLY (thread state) 31NET SEND (thread state) 31network

as homogeneous set ofresources 14

Driver Development Kit(DDK) 245

flexibility 15message passing 249name resolution 252pathnames 158protocol

Qnet 249transparency 15, 182

Neutrinomicrokernel 8

NFS filesystem 218NMI 107node descriptor

network 252node domain 256node name 255non-Ethernet interconnect 263npm-qnet 256npm-tcpip 269ntohl() 271ntohs() 271NTO TCTL RUNMASK 131NTO TCTL RUNMASK GET AND SET INHERIT

131NTP 274

O

object files, sections of 167on 131opaque bitmask (Photon) 336open control blocks (OCBs) 158,

159open resources

active information contained inOCBs 159

open() 86, 190operations, atomic 54output, redirecting 93

398 Index November 2, 2006

Page 421: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Index

P

pages 144parallel devices 236partially bound executable 166partitions (disk) 198partitions (scheduling) 323

adaptive 324static 323

pathconf() 190pathname

converting relative tonetwork 158

pathname delimiter in QNXMomentics documentationxx

pathname space 150, 252mapping 177

pause() 74performance

context-switch 43realtime 99

persistent storage 316Photon

architecture differs from XWindow System 333

architecture similar to that of OSmicrokernel 333

event space 334event types 338graphics drivers 339, 341microkernel runs as tiny

process 333regions 336widgets 347–364window manager 346

pipe manager 93

pipe() 93pipes 93

creating 93Point-to-Point Protocol (PPP) 269Point-to-Point Protocol over Ethernet

(PPPoE) 275popen() 93POSIX

defines interface, notimplementation 3

message queues 81profiles 5realtime extensions 5standards of interest to

embedded systemsdevelopers 4

suitable for embeddedsystems 4, 6

threads 5library calls not involving

kernel calls 24library calls with

corresponding kernel calls25

UNIX and 3power management 303–320

CPU modes 318framework components 303in embedded systems 303industry standards are

PC-oriented 303power manager 306

components of 307is written by the user 307

power policy 307, 308determined by the

developer 303

November 2, 2006 Index 399

Page 422: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Index 2006, QNX Software Systems GmbH & Co. KG.

power states 309power-aware applications 316power-managed objects 311PPP (Point-to-Point Protocol) 269PPPoE (Point-to-Point Protocol over

Ethernet) 275preferred (QoS policy) 257prefix 150printf() 165priority 32

background and foreground(sporadic scheduling) 37

inheritance 141inversion 34, 46range 33

process groupsmembership, inheriting 140remote node 141

Process Managercapabilities of 137required when creating multiple

POSIX processes 137processes

as container for threads 23cooperating

via pipes and FIFOs 93loading 143management 137model, required for POSIX

compliance 21opening the same file

twice 159OS as team of cooperating 10primitives 138system 12

processorslocking processes to 127, 130

multiple 121procnto* See also microkernel,

process managerimage filesystem 194instrumented 111RAM filesystem 202SMP 124

product line, using a single OSfor 4

PROT EXEC 88PROT NOCACHE 88PROT NONE 88protocols 271PROT READ 88PROT WRITE 88pthread attr destroy() 24pthread attr getdetachstate() 24pthread attr getinheritsched() 24pthread attr getschedparam() 24pthread attr getschedpolicy() 24pthread attr getscope() 24pthread attr getstackaddr() 24pthread attr getstacksize() 24pthread attr init() 24pthread attr setdetachstate() 24pthread attr setinheritsched() 24pthread attr setschedparam() 24pthread attr setschedpolicy() 24pthread attr setscope() 24pthread attr setstackaddr() 24pthread attr setstacksize() 24pthread barrierattr destroy() 50pthread barrierattr getpshared()

50pthread barrierattr init() 50pthread barrierattr setpshared()

50

400 Index November 2, 2006

Page 423: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Index

pthread barrier destroy() 50pthread barrier init() 48pthread barrier wait() 48pthread cancel() 25pthread cleanup pop() 24pthread cleanup push() 24pthread cond broadcast() 25, 55pthread cond destroy() 25, 55pthread cond init() 25, 55pthread cond signal() 25, 55pthread cond timedwait() 55, 98pthread cond wait() 25, 29, 55pthread create() 25pthread detach() 25pthread equal() 24pthread exit() 25pthread getcpuclockid() 95pthread getschedparam() 25pthread getspecific() 24, 28pthread join() 25, 30pthread key create() 24, 28pthread key delete() 24pthread kill() 25, 74pthread mutexattr setrecursive()

46pthread mutex destroy() 25, 55pthread mutex init() 25, 55pthread mutex lock() 25, 30, 55,

123pthread mutex trylock() 25, 55, 98pthread mutex unlock() 25, 55, 123pthread once() 24pthread rwlock rdlock() 51pthread rwlock tryrdlock() 51pthread rwlock trywrlock() 51pthread rwlock unlock() 51pthread rwlock wrlock() 51

pthread self() 24pthread setcancelstate() 25pthread setcanceltype() 25pthread setschedparam() 25pthread setspecific() 24pthread sigmask() 25pthread sleepon lock() 51pthread spin lock() 123pthread spin unlock() 123pthread testcancel() 24pty (pseudo terminal)

as pair of characterdevices 236

pulses 67, 71

Q

Qnet 244, 249–263QNX 4 filesystem 151, 208QNX Neutrino

design goals 3, 20extensibility 12flexibility 7network as homogeneous set of

resources 14network flexibility 15network transparency 15preemptible even during

message pass 22realtime applications, suitability

for 7services 21single-computer model 15

QoS (Quality of Service) 256policies 257

November 2, 2006 Index 401

Page 424: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Index 2006, QNX Software Systems GmbH & Co. KG.

R

raise() 74RAM 143RAM “filesystem” 202RAM disk 197RapidIO 263rate monotonic analysis (RMA) 37raw input mode 229

conditions for inputrequest 230

FORWARD qualifier 231MIN qualifier 230TIME qualifier 230TIMEOUT qualifier 230

read() 73readblock() 187readdir() 155, 194reader/writer locks 44, 51READY (thread state) 31realtime performance 99

interrupt latency and 100nested interrupts and 101scheduling latency and 100

RECEIVE (thread state) 31rectangle list (Photon) 338recv() 271recvfrom() 271redirecting

input 93output 93

regions (Photon) 334, 336relative pathnames, converting to

network pathnames 158remove() 94replenishment period (sporadic

scheduling) 38

REPLY (thread state) 31RESCONF 271res init() 271res mkquery() 271resolv.conf 271resource managers 193

atomic operations 187attributes structure 188can be started and stopped

dynamically 177communicate with clients via

IPC 182context for client requests 184defined 177don’t require any special

arrangements with thekernel 177

iofunc *() shared library 188message types 184mount structure 188shared library 184similarity to traditional device

drivers 177similarity to user-level

servers 177thread management 186unique features of 182

resourcesaccessing on other

machines 15no real difference between local

and remote 14open 159

res query() 271res querydomain() 271res search() 271res send() 271

402 Index November 2, 2006

Page 425: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Index

rm 94robustness

improved via memoryprotection 144

of application architectures viaSend/Receive/Reply 69

of implementations withSend/Receive/Reply 68

of IPC 56of signal handlers 72

ROM 143round-robin scheduling 34, 36runmask 130

inheriting 130RUNNING (thread state) 31runtime linker 169

S

scalingadvantages of 4of applications 4

scatter/gather DMA 60SchedGet() 25, 41sched getparam() 41sched getscheduler() 41SchedSet() 25, 41sched setparam() 41sched setscheduler() 41scheduling

FIFO 34, 35latency 100method

determining 35setting 35

round-robin 34, 36SMP systems 127sporadic 34, 37threads 32

sched yield() 32SCTP 272seek points 159, 161segments 167select() 73, 271SEM (thread state) 31semaphores 21, 42, 44, 52

named 52SMP 128

sem destroy() 55sem init() 55sem post() 55sem trywait() 55sem wait() 55SEND (thread state) 31send() 271sendto() 271sensitivity bitmask (Photon) 336serial

devices 235drivers 235

tiny versions 235Server Side Includes (SSI) 277services 272services, handled by system

processes 12sessions, remote node 141setprotoent() 271setservent() 271setsockopt() 271shared libraries (.so files) 19, 137,

170shared memory 42, 53, 63, 84–91

November 2, 2006 Index 403

Page 426: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Index 2006, QNX Software Systems GmbH & Co. KG.

shm ctl() 86, 87shm open() 86shm unlink() 86shutdown() 271SIGABRT 79sigaction() 74SIGALRM 79SIGBUS 79SIGCHLD 79SIGCONT 79SIGDEADLK 79SIGEMT 79SIGFPE 79SIGHUP 79SIGILL 79SIGINT 79SIGIOT 79SIGKILL 79signal() 77SignalAction() 74SignalKill() 25, 30, 74SignalProcmask() 25, 74signals 21

POSIX and UNIX 74queuing of 76rules for a multithreaded

process 74similarity to pulses 76targeted at specific threads 74

SignalSuspend() 74SignalWaitinfo() 74SIGPIPE 79SIGPOLL 79sigprocmask() 74sigqueue() 74SIGQUIT 79SIGSEGV 79

SIGSTOP 79SIGSUSPEND (thread state) 31sigsuspend() 30, 74SIGSYS 79SIGTERM 79sigtimedwait() 98SIGTRAP 79SIGTSTP 79SIGTTIN 79SIGTTOU 79SIGURG 79SIGUSR1 79SIGUSR2 79SIGWAITINFO (thread state) 31sigwaitinfo() 30, 74SIGWINCH 79single-computer model 15slay 131sleep() 98sleepon locks 44, 51slinger 276SMP (Symmetric

Multiprocessing) 121, 123SNMP (Simple Network

ManagementProtocol) 276

socket() 271sockets (logical flash drives) 213software interrupt See signalsspawn() 131

family of functions 138, 139QNX implementation 139

SPAWN EXPLICIT CPU 131SPAWN SETGROUP 140SPAWN SETSIGDEF 140SPAWN SETSIGMASK 140spinlocks 129

404 Index November 2, 2006

Page 427: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Index

SPOF 286sporadic scheduling 34, 37SSI (Server Side Includes) 277STACK (thread state) 31startup code (startup-*) 125,

126stat() 190states

CONDVAR 31DEAD 31INTERRUPT 31JOIN 31MUTEX 31NANOSLEEP 31NET REPLY 31NET SEND 31READY 31RECEIVE 31REPLY 31RUNNING 31SEM 31SEND 31SIGSUSPEND 31SIGWAITINFO 31STACK 31STOPPED 31WAITCTX 31WAITPAGE 31WAITTHREAD 31

static linking 165static partitions 323STOPPED (thread state) 31stty 233symbol names, resolving 172symbolic links

cd command and 158symbolic prefixes 155

Symmetric Multiprocessing SeeSMP

SyncCondvarSignal() 25, 55SyncCondvarWait() 25, 55SyncDestroy() 25, 55synchronization services 44, 55SyncMutexEvent() 25SyncMutexLock() 25, 55SyncMutexUnlock() 25, 55SyncSemPost() 55SyncSemWait() 30, 55SyncTypeCreate() 25, 55system

emergency access to 329page 95processes 12

similarity to user-writtenprocesses 12

T

tcdropline() 228tcgetattr() 228tcgetpgrp() 228tcinject() 228TCP/IP 267–277

resource manager(npm-tcpip) 269

stack configurations 267tcsendbreak() 228tcsetattr() 228tcsetpgrp() 228terminal emulation 235textto 209ThreadCancel() 25

November 2, 2006 Index 405

Page 428: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

Index 2006, QNX Software Systems GmbH & Co. KG.

ThreadCreate() 25, 31ThreadCtl() 25, 131ThreadDestroy() 25ThreadDetach() 25ThreadJoin() 25threads 21, 186

all share same kernelinterface 137

and barriers 48attributes of 26cancellation handlers 26concurrency advantages 43defined 23life cycle 28migration, reducing 127priority inversion 34process must contain one or

more 23register set 26scheduling 31

FIFO 35round-robin 36sporadic 37

signal mask 26stack 26states 29synchronization 42tid 26TLS (thread local storage) 26

timeout service 97TimerAlarm() 98TimerCreate() 98timer create() 98timer delete() 98TimerDestroy() 98timer getoverrun() 98timer gettime() 98

TimerInfo() 98timers 21, 96

cyclical mode 97TimerSettime() 98timer settime() 98TimerTimeout() 98timeslice 36TLB (translation look-aside

buffer) 145TLS (thread local storage) 27TraceEvent() 116transactions 203transparency of network 15typographical conventions xix

U

UART 106Unicode 342unlink() 94UNMAP INIT OPTIONAL 92UNMAP INIT REQUIRED 92UTF-8 encoding 343utime() 190

V

vfork() 138, 142virtual addresses 145virtual consoles 234

406 Index November 2, 2006

Page 429: QNX Neutrino RTOSbetas.qnx.com/download/download/14695/sys_arch.pdf · QNX, Neutrino, Photon, Photon microGUI, Momentics, ... 14 Power Management 301 Power management for embedded

2006, QNX Software Systems GmbH & Co. KG. Index

W

WAITCTX (thread state) 31WAITPAGE (thread state) 31WAITTHREAD (thread state) 31watchdog 146web server 276wide emitting mode (instrumented

kernel) 113widgets (Photon) 347–364window manager 346windows, dragging across network

links 341write() 73, 165

Z

zero-copy architecture(io-net) 242

November 2, 2006 Index 407