EmbdedSysts.doc

64
Unit -I INTRODUCTION: EMBEDDED SYSTEMS OVERVIEW, EMBEDDED HARDWARE UNITS, EMBEDDED SOFTWARE IN A SYSTEM, EMBEDDED SYSTEMS ON CHIP (SOC) DESIGN PROCESS, CLASSIFICATION OF EMBEDDED SYSTEMS UNIT-II EMBEDDED COMPUTING PLATFORM: CPU BUS, MEMORY DEVICES, COMPONENT INTERFACING, NETWORKS OF EMBEDDED SYSTEMS, COMMUNICATION INTERFACINGS: RS232/UART, RS422/RS485, IEEE 488 BUS UNIT-III SURVEY OF SOFTWARE ARCHITECTURE: ROUND ROBIN, ROUND ROBIN WITH INTERRUPTS, FUNCTION QUEUE SCHEDULING ARCHETECTURE, SELECTING AN ARCHITECTURE SAVING MEMORY SPACE UNIT-IV EMBEDDED SOFTWARE DEVELOPMENT TOOLS: HOST AND TARGET MACHINES, LINKERS/LOCATERS FOR EMBEDDED SOFTWARE, GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM,

description

brief documet for M.Tech DECS students

Transcript of EmbdedSysts.doc

Page 1: EmbdedSysts.doc

Unit -I INTRODUCTION

EMBEDDED SYSTEMS OVERVIEW

EMBEDDED HARDWARE UNITS

EMBEDDED SOFTWARE IN A SYSTEM

EMBEDDED SYSTEMS ON CHIP (SOC)

DESIGN PROCESS

CLASSIFICATION OF EMBEDDED SYSTEMS

UNIT-II EMBEDDED COMPUTING PLATFORM

CPU BUS

MEMORY DEVICES

COMPONENT INTERFACING

NETWORKS OF EMBEDDED SYSTEMS

COMMUNICATION INTERFACINGS

RS232UART RS422RS485 IEEE 488 BUS

UNIT-III SURVEY OF SOFTWARE ARCHITECTURE

ROUND ROBIN

ROUND ROBIN WITH INTERRUPTS

FUNCTION QUEUE SCHEDULING ARCHETECTURE

SELECTING AN ARCHITECTURE SAVING MEMORY SPACE

UNIT-IV EMBEDDED SOFTWARE DEVELOPMENT TOOLS

HOST AND TARGET MACHINES

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

INTERRUPT SERVICE ROUTINES

SEMAPHORES

MESSAGE QUEUES

PIPES

UNIT-VI INSTRUCTION SETS

INTRODUCTION

PRELIMINARIES

ARM PROCESSOR

SHARC PROCESSOR

UNIT-VII SYSTEM DESIGN TECHNIQUES

DESIGN METHODOLOGIES

REQUIREMENT ANALYSIS

SPECIFICATIONS

SYSTEM ANALYSIS AND ARCHITECURE DESIGN

UNIT-VIII DESIGN EXAMPLES

TELEPHONE PBX

INK JET PRINITER

WATER TANK MONITORING SYSTEM

GPRS

PERSONAL DIGITAL ASSISTANTS

SET TOP BOXES

Unit -I INTRODUCTION

DEFINITION

It can be defined as a computing device that does a specific focused job

They usually consist of a processor plus some special hardware along with embedded software both designed to meet the specific requirements of the application

SPECIAL FEATURES

They do a very specific task can not be programmed to do different things The software in Embedded systems is always fixed

They have to work against some deadlines like specific job has to be completed with in a specific time

Resources at their disposal are limited particularly the memory usually they donrsquot have secondary storage devices Power is another resource of limited availability

They need to be highly reliable and they need to work extreme environmental conditions

Embedded systems targeted to consumer market are very cost sensitive

In the area of embedded systems there is a wide variety of processors and operating systems Selecting appropriate one is a difficult task

APPLICATION AREAS

Consumer appliancesmdashdigital camera digital diary DVD players electronic toys remotes for TV and

microwave oven etc

Office automation mdash copying machine fax machine printer scanner etc

Industrial automaton mdashequipment to measure temp pressure voltage and current robots to avoid

hazardous jobs

Medical electronicsmdashECG EEG Blood pressure measuring devices X-ray scanners equipment used for

colonoscopy endoscopy

Computer networksmdashbridges routers ISDN ATM frame relay switches etc

Telecommunicationsmdashkey telephones ISDN telephones terminal adapters web cameras multiplexers

IP Phone IP gateway IP gate keeper

Wireless technologiesmdashmobile phones base station controllers personal digital assistants palm tops

etc

Instrumentationmdashoscilloscopes spectrum analyzers logic analyzers protocol analyzers etc

Securitymdash Encryption devices and biometric systems security devices at homes offices and airports for

authentication and verification

Financemdashsmart cards ATMs etc

EMBEDDED SYSTEMS OVERVIEW

ES usually consists of custom built hardware woven around a CPU

The custom built hardware also contains memory chips on to which software called firmware is loaded

When represented in layered architecture OS runs over the hardware and application software runs over the OS

EMBEDDED HARDWARE UNITS

Central processing unit

ROM and RAM

Input devices such as sensors AD converters keypad

Output devices such as DA converters LEDs LCD

DEBUG PORT

Communication interface

Power supply unit

EMBEDDED SOFTWARE IN A SYSTEM

EMBEDDED SYSTEMS ON CHIP (SOC)

SoC is an ES on a VLSI chip that has all the necessary analog and digital circuits processors and software A SoC may be embedded with the following components

Embedded processor GPP or ASIP core

Single purpose processing cores or multiple processors

A network bus protocol core

An encryption function unit

DCT for signal processing applications

Memories

PLDs and FPGA cores

Other logic and analog units

An application of SoC is mobile phone

DESIGN PROCESS

In Top down view we start with the most abstract description of the system and conclude with concrete details It consists of

Requirements- it is customers description of the system which they require They may be functional or nonfunctional The second category includes performance cost physical size and weight power consumption etc

Specifications - it serves as the contract between the customer and the architect accurately reflecting the customers requirements In this stage we create a more detailed description of what we want and how the system is expected to behave

Architecture- basically it is a plan for the over all structure of the system that will be used later to design the components that make up the architecture In this stage the aspect of how the system has to be built is addressed and the details of the system internals and components begin to take shape

Components- Here the design of components including both software and hardware modules takes place

System integration-This phase is difficult and challenging because it usually uncovers the problems In this phase the system is tested and if found the bugs are addressed

CLASSIFICATION OF EMBEDDED SYSTEMS

Stand alone ESs

They work in stand along mode they take inputs process them and produce the desired output ESs in automobiles consumer electronic items are examples

Real time systems

ESystems in which specific work has to be done in specific time period are called RT systems

Hard RTSs are systems which require to adhere deadlines strictly

Soft RTSs are systems in which non-adherence to deadlines doesnrsquot lead to catastrophe

Networked information appliances

These are connected to a network and provided with network interfaces and are accessible to LANs and INTERNET Web cameral connected to net is an example

Mobile devices

Mobile phones Personal Digital Assistants smart phones etc are examples for this category

UNIT-II EMBEDDED COMPUTING PLATFORM

CPU BUS

It is a mechanism by which the CPU communicates with memory and devices

A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory

HANDSHAKE

Basic building block of bus protocol is four cycle hand shake

It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive

It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake

The four cycles are

o Device 1 raises its output to signal an enquiry which tells device 2 that it should get

ready to listen for data

o When device 2 is ready to receive it raises its output to signal an acknowledgement At

this point devices 1 and 2 can transmit or receive

o Once the data transfer is complete device 2 lowers its output signaling that it has

received the data

o After seeing that ack has been released device 1 lowers its output

The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer

Major components of the typical bus structure that supports read and write are

o Clock provides synchronization to the bus components

o RW is true when the bus is reading and false when the bus is writing

o Address is an a-bit bundle of signals that transmits the address for an access

o Data is an n-bit bundle of signals that can carry data to or from the CPU

o Data ready signals when the values on the data bundle are valid

Burst transfer hand shaking signals are also used for burst transfers

In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address

One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst

Releasing the burst signal actually tells the device that enough data has been transmitted

Disconnected transfers in these buses the request and response are separate

A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready

DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory

A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below

o Higher speed buses may provide wider data connections

o A high speed bus usually requires more expensive circuits and connectors The

cost of low-speed devices can be held down by using a lower-speed lower-cost bus

o Bridge may allow the buses to operate independently thereby providing some

parallelism in IO operations

ARM Bus

o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by

many vendors

o AMBA Bus supports CPUs memories and peripherals integrated in a system on

silicon(SoS)

o AMBA specification includes two buses AHB AMBA high performance bus and APB

AMBA peripherals bus

o AHB it is optimized for high speed transfers and is directly connected to the CPU

o It supports several high performance features pipelining burst transfers split

transactions and multiple bus masters

o APB it is simple and easy to implement consumes relatively little power

o it assumes that all peripherals act as slaves simplifying the logic required in both the

peripherals and the bus controller

o It does not perform pipelined operations which simplifies the bus logic

SHARC Bus

o It contains both program and data memory on-chip

o There of two external interfaces of interest the external memory interface and the host

interface

o The external memory interface allows the SHARC to address up to four gigawords of

external memory which can hold either instructions or data

o The external data bus can vary in width from 16 to 48 bits depending upon the type of

memory access

o Different units of in the processor have different amounts of access to external memory

DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words

o The external memory is divided into four banks of equal size The memory above the

banks is known as unbanked external memory

o Host interface is used to connect the SHARC to standard microprocessor bus The host

interface implements a typical handshake for bus granting

o The SHARC includes an on-board DMA controller as part of IO processor It performs

external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional

o Each DMA channel has its own interrupt and the DMA controller supports chained

transfers also

MEMORY DEVICES

Important types of memories are RAMs and ROMs

Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM

o SRAM is faster than DRAM

o SRAM consumes more power than DRAM

o More DRAM can be put on a single chip

o DRAM values must be periodically refreshed

Static RAM has four inputs

o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled

and when CErsquo=0 the data pins are enabled

o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or

write to (RWrsquo=0) RAM

o Adrs specifies the address for the read or write

o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins

are outputs and when RWrsquo=0 the pins are inputs

DRAMs inputs and refresh

o They have two inputs in addition to the inputs of static RAM They are row

address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address

o DRAMs must be refreshed because they store values which can leak away A

single refresh request can refresh an entire row of the DRAM

o CAS before RAS refresh

it is a special quick refresh mode

This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0

It causes the current memory row get refreshed and the corresponding counter updated

Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate

Page mode

o Developed to improve the performance of DRAM

o Useful to access several locations in the same region of the memory

o In page mode access one row address and several column addresses are supplied

o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive

column addresses

o It is typically supported for both reads and writes

o EDO extended data out is an improved version of page mode Here the data are

held valid until the falling edge of CASrsquo rather than its rising edge as in page mode

Synchronous DRAMs

o It is developed to improve the performance of the DRAMs by introducing a clock

o Changes to input and outputs of the DRAM occur on clock edges

RAMs for specialized applications

o Video RAM

o RAMBUS

Video RAM

o It is designed to speed up video operations

o It includes a standard parallel interface as well as a serial interface

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 2: EmbdedSysts.doc

ARCHITECTURE OF THE KERNEL

INTERRUPT SERVICE ROUTINES

SEMAPHORES

MESSAGE QUEUES

PIPES

UNIT-VI INSTRUCTION SETS

INTRODUCTION

PRELIMINARIES

ARM PROCESSOR

SHARC PROCESSOR

UNIT-VII SYSTEM DESIGN TECHNIQUES

DESIGN METHODOLOGIES

REQUIREMENT ANALYSIS

SPECIFICATIONS

SYSTEM ANALYSIS AND ARCHITECURE DESIGN

UNIT-VIII DESIGN EXAMPLES

TELEPHONE PBX

INK JET PRINITER

WATER TANK MONITORING SYSTEM

GPRS

PERSONAL DIGITAL ASSISTANTS

SET TOP BOXES

Unit -I INTRODUCTION

DEFINITION

It can be defined as a computing device that does a specific focused job

They usually consist of a processor plus some special hardware along with embedded software both designed to meet the specific requirements of the application

SPECIAL FEATURES

They do a very specific task can not be programmed to do different things The software in Embedded systems is always fixed

They have to work against some deadlines like specific job has to be completed with in a specific time

Resources at their disposal are limited particularly the memory usually they donrsquot have secondary storage devices Power is another resource of limited availability

They need to be highly reliable and they need to work extreme environmental conditions

Embedded systems targeted to consumer market are very cost sensitive

In the area of embedded systems there is a wide variety of processors and operating systems Selecting appropriate one is a difficult task

APPLICATION AREAS

Consumer appliancesmdashdigital camera digital diary DVD players electronic toys remotes for TV and

microwave oven etc

Office automation mdash copying machine fax machine printer scanner etc

Industrial automaton mdashequipment to measure temp pressure voltage and current robots to avoid

hazardous jobs

Medical electronicsmdashECG EEG Blood pressure measuring devices X-ray scanners equipment used for

colonoscopy endoscopy

Computer networksmdashbridges routers ISDN ATM frame relay switches etc

Telecommunicationsmdashkey telephones ISDN telephones terminal adapters web cameras multiplexers

IP Phone IP gateway IP gate keeper

Wireless technologiesmdashmobile phones base station controllers personal digital assistants palm tops

etc

Instrumentationmdashoscilloscopes spectrum analyzers logic analyzers protocol analyzers etc

Securitymdash Encryption devices and biometric systems security devices at homes offices and airports for

authentication and verification

Financemdashsmart cards ATMs etc

EMBEDDED SYSTEMS OVERVIEW

ES usually consists of custom built hardware woven around a CPU

The custom built hardware also contains memory chips on to which software called firmware is loaded

When represented in layered architecture OS runs over the hardware and application software runs over the OS

EMBEDDED HARDWARE UNITS

Central processing unit

ROM and RAM

Input devices such as sensors AD converters keypad

Output devices such as DA converters LEDs LCD

DEBUG PORT

Communication interface

Power supply unit

EMBEDDED SOFTWARE IN A SYSTEM

EMBEDDED SYSTEMS ON CHIP (SOC)

SoC is an ES on a VLSI chip that has all the necessary analog and digital circuits processors and software A SoC may be embedded with the following components

Embedded processor GPP or ASIP core

Single purpose processing cores or multiple processors

A network bus protocol core

An encryption function unit

DCT for signal processing applications

Memories

PLDs and FPGA cores

Other logic and analog units

An application of SoC is mobile phone

DESIGN PROCESS

In Top down view we start with the most abstract description of the system and conclude with concrete details It consists of

Requirements- it is customers description of the system which they require They may be functional or nonfunctional The second category includes performance cost physical size and weight power consumption etc

Specifications - it serves as the contract between the customer and the architect accurately reflecting the customers requirements In this stage we create a more detailed description of what we want and how the system is expected to behave

Architecture- basically it is a plan for the over all structure of the system that will be used later to design the components that make up the architecture In this stage the aspect of how the system has to be built is addressed and the details of the system internals and components begin to take shape

Components- Here the design of components including both software and hardware modules takes place

System integration-This phase is difficult and challenging because it usually uncovers the problems In this phase the system is tested and if found the bugs are addressed

CLASSIFICATION OF EMBEDDED SYSTEMS

Stand alone ESs

They work in stand along mode they take inputs process them and produce the desired output ESs in automobiles consumer electronic items are examples

Real time systems

ESystems in which specific work has to be done in specific time period are called RT systems

Hard RTSs are systems which require to adhere deadlines strictly

Soft RTSs are systems in which non-adherence to deadlines doesnrsquot lead to catastrophe

Networked information appliances

These are connected to a network and provided with network interfaces and are accessible to LANs and INTERNET Web cameral connected to net is an example

Mobile devices

Mobile phones Personal Digital Assistants smart phones etc are examples for this category

UNIT-II EMBEDDED COMPUTING PLATFORM

CPU BUS

It is a mechanism by which the CPU communicates with memory and devices

A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory

HANDSHAKE

Basic building block of bus protocol is four cycle hand shake

It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive

It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake

The four cycles are

o Device 1 raises its output to signal an enquiry which tells device 2 that it should get

ready to listen for data

o When device 2 is ready to receive it raises its output to signal an acknowledgement At

this point devices 1 and 2 can transmit or receive

o Once the data transfer is complete device 2 lowers its output signaling that it has

received the data

o After seeing that ack has been released device 1 lowers its output

The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer

Major components of the typical bus structure that supports read and write are

o Clock provides synchronization to the bus components

o RW is true when the bus is reading and false when the bus is writing

o Address is an a-bit bundle of signals that transmits the address for an access

o Data is an n-bit bundle of signals that can carry data to or from the CPU

o Data ready signals when the values on the data bundle are valid

Burst transfer hand shaking signals are also used for burst transfers

In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address

One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst

Releasing the burst signal actually tells the device that enough data has been transmitted

Disconnected transfers in these buses the request and response are separate

A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready

DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory

A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below

o Higher speed buses may provide wider data connections

o A high speed bus usually requires more expensive circuits and connectors The

cost of low-speed devices can be held down by using a lower-speed lower-cost bus

o Bridge may allow the buses to operate independently thereby providing some

parallelism in IO operations

ARM Bus

o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by

many vendors

o AMBA Bus supports CPUs memories and peripherals integrated in a system on

silicon(SoS)

o AMBA specification includes two buses AHB AMBA high performance bus and APB

AMBA peripherals bus

o AHB it is optimized for high speed transfers and is directly connected to the CPU

o It supports several high performance features pipelining burst transfers split

transactions and multiple bus masters

o APB it is simple and easy to implement consumes relatively little power

o it assumes that all peripherals act as slaves simplifying the logic required in both the

peripherals and the bus controller

o It does not perform pipelined operations which simplifies the bus logic

SHARC Bus

o It contains both program and data memory on-chip

o There of two external interfaces of interest the external memory interface and the host

interface

o The external memory interface allows the SHARC to address up to four gigawords of

external memory which can hold either instructions or data

o The external data bus can vary in width from 16 to 48 bits depending upon the type of

memory access

o Different units of in the processor have different amounts of access to external memory

DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words

o The external memory is divided into four banks of equal size The memory above the

banks is known as unbanked external memory

o Host interface is used to connect the SHARC to standard microprocessor bus The host

interface implements a typical handshake for bus granting

o The SHARC includes an on-board DMA controller as part of IO processor It performs

external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional

o Each DMA channel has its own interrupt and the DMA controller supports chained

transfers also

MEMORY DEVICES

Important types of memories are RAMs and ROMs

Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM

o SRAM is faster than DRAM

o SRAM consumes more power than DRAM

o More DRAM can be put on a single chip

o DRAM values must be periodically refreshed

Static RAM has four inputs

o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled

and when CErsquo=0 the data pins are enabled

o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or

write to (RWrsquo=0) RAM

o Adrs specifies the address for the read or write

o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins

are outputs and when RWrsquo=0 the pins are inputs

DRAMs inputs and refresh

o They have two inputs in addition to the inputs of static RAM They are row

address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address

o DRAMs must be refreshed because they store values which can leak away A

single refresh request can refresh an entire row of the DRAM

o CAS before RAS refresh

it is a special quick refresh mode

This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0

It causes the current memory row get refreshed and the corresponding counter updated

Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate

Page mode

o Developed to improve the performance of DRAM

o Useful to access several locations in the same region of the memory

o In page mode access one row address and several column addresses are supplied

o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive

column addresses

o It is typically supported for both reads and writes

o EDO extended data out is an improved version of page mode Here the data are

held valid until the falling edge of CASrsquo rather than its rising edge as in page mode

Synchronous DRAMs

o It is developed to improve the performance of the DRAMs by introducing a clock

o Changes to input and outputs of the DRAM occur on clock edges

RAMs for specialized applications

o Video RAM

o RAMBUS

Video RAM

o It is designed to speed up video operations

o It includes a standard parallel interface as well as a serial interface

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 3: EmbdedSysts.doc

They usually consist of a processor plus some special hardware along with embedded software both designed to meet the specific requirements of the application

SPECIAL FEATURES

They do a very specific task can not be programmed to do different things The software in Embedded systems is always fixed

They have to work against some deadlines like specific job has to be completed with in a specific time

Resources at their disposal are limited particularly the memory usually they donrsquot have secondary storage devices Power is another resource of limited availability

They need to be highly reliable and they need to work extreme environmental conditions

Embedded systems targeted to consumer market are very cost sensitive

In the area of embedded systems there is a wide variety of processors and operating systems Selecting appropriate one is a difficult task

APPLICATION AREAS

Consumer appliancesmdashdigital camera digital diary DVD players electronic toys remotes for TV and

microwave oven etc

Office automation mdash copying machine fax machine printer scanner etc

Industrial automaton mdashequipment to measure temp pressure voltage and current robots to avoid

hazardous jobs

Medical electronicsmdashECG EEG Blood pressure measuring devices X-ray scanners equipment used for

colonoscopy endoscopy

Computer networksmdashbridges routers ISDN ATM frame relay switches etc

Telecommunicationsmdashkey telephones ISDN telephones terminal adapters web cameras multiplexers

IP Phone IP gateway IP gate keeper

Wireless technologiesmdashmobile phones base station controllers personal digital assistants palm tops

etc

Instrumentationmdashoscilloscopes spectrum analyzers logic analyzers protocol analyzers etc

Securitymdash Encryption devices and biometric systems security devices at homes offices and airports for

authentication and verification

Financemdashsmart cards ATMs etc

EMBEDDED SYSTEMS OVERVIEW

ES usually consists of custom built hardware woven around a CPU

The custom built hardware also contains memory chips on to which software called firmware is loaded

When represented in layered architecture OS runs over the hardware and application software runs over the OS

EMBEDDED HARDWARE UNITS

Central processing unit

ROM and RAM

Input devices such as sensors AD converters keypad

Output devices such as DA converters LEDs LCD

DEBUG PORT

Communication interface

Power supply unit

EMBEDDED SOFTWARE IN A SYSTEM

EMBEDDED SYSTEMS ON CHIP (SOC)

SoC is an ES on a VLSI chip that has all the necessary analog and digital circuits processors and software A SoC may be embedded with the following components

Embedded processor GPP or ASIP core

Single purpose processing cores or multiple processors

A network bus protocol core

An encryption function unit

DCT for signal processing applications

Memories

PLDs and FPGA cores

Other logic and analog units

An application of SoC is mobile phone

DESIGN PROCESS

In Top down view we start with the most abstract description of the system and conclude with concrete details It consists of

Requirements- it is customers description of the system which they require They may be functional or nonfunctional The second category includes performance cost physical size and weight power consumption etc

Specifications - it serves as the contract between the customer and the architect accurately reflecting the customers requirements In this stage we create a more detailed description of what we want and how the system is expected to behave

Architecture- basically it is a plan for the over all structure of the system that will be used later to design the components that make up the architecture In this stage the aspect of how the system has to be built is addressed and the details of the system internals and components begin to take shape

Components- Here the design of components including both software and hardware modules takes place

System integration-This phase is difficult and challenging because it usually uncovers the problems In this phase the system is tested and if found the bugs are addressed

CLASSIFICATION OF EMBEDDED SYSTEMS

Stand alone ESs

They work in stand along mode they take inputs process them and produce the desired output ESs in automobiles consumer electronic items are examples

Real time systems

ESystems in which specific work has to be done in specific time period are called RT systems

Hard RTSs are systems which require to adhere deadlines strictly

Soft RTSs are systems in which non-adherence to deadlines doesnrsquot lead to catastrophe

Networked information appliances

These are connected to a network and provided with network interfaces and are accessible to LANs and INTERNET Web cameral connected to net is an example

Mobile devices

Mobile phones Personal Digital Assistants smart phones etc are examples for this category

UNIT-II EMBEDDED COMPUTING PLATFORM

CPU BUS

It is a mechanism by which the CPU communicates with memory and devices

A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory

HANDSHAKE

Basic building block of bus protocol is four cycle hand shake

It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive

It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake

The four cycles are

o Device 1 raises its output to signal an enquiry which tells device 2 that it should get

ready to listen for data

o When device 2 is ready to receive it raises its output to signal an acknowledgement At

this point devices 1 and 2 can transmit or receive

o Once the data transfer is complete device 2 lowers its output signaling that it has

received the data

o After seeing that ack has been released device 1 lowers its output

The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer

Major components of the typical bus structure that supports read and write are

o Clock provides synchronization to the bus components

o RW is true when the bus is reading and false when the bus is writing

o Address is an a-bit bundle of signals that transmits the address for an access

o Data is an n-bit bundle of signals that can carry data to or from the CPU

o Data ready signals when the values on the data bundle are valid

Burst transfer hand shaking signals are also used for burst transfers

In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address

One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst

Releasing the burst signal actually tells the device that enough data has been transmitted

Disconnected transfers in these buses the request and response are separate

A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready

DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory

A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below

o Higher speed buses may provide wider data connections

o A high speed bus usually requires more expensive circuits and connectors The

cost of low-speed devices can be held down by using a lower-speed lower-cost bus

o Bridge may allow the buses to operate independently thereby providing some

parallelism in IO operations

ARM Bus

o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by

many vendors

o AMBA Bus supports CPUs memories and peripherals integrated in a system on

silicon(SoS)

o AMBA specification includes two buses AHB AMBA high performance bus and APB

AMBA peripherals bus

o AHB it is optimized for high speed transfers and is directly connected to the CPU

o It supports several high performance features pipelining burst transfers split

transactions and multiple bus masters

o APB it is simple and easy to implement consumes relatively little power

o it assumes that all peripherals act as slaves simplifying the logic required in both the

peripherals and the bus controller

o It does not perform pipelined operations which simplifies the bus logic

SHARC Bus

o It contains both program and data memory on-chip

o There of two external interfaces of interest the external memory interface and the host

interface

o The external memory interface allows the SHARC to address up to four gigawords of

external memory which can hold either instructions or data

o The external data bus can vary in width from 16 to 48 bits depending upon the type of

memory access

o Different units of in the processor have different amounts of access to external memory

DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words

o The external memory is divided into four banks of equal size The memory above the

banks is known as unbanked external memory

o Host interface is used to connect the SHARC to standard microprocessor bus The host

interface implements a typical handshake for bus granting

o The SHARC includes an on-board DMA controller as part of IO processor It performs

external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional

o Each DMA channel has its own interrupt and the DMA controller supports chained

transfers also

MEMORY DEVICES

Important types of memories are RAMs and ROMs

Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM

o SRAM is faster than DRAM

o SRAM consumes more power than DRAM

o More DRAM can be put on a single chip

o DRAM values must be periodically refreshed

Static RAM has four inputs

o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled

and when CErsquo=0 the data pins are enabled

o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or

write to (RWrsquo=0) RAM

o Adrs specifies the address for the read or write

o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins

are outputs and when RWrsquo=0 the pins are inputs

DRAMs inputs and refresh

o They have two inputs in addition to the inputs of static RAM They are row

address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address

o DRAMs must be refreshed because they store values which can leak away A

single refresh request can refresh an entire row of the DRAM

o CAS before RAS refresh

it is a special quick refresh mode

This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0

It causes the current memory row get refreshed and the corresponding counter updated

Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate

Page mode

o Developed to improve the performance of DRAM

o Useful to access several locations in the same region of the memory

o In page mode access one row address and several column addresses are supplied

o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive

column addresses

o It is typically supported for both reads and writes

o EDO extended data out is an improved version of page mode Here the data are

held valid until the falling edge of CASrsquo rather than its rising edge as in page mode

Synchronous DRAMs

o It is developed to improve the performance of the DRAMs by introducing a clock

o Changes to input and outputs of the DRAM occur on clock edges

RAMs for specialized applications

o Video RAM

o RAMBUS

Video RAM

o It is designed to speed up video operations

o It includes a standard parallel interface as well as a serial interface

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 4: EmbdedSysts.doc

authentication and verification

Financemdashsmart cards ATMs etc

EMBEDDED SYSTEMS OVERVIEW

ES usually consists of custom built hardware woven around a CPU

The custom built hardware also contains memory chips on to which software called firmware is loaded

When represented in layered architecture OS runs over the hardware and application software runs over the OS

EMBEDDED HARDWARE UNITS

Central processing unit

ROM and RAM

Input devices such as sensors AD converters keypad

Output devices such as DA converters LEDs LCD

DEBUG PORT

Communication interface

Power supply unit

EMBEDDED SOFTWARE IN A SYSTEM

EMBEDDED SYSTEMS ON CHIP (SOC)

SoC is an ES on a VLSI chip that has all the necessary analog and digital circuits processors and software A SoC may be embedded with the following components

Embedded processor GPP or ASIP core

Single purpose processing cores or multiple processors

A network bus protocol core

An encryption function unit

DCT for signal processing applications

Memories

PLDs and FPGA cores

Other logic and analog units

An application of SoC is mobile phone

DESIGN PROCESS

In Top down view we start with the most abstract description of the system and conclude with concrete details It consists of

Requirements- it is customers description of the system which they require They may be functional or nonfunctional The second category includes performance cost physical size and weight power consumption etc

Specifications - it serves as the contract between the customer and the architect accurately reflecting the customers requirements In this stage we create a more detailed description of what we want and how the system is expected to behave

Architecture- basically it is a plan for the over all structure of the system that will be used later to design the components that make up the architecture In this stage the aspect of how the system has to be built is addressed and the details of the system internals and components begin to take shape

Components- Here the design of components including both software and hardware modules takes place

System integration-This phase is difficult and challenging because it usually uncovers the problems In this phase the system is tested and if found the bugs are addressed

CLASSIFICATION OF EMBEDDED SYSTEMS

Stand alone ESs

They work in stand along mode they take inputs process them and produce the desired output ESs in automobiles consumer electronic items are examples

Real time systems

ESystems in which specific work has to be done in specific time period are called RT systems

Hard RTSs are systems which require to adhere deadlines strictly

Soft RTSs are systems in which non-adherence to deadlines doesnrsquot lead to catastrophe

Networked information appliances

These are connected to a network and provided with network interfaces and are accessible to LANs and INTERNET Web cameral connected to net is an example

Mobile devices

Mobile phones Personal Digital Assistants smart phones etc are examples for this category

UNIT-II EMBEDDED COMPUTING PLATFORM

CPU BUS

It is a mechanism by which the CPU communicates with memory and devices

A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory

HANDSHAKE

Basic building block of bus protocol is four cycle hand shake

It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive

It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake

The four cycles are

o Device 1 raises its output to signal an enquiry which tells device 2 that it should get

ready to listen for data

o When device 2 is ready to receive it raises its output to signal an acknowledgement At

this point devices 1 and 2 can transmit or receive

o Once the data transfer is complete device 2 lowers its output signaling that it has

received the data

o After seeing that ack has been released device 1 lowers its output

The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer

Major components of the typical bus structure that supports read and write are

o Clock provides synchronization to the bus components

o RW is true when the bus is reading and false when the bus is writing

o Address is an a-bit bundle of signals that transmits the address for an access

o Data is an n-bit bundle of signals that can carry data to or from the CPU

o Data ready signals when the values on the data bundle are valid

Burst transfer hand shaking signals are also used for burst transfers

In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address

One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst

Releasing the burst signal actually tells the device that enough data has been transmitted

Disconnected transfers in these buses the request and response are separate

A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready

DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory

A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below

o Higher speed buses may provide wider data connections

o A high speed bus usually requires more expensive circuits and connectors The

cost of low-speed devices can be held down by using a lower-speed lower-cost bus

o Bridge may allow the buses to operate independently thereby providing some

parallelism in IO operations

ARM Bus

o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by

many vendors

o AMBA Bus supports CPUs memories and peripherals integrated in a system on

silicon(SoS)

o AMBA specification includes two buses AHB AMBA high performance bus and APB

AMBA peripherals bus

o AHB it is optimized for high speed transfers and is directly connected to the CPU

o It supports several high performance features pipelining burst transfers split

transactions and multiple bus masters

o APB it is simple and easy to implement consumes relatively little power

o it assumes that all peripherals act as slaves simplifying the logic required in both the

peripherals and the bus controller

o It does not perform pipelined operations which simplifies the bus logic

SHARC Bus

o It contains both program and data memory on-chip

o There of two external interfaces of interest the external memory interface and the host

interface

o The external memory interface allows the SHARC to address up to four gigawords of

external memory which can hold either instructions or data

o The external data bus can vary in width from 16 to 48 bits depending upon the type of

memory access

o Different units of in the processor have different amounts of access to external memory

DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words

o The external memory is divided into four banks of equal size The memory above the

banks is known as unbanked external memory

o Host interface is used to connect the SHARC to standard microprocessor bus The host

interface implements a typical handshake for bus granting

o The SHARC includes an on-board DMA controller as part of IO processor It performs

external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional

o Each DMA channel has its own interrupt and the DMA controller supports chained

transfers also

MEMORY DEVICES

Important types of memories are RAMs and ROMs

Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM

o SRAM is faster than DRAM

o SRAM consumes more power than DRAM

o More DRAM can be put on a single chip

o DRAM values must be periodically refreshed

Static RAM has four inputs

o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled

and when CErsquo=0 the data pins are enabled

o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or

write to (RWrsquo=0) RAM

o Adrs specifies the address for the read or write

o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins

are outputs and when RWrsquo=0 the pins are inputs

DRAMs inputs and refresh

o They have two inputs in addition to the inputs of static RAM They are row

address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address

o DRAMs must be refreshed because they store values which can leak away A

single refresh request can refresh an entire row of the DRAM

o CAS before RAS refresh

it is a special quick refresh mode

This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0

It causes the current memory row get refreshed and the corresponding counter updated

Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate

Page mode

o Developed to improve the performance of DRAM

o Useful to access several locations in the same region of the memory

o In page mode access one row address and several column addresses are supplied

o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive

column addresses

o It is typically supported for both reads and writes

o EDO extended data out is an improved version of page mode Here the data are

held valid until the falling edge of CASrsquo rather than its rising edge as in page mode

Synchronous DRAMs

o It is developed to improve the performance of the DRAMs by introducing a clock

o Changes to input and outputs of the DRAM occur on clock edges

RAMs for specialized applications

o Video RAM

o RAMBUS

Video RAM

o It is designed to speed up video operations

o It includes a standard parallel interface as well as a serial interface

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 5: EmbdedSysts.doc

An application of SoC is mobile phone

DESIGN PROCESS

In Top down view we start with the most abstract description of the system and conclude with concrete details It consists of

Requirements- it is customers description of the system which they require They may be functional or nonfunctional The second category includes performance cost physical size and weight power consumption etc

Specifications - it serves as the contract between the customer and the architect accurately reflecting the customers requirements In this stage we create a more detailed description of what we want and how the system is expected to behave

Architecture- basically it is a plan for the over all structure of the system that will be used later to design the components that make up the architecture In this stage the aspect of how the system has to be built is addressed and the details of the system internals and components begin to take shape

Components- Here the design of components including both software and hardware modules takes place

System integration-This phase is difficult and challenging because it usually uncovers the problems In this phase the system is tested and if found the bugs are addressed

CLASSIFICATION OF EMBEDDED SYSTEMS

Stand alone ESs

They work in stand along mode they take inputs process them and produce the desired output ESs in automobiles consumer electronic items are examples

Real time systems

ESystems in which specific work has to be done in specific time period are called RT systems

Hard RTSs are systems which require to adhere deadlines strictly

Soft RTSs are systems in which non-adherence to deadlines doesnrsquot lead to catastrophe

Networked information appliances

These are connected to a network and provided with network interfaces and are accessible to LANs and INTERNET Web cameral connected to net is an example

Mobile devices

Mobile phones Personal Digital Assistants smart phones etc are examples for this category

UNIT-II EMBEDDED COMPUTING PLATFORM

CPU BUS

It is a mechanism by which the CPU communicates with memory and devices

A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory

HANDSHAKE

Basic building block of bus protocol is four cycle hand shake

It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive

It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake

The four cycles are

o Device 1 raises its output to signal an enquiry which tells device 2 that it should get

ready to listen for data

o When device 2 is ready to receive it raises its output to signal an acknowledgement At

this point devices 1 and 2 can transmit or receive

o Once the data transfer is complete device 2 lowers its output signaling that it has

received the data

o After seeing that ack has been released device 1 lowers its output

The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer

Major components of the typical bus structure that supports read and write are

o Clock provides synchronization to the bus components

o RW is true when the bus is reading and false when the bus is writing

o Address is an a-bit bundle of signals that transmits the address for an access

o Data is an n-bit bundle of signals that can carry data to or from the CPU

o Data ready signals when the values on the data bundle are valid

Burst transfer hand shaking signals are also used for burst transfers

In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address

One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst

Releasing the burst signal actually tells the device that enough data has been transmitted

Disconnected transfers in these buses the request and response are separate

A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready

DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory

A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below

o Higher speed buses may provide wider data connections

o A high speed bus usually requires more expensive circuits and connectors The

cost of low-speed devices can be held down by using a lower-speed lower-cost bus

o Bridge may allow the buses to operate independently thereby providing some

parallelism in IO operations

ARM Bus

o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by

many vendors

o AMBA Bus supports CPUs memories and peripherals integrated in a system on

silicon(SoS)

o AMBA specification includes two buses AHB AMBA high performance bus and APB

AMBA peripherals bus

o AHB it is optimized for high speed transfers and is directly connected to the CPU

o It supports several high performance features pipelining burst transfers split

transactions and multiple bus masters

o APB it is simple and easy to implement consumes relatively little power

o it assumes that all peripherals act as slaves simplifying the logic required in both the

peripherals and the bus controller

o It does not perform pipelined operations which simplifies the bus logic

SHARC Bus

o It contains both program and data memory on-chip

o There of two external interfaces of interest the external memory interface and the host

interface

o The external memory interface allows the SHARC to address up to four gigawords of

external memory which can hold either instructions or data

o The external data bus can vary in width from 16 to 48 bits depending upon the type of

memory access

o Different units of in the processor have different amounts of access to external memory

DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words

o The external memory is divided into four banks of equal size The memory above the

banks is known as unbanked external memory

o Host interface is used to connect the SHARC to standard microprocessor bus The host

interface implements a typical handshake for bus granting

o The SHARC includes an on-board DMA controller as part of IO processor It performs

external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional

o Each DMA channel has its own interrupt and the DMA controller supports chained

transfers also

MEMORY DEVICES

Important types of memories are RAMs and ROMs

Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM

o SRAM is faster than DRAM

o SRAM consumes more power than DRAM

o More DRAM can be put on a single chip

o DRAM values must be periodically refreshed

Static RAM has four inputs

o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled

and when CErsquo=0 the data pins are enabled

o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or

write to (RWrsquo=0) RAM

o Adrs specifies the address for the read or write

o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins

are outputs and when RWrsquo=0 the pins are inputs

DRAMs inputs and refresh

o They have two inputs in addition to the inputs of static RAM They are row

address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address

o DRAMs must be refreshed because they store values which can leak away A

single refresh request can refresh an entire row of the DRAM

o CAS before RAS refresh

it is a special quick refresh mode

This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0

It causes the current memory row get refreshed and the corresponding counter updated

Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate

Page mode

o Developed to improve the performance of DRAM

o Useful to access several locations in the same region of the memory

o In page mode access one row address and several column addresses are supplied

o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive

column addresses

o It is typically supported for both reads and writes

o EDO extended data out is an improved version of page mode Here the data are

held valid until the falling edge of CASrsquo rather than its rising edge as in page mode

Synchronous DRAMs

o It is developed to improve the performance of the DRAMs by introducing a clock

o Changes to input and outputs of the DRAM occur on clock edges

RAMs for specialized applications

o Video RAM

o RAMBUS

Video RAM

o It is designed to speed up video operations

o It includes a standard parallel interface as well as a serial interface

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 6: EmbdedSysts.doc

Mobile devices

Mobile phones Personal Digital Assistants smart phones etc are examples for this category

UNIT-II EMBEDDED COMPUTING PLATFORM

CPU BUS

It is a mechanism by which the CPU communicates with memory and devices

A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory

HANDSHAKE

Basic building block of bus protocol is four cycle hand shake

It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive

It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake

The four cycles are

o Device 1 raises its output to signal an enquiry which tells device 2 that it should get

ready to listen for data

o When device 2 is ready to receive it raises its output to signal an acknowledgement At

this point devices 1 and 2 can transmit or receive

o Once the data transfer is complete device 2 lowers its output signaling that it has

received the data

o After seeing that ack has been released device 1 lowers its output

The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer

Major components of the typical bus structure that supports read and write are

o Clock provides synchronization to the bus components

o RW is true when the bus is reading and false when the bus is writing

o Address is an a-bit bundle of signals that transmits the address for an access

o Data is an n-bit bundle of signals that can carry data to or from the CPU

o Data ready signals when the values on the data bundle are valid

Burst transfer hand shaking signals are also used for burst transfers

In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address

One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst

Releasing the burst signal actually tells the device that enough data has been transmitted

Disconnected transfers in these buses the request and response are separate

A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready

DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory

A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below

o Higher speed buses may provide wider data connections

o A high speed bus usually requires more expensive circuits and connectors The

cost of low-speed devices can be held down by using a lower-speed lower-cost bus

o Bridge may allow the buses to operate independently thereby providing some

parallelism in IO operations

ARM Bus

o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by

many vendors

o AMBA Bus supports CPUs memories and peripherals integrated in a system on

silicon(SoS)

o AMBA specification includes two buses AHB AMBA high performance bus and APB

AMBA peripherals bus

o AHB it is optimized for high speed transfers and is directly connected to the CPU

o It supports several high performance features pipelining burst transfers split

transactions and multiple bus masters

o APB it is simple and easy to implement consumes relatively little power

o it assumes that all peripherals act as slaves simplifying the logic required in both the

peripherals and the bus controller

o It does not perform pipelined operations which simplifies the bus logic

SHARC Bus

o It contains both program and data memory on-chip

o There of two external interfaces of interest the external memory interface and the host

interface

o The external memory interface allows the SHARC to address up to four gigawords of

external memory which can hold either instructions or data

o The external data bus can vary in width from 16 to 48 bits depending upon the type of

memory access

o Different units of in the processor have different amounts of access to external memory

DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words

o The external memory is divided into four banks of equal size The memory above the

banks is known as unbanked external memory

o Host interface is used to connect the SHARC to standard microprocessor bus The host

interface implements a typical handshake for bus granting

o The SHARC includes an on-board DMA controller as part of IO processor It performs

external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional

o Each DMA channel has its own interrupt and the DMA controller supports chained

transfers also

MEMORY DEVICES

Important types of memories are RAMs and ROMs

Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM

o SRAM is faster than DRAM

o SRAM consumes more power than DRAM

o More DRAM can be put on a single chip

o DRAM values must be periodically refreshed

Static RAM has four inputs

o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled

and when CErsquo=0 the data pins are enabled

o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or

write to (RWrsquo=0) RAM

o Adrs specifies the address for the read or write

o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins

are outputs and when RWrsquo=0 the pins are inputs

DRAMs inputs and refresh

o They have two inputs in addition to the inputs of static RAM They are row

address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address

o DRAMs must be refreshed because they store values which can leak away A

single refresh request can refresh an entire row of the DRAM

o CAS before RAS refresh

it is a special quick refresh mode

This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0

It causes the current memory row get refreshed and the corresponding counter updated

Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate

Page mode

o Developed to improve the performance of DRAM

o Useful to access several locations in the same region of the memory

o In page mode access one row address and several column addresses are supplied

o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive

column addresses

o It is typically supported for both reads and writes

o EDO extended data out is an improved version of page mode Here the data are

held valid until the falling edge of CASrsquo rather than its rising edge as in page mode

Synchronous DRAMs

o It is developed to improve the performance of the DRAMs by introducing a clock

o Changes to input and outputs of the DRAM occur on clock edges

RAMs for specialized applications

o Video RAM

o RAMBUS

Video RAM

o It is designed to speed up video operations

o It includes a standard parallel interface as well as a serial interface

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 7: EmbdedSysts.doc

The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer

Major components of the typical bus structure that supports read and write are

o Clock provides synchronization to the bus components

o RW is true when the bus is reading and false when the bus is writing

o Address is an a-bit bundle of signals that transmits the address for an access

o Data is an n-bit bundle of signals that can carry data to or from the CPU

o Data ready signals when the values on the data bundle are valid

Burst transfer hand shaking signals are also used for burst transfers

In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address

One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst

Releasing the burst signal actually tells the device that enough data has been transmitted

Disconnected transfers in these buses the request and response are separate

A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready

DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory

A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below

o Higher speed buses may provide wider data connections

o A high speed bus usually requires more expensive circuits and connectors The

cost of low-speed devices can be held down by using a lower-speed lower-cost bus

o Bridge may allow the buses to operate independently thereby providing some

parallelism in IO operations

ARM Bus

o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by

many vendors

o AMBA Bus supports CPUs memories and peripherals integrated in a system on

silicon(SoS)

o AMBA specification includes two buses AHB AMBA high performance bus and APB

AMBA peripherals bus

o AHB it is optimized for high speed transfers and is directly connected to the CPU

o It supports several high performance features pipelining burst transfers split

transactions and multiple bus masters

o APB it is simple and easy to implement consumes relatively little power

o it assumes that all peripherals act as slaves simplifying the logic required in both the

peripherals and the bus controller

o It does not perform pipelined operations which simplifies the bus logic

SHARC Bus

o It contains both program and data memory on-chip

o There of two external interfaces of interest the external memory interface and the host

interface

o The external memory interface allows the SHARC to address up to four gigawords of

external memory which can hold either instructions or data

o The external data bus can vary in width from 16 to 48 bits depending upon the type of

memory access

o Different units of in the processor have different amounts of access to external memory

DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words

o The external memory is divided into four banks of equal size The memory above the

banks is known as unbanked external memory

o Host interface is used to connect the SHARC to standard microprocessor bus The host

interface implements a typical handshake for bus granting

o The SHARC includes an on-board DMA controller as part of IO processor It performs

external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional

o Each DMA channel has its own interrupt and the DMA controller supports chained

transfers also

MEMORY DEVICES

Important types of memories are RAMs and ROMs

Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM

o SRAM is faster than DRAM

o SRAM consumes more power than DRAM

o More DRAM can be put on a single chip

o DRAM values must be periodically refreshed

Static RAM has four inputs

o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled

and when CErsquo=0 the data pins are enabled

o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or

write to (RWrsquo=0) RAM

o Adrs specifies the address for the read or write

o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins

are outputs and when RWrsquo=0 the pins are inputs

DRAMs inputs and refresh

o They have two inputs in addition to the inputs of static RAM They are row

address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address

o DRAMs must be refreshed because they store values which can leak away A

single refresh request can refresh an entire row of the DRAM

o CAS before RAS refresh

it is a special quick refresh mode

This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0

It causes the current memory row get refreshed and the corresponding counter updated

Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate

Page mode

o Developed to improve the performance of DRAM

o Useful to access several locations in the same region of the memory

o In page mode access one row address and several column addresses are supplied

o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive

column addresses

o It is typically supported for both reads and writes

o EDO extended data out is an improved version of page mode Here the data are

held valid until the falling edge of CASrsquo rather than its rising edge as in page mode

Synchronous DRAMs

o It is developed to improve the performance of the DRAMs by introducing a clock

o Changes to input and outputs of the DRAM occur on clock edges

RAMs for specialized applications

o Video RAM

o RAMBUS

Video RAM

o It is designed to speed up video operations

o It includes a standard parallel interface as well as a serial interface

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 8: EmbdedSysts.doc

o Bridge may allow the buses to operate independently thereby providing some

parallelism in IO operations

ARM Bus

o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by

many vendors

o AMBA Bus supports CPUs memories and peripherals integrated in a system on

silicon(SoS)

o AMBA specification includes two buses AHB AMBA high performance bus and APB

AMBA peripherals bus

o AHB it is optimized for high speed transfers and is directly connected to the CPU

o It supports several high performance features pipelining burst transfers split

transactions and multiple bus masters

o APB it is simple and easy to implement consumes relatively little power

o it assumes that all peripherals act as slaves simplifying the logic required in both the

peripherals and the bus controller

o It does not perform pipelined operations which simplifies the bus logic

SHARC Bus

o It contains both program and data memory on-chip

o There of two external interfaces of interest the external memory interface and the host

interface

o The external memory interface allows the SHARC to address up to four gigawords of

external memory which can hold either instructions or data

o The external data bus can vary in width from 16 to 48 bits depending upon the type of

memory access

o Different units of in the processor have different amounts of access to external memory

DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words

o The external memory is divided into four banks of equal size The memory above the

banks is known as unbanked external memory

o Host interface is used to connect the SHARC to standard microprocessor bus The host

interface implements a typical handshake for bus granting

o The SHARC includes an on-board DMA controller as part of IO processor It performs

external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional

o Each DMA channel has its own interrupt and the DMA controller supports chained

transfers also

MEMORY DEVICES

Important types of memories are RAMs and ROMs

Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM

o SRAM is faster than DRAM

o SRAM consumes more power than DRAM

o More DRAM can be put on a single chip

o DRAM values must be periodically refreshed

Static RAM has four inputs

o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled

and when CErsquo=0 the data pins are enabled

o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or

write to (RWrsquo=0) RAM

o Adrs specifies the address for the read or write

o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins

are outputs and when RWrsquo=0 the pins are inputs

DRAMs inputs and refresh

o They have two inputs in addition to the inputs of static RAM They are row

address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address

o DRAMs must be refreshed because they store values which can leak away A

single refresh request can refresh an entire row of the DRAM

o CAS before RAS refresh

it is a special quick refresh mode

This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0

It causes the current memory row get refreshed and the corresponding counter updated

Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate

Page mode

o Developed to improve the performance of DRAM

o Useful to access several locations in the same region of the memory

o In page mode access one row address and several column addresses are supplied

o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive

column addresses

o It is typically supported for both reads and writes

o EDO extended data out is an improved version of page mode Here the data are

held valid until the falling edge of CASrsquo rather than its rising edge as in page mode

Synchronous DRAMs

o It is developed to improve the performance of the DRAMs by introducing a clock

o Changes to input and outputs of the DRAM occur on clock edges

RAMs for specialized applications

o Video RAM

o RAMBUS

Video RAM

o It is designed to speed up video operations

o It includes a standard parallel interface as well as a serial interface

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 9: EmbdedSysts.doc

o Host interface is used to connect the SHARC to standard microprocessor bus The host

interface implements a typical handshake for bus granting

o The SHARC includes an on-board DMA controller as part of IO processor It performs

external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional

o Each DMA channel has its own interrupt and the DMA controller supports chained

transfers also

MEMORY DEVICES

Important types of memories are RAMs and ROMs

Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM

o SRAM is faster than DRAM

o SRAM consumes more power than DRAM

o More DRAM can be put on a single chip

o DRAM values must be periodically refreshed

Static RAM has four inputs

o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled

and when CErsquo=0 the data pins are enabled

o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or

write to (RWrsquo=0) RAM

o Adrs specifies the address for the read or write

o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins

are outputs and when RWrsquo=0 the pins are inputs

DRAMs inputs and refresh

o They have two inputs in addition to the inputs of static RAM They are row

address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address

o DRAMs must be refreshed because they store values which can leak away A

single refresh request can refresh an entire row of the DRAM

o CAS before RAS refresh

it is a special quick refresh mode

This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0

It causes the current memory row get refreshed and the corresponding counter updated

Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate

Page mode

o Developed to improve the performance of DRAM

o Useful to access several locations in the same region of the memory

o In page mode access one row address and several column addresses are supplied

o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive

column addresses

o It is typically supported for both reads and writes

o EDO extended data out is an improved version of page mode Here the data are

held valid until the falling edge of CASrsquo rather than its rising edge as in page mode

Synchronous DRAMs

o It is developed to improve the performance of the DRAMs by introducing a clock

o Changes to input and outputs of the DRAM occur on clock edges

RAMs for specialized applications

o Video RAM

o RAMBUS

Video RAM

o It is designed to speed up video operations

o It includes a standard parallel interface as well as a serial interface

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 10: EmbdedSysts.doc

o DRAMs must be refreshed because they store values which can leak away A

single refresh request can refresh an entire row of the DRAM

o CAS before RAS refresh

it is a special quick refresh mode

This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0

It causes the current memory row get refreshed and the corresponding counter updated

Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate

Page mode

o Developed to improve the performance of DRAM

o Useful to access several locations in the same region of the memory

o In page mode access one row address and several column addresses are supplied

o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive

column addresses

o It is typically supported for both reads and writes

o EDO extended data out is an improved version of page mode Here the data are

held valid until the falling edge of CASrsquo rather than its rising edge as in page mode

Synchronous DRAMs

o It is developed to improve the performance of the DRAMs by introducing a clock

o Changes to input and outputs of the DRAM occur on clock edges

RAMs for specialized applications

o Video RAM

o RAMBUS

Video RAM

o It is designed to speed up video operations

o It includes a standard parallel interface as well as a serial interface

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 11: EmbdedSysts.doc

o Typically the serial interface is connected to a video display while the parallel

interface to the microprocessor

RAMBUS

o It is high performance at a relatively low cost

o It has multiple memory banks that can be addressed in parallel

o It has separate data and control buses

o It is capable of sustained data rates well above 1Gbytessec

ROMs are

o programmed with fixed data

o very useful in embedded systems since a great deal of the code and perhaps

some data does not change over time

o less sensitive to radiation induced errors

Varieties of ROMs

o Factory (or mask)programmed ROMs and field programmed ROMs

o Factory programming is useful only when ROMs are installed in some quantity

Field programmable ROMs are programmed in the laboratory using ROM burners

o Field programmable ROMs can be of two types antifuse-programmable ROM

which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed

o Flash memory

It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s

While using floating gate principle it is designed such that large blocks of memory can be erased all at once

It uses standard system voltage for erasing and programming allowing programming in a typical system

Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 12: EmbdedSysts.doc

Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment

Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back

IO devices

Timers and counters

AD and DA converters

Keyboards

LEDs

Displays

Touch screens

COMPONENT INTERFACING

Memory interfacing

Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons

DRAMrsquos RASCAS multiplexing

Need to refresh

Device interfacing

Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed

An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely

NETWORKS OF EMBEDDED SYSTEMS

Interconnect networks specialized for distributed embedded computing are

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 13: EmbdedSysts.doc

I2C bus used in microcontroller based systems

CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices

Echelon LON network used for home and industrial automation

DSPs usually supply their own interconnects for multiprocessing

I 2 C bus

Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol

This protocol enables peripheral ICs communicate with each other using simple communication hardware

Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus

Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers

The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF

Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data

All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions

It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)

It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line

Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters

It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 14: EmbdedSysts.doc

clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high

Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message

Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition

o 0000000 is used for general call or bus broadcast useful to signal all

devices simultaneously

o 11110XX is reserved for the extended 10 bit addressing scheme

Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes

Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master

The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL

The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus

CAN Bus

The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires

It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also

Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities

Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments

The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 15: EmbdedSysts.doc

It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used

The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar

Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion

In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant

The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1

When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state

Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work

Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame

Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are

at least three bit fields between data frames) The first field in the packet contains the packets destination address and

is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long

The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier

The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between

The data field is from 0 to 64 bytes depending on the value given in the control field

A cyclic redundancy check (CRC)is sent after the data field for error detection

Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field

Arbitration

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 16: EmbdedSysts.doc

Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority

Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged

COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc

RS232UART

It is a standard for serial communication developed by electronic industries association(EIA)

It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)

Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner

Communication between the two devices is in full duplex ie the data transfer can take place in both the directions

In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits

Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 17: EmbdedSysts.doc

This mode of communication is called asynchronous communication because no clock signal is transmitted

RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved

Possible Data rates depend upon the UART chip and the clock used

Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems

Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps

Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits

Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended

Parity bit It is added for error checking on the receiver side

Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type

RS232 connector configurations

It specifies two types of connectors 9 pin and 25 pin

For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals

The voltage level is wrt to local ground and hence RS232 used unbalanced transmission

UART chip universal asynchronous receive transmit chip

It has two sections receive section and transmit section

Receive section receives data converts it into parallel form from series form give the data to processor

Transmit section takes the data from the processor converts the data from parallel format to serial format

It also adds start stop and parity bits

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 18: EmbdedSysts.doc

The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible

UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector

RS422

It is a standard for serial communication in noisy environments

The distance between the devices can be up to 1200 meters

Twisted copper cable is used as a transmitting medium

It uses balanced transmission

Two channels are used for transmit and receive paths

RS485

It is a variation of RS422 created to connect a number of devices up to 512 in a network

RS485 controller chip is used on each device

The network with RS485 protocol uses the master slave configuration

With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved

IEEE 488 BUS

It is a short range digital communications bus specification Originally created by HP for use with automated test equipment

It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)

`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections

Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 19: EmbdedSysts.doc

position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle

Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below

A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority

The slowest device participates in control and data transfer handshakes to determine the speed of the transaction

The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions

IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines

In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data

The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 20: EmbdedSysts.doc

The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882

While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented

National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003

Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)

Applications

Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus

HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers

Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 21: EmbdedSysts.doc

Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS

Choosing an Architecture The best architecture depends on several factors

Real-time requirements of the application (absoluteresponse time)

o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response

and priority

Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)

Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything

Round Robin Uses

1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly

Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 22: EmbdedSysts.doc

Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur

1048708 How could you fix

Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough

Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)

Real Time Operating System

Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 23: EmbdedSysts.doc

Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs

ROUND ROBIN

A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin

In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling

In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events

A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread

Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks

The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 24: EmbdedSysts.doc

Process scheduling

In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes

Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU

Job1 = Total time to complete 250 ms (quantum 100 ms)

1 First allocation = 100 ms

2 Second allocation = 100 ms

3 Third allocation = 100 ms but job1 self-terminates after 50 ms

4 Total CPU time of job1 = 250 mS

Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time

Data packet scheduling

In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing

A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused

Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable

If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 25: EmbdedSysts.doc

In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station

In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation

Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores

Scheduling (computing)

In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)

The scheduler is concerned mainly with

Throughput - number of processes that complete their execution per time unit Latency specifically

o Turnaround - total time between submission of a process and its completion

o Response time - amount of time it takes from when a request was submitted until the first response is produced

Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)

In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives

In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 26: EmbdedSysts.doc

crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end

Types of operating system schedulers

Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run

Long-term scheduling

The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]

Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system

Medium-term scheduling

The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]

In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 27: EmbdedSysts.doc

Short-term scheduling

The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system

Dispatcher

Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following

Switching context Switching to user mode

Jumping to the proper location in the user program to restart that program

The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]

Scheduling disciplines

Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc

The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them

In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets

The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 28: EmbdedSysts.doc

or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized

In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them

First in first out

Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue

Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal

Throughput can be low since long processes can hog the CPU

Turnaround time waiting time and response time can be high for the same reasons above

No prioritization occurs thus this system has trouble meeting process deadlines

The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation

It is based on Queuing

Shortest remaining time

Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete

If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead

This algorithm is designed for maximum throughput in most scenarios

Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 29: EmbdedSysts.doc

No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible

Starvation is possible especially in a busy system with many small processes being run

Fixed priority pre-emptive scheduling

The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes

Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling

Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times

Deadlines can be met by giving processes with deadlines a higher priority

Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time

Round-robin scheduling

The scheduler assigns a fixed time unit per process and cycles through them

RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in

FCFS and longer processes are completed faster than in SJF

Poor average response time waiting time is dependent on number of processes and not average process length

Because of high waiting times deadlines are rarely met in a pure RR system

Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS

Multilevel queue scheduling

This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem

OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 30: EmbdedSysts.doc

First In First Out Low Low High High

Shortest Job First Medium High Medium Medium

Priority based scheduling Medium Low High High

Round-robin scheduling High Medium Medium High

Multilevel Queue scheduling High High Medium Medium

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 31: EmbdedSysts.doc

HOST AND TARGET MACHINES

The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following

o Load programs into the target

o Start and stop program execution on the target

o Examine memory and CPU registers

The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system

LINKERSLOCATERS FOR EMBEDDED SOFTWARE

A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in

GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM

DEBUGGING TECHNIQUES

Major portion of software debugging is done by compiling and executing the code on PC or work station

Serial port available on the evaluation boards is one of the most important debugging tools

Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices

LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity

Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor

Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 32: EmbdedSysts.doc

Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are

o An instruction level simulator may be used to debug code running on the

CPU

o A cycle level simulator tool may be used for faster simulation of parts of

the system

o A hardwaresoftware co-simulator may be used to simulate various parts

of the system at different levels of detail

Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise

UNIT-VI INSTRUCTION SETS

INTRODUCTION PRELIMINARIES

Instruction sets are the programmers interface to the hardware

The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)

A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture

Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths

o CISC architecture was developed to reduce the number of instructions in the

compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 33: EmbdedSysts.doc

o Drawbacks include difficulty in instruction pipelining and longer clock cycles

leading to unsuitability for high performance processors

Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors

o RISC architecture is optimized to achieve short clock cycles small numbers of

cycles per instruction and efficient pipelining of instruction streams

o It requires a more sophisticated compiler and the compiler needs to use a

sequence of RISC instructions in order to implement complex operations

ARM PROCESSOR

It is a family of RISC architectures

The instructions are written one per line starting after the first column

A label which gives a name to a memory location comes at the beginning of the line starting in the first column

Comments begin t with a semi colon and continue until the end of the line

Example

LDR r0[r8] a comment

Label ADD r4r0r1

Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer

It supports two basic types of data

o The standard ARM word is 32 bit long

o The word may be divided into four 8-bit bytes

An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on

The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 34: EmbdedSysts.doc

SHARC PROCESSOR

It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures

The SHARC is designed to perform floating point intensive computations

The instructions are written one per line and terminated by a semicolon

A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon

Comments start with an exclamation point and continue until the end of the line

Example

R1=DM(M0I0) R2=PM(M8I8) a comment

Label R3= R1+ R2

SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits

The SHARC supports the following types of data

32-bit IEEE single-precision floating point

40-bit IEEE extended -precision floating point

32-bit integers

The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)

The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions

Data operations

The programming model for the SHARC is rather large and complex

The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations

All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register

The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 35: EmbdedSysts.doc

All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register

The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction

The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register

The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there

The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register

The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory

Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2

The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches

DAGs provide the following addressing modes

The simplest addressing mode provides an immediate value that can represent an address

An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction

A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value

The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 36: EmbdedSysts.doc

The DAGs also support circular buffers which are commonly used in signal processing

The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform

UNIT-V RTOS CONCEPTS

ARCHITECTURE OF THE KERNEL

The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management

Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers

INTERRUPT SERVICE ROUTINES

Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important

Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR

Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context

Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction

SEMAPHORES

It is a kernel object that is used for both resource synchronization and task

Synchronization

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 37: EmbdedSysts.doc

It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1

Semaphore management function calls

o Create a semaphore

o Delete a semaphore

o Acquire a semaphore

o Release a semaphore

o Query a semaphore

MESSAGE QUEUES

Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list

Some of the applications of message queue are

o Taking the input from a keyboard

o To display output

o Reading voltages from sensors or transducers

o Data packet transmission in a network

In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message

PIPES

Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing

Pipe management function calls

o Create a Pipe

o Open a Pipe

o Close a pipe

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 38: EmbdedSysts.doc

o Read from the pipe

o Write from the pipe

INSTRUCTION SET ARCHITECTURE ISA

ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are

o Large uniform register file with 16 general purpose registers

o Loadstore architecture the instructions that process data operate only on the registers and are

separate from the instructions that access memory

o Simple addressing modes

o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have

a regular three operand encoding

These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced

o Each instruction controls the ALU and the shifter thus making the instructions more powerful

o Auto-increment and auto-decrement addressing modes have been incorporated

o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been

introduced

o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a

4-bit condition code

All these features have resulted in high performance low code size low power consumption and low silicon area

REGISTERS

The ARM-ISA has 16 general purpose registers in the user mode They are

R15----------- it is program counter but can be manipulated as a general purpose register

R13----------- it is used as a stack pointer

R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register

CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 39: EmbdedSysts.doc

The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets

The mode field selects one of the six execution modes as follows

o User mode It is used to run the application code Once in user mode the CPSR can not be

written to

o Fast interrupt processing mode(FIQ) supports high speed interrupt handling

o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system

o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt

instruction

o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM

instruction or a coprocessor instruction

o Abort mode it is entered in response to memory fault

The user registers R0 to R7 are common to all the operation modes

DATA TYPES

ARM instruction set supports six different data types namely

8-bit signed and unsigned

16-bit signed and unsigned

32-bit signed and unsigned

The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format

ARM instruction sets

It has two instruction sets 32-bit ARM and 16-bit THUMB

ARM It is standard 32-bit instruction set

Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value

o Interesting feature of the ARM architecture is that the modification of the

condition flags by arithmetic instructions is optional

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 40: EmbdedSysts.doc

Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions

o ARM supports both little endian and big endian formats for data access

Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either

o Any subset of the current bank of registers(default)

o Any subset of the user bank of registers when in a privilege mode

Multiplication instructions ARM provide several versions of multiplications These are

o Integer multiplication (32-bit result)

o Long Integer multiplication (64-bit result)

o Multiply accumulate instruction

Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n

Execution of the instruction causes SWI exception handler to be called

Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally

Branch instruction In ARM processor the branch instructions have the following features

o All the branches are relative to the program counter

o Jump is always with in a limit of plusmnMB

o Conditional branches are formed by using the condition codes

o Subroutine call instruction is also modeled as a variant of branch instruction

THUMB

o These are 16-bit in length

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 41: EmbdedSysts.doc

o Stored in a compressed form

o The instructions are decomposed into ARM instructions and then executed

by the processor

o THUMB instruction set must always be entered running BXBLX (Branch

Exchange) instruction

Differences with ARM

o THUMB instructions are executed unconditionally excepting the branch

instructions

o THUMB instructions have unlimited access to registers R0-R7 and R13-R15

A reduced no of instructions can access the full register set

o No MSR and MRS instructions

o Maximum no of SWI calls is restricted to 256

o On reset and on raising of an exception the processor always enters into the

ARM instruction set mode

Advantages of THUMB

o More code density

o Less power consumption

o Less space occupation

o It is faster when the memory is organized in 16-bit However ARM is faster

when the memory is organized in 32-bit

RTOS architecture

In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do

The differences between this architecture and the previous ones are that

The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose

No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 42: EmbdedSysts.doc

The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another

Advantages

RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions

RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems

Disadvantages

RTOS it self uses a certain amount of processing time

TASK

The basic building block of software written under an RTOS is task

Under most RTOS a task is simply a subroutine

At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc

Most RTOS allows as many tasks as we need

TASKS AND DATA

Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system

Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem

Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic

Atomic section is a part of program which can not be interrupted

REENTRANT FUNCTION

These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 43: EmbdedSysts.doc

A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task

A reentrant function may not call any other functions that are not themselves reentrant

A reentrant function may not use the hardware in a non-atomic way

States of RTOS

Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time

Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state

Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well

Scheduler

It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state

The task that is assigned highest priority gets the processor

The scheduler does not fiddle with task priorities

Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state

Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever

Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler

How does the scheduler know when a task has become blocked or unblocked

RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened

What happens if all the tasks are blocked

If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 44: EmbdedSysts.doc

What happens if two tasks are ready with same priority

It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one

If a higher priority task unblocks what happens to the running task

In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks

In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks

REQUIREMENTS ANALYSIS

Requirements and specifications are related but distinct steps in the design process

Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture

Both requirements and specifications are however directed to the outward behavior of the system not its internal structure

The overall goal of creating a requirements document is effective communication between the customers and designers

Two types of requirements

o Functional A functional requirements states what the system must do such as compute

an FFT

o Non-functional A non-functional requirements can be any no of attributes including

physical size cost power consumption design time reliability and so on

A good set of requirements reflect

o Correctness

o Unambiguousness

o Completeness

o Verifiability

o Consistency

o Modifiability

o traceability

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 45: EmbdedSysts.doc

Design methodology

It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are

o Functionality

o Manufacturing cost

o Performance

o Power consumption

o Time-to-market

o Design cost

o Quality Customers not only want their products fast and cheap they also want them to be of

right quality

Design flow

A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand

Waterfall model

It was introduced by ROYCE and it is the first model proposed for the software development process

This model has five major phases

o Requirements Analysis and determines the basic characteristics of the system

o Architecture Design decomposes the functionality into major components

o Coding Implements the process and integrates them

o Testing Uncovers the bugs

o Maintenance Entails deployment in the field bug fixes and upgrades

This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps

It is ideal during early design phases since it implies good foreknowledge of the implementation

It not suitable where design entails experimentation and changes that require bottom up feedback

Nowadays it is considered as an unrealistic design process

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview
Page 46: EmbdedSysts.doc

Spiral model

It is an alternative model for the software development

Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built

At each level of the design the designers go through requirements construction and testing phases

The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles

Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design

Its main disadvantage is with too many spirals it may take too long when design time is a major requirement

Its advantage is It adopts successive refinement approach

CONCURRENT ENGINEERING

Its important goals are reduced design time increased reliability and performance reduced power consumption etc

It tries to eliminate ldquo over-the-well rdquo design steps

Concurrent engineering effort

Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth

Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time

Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process

Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done

Early and continual supplier involvement helps make the best use of supplier capabilities

Early and continual customer focus helps to ensure that the product best meets customerrsquos needs

  • 7 No latency issues (other than waiting for other devices to be serviced)
  • 1048708 How could you fix
  • ROUND ROBIN
    • Process scheduling
    • Data packet scheduling
      • Scheduling (computing)
        • Types of operating system schedulers
          • Long-term scheduling
          • Medium-term scheduling
          • Short-term scheduling
          • Dispatcher
            • Scheduling disciplines
              • First in first out
              • Shortest remaining time
              • Fixed priority pre-emptive scheduling
              • Round-robin scheduling
              • Multilevel queue scheduling
              • Overview