EmbdedSysts.doc
description
Transcript of EmbdedSysts.doc
![Page 1: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/1.jpg)
Unit -I INTRODUCTION
EMBEDDED SYSTEMS OVERVIEW
EMBEDDED HARDWARE UNITS
EMBEDDED SOFTWARE IN A SYSTEM
EMBEDDED SYSTEMS ON CHIP (SOC)
DESIGN PROCESS
CLASSIFICATION OF EMBEDDED SYSTEMS
UNIT-II EMBEDDED COMPUTING PLATFORM
CPU BUS
MEMORY DEVICES
COMPONENT INTERFACING
NETWORKS OF EMBEDDED SYSTEMS
COMMUNICATION INTERFACINGS
RS232UART RS422RS485 IEEE 488 BUS
UNIT-III SURVEY OF SOFTWARE ARCHITECTURE
ROUND ROBIN
ROUND ROBIN WITH INTERRUPTS
FUNCTION QUEUE SCHEDULING ARCHETECTURE
SELECTING AN ARCHITECTURE SAVING MEMORY SPACE
UNIT-IV EMBEDDED SOFTWARE DEVELOPMENT TOOLS
HOST AND TARGET MACHINES
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
INTERRUPT SERVICE ROUTINES
SEMAPHORES
MESSAGE QUEUES
PIPES
UNIT-VI INSTRUCTION SETS
INTRODUCTION
PRELIMINARIES
ARM PROCESSOR
SHARC PROCESSOR
UNIT-VII SYSTEM DESIGN TECHNIQUES
DESIGN METHODOLOGIES
REQUIREMENT ANALYSIS
SPECIFICATIONS
SYSTEM ANALYSIS AND ARCHITECURE DESIGN
UNIT-VIII DESIGN EXAMPLES
TELEPHONE PBX
INK JET PRINITER
WATER TANK MONITORING SYSTEM
GPRS
PERSONAL DIGITAL ASSISTANTS
SET TOP BOXES
Unit -I INTRODUCTION
DEFINITION
It can be defined as a computing device that does a specific focused job
They usually consist of a processor plus some special hardware along with embedded software both designed to meet the specific requirements of the application
SPECIAL FEATURES
They do a very specific task can not be programmed to do different things The software in Embedded systems is always fixed
They have to work against some deadlines like specific job has to be completed with in a specific time
Resources at their disposal are limited particularly the memory usually they donrsquot have secondary storage devices Power is another resource of limited availability
They need to be highly reliable and they need to work extreme environmental conditions
Embedded systems targeted to consumer market are very cost sensitive
In the area of embedded systems there is a wide variety of processors and operating systems Selecting appropriate one is a difficult task
APPLICATION AREAS
Consumer appliancesmdashdigital camera digital diary DVD players electronic toys remotes for TV and
microwave oven etc
Office automation mdash copying machine fax machine printer scanner etc
Industrial automaton mdashequipment to measure temp pressure voltage and current robots to avoid
hazardous jobs
Medical electronicsmdashECG EEG Blood pressure measuring devices X-ray scanners equipment used for
colonoscopy endoscopy
Computer networksmdashbridges routers ISDN ATM frame relay switches etc
Telecommunicationsmdashkey telephones ISDN telephones terminal adapters web cameras multiplexers
IP Phone IP gateway IP gate keeper
Wireless technologiesmdashmobile phones base station controllers personal digital assistants palm tops
etc
Instrumentationmdashoscilloscopes spectrum analyzers logic analyzers protocol analyzers etc
Securitymdash Encryption devices and biometric systems security devices at homes offices and airports for
authentication and verification
Financemdashsmart cards ATMs etc
EMBEDDED SYSTEMS OVERVIEW
ES usually consists of custom built hardware woven around a CPU
The custom built hardware also contains memory chips on to which software called firmware is loaded
When represented in layered architecture OS runs over the hardware and application software runs over the OS
EMBEDDED HARDWARE UNITS
Central processing unit
ROM and RAM
Input devices such as sensors AD converters keypad
Output devices such as DA converters LEDs LCD
DEBUG PORT
Communication interface
Power supply unit
EMBEDDED SOFTWARE IN A SYSTEM
EMBEDDED SYSTEMS ON CHIP (SOC)
SoC is an ES on a VLSI chip that has all the necessary analog and digital circuits processors and software A SoC may be embedded with the following components
Embedded processor GPP or ASIP core
Single purpose processing cores or multiple processors
A network bus protocol core
An encryption function unit
DCT for signal processing applications
Memories
PLDs and FPGA cores
Other logic and analog units
An application of SoC is mobile phone
DESIGN PROCESS
In Top down view we start with the most abstract description of the system and conclude with concrete details It consists of
Requirements- it is customers description of the system which they require They may be functional or nonfunctional The second category includes performance cost physical size and weight power consumption etc
Specifications - it serves as the contract between the customer and the architect accurately reflecting the customers requirements In this stage we create a more detailed description of what we want and how the system is expected to behave
Architecture- basically it is a plan for the over all structure of the system that will be used later to design the components that make up the architecture In this stage the aspect of how the system has to be built is addressed and the details of the system internals and components begin to take shape
Components- Here the design of components including both software and hardware modules takes place
System integration-This phase is difficult and challenging because it usually uncovers the problems In this phase the system is tested and if found the bugs are addressed
CLASSIFICATION OF EMBEDDED SYSTEMS
Stand alone ESs
They work in stand along mode they take inputs process them and produce the desired output ESs in automobiles consumer electronic items are examples
Real time systems
ESystems in which specific work has to be done in specific time period are called RT systems
Hard RTSs are systems which require to adhere deadlines strictly
Soft RTSs are systems in which non-adherence to deadlines doesnrsquot lead to catastrophe
Networked information appliances
These are connected to a network and provided with network interfaces and are accessible to LANs and INTERNET Web cameral connected to net is an example
Mobile devices
Mobile phones Personal Digital Assistants smart phones etc are examples for this category
UNIT-II EMBEDDED COMPUTING PLATFORM
CPU BUS
It is a mechanism by which the CPU communicates with memory and devices
A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory
HANDSHAKE
Basic building block of bus protocol is four cycle hand shake
It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive
It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake
The four cycles are
o Device 1 raises its output to signal an enquiry which tells device 2 that it should get
ready to listen for data
o When device 2 is ready to receive it raises its output to signal an acknowledgement At
this point devices 1 and 2 can transmit or receive
o Once the data transfer is complete device 2 lowers its output signaling that it has
received the data
o After seeing that ack has been released device 1 lowers its output
The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer
Major components of the typical bus structure that supports read and write are
o Clock provides synchronization to the bus components
o RW is true when the bus is reading and false when the bus is writing
o Address is an a-bit bundle of signals that transmits the address for an access
o Data is an n-bit bundle of signals that can carry data to or from the CPU
o Data ready signals when the values on the data bundle are valid
Burst transfer hand shaking signals are also used for burst transfers
In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address
One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst
Releasing the burst signal actually tells the device that enough data has been transmitted
Disconnected transfers in these buses the request and response are separate
A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready
DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory
A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below
o Higher speed buses may provide wider data connections
o A high speed bus usually requires more expensive circuits and connectors The
cost of low-speed devices can be held down by using a lower-speed lower-cost bus
o Bridge may allow the buses to operate independently thereby providing some
parallelism in IO operations
ARM Bus
o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by
many vendors
o AMBA Bus supports CPUs memories and peripherals integrated in a system on
silicon(SoS)
o AMBA specification includes two buses AHB AMBA high performance bus and APB
AMBA peripherals bus
o AHB it is optimized for high speed transfers and is directly connected to the CPU
o It supports several high performance features pipelining burst transfers split
transactions and multiple bus masters
o APB it is simple and easy to implement consumes relatively little power
o it assumes that all peripherals act as slaves simplifying the logic required in both the
peripherals and the bus controller
o It does not perform pipelined operations which simplifies the bus logic
SHARC Bus
o It contains both program and data memory on-chip
o There of two external interfaces of interest the external memory interface and the host
interface
o The external memory interface allows the SHARC to address up to four gigawords of
external memory which can hold either instructions or data
o The external data bus can vary in width from 16 to 48 bits depending upon the type of
memory access
o Different units of in the processor have different amounts of access to external memory
DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words
o The external memory is divided into four banks of equal size The memory above the
banks is known as unbanked external memory
o Host interface is used to connect the SHARC to standard microprocessor bus The host
interface implements a typical handshake for bus granting
o The SHARC includes an on-board DMA controller as part of IO processor It performs
external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional
o Each DMA channel has its own interrupt and the DMA controller supports chained
transfers also
MEMORY DEVICES
Important types of memories are RAMs and ROMs
Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM
o SRAM is faster than DRAM
o SRAM consumes more power than DRAM
o More DRAM can be put on a single chip
o DRAM values must be periodically refreshed
Static RAM has four inputs
o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled
and when CErsquo=0 the data pins are enabled
o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or
write to (RWrsquo=0) RAM
o Adrs specifies the address for the read or write
o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins
are outputs and when RWrsquo=0 the pins are inputs
DRAMs inputs and refresh
o They have two inputs in addition to the inputs of static RAM They are row
address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address
o DRAMs must be refreshed because they store values which can leak away A
single refresh request can refresh an entire row of the DRAM
o CAS before RAS refresh
it is a special quick refresh mode
This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0
It causes the current memory row get refreshed and the corresponding counter updated
Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate
Page mode
o Developed to improve the performance of DRAM
o Useful to access several locations in the same region of the memory
o In page mode access one row address and several column addresses are supplied
o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive
column addresses
o It is typically supported for both reads and writes
o EDO extended data out is an improved version of page mode Here the data are
held valid until the falling edge of CASrsquo rather than its rising edge as in page mode
Synchronous DRAMs
o It is developed to improve the performance of the DRAMs by introducing a clock
o Changes to input and outputs of the DRAM occur on clock edges
RAMs for specialized applications
o Video RAM
o RAMBUS
Video RAM
o It is designed to speed up video operations
o It includes a standard parallel interface as well as a serial interface
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 2: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/2.jpg)
ARCHITECTURE OF THE KERNEL
INTERRUPT SERVICE ROUTINES
SEMAPHORES
MESSAGE QUEUES
PIPES
UNIT-VI INSTRUCTION SETS
INTRODUCTION
PRELIMINARIES
ARM PROCESSOR
SHARC PROCESSOR
UNIT-VII SYSTEM DESIGN TECHNIQUES
DESIGN METHODOLOGIES
REQUIREMENT ANALYSIS
SPECIFICATIONS
SYSTEM ANALYSIS AND ARCHITECURE DESIGN
UNIT-VIII DESIGN EXAMPLES
TELEPHONE PBX
INK JET PRINITER
WATER TANK MONITORING SYSTEM
GPRS
PERSONAL DIGITAL ASSISTANTS
SET TOP BOXES
Unit -I INTRODUCTION
DEFINITION
It can be defined as a computing device that does a specific focused job
They usually consist of a processor plus some special hardware along with embedded software both designed to meet the specific requirements of the application
SPECIAL FEATURES
They do a very specific task can not be programmed to do different things The software in Embedded systems is always fixed
They have to work against some deadlines like specific job has to be completed with in a specific time
Resources at their disposal are limited particularly the memory usually they donrsquot have secondary storage devices Power is another resource of limited availability
They need to be highly reliable and they need to work extreme environmental conditions
Embedded systems targeted to consumer market are very cost sensitive
In the area of embedded systems there is a wide variety of processors and operating systems Selecting appropriate one is a difficult task
APPLICATION AREAS
Consumer appliancesmdashdigital camera digital diary DVD players electronic toys remotes for TV and
microwave oven etc
Office automation mdash copying machine fax machine printer scanner etc
Industrial automaton mdashequipment to measure temp pressure voltage and current robots to avoid
hazardous jobs
Medical electronicsmdashECG EEG Blood pressure measuring devices X-ray scanners equipment used for
colonoscopy endoscopy
Computer networksmdashbridges routers ISDN ATM frame relay switches etc
Telecommunicationsmdashkey telephones ISDN telephones terminal adapters web cameras multiplexers
IP Phone IP gateway IP gate keeper
Wireless technologiesmdashmobile phones base station controllers personal digital assistants palm tops
etc
Instrumentationmdashoscilloscopes spectrum analyzers logic analyzers protocol analyzers etc
Securitymdash Encryption devices and biometric systems security devices at homes offices and airports for
authentication and verification
Financemdashsmart cards ATMs etc
EMBEDDED SYSTEMS OVERVIEW
ES usually consists of custom built hardware woven around a CPU
The custom built hardware also contains memory chips on to which software called firmware is loaded
When represented in layered architecture OS runs over the hardware and application software runs over the OS
EMBEDDED HARDWARE UNITS
Central processing unit
ROM and RAM
Input devices such as sensors AD converters keypad
Output devices such as DA converters LEDs LCD
DEBUG PORT
Communication interface
Power supply unit
EMBEDDED SOFTWARE IN A SYSTEM
EMBEDDED SYSTEMS ON CHIP (SOC)
SoC is an ES on a VLSI chip that has all the necessary analog and digital circuits processors and software A SoC may be embedded with the following components
Embedded processor GPP or ASIP core
Single purpose processing cores or multiple processors
A network bus protocol core
An encryption function unit
DCT for signal processing applications
Memories
PLDs and FPGA cores
Other logic and analog units
An application of SoC is mobile phone
DESIGN PROCESS
In Top down view we start with the most abstract description of the system and conclude with concrete details It consists of
Requirements- it is customers description of the system which they require They may be functional or nonfunctional The second category includes performance cost physical size and weight power consumption etc
Specifications - it serves as the contract between the customer and the architect accurately reflecting the customers requirements In this stage we create a more detailed description of what we want and how the system is expected to behave
Architecture- basically it is a plan for the over all structure of the system that will be used later to design the components that make up the architecture In this stage the aspect of how the system has to be built is addressed and the details of the system internals and components begin to take shape
Components- Here the design of components including both software and hardware modules takes place
System integration-This phase is difficult and challenging because it usually uncovers the problems In this phase the system is tested and if found the bugs are addressed
CLASSIFICATION OF EMBEDDED SYSTEMS
Stand alone ESs
They work in stand along mode they take inputs process them and produce the desired output ESs in automobiles consumer electronic items are examples
Real time systems
ESystems in which specific work has to be done in specific time period are called RT systems
Hard RTSs are systems which require to adhere deadlines strictly
Soft RTSs are systems in which non-adherence to deadlines doesnrsquot lead to catastrophe
Networked information appliances
These are connected to a network and provided with network interfaces and are accessible to LANs and INTERNET Web cameral connected to net is an example
Mobile devices
Mobile phones Personal Digital Assistants smart phones etc are examples for this category
UNIT-II EMBEDDED COMPUTING PLATFORM
CPU BUS
It is a mechanism by which the CPU communicates with memory and devices
A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory
HANDSHAKE
Basic building block of bus protocol is four cycle hand shake
It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive
It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake
The four cycles are
o Device 1 raises its output to signal an enquiry which tells device 2 that it should get
ready to listen for data
o When device 2 is ready to receive it raises its output to signal an acknowledgement At
this point devices 1 and 2 can transmit or receive
o Once the data transfer is complete device 2 lowers its output signaling that it has
received the data
o After seeing that ack has been released device 1 lowers its output
The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer
Major components of the typical bus structure that supports read and write are
o Clock provides synchronization to the bus components
o RW is true when the bus is reading and false when the bus is writing
o Address is an a-bit bundle of signals that transmits the address for an access
o Data is an n-bit bundle of signals that can carry data to or from the CPU
o Data ready signals when the values on the data bundle are valid
Burst transfer hand shaking signals are also used for burst transfers
In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address
One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst
Releasing the burst signal actually tells the device that enough data has been transmitted
Disconnected transfers in these buses the request and response are separate
A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready
DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory
A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below
o Higher speed buses may provide wider data connections
o A high speed bus usually requires more expensive circuits and connectors The
cost of low-speed devices can be held down by using a lower-speed lower-cost bus
o Bridge may allow the buses to operate independently thereby providing some
parallelism in IO operations
ARM Bus
o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by
many vendors
o AMBA Bus supports CPUs memories and peripherals integrated in a system on
silicon(SoS)
o AMBA specification includes two buses AHB AMBA high performance bus and APB
AMBA peripherals bus
o AHB it is optimized for high speed transfers and is directly connected to the CPU
o It supports several high performance features pipelining burst transfers split
transactions and multiple bus masters
o APB it is simple and easy to implement consumes relatively little power
o it assumes that all peripherals act as slaves simplifying the logic required in both the
peripherals and the bus controller
o It does not perform pipelined operations which simplifies the bus logic
SHARC Bus
o It contains both program and data memory on-chip
o There of two external interfaces of interest the external memory interface and the host
interface
o The external memory interface allows the SHARC to address up to four gigawords of
external memory which can hold either instructions or data
o The external data bus can vary in width from 16 to 48 bits depending upon the type of
memory access
o Different units of in the processor have different amounts of access to external memory
DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words
o The external memory is divided into four banks of equal size The memory above the
banks is known as unbanked external memory
o Host interface is used to connect the SHARC to standard microprocessor bus The host
interface implements a typical handshake for bus granting
o The SHARC includes an on-board DMA controller as part of IO processor It performs
external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional
o Each DMA channel has its own interrupt and the DMA controller supports chained
transfers also
MEMORY DEVICES
Important types of memories are RAMs and ROMs
Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM
o SRAM is faster than DRAM
o SRAM consumes more power than DRAM
o More DRAM can be put on a single chip
o DRAM values must be periodically refreshed
Static RAM has four inputs
o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled
and when CErsquo=0 the data pins are enabled
o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or
write to (RWrsquo=0) RAM
o Adrs specifies the address for the read or write
o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins
are outputs and when RWrsquo=0 the pins are inputs
DRAMs inputs and refresh
o They have two inputs in addition to the inputs of static RAM They are row
address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address
o DRAMs must be refreshed because they store values which can leak away A
single refresh request can refresh an entire row of the DRAM
o CAS before RAS refresh
it is a special quick refresh mode
This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0
It causes the current memory row get refreshed and the corresponding counter updated
Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate
Page mode
o Developed to improve the performance of DRAM
o Useful to access several locations in the same region of the memory
o In page mode access one row address and several column addresses are supplied
o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive
column addresses
o It is typically supported for both reads and writes
o EDO extended data out is an improved version of page mode Here the data are
held valid until the falling edge of CASrsquo rather than its rising edge as in page mode
Synchronous DRAMs
o It is developed to improve the performance of the DRAMs by introducing a clock
o Changes to input and outputs of the DRAM occur on clock edges
RAMs for specialized applications
o Video RAM
o RAMBUS
Video RAM
o It is designed to speed up video operations
o It includes a standard parallel interface as well as a serial interface
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 3: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/3.jpg)
They usually consist of a processor plus some special hardware along with embedded software both designed to meet the specific requirements of the application
SPECIAL FEATURES
They do a very specific task can not be programmed to do different things The software in Embedded systems is always fixed
They have to work against some deadlines like specific job has to be completed with in a specific time
Resources at their disposal are limited particularly the memory usually they donrsquot have secondary storage devices Power is another resource of limited availability
They need to be highly reliable and they need to work extreme environmental conditions
Embedded systems targeted to consumer market are very cost sensitive
In the area of embedded systems there is a wide variety of processors and operating systems Selecting appropriate one is a difficult task
APPLICATION AREAS
Consumer appliancesmdashdigital camera digital diary DVD players electronic toys remotes for TV and
microwave oven etc
Office automation mdash copying machine fax machine printer scanner etc
Industrial automaton mdashequipment to measure temp pressure voltage and current robots to avoid
hazardous jobs
Medical electronicsmdashECG EEG Blood pressure measuring devices X-ray scanners equipment used for
colonoscopy endoscopy
Computer networksmdashbridges routers ISDN ATM frame relay switches etc
Telecommunicationsmdashkey telephones ISDN telephones terminal adapters web cameras multiplexers
IP Phone IP gateway IP gate keeper
Wireless technologiesmdashmobile phones base station controllers personal digital assistants palm tops
etc
Instrumentationmdashoscilloscopes spectrum analyzers logic analyzers protocol analyzers etc
Securitymdash Encryption devices and biometric systems security devices at homes offices and airports for
authentication and verification
Financemdashsmart cards ATMs etc
EMBEDDED SYSTEMS OVERVIEW
ES usually consists of custom built hardware woven around a CPU
The custom built hardware also contains memory chips on to which software called firmware is loaded
When represented in layered architecture OS runs over the hardware and application software runs over the OS
EMBEDDED HARDWARE UNITS
Central processing unit
ROM and RAM
Input devices such as sensors AD converters keypad
Output devices such as DA converters LEDs LCD
DEBUG PORT
Communication interface
Power supply unit
EMBEDDED SOFTWARE IN A SYSTEM
EMBEDDED SYSTEMS ON CHIP (SOC)
SoC is an ES on a VLSI chip that has all the necessary analog and digital circuits processors and software A SoC may be embedded with the following components
Embedded processor GPP or ASIP core
Single purpose processing cores or multiple processors
A network bus protocol core
An encryption function unit
DCT for signal processing applications
Memories
PLDs and FPGA cores
Other logic and analog units
An application of SoC is mobile phone
DESIGN PROCESS
In Top down view we start with the most abstract description of the system and conclude with concrete details It consists of
Requirements- it is customers description of the system which they require They may be functional or nonfunctional The second category includes performance cost physical size and weight power consumption etc
Specifications - it serves as the contract between the customer and the architect accurately reflecting the customers requirements In this stage we create a more detailed description of what we want and how the system is expected to behave
Architecture- basically it is a plan for the over all structure of the system that will be used later to design the components that make up the architecture In this stage the aspect of how the system has to be built is addressed and the details of the system internals and components begin to take shape
Components- Here the design of components including both software and hardware modules takes place
System integration-This phase is difficult and challenging because it usually uncovers the problems In this phase the system is tested and if found the bugs are addressed
CLASSIFICATION OF EMBEDDED SYSTEMS
Stand alone ESs
They work in stand along mode they take inputs process them and produce the desired output ESs in automobiles consumer electronic items are examples
Real time systems
ESystems in which specific work has to be done in specific time period are called RT systems
Hard RTSs are systems which require to adhere deadlines strictly
Soft RTSs are systems in which non-adherence to deadlines doesnrsquot lead to catastrophe
Networked information appliances
These are connected to a network and provided with network interfaces and are accessible to LANs and INTERNET Web cameral connected to net is an example
Mobile devices
Mobile phones Personal Digital Assistants smart phones etc are examples for this category
UNIT-II EMBEDDED COMPUTING PLATFORM
CPU BUS
It is a mechanism by which the CPU communicates with memory and devices
A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory
HANDSHAKE
Basic building block of bus protocol is four cycle hand shake
It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive
It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake
The four cycles are
o Device 1 raises its output to signal an enquiry which tells device 2 that it should get
ready to listen for data
o When device 2 is ready to receive it raises its output to signal an acknowledgement At
this point devices 1 and 2 can transmit or receive
o Once the data transfer is complete device 2 lowers its output signaling that it has
received the data
o After seeing that ack has been released device 1 lowers its output
The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer
Major components of the typical bus structure that supports read and write are
o Clock provides synchronization to the bus components
o RW is true when the bus is reading and false when the bus is writing
o Address is an a-bit bundle of signals that transmits the address for an access
o Data is an n-bit bundle of signals that can carry data to or from the CPU
o Data ready signals when the values on the data bundle are valid
Burst transfer hand shaking signals are also used for burst transfers
In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address
One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst
Releasing the burst signal actually tells the device that enough data has been transmitted
Disconnected transfers in these buses the request and response are separate
A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready
DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory
A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below
o Higher speed buses may provide wider data connections
o A high speed bus usually requires more expensive circuits and connectors The
cost of low-speed devices can be held down by using a lower-speed lower-cost bus
o Bridge may allow the buses to operate independently thereby providing some
parallelism in IO operations
ARM Bus
o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by
many vendors
o AMBA Bus supports CPUs memories and peripherals integrated in a system on
silicon(SoS)
o AMBA specification includes two buses AHB AMBA high performance bus and APB
AMBA peripherals bus
o AHB it is optimized for high speed transfers and is directly connected to the CPU
o It supports several high performance features pipelining burst transfers split
transactions and multiple bus masters
o APB it is simple and easy to implement consumes relatively little power
o it assumes that all peripherals act as slaves simplifying the logic required in both the
peripherals and the bus controller
o It does not perform pipelined operations which simplifies the bus logic
SHARC Bus
o It contains both program and data memory on-chip
o There of two external interfaces of interest the external memory interface and the host
interface
o The external memory interface allows the SHARC to address up to four gigawords of
external memory which can hold either instructions or data
o The external data bus can vary in width from 16 to 48 bits depending upon the type of
memory access
o Different units of in the processor have different amounts of access to external memory
DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words
o The external memory is divided into four banks of equal size The memory above the
banks is known as unbanked external memory
o Host interface is used to connect the SHARC to standard microprocessor bus The host
interface implements a typical handshake for bus granting
o The SHARC includes an on-board DMA controller as part of IO processor It performs
external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional
o Each DMA channel has its own interrupt and the DMA controller supports chained
transfers also
MEMORY DEVICES
Important types of memories are RAMs and ROMs
Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM
o SRAM is faster than DRAM
o SRAM consumes more power than DRAM
o More DRAM can be put on a single chip
o DRAM values must be periodically refreshed
Static RAM has four inputs
o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled
and when CErsquo=0 the data pins are enabled
o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or
write to (RWrsquo=0) RAM
o Adrs specifies the address for the read or write
o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins
are outputs and when RWrsquo=0 the pins are inputs
DRAMs inputs and refresh
o They have two inputs in addition to the inputs of static RAM They are row
address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address
o DRAMs must be refreshed because they store values which can leak away A
single refresh request can refresh an entire row of the DRAM
o CAS before RAS refresh
it is a special quick refresh mode
This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0
It causes the current memory row get refreshed and the corresponding counter updated
Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate
Page mode
o Developed to improve the performance of DRAM
o Useful to access several locations in the same region of the memory
o In page mode access one row address and several column addresses are supplied
o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive
column addresses
o It is typically supported for both reads and writes
o EDO extended data out is an improved version of page mode Here the data are
held valid until the falling edge of CASrsquo rather than its rising edge as in page mode
Synchronous DRAMs
o It is developed to improve the performance of the DRAMs by introducing a clock
o Changes to input and outputs of the DRAM occur on clock edges
RAMs for specialized applications
o Video RAM
o RAMBUS
Video RAM
o It is designed to speed up video operations
o It includes a standard parallel interface as well as a serial interface
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 4: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/4.jpg)
authentication and verification
Financemdashsmart cards ATMs etc
EMBEDDED SYSTEMS OVERVIEW
ES usually consists of custom built hardware woven around a CPU
The custom built hardware also contains memory chips on to which software called firmware is loaded
When represented in layered architecture OS runs over the hardware and application software runs over the OS
EMBEDDED HARDWARE UNITS
Central processing unit
ROM and RAM
Input devices such as sensors AD converters keypad
Output devices such as DA converters LEDs LCD
DEBUG PORT
Communication interface
Power supply unit
EMBEDDED SOFTWARE IN A SYSTEM
EMBEDDED SYSTEMS ON CHIP (SOC)
SoC is an ES on a VLSI chip that has all the necessary analog and digital circuits processors and software A SoC may be embedded with the following components
Embedded processor GPP or ASIP core
Single purpose processing cores or multiple processors
A network bus protocol core
An encryption function unit
DCT for signal processing applications
Memories
PLDs and FPGA cores
Other logic and analog units
An application of SoC is mobile phone
DESIGN PROCESS
In Top down view we start with the most abstract description of the system and conclude with concrete details It consists of
Requirements- it is customers description of the system which they require They may be functional or nonfunctional The second category includes performance cost physical size and weight power consumption etc
Specifications - it serves as the contract between the customer and the architect accurately reflecting the customers requirements In this stage we create a more detailed description of what we want and how the system is expected to behave
Architecture- basically it is a plan for the over all structure of the system that will be used later to design the components that make up the architecture In this stage the aspect of how the system has to be built is addressed and the details of the system internals and components begin to take shape
Components- Here the design of components including both software and hardware modules takes place
System integration-This phase is difficult and challenging because it usually uncovers the problems In this phase the system is tested and if found the bugs are addressed
CLASSIFICATION OF EMBEDDED SYSTEMS
Stand alone ESs
They work in stand along mode they take inputs process them and produce the desired output ESs in automobiles consumer electronic items are examples
Real time systems
ESystems in which specific work has to be done in specific time period are called RT systems
Hard RTSs are systems which require to adhere deadlines strictly
Soft RTSs are systems in which non-adherence to deadlines doesnrsquot lead to catastrophe
Networked information appliances
These are connected to a network and provided with network interfaces and are accessible to LANs and INTERNET Web cameral connected to net is an example
Mobile devices
Mobile phones Personal Digital Assistants smart phones etc are examples for this category
UNIT-II EMBEDDED COMPUTING PLATFORM
CPU BUS
It is a mechanism by which the CPU communicates with memory and devices
A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory
HANDSHAKE
Basic building block of bus protocol is four cycle hand shake
It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive
It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake
The four cycles are
o Device 1 raises its output to signal an enquiry which tells device 2 that it should get
ready to listen for data
o When device 2 is ready to receive it raises its output to signal an acknowledgement At
this point devices 1 and 2 can transmit or receive
o Once the data transfer is complete device 2 lowers its output signaling that it has
received the data
o After seeing that ack has been released device 1 lowers its output
The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer
Major components of the typical bus structure that supports read and write are
o Clock provides synchronization to the bus components
o RW is true when the bus is reading and false when the bus is writing
o Address is an a-bit bundle of signals that transmits the address for an access
o Data is an n-bit bundle of signals that can carry data to or from the CPU
o Data ready signals when the values on the data bundle are valid
Burst transfer hand shaking signals are also used for burst transfers
In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address
One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst
Releasing the burst signal actually tells the device that enough data has been transmitted
Disconnected transfers in these buses the request and response are separate
A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready
DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory
A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below
o Higher speed buses may provide wider data connections
o A high speed bus usually requires more expensive circuits and connectors The
cost of low-speed devices can be held down by using a lower-speed lower-cost bus
o Bridge may allow the buses to operate independently thereby providing some
parallelism in IO operations
ARM Bus
o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by
many vendors
o AMBA Bus supports CPUs memories and peripherals integrated in a system on
silicon(SoS)
o AMBA specification includes two buses AHB AMBA high performance bus and APB
AMBA peripherals bus
o AHB it is optimized for high speed transfers and is directly connected to the CPU
o It supports several high performance features pipelining burst transfers split
transactions and multiple bus masters
o APB it is simple and easy to implement consumes relatively little power
o it assumes that all peripherals act as slaves simplifying the logic required in both the
peripherals and the bus controller
o It does not perform pipelined operations which simplifies the bus logic
SHARC Bus
o It contains both program and data memory on-chip
o There of two external interfaces of interest the external memory interface and the host
interface
o The external memory interface allows the SHARC to address up to four gigawords of
external memory which can hold either instructions or data
o The external data bus can vary in width from 16 to 48 bits depending upon the type of
memory access
o Different units of in the processor have different amounts of access to external memory
DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words
o The external memory is divided into four banks of equal size The memory above the
banks is known as unbanked external memory
o Host interface is used to connect the SHARC to standard microprocessor bus The host
interface implements a typical handshake for bus granting
o The SHARC includes an on-board DMA controller as part of IO processor It performs
external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional
o Each DMA channel has its own interrupt and the DMA controller supports chained
transfers also
MEMORY DEVICES
Important types of memories are RAMs and ROMs
Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM
o SRAM is faster than DRAM
o SRAM consumes more power than DRAM
o More DRAM can be put on a single chip
o DRAM values must be periodically refreshed
Static RAM has four inputs
o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled
and when CErsquo=0 the data pins are enabled
o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or
write to (RWrsquo=0) RAM
o Adrs specifies the address for the read or write
o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins
are outputs and when RWrsquo=0 the pins are inputs
DRAMs inputs and refresh
o They have two inputs in addition to the inputs of static RAM They are row
address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address
o DRAMs must be refreshed because they store values which can leak away A
single refresh request can refresh an entire row of the DRAM
o CAS before RAS refresh
it is a special quick refresh mode
This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0
It causes the current memory row get refreshed and the corresponding counter updated
Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate
Page mode
o Developed to improve the performance of DRAM
o Useful to access several locations in the same region of the memory
o In page mode access one row address and several column addresses are supplied
o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive
column addresses
o It is typically supported for both reads and writes
o EDO extended data out is an improved version of page mode Here the data are
held valid until the falling edge of CASrsquo rather than its rising edge as in page mode
Synchronous DRAMs
o It is developed to improve the performance of the DRAMs by introducing a clock
o Changes to input and outputs of the DRAM occur on clock edges
RAMs for specialized applications
o Video RAM
o RAMBUS
Video RAM
o It is designed to speed up video operations
o It includes a standard parallel interface as well as a serial interface
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 5: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/5.jpg)
An application of SoC is mobile phone
DESIGN PROCESS
In Top down view we start with the most abstract description of the system and conclude with concrete details It consists of
Requirements- it is customers description of the system which they require They may be functional or nonfunctional The second category includes performance cost physical size and weight power consumption etc
Specifications - it serves as the contract between the customer and the architect accurately reflecting the customers requirements In this stage we create a more detailed description of what we want and how the system is expected to behave
Architecture- basically it is a plan for the over all structure of the system that will be used later to design the components that make up the architecture In this stage the aspect of how the system has to be built is addressed and the details of the system internals and components begin to take shape
Components- Here the design of components including both software and hardware modules takes place
System integration-This phase is difficult and challenging because it usually uncovers the problems In this phase the system is tested and if found the bugs are addressed
CLASSIFICATION OF EMBEDDED SYSTEMS
Stand alone ESs
They work in stand along mode they take inputs process them and produce the desired output ESs in automobiles consumer electronic items are examples
Real time systems
ESystems in which specific work has to be done in specific time period are called RT systems
Hard RTSs are systems which require to adhere deadlines strictly
Soft RTSs are systems in which non-adherence to deadlines doesnrsquot lead to catastrophe
Networked information appliances
These are connected to a network and provided with network interfaces and are accessible to LANs and INTERNET Web cameral connected to net is an example
Mobile devices
Mobile phones Personal Digital Assistants smart phones etc are examples for this category
UNIT-II EMBEDDED COMPUTING PLATFORM
CPU BUS
It is a mechanism by which the CPU communicates with memory and devices
A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory
HANDSHAKE
Basic building block of bus protocol is four cycle hand shake
It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive
It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake
The four cycles are
o Device 1 raises its output to signal an enquiry which tells device 2 that it should get
ready to listen for data
o When device 2 is ready to receive it raises its output to signal an acknowledgement At
this point devices 1 and 2 can transmit or receive
o Once the data transfer is complete device 2 lowers its output signaling that it has
received the data
o After seeing that ack has been released device 1 lowers its output
The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer
Major components of the typical bus structure that supports read and write are
o Clock provides synchronization to the bus components
o RW is true when the bus is reading and false when the bus is writing
o Address is an a-bit bundle of signals that transmits the address for an access
o Data is an n-bit bundle of signals that can carry data to or from the CPU
o Data ready signals when the values on the data bundle are valid
Burst transfer hand shaking signals are also used for burst transfers
In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address
One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst
Releasing the burst signal actually tells the device that enough data has been transmitted
Disconnected transfers in these buses the request and response are separate
A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready
DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory
A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below
o Higher speed buses may provide wider data connections
o A high speed bus usually requires more expensive circuits and connectors The
cost of low-speed devices can be held down by using a lower-speed lower-cost bus
o Bridge may allow the buses to operate independently thereby providing some
parallelism in IO operations
ARM Bus
o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by
many vendors
o AMBA Bus supports CPUs memories and peripherals integrated in a system on
silicon(SoS)
o AMBA specification includes two buses AHB AMBA high performance bus and APB
AMBA peripherals bus
o AHB it is optimized for high speed transfers and is directly connected to the CPU
o It supports several high performance features pipelining burst transfers split
transactions and multiple bus masters
o APB it is simple and easy to implement consumes relatively little power
o it assumes that all peripherals act as slaves simplifying the logic required in both the
peripherals and the bus controller
o It does not perform pipelined operations which simplifies the bus logic
SHARC Bus
o It contains both program and data memory on-chip
o There of two external interfaces of interest the external memory interface and the host
interface
o The external memory interface allows the SHARC to address up to four gigawords of
external memory which can hold either instructions or data
o The external data bus can vary in width from 16 to 48 bits depending upon the type of
memory access
o Different units of in the processor have different amounts of access to external memory
DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words
o The external memory is divided into four banks of equal size The memory above the
banks is known as unbanked external memory
o Host interface is used to connect the SHARC to standard microprocessor bus The host
interface implements a typical handshake for bus granting
o The SHARC includes an on-board DMA controller as part of IO processor It performs
external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional
o Each DMA channel has its own interrupt and the DMA controller supports chained
transfers also
MEMORY DEVICES
Important types of memories are RAMs and ROMs
Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM
o SRAM is faster than DRAM
o SRAM consumes more power than DRAM
o More DRAM can be put on a single chip
o DRAM values must be periodically refreshed
Static RAM has four inputs
o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled
and when CErsquo=0 the data pins are enabled
o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or
write to (RWrsquo=0) RAM
o Adrs specifies the address for the read or write
o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins
are outputs and when RWrsquo=0 the pins are inputs
DRAMs inputs and refresh
o They have two inputs in addition to the inputs of static RAM They are row
address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address
o DRAMs must be refreshed because they store values which can leak away A
single refresh request can refresh an entire row of the DRAM
o CAS before RAS refresh
it is a special quick refresh mode
This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0
It causes the current memory row get refreshed and the corresponding counter updated
Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate
Page mode
o Developed to improve the performance of DRAM
o Useful to access several locations in the same region of the memory
o In page mode access one row address and several column addresses are supplied
o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive
column addresses
o It is typically supported for both reads and writes
o EDO extended data out is an improved version of page mode Here the data are
held valid until the falling edge of CASrsquo rather than its rising edge as in page mode
Synchronous DRAMs
o It is developed to improve the performance of the DRAMs by introducing a clock
o Changes to input and outputs of the DRAM occur on clock edges
RAMs for specialized applications
o Video RAM
o RAMBUS
Video RAM
o It is designed to speed up video operations
o It includes a standard parallel interface as well as a serial interface
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 6: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/6.jpg)
Mobile devices
Mobile phones Personal Digital Assistants smart phones etc are examples for this category
UNIT-II EMBEDDED COMPUTING PLATFORM
CPU BUS
It is a mechanism by which the CPU communicates with memory and devices
A bus is at a minimum a collection of wires but it also defines a protocol by which the CPU memory and devices communicate One of the major roles of bus is to provide an interface to memory
HANDSHAKE
Basic building block of bus protocol is four cycle hand shake
It ensures when two devices want to communicate one is ready to transmit and the other is ready to receive
It uses a pair of dedicated wires to handshake enq and ack Extra wires are used for data transmission during the handshake
The four cycles are
o Device 1 raises its output to signal an enquiry which tells device 2 that it should get
ready to listen for data
o When device 2 is ready to receive it raises its output to signal an acknowledgement At
this point devices 1 and 2 can transmit or receive
o Once the data transfer is complete device 2 lowers its output signaling that it has
received the data
o After seeing that ack has been released device 1 lowers its output
The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer
Major components of the typical bus structure that supports read and write are
o Clock provides synchronization to the bus components
o RW is true when the bus is reading and false when the bus is writing
o Address is an a-bit bundle of signals that transmits the address for an access
o Data is an n-bit bundle of signals that can carry data to or from the CPU
o Data ready signals when the values on the data bundle are valid
Burst transfer hand shaking signals are also used for burst transfers
In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address
One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst
Releasing the burst signal actually tells the device that enough data has been transmitted
Disconnected transfers in these buses the request and response are separate
A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready
DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory
A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below
o Higher speed buses may provide wider data connections
o A high speed bus usually requires more expensive circuits and connectors The
cost of low-speed devices can be held down by using a lower-speed lower-cost bus
o Bridge may allow the buses to operate independently thereby providing some
parallelism in IO operations
ARM Bus
o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by
many vendors
o AMBA Bus supports CPUs memories and peripherals integrated in a system on
silicon(SoS)
o AMBA specification includes two buses AHB AMBA high performance bus and APB
AMBA peripherals bus
o AHB it is optimized for high speed transfers and is directly connected to the CPU
o It supports several high performance features pipelining burst transfers split
transactions and multiple bus masters
o APB it is simple and easy to implement consumes relatively little power
o it assumes that all peripherals act as slaves simplifying the logic required in both the
peripherals and the bus controller
o It does not perform pipelined operations which simplifies the bus logic
SHARC Bus
o It contains both program and data memory on-chip
o There of two external interfaces of interest the external memory interface and the host
interface
o The external memory interface allows the SHARC to address up to four gigawords of
external memory which can hold either instructions or data
o The external data bus can vary in width from 16 to 48 bits depending upon the type of
memory access
o Different units of in the processor have different amounts of access to external memory
DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words
o The external memory is divided into four banks of equal size The memory above the
banks is known as unbanked external memory
o Host interface is used to connect the SHARC to standard microprocessor bus The host
interface implements a typical handshake for bus granting
o The SHARC includes an on-board DMA controller as part of IO processor It performs
external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional
o Each DMA channel has its own interrupt and the DMA controller supports chained
transfers also
MEMORY DEVICES
Important types of memories are RAMs and ROMs
Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM
o SRAM is faster than DRAM
o SRAM consumes more power than DRAM
o More DRAM can be put on a single chip
o DRAM values must be periodically refreshed
Static RAM has four inputs
o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled
and when CErsquo=0 the data pins are enabled
o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or
write to (RWrsquo=0) RAM
o Adrs specifies the address for the read or write
o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins
are outputs and when RWrsquo=0 the pins are inputs
DRAMs inputs and refresh
o They have two inputs in addition to the inputs of static RAM They are row
address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address
o DRAMs must be refreshed because they store values which can leak away A
single refresh request can refresh an entire row of the DRAM
o CAS before RAS refresh
it is a special quick refresh mode
This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0
It causes the current memory row get refreshed and the corresponding counter updated
Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate
Page mode
o Developed to improve the performance of DRAM
o Useful to access several locations in the same region of the memory
o In page mode access one row address and several column addresses are supplied
o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive
column addresses
o It is typically supported for both reads and writes
o EDO extended data out is an improved version of page mode Here the data are
held valid until the falling edge of CASrsquo rather than its rising edge as in page mode
Synchronous DRAMs
o It is developed to improve the performance of the DRAMs by introducing a clock
o Changes to input and outputs of the DRAM occur on clock edges
RAMs for specialized applications
o Video RAM
o RAMBUS
Video RAM
o It is designed to speed up video operations
o It includes a standard parallel interface as well as a serial interface
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 7: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/7.jpg)
The fundamental bus operations are reading and writing All transfers on the basic bus are controlled by CPU- the CPU can read or write a device or memory but devices or memory cannot initiate a transfer
Major components of the typical bus structure that supports read and write are
o Clock provides synchronization to the bus components
o RW is true when the bus is reading and false when the bus is writing
o Address is an a-bit bundle of signals that transmits the address for an access
o Data is an n-bit bundle of signals that can carry data to or from the CPU
o Data ready signals when the values on the data bundle are valid
Burst transfer hand shaking signals are also used for burst transfers
In the burst read transaction the CPU sends one address but receives a sequence of data values The data values come from successive memory locations starting at the given address
One extra line is added to the bus called burstrsquo which signals when a transaction is actually a burst
Releasing the burst signal actually tells the device that enough data has been transmitted
Disconnected transfers in these buses the request and response are separate
A first operation requests the transfer The bus can then be used for other operations The transfer is completed later when the data are ready
DMA direct memory access is a bus operation that allows reads and writes not controlled by CPU A DMA transfer is controlled by DMA controller which requests control of the bus from the CPU After gaining the control the DNA controller performs read and write operations directly between devices and memory
A microprocessor system often has more than one bus with high speed devices connected to a high performance bus and lower speed devices connected to a different bus A small block of logic known as bridge connects to busses to each other Three reasons to do this are summarized below
o Higher speed buses may provide wider data connections
o A high speed bus usually requires more expensive circuits and connectors The
cost of low-speed devices can be held down by using a lower-speed lower-cost bus
o Bridge may allow the buses to operate independently thereby providing some
parallelism in IO operations
ARM Bus
o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by
many vendors
o AMBA Bus supports CPUs memories and peripherals integrated in a system on
silicon(SoS)
o AMBA specification includes two buses AHB AMBA high performance bus and APB
AMBA peripherals bus
o AHB it is optimized for high speed transfers and is directly connected to the CPU
o It supports several high performance features pipelining burst transfers split
transactions and multiple bus masters
o APB it is simple and easy to implement consumes relatively little power
o it assumes that all peripherals act as slaves simplifying the logic required in both the
peripherals and the bus controller
o It does not perform pipelined operations which simplifies the bus logic
SHARC Bus
o It contains both program and data memory on-chip
o There of two external interfaces of interest the external memory interface and the host
interface
o The external memory interface allows the SHARC to address up to four gigawords of
external memory which can hold either instructions or data
o The external data bus can vary in width from 16 to 48 bits depending upon the type of
memory access
o Different units of in the processor have different amounts of access to external memory
DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words
o The external memory is divided into four banks of equal size The memory above the
banks is known as unbanked external memory
o Host interface is used to connect the SHARC to standard microprocessor bus The host
interface implements a typical handshake for bus granting
o The SHARC includes an on-board DMA controller as part of IO processor It performs
external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional
o Each DMA channel has its own interrupt and the DMA controller supports chained
transfers also
MEMORY DEVICES
Important types of memories are RAMs and ROMs
Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM
o SRAM is faster than DRAM
o SRAM consumes more power than DRAM
o More DRAM can be put on a single chip
o DRAM values must be periodically refreshed
Static RAM has four inputs
o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled
and when CErsquo=0 the data pins are enabled
o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or
write to (RWrsquo=0) RAM
o Adrs specifies the address for the read or write
o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins
are outputs and when RWrsquo=0 the pins are inputs
DRAMs inputs and refresh
o They have two inputs in addition to the inputs of static RAM They are row
address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address
o DRAMs must be refreshed because they store values which can leak away A
single refresh request can refresh an entire row of the DRAM
o CAS before RAS refresh
it is a special quick refresh mode
This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0
It causes the current memory row get refreshed and the corresponding counter updated
Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate
Page mode
o Developed to improve the performance of DRAM
o Useful to access several locations in the same region of the memory
o In page mode access one row address and several column addresses are supplied
o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive
column addresses
o It is typically supported for both reads and writes
o EDO extended data out is an improved version of page mode Here the data are
held valid until the falling edge of CASrsquo rather than its rising edge as in page mode
Synchronous DRAMs
o It is developed to improve the performance of the DRAMs by introducing a clock
o Changes to input and outputs of the DRAM occur on clock edges
RAMs for specialized applications
o Video RAM
o RAMBUS
Video RAM
o It is designed to speed up video operations
o It includes a standard parallel interface as well as a serial interface
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 8: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/8.jpg)
o Bridge may allow the buses to operate independently thereby providing some
parallelism in IO operations
ARM Bus
o ARM Bus provided off-chip vary from chip to chip as ARM CPU is manufactured by
many vendors
o AMBA Bus supports CPUs memories and peripherals integrated in a system on
silicon(SoS)
o AMBA specification includes two buses AHB AMBA high performance bus and APB
AMBA peripherals bus
o AHB it is optimized for high speed transfers and is directly connected to the CPU
o It supports several high performance features pipelining burst transfers split
transactions and multiple bus masters
o APB it is simple and easy to implement consumes relatively little power
o it assumes that all peripherals act as slaves simplifying the logic required in both the
peripherals and the bus controller
o It does not perform pipelined operations which simplifies the bus logic
SHARC Bus
o It contains both program and data memory on-chip
o There of two external interfaces of interest the external memory interface and the host
interface
o The external memory interface allows the SHARC to address up to four gigawords of
external memory which can hold either instructions or data
o The external data bus can vary in width from 16 to 48 bits depending upon the type of
memory access
o Different units of in the processor have different amounts of access to external memory
DM bus and IO processor can access the entire external address space while the PM address bus can access only 12 mega words
o The external memory is divided into four banks of equal size The memory above the
banks is known as unbanked external memory
o Host interface is used to connect the SHARC to standard microprocessor bus The host
interface implements a typical handshake for bus granting
o The SHARC includes an on-board DMA controller as part of IO processor It performs
external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional
o Each DMA channel has its own interrupt and the DMA controller supports chained
transfers also
MEMORY DEVICES
Important types of memories are RAMs and ROMs
Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM
o SRAM is faster than DRAM
o SRAM consumes more power than DRAM
o More DRAM can be put on a single chip
o DRAM values must be periodically refreshed
Static RAM has four inputs
o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled
and when CErsquo=0 the data pins are enabled
o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or
write to (RWrsquo=0) RAM
o Adrs specifies the address for the read or write
o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins
are outputs and when RWrsquo=0 the pins are inputs
DRAMs inputs and refresh
o They have two inputs in addition to the inputs of static RAM They are row
address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address
o DRAMs must be refreshed because they store values which can leak away A
single refresh request can refresh an entire row of the DRAM
o CAS before RAS refresh
it is a special quick refresh mode
This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0
It causes the current memory row get refreshed and the corresponding counter updated
Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate
Page mode
o Developed to improve the performance of DRAM
o Useful to access several locations in the same region of the memory
o In page mode access one row address and several column addresses are supplied
o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive
column addresses
o It is typically supported for both reads and writes
o EDO extended data out is an improved version of page mode Here the data are
held valid until the falling edge of CASrsquo rather than its rising edge as in page mode
Synchronous DRAMs
o It is developed to improve the performance of the DRAMs by introducing a clock
o Changes to input and outputs of the DRAM occur on clock edges
RAMs for specialized applications
o Video RAM
o RAMBUS
Video RAM
o It is designed to speed up video operations
o It includes a standard parallel interface as well as a serial interface
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 9: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/9.jpg)
o Host interface is used to connect the SHARC to standard microprocessor bus The host
interface implements a typical handshake for bus granting
o The SHARC includes an on-board DMA controller as part of IO processor It performs
external port block data transfers and data transfers on the link and serial ports It has ten channels the external and link port DMA channels can be used for bidirectional transfers while the serial port DMA channels are unidirectional
o Each DMA channel has its own interrupt and the DMA controller supports chained
transfers also
MEMORY DEVICES
Important types of memories are RAMs and ROMs
Random access memories can be both read and written Two major categories are static RAM SRAM and dynamic RAM DRAM
o SRAM is faster than DRAM
o SRAM consumes more power than DRAM
o More DRAM can be put on a single chip
o DRAM values must be periodically refreshed
Static RAM has four inputs
o CErsquo is the chip enable input When CErsquo=1 the SRAMrsquos data pins are disabled
and when CErsquo=0 the data pins are enabled
o RWrsquo controls whether the current operation is a read(RWrsquo=1)from RAM or
write to (RWrsquo=0) RAM
o Adrs specifies the address for the read or write
o Data is a bidirectional bundle of signals for data transfer When RWrsquo=1 the pins
are outputs and when RWrsquo=0 the pins are inputs
DRAMs inputs and refresh
o They have two inputs in addition to the inputs of static RAM They are row
address select(RASrsquo) and column address select(CASrsquo)are designed to minimize the number of required pins These signals are needed because address lines are provided for only half the address
o DRAMs must be refreshed because they store values which can leak away A
single refresh request can refresh an entire row of the DRAM
o CAS before RAS refresh
it is a special quick refresh mode
This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0
It causes the current memory row get refreshed and the corresponding counter updated
Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate
Page mode
o Developed to improve the performance of DRAM
o Useful to access several locations in the same region of the memory
o In page mode access one row address and several column addresses are supplied
o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive
column addresses
o It is typically supported for both reads and writes
o EDO extended data out is an improved version of page mode Here the data are
held valid until the falling edge of CASrsquo rather than its rising edge as in page mode
Synchronous DRAMs
o It is developed to improve the performance of the DRAMs by introducing a clock
o Changes to input and outputs of the DRAM occur on clock edges
RAMs for specialized applications
o Video RAM
o RAMBUS
Video RAM
o It is designed to speed up video operations
o It includes a standard parallel interface as well as a serial interface
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 10: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/10.jpg)
o DRAMs must be refreshed because they store values which can leak away A
single refresh request can refresh an entire row of the DRAM
o CAS before RAS refresh
it is a special quick refresh mode
This mode is initiated by setting CASrsquo to 0 first then RASrsquo to 0
It causes the current memory row get refreshed and the corresponding counter updated
Memory controller is a logic circuit that is external to DRAM and performs CAS before RAS refresh to the entire memory at the required rate
Page mode
o Developed to improve the performance of DRAM
o Useful to access several locations in the same region of the memory
o In page mode access one row address and several column addresses are supplied
o RASrsquo is held down while CASrsquo is strobed to signal the arrival of successive
column addresses
o It is typically supported for both reads and writes
o EDO extended data out is an improved version of page mode Here the data are
held valid until the falling edge of CASrsquo rather than its rising edge as in page mode
Synchronous DRAMs
o It is developed to improve the performance of the DRAMs by introducing a clock
o Changes to input and outputs of the DRAM occur on clock edges
RAMs for specialized applications
o Video RAM
o RAMBUS
Video RAM
o It is designed to speed up video operations
o It includes a standard parallel interface as well as a serial interface
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 11: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/11.jpg)
o Typically the serial interface is connected to a video display while the parallel
interface to the microprocessor
RAMBUS
o It is high performance at a relatively low cost
o It has multiple memory banks that can be addressed in parallel
o It has separate data and control buses
o It is capable of sustained data rates well above 1Gbytessec
ROMs are
o programmed with fixed data
o very useful in embedded systems since a great deal of the code and perhaps
some data does not change over time
o less sensitive to radiation induced errors
Varieties of ROMs
o Factory (or mask)programmed ROMs and field programmed ROMs
o Factory programming is useful only when ROMs are installed in some quantity
Field programmable ROMs are programmed in the laboratory using ROM burners
o Field programmable ROMs can be of two types antifuse-programmable ROM
which is programmable only once and UV-erasable PROM which can be erased and using ultra violet light and then reprogrammed
o Flash memory
It is modern form of electrically erasable PROM(EEPROM) developed in the late 1980s
While using floating gate principle it is designed such that large blocks of memory can be erased all at once
It uses standard system voltage for erasing and programming allowing programming in a typical system
Early flash memories had to be erased in their entirety but modern devices allow the memory to be erased in blocks an advantage
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 12: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/12.jpg)
Its fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in nonvolatile memory systems like digital cameras TV set-top boxes cell phones and medical monitoring equipment
Its draw back is that writing a single word in flash may be slower than writing to a single word in EEPROM since an entire block will need to be read the word with in it updated and then the block written back
IO devices
Timers and counters
AD and DA converters
Keyboards
LEDs
Displays
Touch screens
COMPONENT INTERFACING
Memory interfacing
Static RAM is simpler to interface to a bus than dynamic RAM due to two reasons
DRAMrsquos RASCAS multiplexing
Need to refresh
Device interfacing
Some IO are designed to interface directly to a particular bus forming GLUELESS INTERFACES But glue logic is required when a device is connected to a bus for which it not designed
An IO device typically requires a much smaller range of addresses than a memory so addresses must be decoded much more finely
NETWORKS OF EMBEDDED SYSTEMS
Interconnect networks specialized for distributed embedded computing are
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 13: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/13.jpg)
I2C bus used in microcontroller based systems
CAN(controlled area network) developed for automotive electronics it provides megabit rates can handle large number of devices
Echelon LON network used for home and industrial automation
DSPs usually supply their own interconnects for multiprocessing
I 2 C bus
Philips semi-conductors developed Inter-IC or I2C bus nearly 30 years ago and It is a two wire serial bus protocol
This protocol enables peripheral ICs communicate with each other using simple communication hardware
Seven bit addressing allows a total of 128 devices to communicate over a shred I2C bus
Common devices capable of interfacing an I2C bus include EPROMs Flash and some RAM memory devices real time clocks watch dog timers and microcontrollers
The I2C specification does not limit the length of the bus wires as long as the total capacitance remains under 40pF
Only master devices can initiate a data transfer on the I2C bus The protocol does not limit the number of master devices on an I2C bus but typically in a microcontroller based system the microcontroller serves as the master Both master and servant devices can be senders or receivers of data
All data transfers on an I2C bus are initiated by a start condition and terminated by a stop condition Actual data transfer is in between start and stop conditions
It is a well known bus to link microcontroller with system It is low cost easy to implement and of moderate speed up to 100kbitssec for standard (7bit)bus and up to 34Mbitssec for extended bus(10bit)
It has two lines serial data line(SDL) for data and serial clock line(SCL) which confirms when valid data are on the line
Every node in the network is connected to both SCL and SDL Some nodes may act as masters and the bus may have more than one master Other nodes may act as slaves that only respond to the requests from the masters
It is designed as a multi master bus any one of several different devices may act as master at various times As a result there is no global master to generate the
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 14: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/14.jpg)
clock signal on SCL Instead a master drives both SCL and SDL when it is sending data When the bus is idle both SDL and SCL remain high
Each bus master device must listen to the bus while transmitting to be sure that it is not interfering with another message
Every I2C device has an address 7 bits in standard I2C definition and 10 bits in extended I2C definition
o 0000000 is used for general call or bus broadcast useful to signal all
devices simultaneously
o 11110XX is reserved for the extended 10 bit addressing scheme
Bus transaction it is comprised of a series of one-byte transmissions It is an address followed by one or more data bytes
Address transmission includes the 7-bit address and a one bit data direction 0 for writing from master to slave and 1 for reading from slave to master
The bus transaction is signaled by a start signal and completer by an end signal Start is signaled by leaving the SCL high and sending a 1 to 0 transition on SDL And the stop is signaled by leaving the SCL high and sending a 0 to 1 transition on SDL
The bus does not define particular voltages to be used for high or low so that either bipolar or MOS circuits can be connected to the bus
CAN Bus
The controller area network(CAN) bus is a robust serial communication bus protocol for real time applications possibly carried over a twisted pair of wires
It was developed by Robert Bosch GmbH for automotive electrons but now it finds in other applications also
Some important characteristics of CAN protocol are high integrity serial data communications real-time support data rates of up to 1Mbitssec 11 bit addressing error detection and confinement capabilities
Common applications other than automobiles include elevator controllers copiers telescopes production line control systems and medical instruments
The CAN specification does not specify the actual layout and structure of the physical bus itself It actually defines data packet format and transmission rules to prioritize messages guarantee latency times allow for multiple masters handles transmission errors retransmits the corrupt messages and distinguish between a permanent failure of a node versus temporary errors
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 15: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/15.jpg)
It uses bit-serial transmission it can run at rates of 1Mbps over a twisted pair connection of 40 meters An optical link can also be used
The bus protocol supports multiple masters on the bus Many of details of CAN and I2C bus are similar
Each node in the CAN bus has its own electrical drivers and receivers that connect the node to bus in wired-AND fashion
In CAN terminology a logical lsquo1rsquo on the bus is called recessive and a logical lsquo0rsquo on the bus is dominant
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls the bus down by making 0 dominant over 1
When all nodes are transmitting 1s the bus is said to be in the recessive state when a node transmits a 0 the bus is in the dominant state
Data are sent on the network in packets known as data frames CAN is a synchronous bus-all transmitters must send at the same time for bus arbitration to work
Nodes synchronize themselves to the bus by listening to the bit transitions on the bus The first bit of a data frame provides the first synchronization opportunity in a frame
Format of data frame Data frame starts with lsquo1rsquo and ends with a string of seven zeros(there are
at least three bit fields between data frames) The first field in the packet contains the packets destination address and
is known as the lsquoarbitration fieldrsquo The destination identifier is lsquo11rsquo bits long
The trailing bit is used for remote transmission request(RTR) It is set lsquo0rsquowhen the data frame is used to request data from the device specified by the identifier When It is set lsquo1rsquo the packet is used to write data to the destination identifier
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in between
The data field is from 0 to 64 bytes depending on the value given in the control field
A cyclic redundancy check (CRC)is sent after the data field for error detection
Acknowledge field is used to let the identifier signal whether the frame is correctly received the sender puts a recessive bit rsquo1rsquo in the ACK slot of the acknowledge field if the receiver detected an error it forces the value to a dominant value lsquo0rsquo If the sender sees a 0 on the bus in the ACK slot it knows that it must retransmit The ACK slot is followed by a single it delimiter followed by the end-of-frame field
Arbitration
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 16: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/16.jpg)
Control of CAN bus is arbitrated using a technique known as CSMAAMP It is similar to the I2C bus arbitration method Network nodes transmit synchronously so they all start sending their identifier fields at the same time When a node hears a dominant bit in the identifier when it tries to send a recessive bit it stops transmitting By the end of the arbitration field only one transmitter will be left The identifier field acts as a priority identifier with the all-0 identifier having the highest priority
Remote frame A remote frame is used to request data from another node The requestor sets the RTR bit to lsquo0rsquo to specify a remote frame it also specifies zero data bits The node specified in the identifier field will respond with a data frame that has the requested valueError handling An error frame can be generated by any node that detects an error on the bus Upon detecting an error a node interrupts the current transmission with an error frame which consists of an error flag followed by an error delimiter field of 8 recessive bits The error delimiter field allows the bus to return to the quiescent state so that data frame retransmission can resumeOverload frame An overload frame signals that a node is overloaded and will not be able to handle the next message The node can delay the transmission of the next frame with up to two overload frames in a row The CRC field can be used to check a messagersquos data field for correctnessIf a transmitting node does not receive an acknowledgement for a data frame it should retransmit the data frame until the data is acknowledged
COMMUNICATION INTERFACINGS These are used to communicate with the external world like transmitting data to a host PC or interacting with another embedded system for sharing data etc
RS232UART
It is a standard for serial communication developed by electronic industries association(EIA)
It is used to connect a data terminal equipment(DTE) to a data circuit terminating equipment (DCE)
Data terminal equipment(DTE) can be a PC serial printer or a plotter and data circuit terminating equipment (DCE) can be a modem mouse digitizer or a scanner
Communication between the two devices is in full duplex ie the data transfer can take place in both the directions
In RS232 the sender gives out data character by character The bits corresponding to a characters are called data bits
Data bits are prefixed by a lsquostartrsquo bit and suffixed by a lsquostoprsquo bit In addition a parity check bit is also added useful for error check on the receiver side
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 17: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/17.jpg)
This mode of communication is called asynchronous communication because no clock signal is transmitted
RS232 standard specifies a distance of 192 meters But using RS232 cable a distance of up to 100meters can be achieved
Possible Data rates depend upon the UART chip and the clock used
Communication parameters For the two devices to communicate in a meaningful way the parameter mentioned below should beset on both the systems
Data rate it represents the rate at which the data communication takes place PCs support 50150300helliphellip115200bps
Data bits No of bits transmitted for each character It can be 5 or 6 or 7 or 8 bits
Start and stop bits They identify the beginning and the end of the character If the no of data bits are 7 or 8 one stop is appended and for 5 or 6 two bits are appended
Parity bit It is added for error checking on the receiver side
Flow control It is useful when the sender pushes out data at such a high rate which can not be absorbed by the receiver It is a protocol to stop and resume data transmissions It is also known as handshake It can be a hardware type or software type
RS232 connector configurations
It specifies two types of connectors 9 pin and 25 pin
For transmission of 1s and 0s the voltage levels are defined in the standard Voltage levels are different for data and control signals
The voltage level is wrt to local ground and hence RS232 used unbalanced transmission
UART chip universal asynchronous receive transmit chip
It has two sections receive section and transmit section
Receive section receives data converts it into parallel form from series form give the data to processor
Transmit section takes the data from the processor converts the data from parallel format to serial format
It also adds start stop and parity bits
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 18: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/18.jpg)
The voltage levels used in RS232 are different from that of embedded systems (5V) And RS232 uses data in serial form where as processor uses the data in parallel form The importance of UART chip lies in the fact it is able to make the RS232 and processor compatible
UART operates on 5volts The level conversion is done by the level shifter and the signals are then passed on to the RS232 connector
RS422
It is a standard for serial communication in noisy environments
The distance between the devices can be up to 1200 meters
Twisted copper cable is used as a transmitting medium
It uses balanced transmission
Two channels are used for transmit and receive paths
RS485
It is a variation of RS422 created to connect a number of devices up to 512 in a network
RS485 controller chip is used on each device
The network with RS485 protocol uses the master slave configuration
With one twisted pair half duplex communication and with two twisted pairs full duplex can be achieved
IEEE 488 BUS
It is a short range digital communications bus specification Originally created by HP for use with automated test equipment
It is also commonly known as HP-IB(Hewlett-Packard Interface Bus) and GPIB(General Purpose Interface Bus)
`It allows up to 15 devices to share a single eight bit parallel electrical bus by daisy chaining connections
Daisy Chaining To find the priority of devices that send the interrupt request this method is adopted The daisy chaining method of establishing priority consists of a serial connection of all devices that request an interrupt The device with the highest priority is placed in the first
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 19: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/19.jpg)
position followed by lower priority devices up to the device with the lowest priority which is placed last in the chain The interrupt request line is common to all devices and forms a wired logic connection If any device has its interrupt signal in the low level state the interrupt line goes to the low level state and enables the interrupt input in the CPU When no interrupts are pending the interrupt line stays in the high level state and no interrupts are recognized by the CPU This is equivalent to negative OR operation The CPU responds to an interrupt request by enabling the interrupt acknowledge line This signal is received by device 1 at its PI (priority in) input The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is not requesting an interrupt If device 1 has pending interrupt it blocks the acknowledge signal from the next device by placing 0 in the PO output It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during the interrupt cycle
Which device sends the request to the CPU will accepts the acknowledgement from the CPU and that device will not send that acknowledgement signal to the next device This procedure is defined as below
A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that the acknowledge signal has been blocked A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by placing a 0 in its PO output If the device does not have pending interrupts it transmits the acknowledgement signal to the next device by placing a 1 in its PO output Thus the device with PI=1 and PO=0 is the one with the highest priority that is requesting an interrupt and this device places its VAD on the data bus The daisy chain arrangement gives the highest priority to the device that receives the interrupt acknowledge signal from the CPU The farther the device is from the first position the lower is its priority
The slowest device participates in control and data transfer handshakes to determine the speed of the transaction
The maximum data rate is about 1Mbytes in the original standard and about 8 Mbytes with later extensions
IEEE 488 connector has 24 pins The bus employs 16 signal lines---eight bidirectional lines used for data transfer three for handshake and five for bus management---plus eight ground return lines
In 1975 the bus was standardized by IEEE as IEEE standard digital interface for programmable instrumentation IEEE 488-1975(now 4881) IEEE 4881 formalized the mechanical electrical and basic protocol parameters of GPIB but said nothing about the format of commands or data
The IEEE 4882 standard Codes Formats Protocols and Common commands for IEEE 4881 provided for basic syntax and format conventions as well as device independent commands data structures error protocols and the like
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 20: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/20.jpg)
The standard IEEE 4882 is built on IEEE 4881 without superseding it equipment can conform to IEEE 4881 without following IEEE 4882
While IEEE 4881 defined the hardware and IEEE 4882 defined the syntax there was still no standard for instrument specific commands Commands to control the same class of instruments ( example multimeters) would vary between manufacturers and even models A standard for device commands SCPI was introduced in the 1990s Due to the late introduction it has not been universally implemented
National Instruments introduced a backwards-compatible extension to IEEE 4881 originally known as HS-488 It increased the maximum data rate to 8 Mbytes although the rate decreases as more devices are connected to the bus This was incorporated into the standard in 2003 as IEEE 4881-2003
Since 2004 the standard is a dual-label IEEEIEC standard known as IEC 60488 Higher performance protocol for the standard digital interface for programmable instrumentation The American National Standards Institutes corresponding standard was known as ANSI standard MC 11 and the international electro technical commission formerly had IEC Publication 60625-1(1993)
Applications
Commodore PETCBM range personal computers connected their disk drivers printers modems etc by IEEE-488 bus
HP and Tektronix used IEEE-488 as a peripheral interface to connect disk drivers tape drivers printers plotters etc to their work station products and HP2100 and HP 3000 microcomputers
Additionally some of HPrsquos advanced pocket calculatorscomputers of the 1980rsquos worked with various instrumentation via the HP-IB interface
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 21: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/21.jpg)
Survey of Architectures1 Round robin 2 RR with interrupts 3 Function queue scheduling 4 RTOS
Choosing an Architecture The best architecture depends on several factors
Real-time requirements of the application (absoluteresponse time)
o Available hardware (speed features)o Number and complexity of different software featureso Number and complexity of different peripheralso Relative priority of featureso Architecture Selection Tradeoff between complexity and control over response
and priority
Round Robin1 Simplest architecture2 No interrupts3 Main loop checks each device one at a time and service whichever needs to be serviced4 Service order depends on position in the loop5 No priorities6 No shared data7 No latency issues (other than waiting for other devices to be serviced)
Round Robin Pros Simple no shared data no interruptsCons 1048708 Max delay is max time to traverse the loop if all devices need to be serviced1048708 Architecture fails if any one device requires a shorter response time1048708 Most IO needs fast response time (buttons serial ports etc)1048708 Lengthy processing adversely affects even soft time deadlines1048708 Architecture is fragile to added functionality1048708 Adding one more device to the loop may break everything
Round Robin Uses
1048708 Simple devices1048708 Watches1048708 Possibly microwave ovens1048708 Devices where operations are all user initiated and process quickly
Round Robin with Interrupts1048708 Based on Round Robin but interrupts deal with urgent timing requirements1048708 Interrupts a) service hardware and b) set flags1048708 Main routine checks flags and does any lower priority follow-up processing1048708 Why Gives more control over priorities
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 22: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/22.jpg)
Round Robin with InterruptsPros1048708 Still relatively simple1048708 Hardware timing requirements better metCons1048708 All task code still executes at same priority1048708 Maximum delay unchanged1048708 Worst case response time = sum all other executiontimes + execution times of any other interrupts thatoccur
1048708 How could you fix
Round Robin with Interrupts1048708 Adjustments1048708 Change order flags are checked (eg ABABAD)1048708 Improves response of A1048708 Increases latency of other tasks1048708 Move some task code to interrupt1048708 Decreases response time of lower priority interrupts1048708 May not be able to ensure lower priority interrupt code executes fast enough
Function Queue SchedulingArchitecture1048708 Interrupts add function pointers to a queue 1048708 Main routine reads queue and executes callsPros1048708 Main routine can use any algorithm to choose what order to execute functions (not necessarily FIFO)1048708 Better response time for highest priority task = length of longest function code1048708 Can improve best response time by cutting long functions into several piecesCons1048708 Worse response time for lower priority code (no guarantee it will actually run)
Real Time Operating System
Architecture1048708 Most complex1048708 Interrupts handle urgent operations then signal that there is more work to do for task code1048708 Differences with previous architectures1048708 We donrsquot write signaling flags (RTOS takes care of it)1048708 No loop in our code decides what is executed next (RTOS does this)1048708 RTOS knows relative task priorities and controls what is executed next1048708 RTOS can suspend a task in the middle to execute code of higher priority1048708 Now we can control task response AND interrupt response
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 23: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/23.jpg)
Real Time Operating Systems Pros1048708 Worst case response time for highest priority function is zero1048708 Systemrsquos high priority response time relatively stable when extra functionality added1048708 Useful functionality pre-written1048708 Generally come with vendor toolsCons1048708 RTOS has cost1048708 Added processing time 1048708 Code out of your control may contain bugs
ROUND ROBIN
A round robin is an arrangement of choosing all elements in a group equally in some rational order usually from the top to the bottom of a list and then starting again at the top of the list and so on A simple way to think of round robin is that it is about taking turns Used as an adjective round robin becomes round-robin
In computer operation one method of having different program process take turns using the resources of the computer is to limit each process to a certain short time period then suspending that process to give another process a turn (or time-slice) This is often described as round-robin process scheduling
In sports tournaments and other games round-robin scheduling arranges to have all teams or players take turns playing each other with the winner emerging from the succession of events
A round-robin story is one that is started by one person and then continued successively by others in turn Whether an author can get additional turns how many lines each person can contribute and how the story can be ended depend on the rules Some Web sites have been created for the telling of round robin stories by each person posting the next part of the story as part of an online conference thread
Round-robin (RR) is one of the simplest scheduling algorithms for processes in an operating system As the term is generally used time slices are assigned to each process in equal portions and in circular order handling all processes without priority (also known as cyclic executive) Round-robin scheduling is simple easy to implement and starvation-free Round-robin scheduling can also be applied to other scheduling problems such as data packet scheduling in computer networks
The name of the algorithm comes from the round-robin principle known from other fields where each person takes an equal share of something in turn
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 24: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/24.jpg)
Process scheduling
In order to schedule processes fairly a round-robin scheduler generally employs time-sharing giving each job a time slot or quantum[1] (its allowance of CPU time) and interrupting the job if it is not completed by then The job is resumed next time a time slot is assigned to that process In the absence of time-sharing or if the quanta were large relative to the sizes of the jobs a process that produced large jobs would be favoured over other processes
Example If the time slot is 100 milliseconds and job1 takes a total time of 250 ms to complete the round-robin scheduler will suspend the job after 100 ms and give other jobs their time on the CPU Once the other jobs have had their equal share (100 ms each) job1 will get another allocation of CPU time and the cycle will repeat This process continues until the job finishes and needs no more time on the CPU
Job1 = Total time to complete 250 ms (quantum 100 ms)
1 First allocation = 100 ms
2 Second allocation = 100 ms
3 Third allocation = 100 ms but job1 self-terminates after 50 ms
4 Total CPU time of job1 = 250 mS
Another approach is to divide all processes into an equal number of timing quanta such that the quantum size is proportional to the size of the process Hence all processes end at the same time
Data packet scheduling
In best-effort packet switching and other statistical multiplexing round-robin scheduling can be used as an alternative to first-come first-served queuing
A multiplexer switch or router that provides round-robin scheduling has a separate queue for every data flow where a data flow may be identified by its source and destination address The algorithm lets every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order The scheduling is work-conserving meaning that if one flow is out of packets the next data flow will take its place Hence the scheduling tries to prevent link resources from going unused
Round-robin scheduling results in max-min fairness if the data packets are equally sized since the data flow that has waited the longest time is given scheduling priority It may not be desirable if the size of the data packets varies widely from one job to another A user that produces large packets would be favored over other users In that case fair queuing would be preferable
If guaranteed or differentiated quality of service is offered and not only best-effort communication deficit round-robin (DRR) scheduling weighted round-robin (WRR) scheduling or weighted fair queuing (WFQ) may be considered
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 25: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/25.jpg)
In multiple-access networks where several terminals are connected to a shared physical medium round-robin scheduling may be provided by token passing channel access schemes such as token ring or by polling or resource reservation from a central control station
In a centralized wireless packet radio network where many stations share one frequency channel a scheduling algorithm in a central base station may reserve time slots for the mobile stations in a round-robin fashion and provide fairness However if link adaptation is used it will take a much longer time to transmit a certain amount of data to expensive users than to others since the channel conditions differ It would be more efficient to wait with the transmission until the channel conditions are improved or at least to give scheduling priority to less expensive users Round-robin scheduling does not utilize this Higher throughput and system spectrum efficiency may be achieved by channel-dependent scheduling for example a proportionally fair algorithm or maximum throughput scheduling Note that the latter is characterized by undesirable scheduling starvation
Round-Robin Scheduling in UNIX this can also be the same concept of round-robin scheduler and it can be created by using semaphores
Scheduling (computing)
In computer science a scheduling is the method by which threads processes or data flows are given access to system resources (eg processor time communications bandwidth) This is usually done to load balance a system effectively or achieve a target quality of service The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously)
The scheduler is concerned mainly with
Throughput - number of processes that complete their execution per time unit Latency specifically
o Turnaround - total time between submission of a process and its completion
o Response time - amount of time it takes from when a request was submitted until the first response is produced
Fairness Waiting Time - Equal CPU time to each process (or more generally appropriate times according to each process priority)
In practice these goals often conflict (eg throughput versus latency) thus a scheduler will implement a suitable compromise Preference is given to any one of the above mentioned concerns depending upon the users needs and objectives
In real-time environments such as embedded systems for automatic control in industry (for example robotics) the scheduler also must ensure that processes can meet deadlines this is
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 26: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/26.jpg)
crucial for keeping the system stable Scheduled tasks are sent to mobile devices and managed through an administrative back end
Types of operating system schedulers
Operating systems may feature up to 3 distinct types of scheduler a long-term scheduler (also known as an admission scheduler or high-level scheduler) a mid-term or medium-term scheduler and a short-term scheduler The names suggest the relative frequency with which these functions are performed The Scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run
Long-term scheduling
The long-term or admission scheduler decides which jobs or processes are to be admitted to the ready queue (in the Main Memory) that is when an attempt is made to execute a program its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler Thus this scheduler dictates what processes are to run on a system and the degree of concurrency to be supported at any one time - ie whether a high or low amount of processes are to be executed concurrently and how the split between IO intensive and CPU intensive processes is to be handled In modern operating systems this is used to make sure that real time processes get enough CPU time to finish their tasks Without proper real time scheduling modern GUI interfaces would seem sluggish The long term queue exists in the Hard Disk or the Virtual Memory [Stallings 399]
Long-term scheduling is also important in large-scale systems such as batch processing systems computer clusters supercomputers and render farms In these cases special purpose job scheduler software is typically used to assist these functions in addition to any underlying admission scheduling support in the operating system
Medium-term scheduling
The medium-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa This is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in) The medium-term scheduler may decide to swap out a process which has not been active for some time or a process which has a low priority or a process which is page faulting frequently or a process which is taking up a large amount of memory in order to free up main memory for other processes swapping the process back in later when more memory is available or when the process has been unblocked and is no longer waiting for a resource [Stallings 396] [Stallings 370]
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file) the medium-term scheduler may actually perform the role of the long-term scheduler by treating binaries as swapped out processes upon their execution In this way when a segment of the binary is required it can be swapped in on demand or lazy loaded [stallings 394]
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 27: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/27.jpg)
Short-term scheduling
The short-term scheduler (also known as the CPU scheduler) decides which of the ready in-memory processes are to be executed (allocated a CPU) next following a clock interrupt an IO interrupt an operating system call or another form of signal Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice and these are very short This scheduler can be preemptive implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process or non-preemptive (also known as voluntary or co-operative) in which case the scheduler is unable to force processes off the CPU [Stallings 396] In most cases short-term scheduler is written in assembler because it is a critical part of the operating system
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler This function involves the following
Switching context Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible since it is invoked during every process switch The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency [Galvin 155]
Scheduling disciplines
Scheduling disciplines are algorithms used for distributing resources among parties which simultaneously and asynchronously request them Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes) disk drives (IO scheduling) printers (print spooler) most embedded systems etc
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources There are many different scheduling algorithms In this section we introduce several of them
In packet-switched computer networks and other statistical multiplexing the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets
The simplest best-effort scheduling algorithms are round-robin fair queuing (a max-min fair scheduling algorithm) proportionally fair scheduling and maximum throughput If differentiated
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 28: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/28.jpg)
or guaranteed quality of service is offered as opposed to best-effort communication weighted fair queuing may be utilized
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access ) 35G cellular system channel-dependent scheduling may be used to take advantage of channel state information If the channel conditions are favourable the throughput and system spectral efficiency may be increased In even more advanced systems such as LTE the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them
First in first out
Also known as First Come First Served (FCFS) is the simplest scheduling algorithm FIFO simply queues processes in the order that they arrive in the ready queue
Since context switches only occur upon process termination and no reorganization of the process queue is required scheduling overhead is minimal
Throughput can be low since long processes can hog the CPU
Turnaround time waiting time and response time can be high for the same reasons above
No prioritization occurs thus this system has trouble meeting process deadlines
The lack of prioritization means that as long as every process eventually completes there is no starvation In an environment where some processes might not complete there can be starvation
It is based on Queuing
Shortest remaining time
Similar to Shortest Job First (SJF) With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue This requires advance knowledge or estimations about the time required for a process to complete
If a shorter process arrives during another process execution the currently running process may be interrupted (known as preemption) dividing that process into two separate computing blocks This creates excess overhead through additional context switching The scheduler must also place each incoming process into a specific place in the queue creating additional overhead
This algorithm is designed for maximum throughput in most scenarios
Waiting time and response time increase as the process computational requirements increase Since turnaround time is based on waiting time plus processing time longer processes are significantly affected by this Overall waiting time is smaller than FIFO however since no process has to wait for the termination of the longest process
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 29: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/29.jpg)
No particular attention is given to deadlines the programmer can only attempt to make processes with deadlines as short as possible
Starvation is possible especially in a busy system with many small processes being run
Fixed priority pre-emptive scheduling
The OS assigns a fixed priority rank to every process and the scheduler arranges the processes in the ready queue in order of their priority Lower priority processes get interrupted by incoming higher priority processes
Overhead is not minimal nor is it significant FPPS has no particular advantage in terms of throughput over FIFO scheduling
Waiting time and response time depend on the priority of the process Higher priority processes have smaller waiting and response times
Deadlines can be met by giving processes with deadlines a higher priority
Starvation of lower priority processes is possible with large amounts of high priority processes queuing for CPU time
Round-robin scheduling
The scheduler assigns a fixed time unit per process and cycles through them
RR scheduling involves extensive overhead especially with a small time unit Balanced throughput between FCFS and SJF shorter jobs are completed faster than in
FCFS and longer processes are completed faster than in SJF
Poor average response time waiting time is dependent on number of processes and not average process length
Because of high waiting times deadlines are rarely met in a pure RR system
Starvation can never occur since no priority is given Order of time unit allocation is based upon process arrival time similar to FCFS
Multilevel queue scheduling
This is used for situations in which processes are easily divided into different groups For example a common division is made between foreground (interactive) processes and background (batch) processes These two types of processes have different response-time requirements and so may have different scheduling needs it is very useful for shared memory problem
OverviewScheduling algorithm CPU Overhead Throughput Turnaround time Response time
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 30: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/30.jpg)
First In First Out Low Low High High
Shortest Job First Medium High Medium Medium
Priority based scheduling Medium Low High High
Round-robin scheduling High Medium Medium High
Multilevel Queue scheduling High High Medium Medium
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 31: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/31.jpg)
HOST AND TARGET MACHINES
The PC or workstation on which the software for embedded systems gets developed is called host The host should be able to do the following
o Load programs into the target
o Start and stop program execution on the target
o Examine memory and CPU registers
The hardware on which the code will finally run is known as target The target usually has a small amount of software to talk to the host system
LINKERSLOCATERS FOR EMBEDDED SOFTWARE
A cross compiler is a compiler that runs on one type of machine but generates code for another After compilation the executable code is downloaded to the embedded system by a serial link or burned in a PROM and plugged in
GETTING EMBEDDED SOFTWARE INTO TARGET SYSTEM
DEBUGGING TECHNIQUES
Major portion of software debugging is done by compiling and executing the code on PC or work station
Serial port available on the evaluation boards is one of the most important debugging tools
Another important debugging tool is breakpoint The simplest form of breakpoint is for the used to specify an address at which the programs execution is to break When PC reaches that address control is returned to the monitor program and from this program the user can examine andor modify CPU registers after which execution can be continued Advantage of breakpoint is it does not require using exceptions or external devices
LED s also play an important role in debugging LED s can be used to show error conditions when the code enters certain routines or to show idle time activity
Microprocessor in-circuit emulator ICE is a specialized hardware tool that can help debug software in a working embedded system Its main drawback is the machine is specific to a particular microprocessor
Logic analyzer is another but a major piece of instrumentation useful for debugging It can sample or analyze several different signals simultaneously but display only 0 or 1 or changing values for each
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 32: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/32.jpg)
Hardwaresoftware co-verification it allows hardware and software designs to be validated at the same time against each other The types of techniques available are
o An instruction level simulator may be used to debug code running on the
CPU
o A cycle level simulator tool may be used for faster simulation of parts of
the system
o A hardwaresoftware co-simulator may be used to simulate various parts
of the system at different levels of detail
Debugging challenges Logical errors in the soft ware can be hard to track down but errors in real-time code can create problems that are even harder to diagonise
UNIT-VI INSTRUCTION SETS
INTRODUCTION PRELIMINARIES
Instruction sets are the programmers interface to the hardware
The ARM processor is widely used in personal digital assistants(PDAs) video games telephones and many other systems where as the SHARC is well known digital signal processor(DSP)
A van Neumann machine is one whose memory holds both data and instructions where as in Harvard architecture the there exists separate memories for data and program Harvard architecture is widely used today for its higher performance which is a result of the separation of program and data memories Having two memories with separate provides higher memory bandwidth Most of the DSPs are of Harvard architecture
Complex instruction set computers (CISC) provided a variety of instructions that may perform very complex tasks such as string searching they also generally used a number of different instruction formats of varying lengths
o CISC architecture was developed to reduce the number of instructions in the
compiled code which would in turn minimize the number of memory accesses required for fetching instructions Complex instruction sets were useful when the memory was relatively small and slow and when programmers frequently worked in assembly language
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 33: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/33.jpg)
o Drawbacks include difficulty in instruction pipelining and longer clock cycles
leading to unsuitability for high performance processors
Reduced instruction set computers (RISC) tended to provide some what fewer and simpler instructions The instructions were also chosen so that they could be efficiently executed in pipelined processors
o RISC architecture is optimized to achieve short clock cycles small numbers of
cycles per instruction and efficient pipelining of instruction streams
o It requires a more sophisticated compiler and the compiler needs to use a
sequence of RISC instructions in order to implement complex operations
ARM PROCESSOR
It is a family of RISC architectures
The instructions are written one per line starting after the first column
A label which gives a name to a memory location comes at the beginning of the line starting in the first column
Comments begin t with a semi colon and continue until the end of the line
Example
LDR r0[r8] a comment
Label ADD r4r0r1
Some versions of ARM architecture like ARM7 are von Neumann architecture machines where as ARM9 uses a Harvard architecture However the difference is invisible to the assembly language programmer
It supports two basic types of data
o The standard ARM word is 32 bit long
o The word may be divided into four 8-bit bytes
An address refers to a byte not a word The word 0 in the ARM address space is at location 0 the word 1 in is at 4 word 2 at 8 and so on
The processor can be configured at power-up to address the bytes in a word in either little-endian mode( with the lowest order byte residing in the low-order bits of the word) or big- endian mode( with the lowest order byte stored in the highest bits of the word)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 34: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/34.jpg)
SHARC PROCESSOR
It is a family of DSPs which uses the Harvard architecture it is a good complement to the ARM CPU because they use fairly different techniques in many aspects of their architectures
The SHARC is designed to perform floating point intensive computations
The instructions are written one per line and terminated by a semicolon
A label which gives a name to an instruction comes at the beginning of the lines and ends with a colon
Comments start with an exclamation point and continue until the end of the line
Example
R1=DM(M0I0) R2=PM(M8I8) a comment
Label R3= R1+ R2
SHARC uses different word sizes and address space sizes for instructions and data A SHARC instruction consists of 48 bits a basic data word 32 bits and an address 32 bits
The SHARC supports the following types of data
32-bit IEEE single-precision floating point
40-bit IEEE extended -precision floating point
32-bit integers
The SHARC family includes a significant amount of on-chip memory The internal memory is evenly split between program memory(PM) and data memory(DM)
The SHARC memory is internally organized as 32-bit words It is a modified Harvard architecture that allows the program memory to hold both data and instructions
Data operations
The programming model for the SHARC is rather large and complex
The primary data registers have two different names they are known as R0 through R15 when used for integer operations and as f0 through f15 when used for floating-point operations
All the data registers are 40 bits long when 32 bit data types are stored in the registers they are put in the most significant bits of the register
The CPU has three major data function units an ALU a multiplier and a shifter The three most significant mode registers for data operations are arithmetic status (ASTAT) sticky(STKY) and mode I(MODEI)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 35: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/35.jpg)
All the ALU operations set the AZ(ALU result zero) AN (ALU result negative) A V(ALU result overflow) AC (ALU fixed-point carry) and AI(floating point invalid) bits in ASTAT register
The STKY register is a sticky version of some bits in the ASTAT register The STKY bits are set along with the ASTAT register bits but are not cleared STKY bits always remain set until cleared by an instruction
The SHARC can perform saturation arithmetic on fixed point values In saturation arithmetic an overflow results in the maximum-range value not the result of wrapping around the numeric range Saturation mode is controlled by the ALUSAT bit in the MODEI register
The multiplier performs fixed-point and floating multiplication It can also perform saturation rounding and setting the result to 0 Fixed point multiplication produces an 80-bit result which can be stored in the MR register and manipulated there
The shifter performs several operations Logical shifts fill with zeroes while arithmetic shifts copy sign bits The distance to shift supplied by the Ry register may be positive for left shift or negative for right shift Shift operation is set the SZ(shifter zero) SV(shifter overflow) and SS(shifter input sign) bits in the ASTAT register
The SHARC is a load-store architecture-operands must be loaded into registers before operating on them SHARC supplies two special registers that are used to control loading and storing They are called DATA ADDRESS GENERATORS(DAGs) one for data memory and the other for program memory
Each of the DAGs has eight sets of primary registers The registers numbered 0 through 7 belong to DAGI while registers 8 through 15 belong to DAG2
The SHARCrsquos modified Harvard architecture allows data to be stored in the program memory This allows two data fetches per cycle A DAG is provided for the program memory to support for such data fetches
DAGs provide the following addressing modes
The simplest addressing mode provides an immediate value that can represent an address
An absolute address has the entire address in the instruction This mode is relatively space inefficient because the address bits take up most of the instruction and preclude other operations from being encoded in the instruction
A post modify with update mode allows the program to sweep through a range of addresses This mode uses an I register and a modifier which may be either an M register or an immediate value The I register specifies the address and is then updated by the modifier value
The DAGs also support base-plus-offset addressing The address of the location to be fetched is computed as I+M where I is the base and M is the modifier or offset
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 36: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/36.jpg)
The DAGs also support circular buffers which are commonly used in signal processing
The DAGs also support bit-reversal addressing which is useful in the fast Fourier Transform
UNIT-V RTOS CONCEPTS
ARCHITECTURE OF THE KERNEL
The kernel provides various services through operations on the kernel objects These services are memory management device management interrupt handling and time management
Various kernel objects are tasks task scheduler interrupt service routines semaphores mutexes mailboxes message queues event registers pipes signals and timers
INTERRUPT SERVICE ROUTINES
Interrupt is hardware signal that informs the CPU that an important event has occurred When interrupt occurs CPU saves the contents of the register and jumps to the ISR After ISR processes the event the CPU returns to the interrupted task in a non-preemptive kernel In case of preemptive kernel highest priority task gets executed In RTOs the interrupt latency interrupt response time and interrupt recovery time are very important
Interrupt latency Max time for which interrupts are disabled plus time to start the execution of the first instruction in the ISR
Interrupt response time Time between receipt of interrupt signal and starting the code that handles the interrupt In a preemptive kernel it is equal to the sum of interrupt latency plus time to save CPU registers context
Interrupt recovery time Time required for CPU to return to the interrupted codehighest priority task is called interrupt recovery time In a non-preemptive kernel it is equal to the sum of the time to restore the CPU context and time to execute the return instruction from the interrupted instruction In a preemptive kernel it is equal to the sum of the time to check whether a high priority a high priority task is ready plus time to restore CPU context of the highest priority task plus time to execute the return instruction from the interrupt instruction
SEMAPHORES
It is a kernel object that is used for both resource synchronization and task
Synchronization
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 37: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/37.jpg)
It is just an integer These are of two types counting semaphore which will have an integer value greater than 1 and binary semaphore which takes values of either 0 or1
Semaphore management function calls
o Create a semaphore
o Delete a semaphore
o Acquire a semaphore
o Release a semaphore
o Query a semaphore
MESSAGE QUEUES
Message queue can be considered as an array of mailboxes At the time of creating a queue it is given a name or ID queue length sending task waiting list and waiting task waiting list
Some of the applications of message queue are
o Taking the input from a keyboard
o To display output
o Reading voltages from sensors or transducers
o Data packet transmission in a network
In each of these applications a task or an ISR deposits the message and based on the application the highest priority task or the first task waiting in the queue can take the message
PIPES
Pipe is kernel object for inter task communication It is used to send output of one task as input to another task for further processing
Pipe management function calls
o Create a Pipe
o Open a Pipe
o Close a pipe
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 38: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/38.jpg)
o Read from the pipe
o Write from the pipe
INSTRUCTION SET ARCHITECTURE ISA
ARM in most respects is a typical RISC architecture but with several enhancements to improve the performance further The RISC features present are
o Large uniform register file with 16 general purpose registers
o Loadstore architecture the instructions that process data operate only on the registers and are
separate from the instructions that access memory
o Simple addressing modes
o Uniform and fixed length fields All the ARM instructions are 32-bit long and most of them have
a regular three operand encoding
These features help in the implementation of pipelining in the ARM architecture In order to keep the architecture simple and improve performance a number of non-RISC features are introduced
o Each instruction controls the ALU and the shifter thus making the instructions more powerful
o Auto-increment and auto-decrement addressing modes have been incorporated
o Multiple loadstore instructions that allow loadstore up to 16 registers at once have been
introduced
o Conditional execution of instructions has been introduced Instruction opcodes is preceded by a
4-bit condition code
All these features have resulted in high performance low code size low power consumption and low silicon area
REGISTERS
The ARM-ISA has 16 general purpose registers in the user mode They are
R15----------- it is program counter but can be manipulated as a general purpose register
R13----------- it is used as a stack pointer
R14----------- it has some special significance and is called link register When a procedure call is made the return address is automatically stored into this register
CPSR(Current Program Status Register) ----------- it is an important register containing four 1-bit condition flags(negative zero carry and overflow) and four fields representing the execution state of the processor
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 39: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/39.jpg)
The I and F flags enable normal and fast interrupts respectively The T flag is used to switch between ARM and THUMB instruction sets
The mode field selects one of the six execution modes as follows
o User mode It is used to run the application code Once in user mode the CPSR can not be
written to
o Fast interrupt processing mode(FIQ) supports high speed interrupt handling
o Normal interrupt processing mode(IRQ) supports all other interrupt services in a system
o Supervisor mode(SVC) it is entered when the processor encounters a software interrupt
instruction
o Undefined instruction mode(UNDEF) it is entered if the fetched opcodes is not an ARM
instruction or a coprocessor instruction
o Abort mode it is entered in response to memory fault
The user registers R0 to R7 are common to all the operation modes
DATA TYPES
ARM instruction set supports six different data types namely
8-bit signed and unsigned
16-bit signed and unsigned
32-bit signed and unsigned
The ARM processor instruction set has been designed to support these data types in little-or-big endian format However most of the ARM silicon implementations used little-endian format
ARM instruction sets
It has two instruction sets 32-bit ARM and 16-bit THUMB
ARM It is standard 32-bit instruction set
Data processing instructions ARM architecture provides a range of arithmetic operations such as addition subtraction multiplication etc and a set of bit-wise logical operations All these instructions take two 32-bit operands and return a 32-bit result The multiplication instruction can return a 32-or 64-bit value
o Interesting feature of the ARM architecture is that the modification of the
condition flags by arithmetic instructions is optional
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 40: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/40.jpg)
Data transfer instructions ARM supports two types of data transfer instructions single register transfers and multiple register transfers Single register transfer instructions can be used to transfer 12 or 4 bytes of data between a register and a memory location On the other hand multiple register loadstore operations can be carried out via multiple register transfer instructions
o ARM supports both little endian and big endian formats for data access
Block Data transfer The load and store multiple instructions allow between 1 and 16 registers to be transferred to or from memory The transferred instructions can be either
o Any subset of the current bank of registers(default)
o Any subset of the user bank of registers when in a privilege mode
Multiplication instructions ARM provide several versions of multiplications These are
o Integer multiplication (32-bit result)
o Long Integer multiplication (64-bit result)
o Multiply accumulate instruction
Software interrupt (SWI) instructions SWI instruction forces the CPU into supervisor mode Its format is SWI n
Execution of the instruction causes SWI exception handler to be called
Conditional execution While most of the existing architectures allow only branches to be executed conditionally ARM allows all instructions to be executed conditionally
Branch instruction In ARM processor the branch instructions have the following features
o All the branches are relative to the program counter
o Jump is always with in a limit of plusmnMB
o Conditional branches are formed by using the condition codes
o Subroutine call instruction is also modeled as a variant of branch instruction
THUMB
o These are 16-bit in length
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 41: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/41.jpg)
o Stored in a compressed form
o The instructions are decomposed into ARM instructions and then executed
by the processor
o THUMB instruction set must always be entered running BXBLX (Branch
Exchange) instruction
Differences with ARM
o THUMB instructions are executed unconditionally excepting the branch
instructions
o THUMB instructions have unlimited access to registers R0-R7 and R13-R15
A reduced no of instructions can access the full register set
o No MSR and MRS instructions
o Maximum no of SWI calls is restricted to 256
o On reset and on raising of an exception the processor always enters into the
ARM instruction set mode
Advantages of THUMB
o More code density
o Less power consumption
o Less space occupation
o It is faster when the memory is organized in 16-bit However ARM is faster
when the memory is organized in 32-bit
RTOS architecture
In this architecture as in others the interrupt routines take care of the most urgent operations They signal that there is work for the task code to do
The differences between this architecture and the previous ones are that
The necessary signaling between the interrupt routines and the task code is handled by RTOS Shared variables are not required for this purpose
No loop in the code decides what needs to be done next Code inside the RTOS decides which of the task code functions should run RTOS knows about task code subroutines ad runs which ever of them is most urgent at any given time
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 42: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/42.jpg)
The RTOS can suspend one task code subroutine in the middle of its processing in order to run the another
Advantages
RTOS scheduling mechanism gives a relatively stable system response even when the code is changed In this architecture changes to lower priority functions do not generally affect the response of higher priority functions
RTOS with debugging facilities are widely available for purchase giving immediate solutions to some of the response problems
Disadvantages
RTOS it self uses a certain amount of processing time
TASK
The basic building block of software written under an RTOS is task
Under most RTOS a task is simply a subroutine
At some point in program we make one or more calls to a function in the RTOS that starts tasks telling it which subroutine the starting point for each task and some other parameters like taskrsquos priority memory location for task stack etc
Most RTOS allows as many tasks as we need
TASKS AND DATA
Each of the tasks has its own private context which includes the register variable a program counter and a stack All other data which includes Global static initialized un-initialized and everything else is shared among all the tasks in the system
Since several data variable are shared among the tasks it is easy to move data from one task to another However sharing of data creates bugs leading to shared data problem
Shared data problem it is one that arises when IR and TC or TCs share data ad the task code uses the data in a way that is not atomic
Atomic section is a part of program which can not be interrupted
REENTRANT FUNCTION
These are the functions that can be called by more than one task and that will always work correctly even if the RTOS switches from one task to another in the middle of executing the function
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 43: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/43.jpg)
A reentrant function may not use variables in a non-atomic way unless they are stored on the stack of task that called the function or are otherwise the private variables of that task
A reentrant function may not call any other functions that are not themselves reentrant
A reentrant function may not use the hardware in a non-atomic way
States of RTOS
Running It means the microprocessor is executing instructions that make up this task Therefore in a single processor system only one task that is in the running state at any given time
Ready It means some other task in the running state but this task has things that it could do if the microprocessor is available Any no of tasks can be in this state
Blocked It means this task has not got any thing to do right now even if the microprocessor is available Tasks get into this state because they are waiting for some external event Any no of tasks can be in this state as well
Scheduler
It is part of RTOS It keeps track of the state of each task and decides which one task should go into the running state
The task that is assigned highest priority gets the processor
The scheduler does not fiddle with task priorities
Tasks block themselves A task moves into blocked state when it decides that it has run out of things to do Other tasks in the system or the scheduler can not decide for a task to go into blocked state
Tasks and IR move tasks from blocked state when a task is blocked it never gets microprocessor An IR or some other task in the system must be able to send a signal to bring the task out of the blocked state Otherwise the task will be blocked for ever
Scheduler controls running state scheduling of the tasks between ready and running states is entirely the work of scheduler
How does the scheduler know when a task has become blocked or unblocked
RTOS provides a collection of functions that the tasks can call to tell the scheduler what events it want to wait for and to signal that events have happened
What happens if all the tasks are blocked
If all the tasks are blocked then the scheduler spins in some tight loop some where inside the RTOS waiting for some thing to happen
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 44: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/44.jpg)
What happens if two tasks are ready with same priority
It depends upon the RTOS In some systems it is illegal to assign same priority to tasks In some RTOS time slicing between the tasks is done In some the task is run until it goes to blocked state before going to the other one
If a higher priority task unblocks what happens to the running task
In preemptive RTOS lower priority task is stopped as soon as an higher priority task unblocks
In non-preemptive RTOS the processor is taken away from the lower priority task only when that task blocks
REQUIREMENTS ANALYSIS
Requirements and specifications are related but distinct steps in the design process
Requirements are informal descriptions of what the customer wants while specifications are more detailed precise and consistent description of the system that can be used to create architecture
Both requirements and specifications are however directed to the outward behavior of the system not its internal structure
The overall goal of creating a requirements document is effective communication between the customers and designers
Two types of requirements
o Functional A functional requirements states what the system must do such as compute
an FFT
o Non-functional A non-functional requirements can be any no of attributes including
physical size cost power consumption design time reliability and so on
A good set of requirements reflect
o Correctness
o Unambiguousness
o Completeness
o Verifiability
o Consistency
o Modifiability
o traceability
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 45: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/45.jpg)
Design methodology
It refers to ldquothe sequence of steps necessary to build some thing useful The goals of the design process are
o Functionality
o Manufacturing cost
o Performance
o Power consumption
o Time-to-market
o Design cost
o Quality Customers not only want their products fast and cheap they also want them to be of
right quality
Design flow
A design flow is a sequence of steps to be followed during a design Some of the steps can be performed by tools such as compilers or CAD systems other steps can be performed by hand
Waterfall model
It was introduced by ROYCE and it is the first model proposed for the software development process
This model has five major phases
o Requirements Analysis and determines the basic characteristics of the system
o Architecture Design decomposes the functionality into major components
o Coding Implements the process and integrates them
o Testing Uncovers the bugs
o Maintenance Entails deployment in the field bug fixes and upgrades
This model involves largely one way flow of work and information from higher levels of abstraction to more detailed design steps
It is ideal during early design phases since it implies good foreknowledge of the implementation
It not suitable where design entails experimentation and changes that require bottom up feedback
Nowadays it is considered as an unrealistic design process
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-
![Page 46: EmbdedSysts.doc](https://reader035.fdocuments.net/reader035/viewer/2022062512/553ed601550346777c8b45fb/html5/thumbnails/46.jpg)
Spiral model
It is an alternative model for the software development
Waterfall model assumes that the system is built once in its entirety the spiral model assumes that the several versions of the system will be built
At each level of the design the designers go through requirements construction and testing phases
The first cycles over the top of the spiral are very small and short while the final cycles at the spirals bottom add detail learned from the earlier cycles
Spiral model is more realistic than the waterfall model because multiple iterations are often necessary to add enough detail to complete a design
Its main disadvantage is with too many spirals it may take too long when design time is a major requirement
Its advantage is It adopts successive refinement approach
CONCURRENT ENGINEERING
Its important goals are reduced design time increased reliability and performance reduced power consumption etc
It tries to eliminate ldquo over-the-well rdquo design steps
Concurrent engineering effort
Cross functional teams include member from various disciplines involved in the process including manufacturing HWSW design marketing and so forth
Concurrent product realization process activities are at the heart of concurrent engineering Doing several things at once such as designing various subsystems simultaneously is central to reducing design time
Incremental information sharing and use helps minimization of the chance that concurrent product realization will lead to sub-process
Integrated project management ensures that some one is responsible for the entire project and that the responsibility is not abdicated once one aspect of the work is done
Early and continual supplier involvement helps make the best use of supplier capabilities
Early and continual customer focus helps to ensure that the product best meets customerrsquos needs
- 7 No latency issues (other than waiting for other devices to be serviced)
- 1048708 How could you fix
- ROUND ROBIN
-
- Process scheduling
- Data packet scheduling
-
- Scheduling (computing)
-
- Types of operating system schedulers
-
- Long-term scheduling
- Medium-term scheduling
- Short-term scheduling
- Dispatcher
-
- Scheduling disciplines
-
- First in first out
- Shortest remaining time
- Fixed priority pre-emptive scheduling
- Round-robin scheduling
- Multilevel queue scheduling
- Overview
-