technicaljournals.org ISSN: DESIGN OF ROUND ROBIN AND...

6
Vol 03, Issue 01; January-April 2012 International Journal of VLSI and Embedded Systems-IJVES http://technicaljournals.org ISSN: 2249 6559 2011 - TECHNICALJOURNALS ® , Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 113 DESIGN OF ROUND ROBIN AND INTERLEAVING ARBITRATION ALGORITHM FOR NOC AMRUT RAJ NALLA, P.SANTHOSHKUMAR 1 M.tech (Embedded systems), 2 Assistant Professor Department of Electronics and Communication Engineering, SR Engineering College,Varadha Reddy College of Engineering ,Warangal dist, AP, India. [email protected] , [email protected] ABSTRACT NOC (Network on Chip) are increasing in popularity because of their advantages, larger bandwidth, and lower power dissipation through shorter wire segments. Communications in large SOCs are so important that many designers have adopted the NOC approach. The challenges consist in offering the best connectivity and throughput with the simplest and cheapest architecture methodology. In this paper it is proposed to Design multi-module architecture (Masters or Slaves) for Network On chip configuration. When multi-modules are involved, there is a need to solve the problem of bus contention among the modules, masters and slaves. To resolve this bus contention problem, two algorithms, interleaving and normal Round robin algorithms in this architecture. In the normal mode, the Round Robin design architecture would be implemented and as matter of configurability interleaving mode is also incorporated. The main design modules consist of interleaving timer module, Request register, data-mux, Mask register and state Machine Controller and Grant module. This implements the interleaving mechanism to overcome the output delay (latency) while operating many-to-one and one-to-many data communication. The overall architecture primitive modules are designed, simulated, synthesized and structured in VHDL and by using various FPGA based EDA Tools. Key Words: Network on Chip, FPGA, Throughput 1. INTRODUCTION A round-robin token passing bus or switch arbiter guarantees avoidance of starvation among masters and allows any unused time slot to be allocated to a master whose Round-robin turn is later but who is ready now. The protocol of a round-robin token passing bus or switch arbiter works as follows. In each cycle, one of the masters (in round-robin order) has the highest priority (i.e., owns the token) for access to a shared resource. If the token- holding master does not need the resource in this cycle, the master with the next highest priority who sends a request can be granted the resource. At the end of the cycle, the highest priority master then passes (in round robin order) the token to the next master which then will have the highest priority for the next time slot. The next section shows the design of 2x2, 3x3 and 4x4 Bus Arbiters(BAs) generated by our tool. MxM hierarchical SAs and BAs generated by our Round- robin Arbiter Generator (RAG) have a hierarchical structure for values of M greater than four. The ON-CHIP bus plays a key role in the system-on-a-chip (SOC) design by enabling the efficient integration of heterogeneous system components such as CPUs, DSPs, application- specific cores, memories, and custom logic. Recently, as the level of design complexity has become higher, SOC designs require a system bus with high bandwidth to perform multiple operations in parallel. To solve the bandwidth problems, there have been several types of high-performance on-chip buses proposed, such as the multilayer AHB (ML-AHB) bus matrix from ARM, the PLB crossbar switch from IBM, and CONMAX from silicore. Among them, the ML-AHB bus matrix has been widely used in many SOC designs. This is because of the simplicity of the AMBA bus of ARM, which attracts many IP designers, and the good architecture of the AMBA bus for applying embedded systems with low power. The ML-AHB bus matrix is an interconnection scheme based on the AMBA AHB protocol, which enables parallel access paths between multiple masters and slaves in a system. 2. ARBITRATION SCHEMES FOR THE ML-AHB BUSMATRIX OF ARM The ML-AHB bus matrix of ARM consists of the input stage, decoder, and output stage, including an arbiter. Fig 1 shows the overall structure of the ML-AHB bus matrix of ARM. The input stage is responsible for holding the address and control information when transfer to a slave is not able to commence immediately. The decoder determines which slave that a transfer is destined for. The output stage is used to select which of the various master input ports is routed to the slave. Each output stage has an arbiter. The arbiter determines which input stage has to perform a transfer to the slave and decides which the highest priority is currently. The ML-AHB bus matrix employs slave-side arbitration, in which the arbiters are located in front of each slave port, as shown in Fig.1 the master simply starts a transaction and waits for the slave response to proceed to the next transfer. Therefore, the unit of arbitration can be a transaction or a transfer. However, the ML-AHB bus matrix of ARM furnishes only transfer-based arbitration schemes, specifically transfer-based fixed-priority and round-robin

Transcript of technicaljournals.org ISSN: DESIGN OF ROUND ROBIN AND...

Page 1: technicaljournals.org ISSN: DESIGN OF ROUND ROBIN AND ...ijves.com/wp-content/uploads/2012/07/IJVES-Y12-TJ-E048.pdf2011 - TECHNICALJOURNALS ... designers have adopted the NOC approach.

Vol 03, Issue 01; January-April 2012 International Journal of VLSI and Embedded Systems-IJVES

http://technicaljournals.org ISSN: 2249 – 6559

2011 - TECHNICALJOURNALS®, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP,

IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India;

Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,

DOAJ, and other major databases etc.,

113

DESIGN OF ROUND ROBIN AND INTERLEAVING

ARBITRATION ALGORITHM FOR NOC AMRUT RAJ NALLA, P.SANTHOSHKUMAR

1M.tech (Embedded systems), 2 Assistant Professor Department of Electronics and Communication Engineering,

SR Engineering College,Varadha Reddy College of Engineering ,Warangal dist, AP, India.

[email protected], [email protected]

ABSTRACT

NOC (Network on Chip) are increasing in popularity because of their advantages, larger bandwidth, and lower power dissipation through shorter wire segments. Communications in large SOCs are so important that many

designers have adopted the NOC approach. The challenges consist in offering the best connectivity and

throughput with the simplest and cheapest architecture methodology. In this paper it is proposed to Design

multi-module architecture (Masters or Slaves) for Network On chip configuration. When multi-modules are

involved, there is a need to solve the problem of bus contention among the modules, masters and slaves. To

resolve this bus contention problem, two algorithms, interleaving and normal Round robin algorithms in this

architecture. In the normal mode, the Round Robin design architecture would be implemented and as matter of

configurability interleaving mode is also incorporated. The main design modules consist of interleaving timer

module, Request register, data-mux, Mask register and state Machine Controller and Grant module. This

implements the interleaving mechanism to overcome the output delay (latency) while operating many-to-one

and one-to-many data communication. The overall architecture primitive modules are designed, simulated,

synthesized and structured in VHDL and by using various FPGA based EDA Tools. Key Words: Network on Chip, FPGA, Throughput

1. INTRODUCTION

A round-robin token passing bus or switch arbiter guarantees avoidance of starvation among masters and allows

any unused time slot to be allocated to a master whose Round-robin turn is later but who is ready now. The

protocol of a round-robin token passing bus or switch arbiter works as follows. In each cycle, one of the masters

(in round-robin order) has the highest priority (i.e., owns the token) for access to a shared resource. If the token-

holding master does not need the resource in this cycle, the master with the next highest priority who sends a

request can be granted the resource. At the end of the cycle, the highest priority master then passes (in round

robin order) the token to the next master which then will have the highest priority for the next time slot. The

next section shows the design of 2x2, 3x3 and 4x4 Bus Arbiters(BAs) generated by our tool. MxM hierarchical SAs and BAs generated by our Round- robin Arbiter Generator (RAG) have a hierarchical structure for values

of M greater than four. The ON-CHIP bus plays a key role in the system-on-a-chip (SOC) design by enabling

the efficient integration of heterogeneous system components such as CPUs, DSPs, application- specific cores,

memories, and custom logic. Recently, as the level of design complexity has become higher, SOC designs

require a system bus with high bandwidth to perform multiple operations in parallel. To solve the bandwidth

problems, there have been several types of high-performance on-chip buses proposed, such as the multilayer

AHB (ML-AHB) bus matrix from ARM, the PLB crossbar switch from IBM, and CONMAX from silicore.

Among them, the ML-AHB bus matrix has been widely used in many SOC designs. This is because of the

simplicity of the AMBA bus of ARM, which attracts many IP designers, and the good architecture of the

AMBA bus for applying embedded systems with low power. The ML-AHB bus matrix is an interconnection

scheme based on the AMBA AHB protocol, which enables parallel access paths between multiple masters and

slaves in a system.

2. ARBITRATION SCHEMES FOR THE ML-AHB BUSMATRIX OF ARM The ML-AHB bus matrix of ARM consists of the input stage, decoder, and output stage, including an arbiter.

Fig 1 shows the overall structure of the ML-AHB bus matrix of ARM. The input stage is responsible for holding

the address and control information when transfer to a slave is not able to commence immediately. The decoder

determines which slave that a transfer is destined for. The output stage is used to select which of the various

master input ports is routed to the slave. Each output stage has an arbiter. The arbiter determines which input

stage has to perform a transfer to the slave and decides which the highest priority is currently. The ML-AHB bus

matrix employs slave-side arbitration, in which the arbiters are located in front of each slave port, as shown in

Fig.1 the master simply starts a transaction and waits for the slave response to proceed to the next transfer.

Therefore, the unit of arbitration can be a transaction or a transfer. However, the ML-AHB bus matrix of ARM furnishes only transfer-based arbitration schemes, specifically transfer-based fixed-priority and round-robin

Page 2: technicaljournals.org ISSN: DESIGN OF ROUND ROBIN AND ...ijves.com/wp-content/uploads/2012/07/IJVES-Y12-TJ-E048.pdf2011 - TECHNICALJOURNALS ... designers have adopted the NOC approach.

Vol 03, Issue 01; January-April 2012 International Journal of VLSI and Embedded Systems-IJVES

http://technicaljournals.org ISSN: 2249 – 6559

2011 - TECHNICALJOURNALS®, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP,

IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India;

Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,

DOAJ, and other major databases etc.,

114

arbitration schemes. The transfer-based fixed-priority (round-robin) arbiter multiplexes the data transfer based

on a single transfer in a fixed-prior

Fig 1 Overall structure of the ML-AHB bus matrix of ARM

Arbitration scheme examples in an embedded system.

Internal structure of Round Robin arbiter

In Fig. 3, S_Number indicates the target slave number, P_Level means the priority level of a master, T_Length

denotes the desired transfer length of a master, and Offset_Add specifies the internal address of the target slave.

Each of S_Number and P_Level consists of 3 b because the maximum number of master–slave sets is 8. Also,

T_Length is composed of 4 b because the maximum number of burst lengths is 16. Although we used 7 b for

P_Level and T_Length in the 32-b address bus to notify the arbiters of the priority level and the desired transfer

length of a master.

Page 3: technicaljournals.org ISSN: DESIGN OF ROUND ROBIN AND ...ijves.com/wp-content/uploads/2012/07/IJVES-Y12-TJ-E048.pdf2011 - TECHNICALJOURNALS ... designers have adopted the NOC approach.

Vol 03, Issue 01; January-April 2012 International Journal of VLSI and Embedded Systems-IJVES

http://technicaljournals.org ISSN: 2249 – 6559

2011 - TECHNICALJOURNALS®, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP,

IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India;

Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,

DOAJ, and other major databases etc.,

115

3. BLOCK DIAGRAM OF A CONVENTIONAL CROSSBAR SWITCH

As chip manufacturing technology shrinks toward sub-nanometres, a chip die will comprise more and more of

processing blocks. Interconnections, communication, and utilization of shared resources of these blocks are

getting more complicated for System-on-Chip (SoC). This complex structure introduces an important challenge

to the designer: fast and fault-free on-chip communication. When processing blocks access a shared resource

simultaneously, arbitration of these clients has to be ensured. Priority encoders (PE) and arbiters are widely used

to allow only one block to access a shared resource. Priority encoding scheme always selects the highest precedence as defined by a priority sequence. As a result of this static scheme, unfairness is revealed. This

unfairness is also called starvation. Conversely, programmable priority encoder (PPE) provides a non-static

scheme to alter the priority sequence during an operation. These are the core functional blocks of round robin

arbiters (RRA). If an arbiter is designed in round-robin fashion, the priority of the system is altered in every

cycle, and round-robin arbiter selects the highest priority starting from the last selected request. This technique

almost always guarantees fairness.

RRAs are widely used in network switches and routers. Network switches and routers consist of crossbar

switches as the internal switching fabric, as shown in Fig 3.8. A crossbar switch comprises of three macro

blocks: Input FIFO buffer, arbiter/scheduler, and a crossbar fabric core. Switch scheduling algorithms are

important aspects to implement high speed network switches. These algorithms are implemented in

schedulers/arbiters. A scheduling algorithm selects input packets and generates proper control signals for

crossbar fabric to set up conflict free paths between input ports and output ports. Then, the crossbar core transfers the requests or packets according to granted control signals. Hence, in order to ensure high speed and

fairness, a crossbar switch requires an intelligent, centralized, and conflict free scheduler/arbiter algorithm. Most

crossbar schedulers are implemented in round robin fashion to prevent starvation of input ports. For example, a

well-known design is applied in Stanford University’s Tiny Tera prototype. The scheduler combines Islip

unicast scheduling algorithm and mRRM multicast scheduling algorithm.

4. SWITCH-BASED CROSSBAR STRUCTURE

The major components that make up the switch-based crossbar consist of Input Port module, Switch module and

Output Port module. Depending on the kind of channels and switches, there are two alternatives for designing

switch-based crossbar: unidirectional and bidirectional. In this paper, unidirectional switch-based crossbar is

selected and simplified with 6x6 switch-based crossbar structure shown in Fig. 4.1 because of resource constraint Obviously, transmitting data from a Source Node to a Destination Node require crossing the link

between the Source Node and the Input Port module, and the link between the Output Port module and the

Destination Node where the Switch module in data path will dynamically establish the link for the Output Port

module according to switching protocol. According to the 6x6 switch-based crossbar architecture, its switching

protocol shows in Fig.4.2.

Page 4: technicaljournals.org ISSN: DESIGN OF ROUND ROBIN AND ...ijves.com/wp-content/uploads/2012/07/IJVES-Y12-TJ-E048.pdf2011 - TECHNICALJOURNALS ... designers have adopted the NOC approach.

Vol 03, Issue 01; January-April 2012 International Journal of VLSI and Embedded Systems-IJVES

http://technicaljournals.org ISSN: 2249 – 6559

2011 - TECHNICALJOURNALS®, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP,

IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India;

Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,

DOAJ, and other major databases etc.,

116

4.1 Switch Based Cross Bar Switch

Fig 4.2 16-bit Switching Protocol

Because of resource constraint on the target FPGA, data width, the N.of word register, the destination port

register are 16,10 an d 6 bits respectively. Moreover, synchronous protocol is applied to synchronize all modules in the switch-based crossbar-Clock signal, Ready signal and Acknowledge signal.

4.2 Input Port module:In this section, the architecture and behavior of the Input port module are detailed. As

shown in Fig. 4.3(a), there are three components in this module comprised of a 16-bit tri-state buffer, a 1-bit tri-

state buffer and a finite state machine (FSM) component where the N. of word register and the destination port

register are resided. Its behavior shows in Fig. 4.3(b) based on switching conceptual.

INPUT MODULE

4.3 Output Port module:

In this section, the Output Port module architecture and the flow diagram are explained in detail. As shown in

Fig. 4.4(a), there are three components of this module, comprised of a 16- bit six-port multiplexer component which used to multiplex the Data in signals coming from the Switch Module, the Round- Robin component

assigned to guarantee the fairness of all input requirements and the FSM component where the 10-bit counter

register and the 6-bit Port Status register used to count down a number of through packets and to identify that

can occupy the path of this Output port module are resided. Since interleaving mechanism bases on Time-

Division-Multiplexer (TDM), the 16-bit six-port multiplexer component can multiplex all Data in signals

depended on Time Slot. For instance, Fig. 4.5(a) and 4.5(b) show the timing diagram, and the 6-bit Port Status

register where all Source node and the Source Node number 1,2,5 and 6 require transferring data to one

Destination Node, the 6-bit Port Status register in the Destination Node will be111111and110011.

Fig. 4.4(b) explains its behavior. At first, the state is in an Idle state, and the mode signals are read where there

are two kind of possible modes, normal mode and interleaving mode. In the normal mode, the Round Robin

component will be enabled but will be disabled in the interleaving mode. Whenever a Source Node requires

transferring its data to a Destination Node, the req signal of the Input Port module connected with the Source

Page 5: technicaljournals.org ISSN: DESIGN OF ROUND ROBIN AND ...ijves.com/wp-content/uploads/2012/07/IJVES-Y12-TJ-E048.pdf2011 - TECHNICALJOURNALS ... designers have adopted the NOC approach.

Vol 03, Issue 01; January-April 2012 International Journal of VLSI and Embedded Systems-IJVES

http://technicaljournals.org ISSN: 2249 – 6559

2011 - TECHNICALJOURNALS®, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP,

IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India;

Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,

DOAJ, and other major databases etc.,

117

Node will enable HIGH. Then, at the Output-Port module connected with the Destination Node, each bit of the

6-bit grant register related with each req signal of each Input Port module will set “1”. The N.of words signal

and the N.of words valid signal will be read and stored into the temporal register. Then, the state goes to the

Write state. In the Write state, each bit of the grant register is read to check the Source Node’s requirement and

to identify the location of the next state. In order to simplify, supposing the grant register contains 101100

It implies that the input port modules number 3,4 and 6 will forward their data to the output port module. Therefore, the state will start at the Run3 state, and the Data in ready signals number 3,4 and 6 will be enabled

HIGH. Until the Run6 state is executed, the state goes to the Check state. The counter register counting the

forwarded packets is checked, whether or not the forwarded packets has been completely sent. Whenever, they

have been sent properly this counter register will be “0000000000” and the Clear state will be done. At the

Clear state, the rep signals at 3,4 and 6 are enabled LOW and the grant register will be “000000”, as well as the

state will begin the next cycle.

RESULTS

Simulation Results generated by using Active HDL Tool

5. CONCLUSION In this project a flexible arbiter based on the SM arbitration scheme for the ML-AHB bus matrix is proposed.

The proposed arbiter supports three priority policies-fixed priority, round-robin, and dynamic priority-and three approaches to data multiplexing- transfer, transaction, and desired transfer length. Thus there are nine possible

arbitration schemes. In addition, the proposed SM arbiter selects one of the nine possible arbitration schemes

based on the priority-level notifications and the desired transfer length from the masters to allow the arbitration

to lead to the maximum performance. Experimental Results show that, although the area of the proposed SM

arbitration scheme is 9%–25% larger than those of other arbitration schemes, the proposed arbiter improves the

throughput by 14%–62% compared with other schemes. Hence it is expected that it would be better to apply SM

arbitration scheme to an application- specific system because it is easy to tune the arbitration scheme according

to the features of the target system.

Page 6: technicaljournals.org ISSN: DESIGN OF ROUND ROBIN AND ...ijves.com/wp-content/uploads/2012/07/IJVES-Y12-TJ-E048.pdf2011 - TECHNICALJOURNALS ... designers have adopted the NOC approach.

Vol 03, Issue 01; January-April 2012 International Journal of VLSI and Embedded Systems-IJVES

http://technicaljournals.org ISSN: 2249 – 6559

2011 - TECHNICALJOURNALS®, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP,

IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India;

Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,

DOAJ, and other major databases etc.,

118

ACKNOWLEDGEMENTS

The authors like to express their thanks to the management of SR Educational Society and the department of

ECE for their encouragement and support during this work. Special thanks to Dr.Prasad Rao, HOD ECE

department for his continuous help in pursuing our research work.

REFERENCES [1] D. Bafumba-Lokilo, Z. Savaria, J. David , “Generic Crossbar Network on Chip for FPGA MPSoCs”,

Circuits and Systems and TAISA Conference, 2008, pp 269-272.

[2]H. C. d. Freitas, P. Navaux, “On the Design of Reconfigurable Crossbar Switch for Adaptable On-Chip

Topologies in Programmable NoC Routers”, ACM Special Interest Group on Design Automation, 2009, pp.

129-132.

[3]H.Po-Tsang, H. Wei , “2-Level FIFO Architecture Design for Switch Fabrics in Network-on-Chip”, Circuits

and Systems, ISCAS 2006, Proceedings 2006, IEEE International Symposium on, pp.4863-4866.

[4]M. Hübner, L. Braum,D. Göhringer, J. Becker, “Run-Time Reconfigurable Adaptive Multilayer Network-

On-Chip for FPGA-Based systems”, IEEE International Symposium on In Parallel and Distributed Processing,

2008, pp. 1-6.

[5] C. Hilton, B. Nelson, “PNoC: a flexible circuit-switched NoC for FPGAbased systems”, IEE

Proceeding Computing Vol. 153, 2006, pp. 181-188

Authors Biography:

Amrut Raj Nalla had received his B.tech From Kakatiya University,Warangal(Dist),AP,India

in 2007,and Pursuing M.Tech degree in EMBEDDED SYSTEMS from J.N..TUniversity in

2012,He had Worked as Assistant.Professor in Kakatiya Institute Of Technology and Sciences

From 2007-2010. He is member of IETE, Guided many B.Tech Projects,and he taught several

Subjects that includes VLSI Design,Microprocessors and Microcontrollers,Integrated

Electronics,Wireless Communication and Networks…..Etc

P.Santhosh Kumar received his B.Tech. Degree from J.N.TUniversity, Hyderabad, India in

2009, the M.Tech. degree in EMBEDDED SYSSTEMS from J.N.TUniversity in 2011, He was an assistant professor, with Department of Electronics and Communication Engineering in S.R

college of engineering(AUTONOMOUS University) from 2010 .. His research interests

include OGG VORBIS DECODER ON ARM PLATFORM. He taught several subjects like

Analog Communication, Radar Systems, Electronics And Measurements Instrumentation,

Digital signal processor architectures, Microprocessors and controllers