[IEEE 2011 4th International Conference on Emerging Trends in Engineering and Technology (ICETET) -...

6
Design and implementation of High speed, Low area Multiported loadless 4T Memory cell Deepa Yagain #1 , Ankit Parakh #2 , Akriti Kedia #3 ,Gunjan kumar Gupta #4 # Dept of ECE, PESIT Bangalore, Karnataka, India 1 [email protected], 2 [email protected], 3 [email protected], 4 [email protected] Abstract—In several applications, the embedded SRAMs can occupy the majority of the chip area and contain hundreds of millions of transistors. Since RAMs are critical to processor performance, researchers have sought to optimize their performance and efficiency through reconfiguration [1].This paper presents the architecture and circuit design for a multiported SRAM building block. In this paper SRAM cell with load (6T) and without load (4T) is designed and implemented in 180nm technology and comparison between them was made in terms of power consumed, area used and access time. It was found that load less 4T SRAM cell consumes less power as compared to 6T SRAM cell and occupies lesser area. Here as an application example, 8X1 memory using loadless 4T is implemented. The decoding here is again done with Traditional CMOS decoder and Lyon Schediwy decoder. It is observed that the later performs much better in terms of power, timing and is area efficient. The 6T and 4T load less memory cell is further converted for multiport operation and simulated for various performance parameters such as area, power and delay and compared. Keywords- 6T Memory, 4T Memory, Loadless, Lyon Schediwy decoder, access time. I. INTRODUCTION SRAM cell design considerations are important for a number of reasons. Firstly, the design of an SRAM cell is the key to ensure stable and robust SRAM operation. Secondly, owing to continuous drive to enhance the on-chip storage capacity, the SRAM designers are motivated to increase the packaging density. For working of basic storage SRAM cells (conventional 6T-SRAM cell) we need other circuits as well which constitute part of memory array such as pre-charge circuits, sense amplifiers, decoders and write driver circuits. In this paper assembling of these circuits is done to form a functional memory array using 180nm technology. Later the crucial task of optimization of power and area in the memory array is done by making use of the New Load less 4T-SRAM cell. Multiport memory cell is the one which can be read and written using more than one port. These are very much required in high speed processors. The general block diagram of SRAM array is shown in Figure 1. There will be a total of n 2 SRAM cells in an n*n SRAM array. Each SRAM cell will store one bit of information. Decoder block will select one out of n rows by enabling a particular wordline (WL) of the SRAM cell array depending on the input address given to it. Sense amplifier are used to amplify the voltage coming off the Bit Lines when the Read Enable (RE) signal is enabled and thus increase the speed and decrease the power dissipation of the SRAM. Power consumption is strongly affected by the choice of supply voltage hence the circuits implemented in this paper are simulated using 180nm technology file which is one among the deep submicron technologies with 1.8V as power supply [2][3]. Figure 1:Block diagram of SRAM array. II. ANALYSIS OF MEMORY MODULES Different types of SRAM cells are based on the type of load used in the elementary inverters of the flip-flop cell. In this paper the two types of SRAM memory cells presented are: The 6T cell (six transistors - four NMOS transistors plus two PMOS transistors). The new 4T load less cell[4] (four transistors - two NMOS transistors plus two PMOS transistors ). A. 6T Memory Cell In this 6T memory cell as shown in Figure 2, the load is replaced by a PMOS transistor. This SRAM cell is composed of six transistors, one NMOS transistor and one PMOS transistor for each inverter, plus two NMOS transistors for access. This configuration is called a 6T Cell [6]. This cell offers better electrical performances (speed, noise immunity, standby current) than a resistive load 4T structure. The main disadvantage of this cell is its large size. Each bit in an SRAM is stored on four transistors that form two cross- coupled inverters. This storage cell has two stable states which are used to denote 0 and 1. Two additional access transistors serve to control the access to a storage cell during read and write operations. Access to the cell is enabled by 2011 Fourth International Conference on Emerging Trends in Engineering & Technology 978-0-7695-4561-5/11 $26.00 © 2011 IEEE DOI 10.1109/ICETET.2011.23 268

Transcript of [IEEE 2011 4th International Conference on Emerging Trends in Engineering and Technology (ICETET) -...

Page 1: [IEEE 2011 4th International Conference on Emerging Trends in Engineering and Technology (ICETET) - Port Louis, Mauritius (2011.11.18-2011.11.20)] 2011 Fourth International Conference

Design and implementation of High speed, Low area Multiported loadless 4T Memory cell

Deepa Yagain#1, Ankit Parakh#2, Akriti Kedia#3 ,Gunjan kumar Gupta#4

#Dept of ECE, PESIT Bangalore, Karnataka, India

[email protected],[email protected], [email protected],[email protected]

Abstract—In several applications, the embedded SRAMs can occupy the majority of the chip area and contain hundreds of millions of transistors. Since RAMs are critical to processor performance, researchers have sought to optimize their performance and efficiency through reconfiguration [1].This paper presents the architecture and circuit design for a multiported SRAM building block. In this paper SRAM cell with load (6T) and without load (4T) is designed and implemented in 180nm technology and comparison between them was made in terms of power consumed, area used and access time. It was found that load less 4T SRAM cell consumes less power as compared to 6T SRAM cell and occupies lesser area. Here as an application example, 8X1 memory using loadless 4T is implemented. The decoding here is again done with Traditional CMOS decoder and Lyon Schediwy decoder. It is observed that the later performs much better in terms of power, timing and is area efficient. The 6T and 4T load less memory cell is further converted for multiport operation and simulated for various performance parameters such as area, power and delay and compared.

Keywords- 6T Memory, 4T Memory, Loadless, Lyon Schediwy decoder, access time.

I. INTRODUCTION SRAM cell design considerations are important for a

number of reasons. Firstly, the design of an SRAM cell is the key to ensure stable and robust SRAM operation. Secondly, owing to continuous drive to enhance the on-chip storage capacity, the SRAM designers are motivated to increase the packaging density. For working of basic storage SRAM cells (conventional 6T-SRAM cell) we need other circuits as well which constitute part of memory array such as pre-charge circuits, sense amplifiers, decoders and write driver circuits. In this paper assembling of these circuits is done to form a functional memory array using 180nm technology. Later the crucial task of optimization of power and area in the memory array is done by making use of the New Load less 4T-SRAM cell. Multiport memory cell is the one which can be read and written using more than one port. These are very much required in high speed processors.

The general block diagram of SRAM array is shown in Figure 1. There will be a total of n2 SRAM cells in an n*n SRAM array. Each SRAM cell will store one bit of information. Decoder block will select one out of n rows by enabling a particular wordline (WL) of the SRAM cell array depending on the input address given to it. Sense amplifier are used to amplify the voltage coming off the Bit Lines

when the Read Enable (RE) signal is enabled and thus increase the speed and decrease the power dissipation of the SRAM. Power consumption is strongly affected by the choice of supply voltage hence the circuits implemented in this paper are simulated using 180nm technology file which is one among the deep submicron technologies with 1.8V as power supply [2][3].

Figure 1:Block diagram of SRAM array.

II. ANALYSIS OF MEMORY MODULES Different types of SRAM cells are based on the type of

load used in the elementary inverters of the flip-flop cell. In this paper the two types of SRAM memory cells presented are:

• The 6T cell (six transistors - four NMOS transistors plus two PMOS transistors).

• The new 4T load less cell[4] (four transistors - two NMOS transistors plus two PMOS transistors ).

A. 6T Memory Cell In this 6T memory cell as shown in Figure 2, the load is

replaced by a PMOS transistor. This SRAM cell is composed of six transistors, one NMOS transistor and one PMOS transistor for each inverter, plus two NMOS transistors for access. This configuration is called a 6T Cell [6]. This cell offers better electrical performances (speed, noise immunity, standby current) than a resistive load 4T structure. The main disadvantage of this cell is its large size. Each bit in an SRAM is stored on four transistors that form two cross-coupled inverters. This storage cell has two stable states which are used to denote 0 and 1. Two additional access transistors serve to control the access to a storage cell during read and write operations. Access to the cell is enabled by

2011 Fourth International Conference on Emerging Trends in Engineering & Technology

978-0-7695-4561-5/11 $26.00 © 2011 IEEE

DOI 10.1109/ICETET.2011.23

268

Page 2: [IEEE 2011 4th International Conference on Emerging Trends in Engineering and Technology (ICETET) - Port Louis, Mauritius (2011.11.18-2011.11.20)] 2011 Fourth International Conference

the word line (WL in figure) which controls the two access transistors M5 and M6 which, in turn, control whether the cell should be connected to the bit lines: BL and ~BL. They are used to transfer data for both read and write operations. Although it is not strictly necessary to have two bit lines, but both the signal and its inverse are typically provided in order to improve noise margins.

Figure 2: 6T memory cell.

A.1 Reading

Assume that the content of the memory is a 1, stored at Q. The read cycle is started by precharging both the bit lines to a logical 1, then asserting the word line WL, enabling both the access transistors. The second step occurs when the values stored in Q and ~Q are transferred to the bit lines by leaving BL at its precharged value and discharging BLB through M1 and M5 to a logical 0 (i.e eventually discharging through the transistor M1 as it is turned on because the Q is logically set to 1). On the BL side, the transistors M4 and M6 pull the bit line towards VDD, a logical 1 (i.e eventually being charged by the transistor M4 as it is turned on because Q is logically set to 0). If the content of the memory was a 0, the opposite would happen and BL would be pulled towards 1 and BL towards 0. Then these BL and ~BL will have a small difference of delta between them and then these lines reach a sense amplifier, which will sense which line has higher voltage and thus will tell whether 1 was stored or 0. The higher the sensitivity of sense amplifier, the faster the speed of read operation is.

Figure 3: Read Operation.

A.2 Writing The start of a write cycle begins by applying the value to be written to the bit lines. If we wish to write a 0, we would apply a 0 to the bit lines, i.e. setting BL to 0 and ~BL to 1.

A 1 is written by inverting the values of the bit lines. WL is then asserted and the value that is to be stored is latched in. Note that the reason this works is that the bit line input-drivers are designed to be much stronger than the relatively weak transistors in the cell itself, so that they can easily override the previous state of the cross-coupled inverters. Careful sizing of the transistors in an SRAM cell is needed to ensure proper operation.

Figure 4: Write Operation.

If the word line is not asserted, the access transistors M5 and M6 disconnect the cell from the bit lines. The two cross coupled inverters formed by M1–M4 will continue to reinforce each other as long as they are connected to the supply. The schematic showing 6T cell along with sense amplifier, precharge and write drive circuit is given in Figure 6.

Figure 5: A schematic showing 6T cell along with sense amplifier, precharge and write drive circuit.

B. New Loadless 4T-SRAM Cell In the new loadless 4T-SRAM cell [5] two NMOS

transistors are used as pass transistors to access the cell and two PMOS transistors are used as drivers for the cell. The bitlines are precharged to ground instead of VDD. For comparable speed and stability, the area occupancy and the power consumption of the new loadless 4T-SRAM cell is lesser than that of the conventional 6T-SRAM cell.

269

Page 3: [IEEE 2011 4th International Conference on Emerging Trends in Engineering and Technology (ICETET) - Port Louis, Mauritius (2011.11.18-2011.11.20)] 2011 Fourth International Conference

Compared to 6T and 4T this design consumes less power, area and access time. Read and write operations can be performed on 4T circuit as shown in Figure 8.

Figure 6: loadless 4T SRAM cell.

Figure 7: Schematic of 4T cell along with sense amplifier, precharge and write drive circuit.

III. ANALYSIS OF SENSE AMPLIFIERS ,WRITE DRIVERS AND PRECHARGE CIRCUITS

A. Sense Amplifier

Sense amplifiers (SA) [7] are an important component in memory design. The primary function of a SA in SRAMs is to amplify a small analog differential voltage developed on the bit lines by a read-accessed cell to the full swing digital output signal, thus greatly reducing the time required for a read operation. Since SRAMs do not feature data refresh after sensing, the sensing operation must be nondestructive.

A latch-type SA is shown in Figure 9. This type of a SA is formed by a pair of cross-coupled inverters, much like a 6T

SRAM cell. The sensing starts with biasing the latch-type SA in the high-gain metastable region. When a cell is accessed by the word line WL, one of the bit lines BL or BLB is discharged to a sufficient voltage differential and the SA is enabled by a high-to-low transition of SAE pulse. This makes BL and BLB to attain the required voltage levels sooner.

Figure 8: Typical circuit with a latch-type sense amplifier.

B. Write Driver

The function of the SRAM write driver is to quickly discharge one of the bit line from the precharge level to below the write margin of the SRAM cell. Normally, the write driver is enabled by the Write Enable (WE) signal and drives the bit line using full-swing discharge from the precharge level to ground. The order in which the word line is enabled and the write drivers are activated is not crucial for the correct write operation. WE and its complementary WEB are used to discharge BL or BLB through the NMOS transistors. The write driver is presented in Figure 10.

Figure 9: Write driver circuit.

270

Page 4: [IEEE 2011 4th International Conference on Emerging Trends in Engineering and Technology (ICETET) - Port Louis, Mauritius (2011.11.18-2011.11.20)] 2011 Fourth International Conference

C. Precharge Circuits

Integrated semiconductor memory devices must be capable of high speed operation. To improve the overall speed of memory devices, a bit line pre-charge circuit is conventionally used to produce a desired voltage level on bit lines prior to a write or read operation. By first producing desired voltage levels on bit lines, the speed of a memory device may be maximized during read or write operations. The precharge circuit serves to refresh or precharge the value held in the SRAM cell every clock cycle. This is achieved by raising both BL and BLB to VDD during the low edge of every clock cycle. The pFETs on either side are the main functioning units of the precharge circuit, while the pFET on the bottom serves as an equalizer pFET by compensating for non-idealities between the other pFETs [6].

Figure 10.1: Schematic of precharge circuit for 6T SRAM.

Precharge circuit for loadless 4T RAMs is shown in the schematic below. Two transistors will precharge the bitlines while the other transistor will equalize them to ensure both bit lines within a pair are at the same potential before the cell is read.

Figure 10.2: Schematic of precharge circuit for new loadless 4T SRAM.

IV. DESIGN & ANALYSIS OF MULTIPORTED SRAM

Multi-port memories are commonly used components in VLSI systems, such as register files in microprocessors, storage for media or network applications. The content of a multi-port memory can be accessed

through different ports simultaneously. This feature is especially valuable for high speed processors, media processors, and communication processors. Multi-port memories require more testing effort since all ports have to be verified. Register files are generally fast SRAMs with multiple read and write ports. A general multi-ported 6T SRAM and 4T loadless SRAM is shown below:

Figure 11.1: Multiported 6T SRAM Cell.

Figure 11.2: Multi-ported 4T loadless SRAM.

V. SIMULATION RESULTS

Figure 12: Simulation waveforms of basic 6T memory cell.

271

Page 5: [IEEE 2011 4th International Conference on Emerging Trends in Engineering and Technology (ICETET) - Port Louis, Mauritius (2011.11.18-2011.11.20)] 2011 Fourth International Conference

Figure 13: Simulation waveforms of basic 4T Load less memory cell.

Figure 14: Simulation waveforms of basic 6T multiport memory cell.

Figure 15: Simulation waveforms of 4T Load less multiport memory cell.

The comparison between 4T and 6T basic and multiport memory cells is presented in Table 1. The parameters compared are Area (in terms of number of transistors), Power and Delay. It is seen that 4T memory performs better among all these parameters.

Table 1: Comparision between 6T and 4T loadless memory cells.

Table 2: Comparision between 6T and 4T loadless multiport memory cells.

VI. APPLICATION EXAMPLE In this paper 8X1 memory is implemented as an

application example. The decoding is a process in which word line for a particular memory cell is enabled. In this paper the decoding operation is performed using:

• Conventional decoder circuit • Lyon-Schediwy decoder

An address decoder is inserted for selecting memory words. This is selected by providing a binary encoded address word (A0 to Ak-1). The decoder translates this address into N = 2k select lines. Thus the address decoder reduces the number of external address lines from N to log2N.The logical effort of a decoder can be reduced by observing that only one of the outputs will be high so the pMOS transistors can be shared among many outputs. The Lyon-Schediwy decoder [8] can be viewed as 2n n-input NOR gates sharing pMOS pull-ups, as shown in the fig 18 for a 3:8 decoder. Relative transistors widths are chosen to present the same capacitance to each input while providing current drive equal to a unit inverter. The logical effort of each input also decreases. Decoders need many stages, so the fastest design is the one that minimizes the logical effort which is well achieved for Lyon-Schediwy decoder.

272

Page 6: [IEEE 2011 4th International Conference on Emerging Trends in Engineering and Technology (ICETET) - Port Louis, Mauritius (2011.11.18-2011.11.20)] 2011 Fourth International Conference

Figure 16: Conventional decoder.

Figure 17: Lyon-Schediwy decoder.

Figure 18: Schematic of 8X1 Memory.

VII. CONCLUSION The New Loadless multiport 4T-SRAM cell is designed

and analysed in 180nm and 130nm. It has established technology independence and is consistent in performance in deep sub-micron regime. The multiport Loadless 4T SRAM array consumes lesser power with lesser area than the Conventional 6T SRAM array.

This multiport loadless 4T SRAM array can be used for on-chip caches in embedded microprocessors, high-density SRAMs embedded in any logic devices, as well as for stand-alone SRAM applications.

VIII. FUTURE SCOPE The Loadless 4T-SRAM cell can be analyzed for it’s

noise immunity with respect to 6T multiport RAM cell.As an enhancement, higher density memory array can be simulated along with the peripherals and performance can be observed. Design needs to be verified for deep submicron technologies like 90nm,65nm etc.

IX. REFERENCE [1] “Architecture and Circuit Techniques for a 1.1- GHz 16-kb Reconfigurable Memory in 0.18-μm CMOS”, IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 1, JANUARY 2005 [2] J. B. Burr and A. M. Peterson, “Ultra low power CMOS technology,” in Proc. NASA VLSI Design Symp., Oct. 1991, pp. 4.2.1–4.2.13. [3] A. Bryant et al., “Low-power CMOS at Vdd = 4 kt=q,” in Proc. Device Research Conf., Jun. 25–27, 2001, pp. 22–23. [4] Yen-Jen Chang, Shanq-Jang Ruan, and Feipei Lai, “Design and Analysis of Low-Power Cache using Two-Level Filter Scheme”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 11, no. 4, August 2003. [5] Jinshen Yang and Li Chen, “A New loadless 4-transistor SRAM cell with a 0.18μm CMOS technology”, Electrical and Computer Engineering, CCECE Canadian Conference, April 2007. [6] Neil H E Weste, David Harris and Ayan Banerjee, “CMOS VLSI design –a circuit and system perspective”. [7] Jan.M.Rabaey, Anantha.P.Chandrakasan, and Borivoje Nikolic, “Digital Integrated Circuits”, PHI, 2000.

273