ee457 Quiz Fall2016 -

13
September 22, 2016 10:09 am EE457 Quiz - Fall 2016 1 / 12 C Copyright 2016 Gandhi Puvvada EE457 Quiz (~10%) Closed-book Closed-notes Exam; No cheat sheets; No cell phones or computers Calculators and Verilog Guides are not needed and hence not allowed. Fall 2016 Instructor: Gandhi Puvvada Thursday, 9/22/2016 (A 2H 50M exam) 05:30 PM - 08:20 PM (170 min) in THH101 Viterbi School of Engineering University of Southern California Ques# Topic Page# Time Points Score 1 State Diagram, RTL Design 2-5 50 min. 68 2 CPU Performance 5 20 min. 28 3 Unsigned and Signed numbers 6 20 min. 35 4 MIPS ISA, Byte-addressable processors 7-9 45 min. 62 5 Single-Cycle CPU 10-11 25 min. 45 Total 11 160 min. 238 Perfect Score 230 Student’s Last Name: _______________________________________ Student’s First Name: _______________________________________ Student’s DEN D2L username: ______________________________ @usc.edu

Transcript of ee457 Quiz Fall2016 -

Page 1: ee457 Quiz Fall2016 -

September 22, 2016 10:09 am EE457 Quiz - Fall 2016 1 / 12 C Copyright 2016 Gandhi Puvvada

EE457 Quiz (~10%)Closed-book Closed-notes Exam; No cheat sheets; No cell phones or computers

Calculators and Verilog Guides are not needed and hence not allowed.

Fall 2016Instructor: Gandhi Puvvada

Thursday, 9/22/2016 (A 2H 50M exam)05:30 PM - 08:20 PM (170 min) in THH101

Viterbi School of EngineeringUniversity of Southern California

Ques# Topic Page# Time Points Score

1 State Diagram, RTL Design 2-5 50 min. 68

2 CPU Performance 5 20 min. 28

3 Unsigned and Signed numbers 6 20 min. 35

4 MIPS ISA, Byte-addressable processors

7-9 45 min. 62

5 Single-Cycle CPU 10-11 25 min. 45

Total 11 160 min. 238

Perfect Score 230

Student’s Last Name: _______________________________________

Student’s First Name: _______________________________________

Student’s DEN D2L username: [email protected]

Page 2: ee457 Quiz Fall2016 -

September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 2 / 12 C Copyright 2016 Gandhi Puvvada

1 ( 7 + 8 + 3 + 12 + 28 = 58 points) 45 min.

State Diagram and RTL design (Iteration counter advancement and terminal count checking):

1.1 A rectangular array of bits, A[I,J], are to be cleared starting from A[1,5] to A[5,1]. It is a total of 25 bits as shown in the diagram. Row index I goes from 1 to 5 and Column index J goes from 5 to 1.The clearing is done row by row from the top row (I == 1) to the bottom row (I == 5). And with in a row, we start from the right end (J == 5) to the left end (J == 1). It is a row-major order as shown in the pseudo code. Complete the state diagram below.

1.1.1 You know that a later assignment over-rides an earlier assignment in a procedural block of Verilog HDL. Complete the two RTL snippets for the CLR case branch. You can use a statement such as I <= I; or J <= J; if needed in your if statement.

In the left-side RTL, J is by default decremented and this decrementation is over-ridden as needed.

In the right-side RTL, I is by default incremented and this incrementation is over-ridden as needed.

for (I = 1; I <= 5; I++){ for (J = 5; J >= 1; J--) A[I][J] = 0;}

I

12345

1 2 3 4 5J

7pts

I<= 1;J<= 5;

INI DONE

START

START

ACK

ACK

RES

ET CLRA[I,J] <= 0;

C

if (J != ) J <= J - 1;

else { J <= ; I <= I + 1; }

C

C = (I == ) (J == )

When you reach the DONE statewhat are the values of I,J?I = ; J = ;

Number of clocks spent in CLRstate =

8pts

CLR: beginA[I,J] <= 0;J <= J - 1; // by default if (

end

CLR: beginA[I,J] <= 0;I <= I + 1; // by default if (

end

Page 3: ee457 Quiz Fall2016 -

September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 3 / 12 C Copyright 2016 Gandhi Puvvada

1.2 How many B[K] locations are cleared in the completed state diagram below? ____________

1.3 Complete the state diagram below to clear 50 bits of A[I,J,K] (25 bits of A [I,J,2] and 25 bits of A [I,J,1] ). Like in Q#1.1, for each of the two values of K, the I and the J can be made to go from [I,J]=[5,1] to [I,J]= [1,5]. The array A is maintained in a flash memory. You perhaps know that flash memory takes several clocks to write to it. For the sake of this problem, let us say that we need a minimum of two clocks to write to it and we need to wait until we get a positive indication from the flash memory status output FWD (FWD = Flash memory Writing Done) in the form of (FWD == 1). The Flash memory is not expected to update FWD properly in the first clock when you start writing. So we should not assume that it finished writing even if we see FWD = 1 during the first clock. A Flag F, initially cleared to zero, can be used to make sure that we keep I, J, and K unchanged at least for one clock. At the end of the first clock, the flag F is set indicating that the FWD can be relied upon. From the 2nd clock onwards we wait for (FWD == 1) to update the location coordinates , I, J, and K, clear the Flag F.

3pts

K<= 2;

INI DONE

START

START

ACK

ACK

RES

ET CLRB[K] <= 0;

K = 1

K <= K - 1;K = 1

for (K = 2; K >= 1; K--){ for (I = 1; I <= 5; I++)

{ for (J = 5; J >= 1; J--)A[I,J,K] = 0;

}}

12pts

INI DONE

START

START

ACK

ACK

RES

ET CLR

A[I,J,K] <= 0;

C

if (J != ) J <= J - 1;

else if ( { J <= ; I <= I + 1; }

C

C =

I<= 1;J<= 5;K<= 2;

else {

}

When you reach the DONE statewhat are the values of I,J,K,F?I = ; J = ; K = ; F = ;

F<= 0;

Page 4: ee457 Quiz Fall2016 -

September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 4 / 12 C Copyright 2016 Gandhi Puvvada

1.4 You have done the Min/Max lab Part 1 (with two comparators) and Part 2 (with one comparator).Part 1 takes a constant number of clocks (1+15 = 16) where as Part 2 takes a variable number of clocks (1+15 = 16 at minimum for an ascending data and 1 + 15*2 = 31 at maximum for a descending data). Here you are given a dual port memory M and two comparators, and are asked to improve the best case. The Comparator #0 looks at all even numbered locations M[I] and arrives at one set of results, Max0 and Min0. The Comparator #1 looks at all odd numbered locations M[J] and arrives at another set of results, Max1 and Min1. Finally we have a Merge state, where we compare the two sets of results (Comp#0 compares Max0 and Max1 while Comp#1 compares Min0 and Min1) and arrive at the final results, Max and Min. So the best case is 1+7+1=9 clocks for an ascending data. The worst case is 1+14+1=16 for a descending data. So, while in the CMxMn state, if one of the two, Com#0 or Comp#1, has finished its job before the other, he should just wait. Only either when both are about to be done together or when one is done and the other is about to be done, you should prepare to move to the Merge state. An inferior designer waits until both are done (rather than about to be done) and wastes one clock. We will send him back to EE354L.

Here I and J counters are 5 bit counters. I starts with 00000 and J starts with 00001. Each is incremented by 2 whenever it needs to be incremented (I<=I+2; J<=J+2;). So I remains even always and J remains odd always. When I is done, it becomes 16 (10000) and when J is done, it becomes 17 (10001). You may be able to use some or all of the following conditions in your design.(I==14), (I!=14), (I==16), (I!=16), (J==15), (J!=15), (J==17), (J!=17), M[I] >= Max0 , M[I] <= Min0, M[J] >= Max1 , M[J] <= Min1, Max1 >= Max0, Min1 <= Min0,

16ptsReset

Start

Start 1

INI LOAD

Merge

CMxMn

State Diagram for Part 2 for Dual-Port Memory

I <= 0;

Min0 <= M[I];Max0 <= M[I];I <= I + 2;

1

2 flags, 2 comparators, 2 counters.

Flag0<=0;

DONE

J <= 1;Flag1<=0;

Min1 <= M[J];Max1 <= M[J];J <= J + 2;

C

1

if (Max1 >= Max0) Max <= Max1;else Max <= Max0;

if (Min1 <= Min0) Min <= Min1;else Min <= Min0;

C

Page 5: ee457 Quiz Fall2016 -

September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 5 / 12 C Copyright 2016 Gandhi Puvvada

You need to complete the design on the previous page by (a) filling up "0" or "1" in the six boxes (b) writing I <= I + 2; or J <= J + 2; at appropriate places in the code(c) and finally figuring out the condition "C" governing state transition from CMxCMn state to the Merge state. Write the long boolean expression below for the condition C.C = ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

2 ( 14 + 8 = 22 points) 15 min. Performance

2.1 Let us say, we are a hardware IP (Intellectual Property) vendor. We designed a Multiply instruction execution IP which takes 20 clocks at 2 GHz and sold the IP to two processor manufacturers, XYZ and ABC, who implemented our USC CISC ISA. The frequency of usage of the multiply instruction (in the dynamic execution trace of the binary produced by compiling the benchmark by a third party compiler) is 10%. Both processors ran at 2 GHz. But percentage of execution time spent on our Mult instruction is 20% in XYZ and 25% in ABC. 1. If you have adequate data, compare the performance of the two processors. If the data is inadequate, explain how it is inadequate. Also, if there is excessive data, state what is excessive.________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________2. If you have adequate data, compare the native MIPs ratings of the two processors. If the data is inadequate, explain how it is inadequate.________________________________________________________________________________________________________________________________________________________3. If you have adequate data, compare the relative MIPs ratings of the two processors. If the data is inadequate, explain how it is inadequate.________________________________________________________________________________________________________________________________________________________

2.2 10% improvement in Instruction A is better than 10% improvement in Instruction B, if (select)(a) frequency of occurrence of A in the dynamic execution trace is more than that of B(b) percentage of execution time spent on A is higher than that of B(c) CPI of A is higher than CPI of B(d) none of the above

Reducing clocks taken by Instruction A by 1 clock is better than reducing clocks taken by Instruction B by 1 clock, if (select)(a) frequency of occurrence of A in the dynamic execution trace is more than that of B(b) percentage of execution time spent on A is higher than that of B(c) CPI of A is higher than CPI of B(d) none of the above

12pts

14pts

8pts

Page 6: ee457 Quiz Fall2016 -

September 22, 2016 10:09 am EE457 Quiz - Fall 2016 6 / 12 C Copyright 2016 Gandhi Puvvada

3 ( 10 + 25 = 35 points) 20 min. unsigned and signed numbers

3.1 If you are allowed to use numbers from the SW (South-West) quadrant only (as shown in the figure below) you can only arrive at some of the 8 combinations in the table on the side. Cross off the rows, which are not possible to fill with these limited choices. Complete the last two columns for the remaining rows.

3.2 Given two 4-bit numbers A (a3a2a1a0) and B (b3b2b1b0), produce 2AleB_BW. 2AleB_BW stands for 2A (double of A) le (less than or equal) to B BW both ways (BW = both ways = whether we treat A and B as signed numbers in 2’s complement notation or unsigned numbers). Let us analyze to see which cases need an actual comparator to compare and which can be concluded easily.

To double A (a3a2a1a0), we can append a zero at the right-end whether A is signed (2’s comp) or unsigned, so basically we are comparing a3a2a1a00 with b3b2b1b0 . _________ T / FIf a3 is a 1, we can conclude 2AleB_BW as ________ (true/false) without any comparator because when A and B are considered to be _____________(signed/unsigned) , the 2A is too ________ (big/small) for any B to match up. For the remaining part, i.e for (a3=0), we need to consider the two cases, (b3=0) and (b3=1). For the case [(a3=0) and (b3=1)], we can conclude 2AleB as false, without any comparator, if both A and B are treated as ______________ (signed/unsigned) numbers. For the case [(a3=0) and (b3=0)], since both are positive, we can use an unsigned 4-bit comparator to find if 2AleB is true or not by comparing a2a1a00 with b3b2b1b0 ______ T / F.If a2a1a0 is lower than b3b2b1 then a2a1a00 is lower than b3b2b1b0 even if b0 is a 1. __ T/FHowever, for a2a1a00 to be equal to b3b2b1b0 , the b0 needs to be a ___ (0 / 1)Complete the 3-bit unsigned comparison below to compare a2a1a0 with b3b2b1. Combine it with b0 to produce an IntR ( = intermediate result) standing for a2a1a00 is lower than or equal to b3b2b1b0 . Combine this with requirements on a3 and b3 to produce the final inference 2AleB_BW.

Operation ResultRight/Wrongif numbers aretreated as signed numbers

ResultRight/Wrongif numbers aretreated as unsigned numbers

V Raw Carry

C4

Right

Wrong

Addition

Subtraction

Subtraction

Subtraction

Subtraction

Addition

Addition

Addition Right

Right

Wrong

Wrong Wrong

Right

Right

Wrong

Right

Right

Wrong

Wrong Wrong

Right

OV

ERFL

OW

(COUT)

10pts

1514

13

12

11

10

9 8

01

2

3

4

5

6

7

00000001

0010

0011

0100

0101

0110011110001001

1010

1011

1100

1101

11101111

Error point:C bit is setUNSIGNED

SMA

LLER

mag

.LA

RG

ER m

ag.

- 1- 2

- 3

- 4

- 5

- 6

- 7 - 8

+0+1

+2

+4

+5

+6

+7

00000001

0010

0011

0100

0101

0110

011110001001

1010

1011

1100

1101

11101111

Error Point:V bit is set

+3

SIGNED

SMALLER mag.

LARGER mag.

4-bitCirclesJust FYI

SW SW

25pts

a bcin

scout

a bcin

scout

a bcin

scout

X2 X1 X0

Y2 Y1 Y0

S2 S1 S0

C0

ADD/

SUB

RawC

arry

Carry

V

a2 a1 a0

b b b

d2 d1 d0

VDD

Zero

d0

d2

d1

2AleB_BWIntR_Lower

IntR_Equal

IntR

Page 7: ee457 Quiz Fall2016 -
Page 8: ee457 Quiz Fall2016 -

September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 7 / 12 C Copyright 2016 Gandhi Puvvada

4 ( 6+3+3+4+3+6+6+6+3+8+20 = 68 points) 50 min. MIPS ISA, Byte-addressable processors

4.1 It is possible to replace the unconditional branch, beq $0,$0, L2 with (i) a jump instruction: j L2 (ii) a jump register instruction jr $8 where $8 was previously preloaded with the address 0000_006C using the following instructions: lui $8, 0000h; ori $8, $8, 006Ch

How do you compare the three choices? _______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

4.1.1 Arrive at the 16-bit offset filed in the translation of beq $0,$0, L2 on the side.

4.1.2 Arrive at the 26-bit filed in the translation of j L2 on the side.

4.2 Reproduced below is an extract from your class-notes

6pts

// upstream code40 beq $1, $2, L1;44 addu $4, $0, $0;48 beq $0, $0, L2;L1(=4C) ................60 ....64 ....L2(=6C)....

00000 00000 3pts

3pts

Page 9: ee457 Quiz Fall2016 -

September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 8 / 12 C Copyright 2016 Gandhi Puvvada

4.2.1 I have corrected the textbook code incorrectly because I assumed that _____________________________________________________________________________________________________________________________________________________________________________

4.2.2 The "add" in the five-instruction rectangle can be replaced by "addi $29, $29, 4" to add a 4 to move the stack pointer $29. Similarly the "sub" can be replaced by "_____________________________________________" to subtract a 4 to move the stack pointer $29.

4.2.3 A student wrote the above code initially and later while debugging he decided that the call to subroutine C was not required. But he just deleted the jal C instruction instead of deleting all the 5 instructions in the rectangles! Is it that it is just wasteful that he is executing 4 instructions or is it harmful? _______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

4.2.4 Answer the above question again, this time assuming that it is a conditional call instruction beqal $1, $2, C in the place of jal C . Recall the made-up instruction beqal (branch if equal and link) we used in an earlier exam question, where we write to the link register $31 conditionally. _____________ ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

4.2.4.1 Based on the above analysis, Mr._______________ (Bruin / Trojan) recommends removal of the AND gate and OR gate in the rectangle by redefining the beqal as conditional call but unconditional deposition of the return address into the link register $31. And he further requires that every use of beqal should be preceded by two lines to __________ (save/retrieve) the contents of $31 on the compiler designated stack and followed by two lines to __________ (save/retrieve) the return address saved on the stack into the link register $31.

4.3 The textbook and the class-notes show a ___________ (SRAM / SSRAM) for the Data Memory _______________ (though/because) in real design we use ___________ (SRAM / SSRAM) for the Data Memory (Data Cache).

4pts

3pts

6pts

6pts

6pts

3pts

Page 10: ee457 Quiz Fall2016 -

September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 9 / 12 C Copyright 2016 Gandhi Puvvada

4.4 Intel follows ___________ (Little Endian / Big Endian) system. In the Intel 80486 processor system address space, byte 0000_747CH is the ____________ (most / least) significant byte of the 32-bit word with system address ______________ (state in hexadecimal). State the next three 32-bit word addresses, next to the 32-bit word 40: __________________________________.State the next three 64-bit long word addresses, next to the 64-bit long word 40, in the context of i860 (64-bit data 32-bit logical address byte addressable processor): ______________________.

4.5 Shown on the side is the memory interface to a byte-wide memory chip in a memory system based on minimum number of byte-wide banks for an USC128 processor (128-bit data, 32-bit logical address, byte-addressable processor) . USC processors are similar to Intel processors so far as byte-enable pins are concerned. The address pins on the processor are (select) (i) A[31:0] (ii) A[31:3],/BE[7:0] (iii) A[31:4],/BE[15:0]

Fill-in the 3-blanks (marked by the 3 arrows) in the figure on the side. Also find the system addresses corresponding to the lowest-addressed two bytes of this memory chip. The lowest-addressed two bytes of this chip map to the system byte addresses (in hex) _______________________________ _________________________________________________.The system addresses mapping to any location in this memory chip will have the same upper ________ (state a number) bitsnamely ______________ (state their labels in the form X[13:2]). If this chip goes bad, until you replace, you should avoid using memory in system address range (state the range in hex): ______________________________________________________

8pts

A31A30A29A28A27A26A25A24

A23A22A21A20

CS

WERD

A[ ]D[7:0]

D[ ]

A[19:4]

BE4

____KB

Note

20pts

Blank area (for rough work)

Page 11: ee457 Quiz Fall2016 -

September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 10 / 12 C Copyright 2016 Gandhi Puvvada

5 ( 15 + 30 = 45 points) 25 min. Single-cycle CPU:

You are familiar with the ordinary jump instruction J (Jump with the 26-bit jump address field), Jal (Jump and Link), Jr rs, (Jump register rs), and the Beq (Branch if Equal). In class we discussed a made-up instruction, Beqal (Branch if equal and link, a conditional call instruction). For this question, assume that the Beqal writes unconditionally to the link register $31 the return address (PC+4) . Hence the control signal "beqal" is crossed off in the control signal table below.

5.1 The data path on the next page is nearly complete. Complete the connections to the 7 loose ends which were marked with numbered arrows .

5.2 Control Signal Table: Complete the three rows and three columns. Whenever possible, use don’t cares.

Inst

ruct

ion

AL

USr

c

AL

UO

p1

AL

Uop

0

Reg

Wri

te

Reg

Dst

Mem

tore

g

Mem

Rea

d

Mem

Wri

te

Bra

nch

beqa

l (no

t ne

eded

)

JUM

P

Jal

JR

R-format 0 1 0 1 1 0 0 0 0 Xlw 1 0 0 1 0 1 1 0 0 Xsw 1 0 0 0 X X 0 1 0 Xbeq 0 0 1 0 X X 0 0 1 Xbeqal XJ X X X 0 X X 0 0 X X 1Jal X 1JR rs X 1

11+4pts

1

21+9pts

Blank area

It is not difficult to get an A in EE457. You need to work for it and seek help from the 457 teaching team on whatever you do not understand. We are eager to help you. The next three topics, pipelined CPU, cache and virtual memory are interesting and challenging too. They are the focus of the midterm exam. Then we cover advanced topics. Best! Gandhi, TAs: Sanmukh, Pezhman, Fangzhou, Mentors: Bo, Monisha, Nishant HW Graders: Hongtai, Aashish, Yashah Lab graders: Neil, Dong, Congyi

Page 12: ee457 Quiz Fall2016 -

Control

JumpJR

JalPCSrc

RegDst

BranchMemReadMemtoReg

ALUOpMemWrite

ALUSrcRegWrite

Zero

ALUcontrol

1

0

1

0

Jump JR

10

10

Jal

Jump Address [31:0]Instruction [31:0]

PC+4 [31:28]

21 3 4 5

6 7

Page 13: ee457 Quiz Fall2016 -

September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 12 / 12 C Copyright 2016 Gandhi Puvvada

Blank page: Please write your name and email. Tear it off and use for rough work. Do not submit.Student’s Last Name:____________________ email: __________________