CSE-316 Computer Architecture Midterm Solutions

byMaterial Notes •November 06, 2024

0

Question 1

What Out-Of-Order processor hardware structure can be used to enforce that instructions commit in order?

Answer: Re-order buffer

Question 2

Register renaming is able overcome which of the data hazards? Select all that apply

Answer: WAW,WAR

Question 3

How many SRAM bits are needed to implement an 8KB two-way set associative cache with 64B block size? Assume that each line (entry) has a single valid bit and no dirty bits. There is one bit per set for true LRU.

Assume that the address size of the machine is 32-bits and that the machine allows for byte addressing.

Answer: 68288

Question 4

Which of the following two processors will execute a program with the given instruction mix faster?

Instruction Mix:

50% ALU Instructions

10% Branch Instructions

40% Memory Instructions

Answer: Processor B

Question 5

In a

pipelined processor, a single instruction takes the following synchronous

exceptions (interrupts): Divide-by-Zero fault and Invalid Opcode. What should the interrupt cause be loaded

with?

Answer: Invalid OPD code

Question 6

Use the following architecture for questions 6-9:

Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined.

Answer: 4: ADD R14,R11,R15

OR R11,R26,R18

Question 7

Use the following architecture for questions 6-9:

Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined.

Answer: 7:LW R22, 4(R19)

9:LW R25, 12(R19)

12:R13,R17,R29

Question 8

Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined.

Answer: 7:LW R22, 4(R19

12: AND R13,R17,R29

Question 9

Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined.

Answer: 4: ADD R14,R11,R15

9:LW R25, 12(R19) ,

OR R11,R26,R18

Question 10

Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.

Answer: 6:ADDIU R14,R18,1

5:ADDIU R18,R11,1

4: MUL R7,R5,R6

Question 11

Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.

Answer: 6:ADDIU R14,R18,1

7:ADDIU R13,R18,2

Question 12

Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.

Answer: 14

Question 13

Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.

Answer: 12

Question 14

Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.

Answer: NO

Question 15

Use the following architecture for questions 15-17:

Draw the optimal pipeline diagram for the following code executing on the IO2I processor from lecture as shown below. The IO2I processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions in-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, writeback one result per cycle, and commit one instruction per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline L excutes loads, pipeline S executes stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Use a lower-case ‘r’ to denote if an instruction enters the reorder buffer, but does not immediately commit. Assume that the issue queue can hold 16 instructions and begins empty.

Answer: 5: ADDIU R18,R11,1

6:ADDIU R14,R18,1

7:ADDIU R13,R18,2

Question 16

Use the following architecture for questions 15-17:

Draw the optimal pipeline diagram for the following code executing on the IO2I processor from lecture as shown below. The IO2I processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions in-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, writeback one result per cycle, and commit one instruction per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline L excutes loads, pipeline S executes stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Use a lower-case ‘r’ to denote if an instruction enters the reorder buffer, but does not immediately commit. Assume that the issue queue can hold 16 instructions and begins empty.

Answer: 18

Question 17

Use the following architecture for questions 15-17:

Draw the optimal pipeline diagram for the following code executing on the IO2I processor from lecture as shown below. The IO2I processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions in-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, writeback one result per cycle, and commit one instruction per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline L excutes loads, pipeline S executes stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Use a lower-case ‘r’ to denote if an instruction enters the reorder buffer, but does not immediately commit. Assume that the issue queue can hold 16 instructions and begins empty.

Answer: NO

Question 18

The following code is to be

executed on a processor with 32 architectural registers. The processor is able to issue instructions

out-of-order. The processor is a single

issue machine. The processor has

different functional unit latencies with multiply instructions having a latency

of 4 cycles, ALU operations having a latency of 1 cycles, and loads and stores

having a latency of 2 cycles. The

processor stalls on WAW and WAR dependencies.

Pretend that you are the compiler and perform changes to the following

code to increase the performance of the code when executing on this out-of-order

processor. Assume that all registers not

used are free to be used by the compiler.

Which of the following code sequences would increase the performance of the code on this OoO processor? Select all that apply.

Answer: MUL R5,R6,R7

ADD R8,R5,R6

MUL R13,R18,R8

SW R12, 0(R18)

SUB R10,R6,R4

MUL R17,R10,R5

ADDIU R19,R5,1

MUL R19,R6,R7

ADD R8,R19,R6

MUL R13,R10,R8

SW R12, 0(R10)

SUB R10,R6,R4

MUL R17,R10,R5

ADDIU R19,R15,1

CSE-316 Computer Architecture Midterm Solutions

Post a Comment

Contact Form