Question 1
What Out-Of-Order processor hardware structure can be used to enforce that instructions commit in order?
Answer: Re-order buffer
Question 2
Register renaming is able overcome which of the data hazards? Select all that apply
Answer: WAW,WAR
Question 3
How many SRAM bits are needed to implement an 8KB two-way set associative cache with 64B block size? Assume that each line (entry) has a single valid bit and no dirty bits. There is one bit per set for true LRU.
Assume that the address size of the machine is 32-bits and that the machine allows for byte addressing.
Answer: 68288
Question 4
Which of the following two processors will execute a program with the given instruction mix faster?
Instruction Mix:
50% ALU Instructions
10% Branch Instructions
40% Memory Instructions
Answer: Processor B
Question 5
In a
pipelined processor, a single instruction takes the following synchronous
exceptions (interrupts): Divide-by-Zero fault and Invalid Opcode. What should the interrupt cause be loaded
with?
Answer: Invalid OPD code
Question 6
Use the following architecture for questions 6-9:
Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined.
Answer: 4: ADD R14,R11,R15
OR R11,R26,R18
Question 7
Use the following architecture for questions 6-9:
Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined.
Answer: 7:LW R22, 4(R19)
9:LW R25, 12(R19)
12:R13,R17,R29
Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined.
Answer: 7:LW R22, 4(R19
12: AND R13,R17,R29
Question 9
Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined.
Answer: 4: ADD R14,R11,R15
9:LW R25, 12(R19) ,
OR R11,R26,R18
Question 10
Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.
Answer: 6:ADDIU R14,R18,1
5:ADDIU R18,R11,1
4: MUL R7,R5,R6
Question 11
Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.
Answer: 6:ADDIU R14,R18,1
7:ADDIU R13,R18,2
Question 12
Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.
Answer: 14
Question 13
Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.
Answer: 12
Question 14
Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.
Answer: NO
Question 15
Use the following architecture for questions 15-17:
Draw the optimal pipeline diagram for the following code executing on the IO2I processor from lecture as shown below. The IO2I processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions in-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, writeback one result per cycle, and commit one instruction per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline L excutes loads, pipeline S executes stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Use a lower-case ‘r’ to denote if an instruction enters the reorder buffer, but does not immediately commit. Assume that the issue queue can hold 16 instructions and begins empty.
Answer: 5: ADDIU R18,R11,1
6:ADDIU R14,R18,1
7:ADDIU R13,R18,2
Question 16
Use the following architecture for questions 15-17:
Draw the optimal pipeline diagram for the following code executing on the IO2I processor from lecture as shown below. The IO2I processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions in-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, writeback one result per cycle, and commit one instruction per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline L excutes loads, pipeline S executes stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Use a lower-case ‘r’ to denote if an instruction enters the reorder buffer, but does not immediately commit. Assume that the issue queue can hold 16 instructions and begins empty.
Answer: 18
Question 17
Use the following architecture for questions 15-17:
Draw the optimal pipeline diagram for the following code executing on the IO2I processor from lecture as shown below. The IO2I processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions in-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, writeback one result per cycle, and commit one instruction per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline L excutes loads, pipeline S executes stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Use a lower-case ‘r’ to denote if an instruction enters the reorder buffer, but does not immediately commit. Assume that the issue queue can hold 16 instructions and begins empty.
Answer: NO
Question 18
The following code is to be
executed on a processor with 32 architectural registers. The processor is able to issue instructions
out-of-order. The processor is a single
issue machine. The processor has
different functional unit latencies with multiply instructions having a latency
of 4 cycles, ALU operations having a latency of 1 cycles, and loads and stores
having a latency of 2 cycles. The
processor stalls on WAW and WAR dependencies.
Pretend that you are the compiler and perform changes to the following
code to increase the performance of the code when executing on this out-of-order
processor. Assume that all registers not
used are free to be used by the compiler.
Which of the following code sequences would increase the performance of the code on this OoO processor? Select all that apply.
Answer: MUL R5,R6,R7
ADD R8,R5,R6
MUL R13,R18,R8
SW R12, 0(R18)
SUB R10,R6,R4
MUL R17,R10,R5
ADDIU R19,R5,1
MUL R19,R6,R7
ADD R8,R19,R6
MUL R13,R10,R8
SW R12, 0(R10)
SUB R10,R6,R4
MUL R17,R10,R5
ADDIU R19,R15,1













