04/29/08: Solutions for homework 5 are posted. Average is 42.3/50.

04/27/08: As promised, here is an outline of the review lecture on Tuesday.

Format of the Exam: It is out of 80. There are five 10-mark questions and two 15-mark questions. There is also one 2-mark bonus question - normally I advise students not to do the bonus unless they have time, but in this case I recommend that you look at it because it is not hard.

If you budget about a minute for each mark, you will have time to check one or two answers about which you are not sure. Do allocate time to read the questions carefully, as some are easier than they may first appear to be.

The emphasis of the exam questions is about one quarter on material before the midterm and about three-quarters on material afterwards.

In reviewing the course for the exam, do not worry about memorizing details. The exam tests understanding of concepts, and if you understand the idea behind something, you will have a good start on answering the questions. Here is the material that I would review:

- Material from CAQA (with emphasis):

Chapter 1: especially 1.8; also 1.1-1.5, 1.7 in less detail; 1.6, and 1.9-1.13 not at all.

Appendix A: particularly A.1,A.2,A.3,A.4; A.5 in less detail; A.6-A.10 not at all - but note that A.7 and A.8 give useful reviews of some of the main ideas, particularly misunderstandings that are worth sorting through.

Chapter 2: particularly 2.1-2.5 and 2.7; 2.6, 2.8, and 2.9 in less detail; 2.10-2.13 not at all.

Appendix C: particularly C.1, C.2, C.3; C.4 very briefly; C.5 very very briefly; C.6-C.8 not at all.

Chapter 5: particularly 5.1 and 5.2; 5.3 in less detail; 5.4-5.9 not at all.

Chapter 3: particularly 3.1, 3.2, 3.3; 3.5 in less detail; 3.4 and 3.6-3.9 not at all. - Material Covered on the Homeworks and Midterm: Pay particular attention to the concepts behind the homework questions rather than the specific calculations that you may have performed. Of course on the exam you have much less time, so get the main ideas down.

One final word of advice. Explain your answers. A `wrong' answer at the end of a good explanation will most likely earn a lot of partial credit.

Finally since we did not have a homework on Chapter 3, here are a couple of sample questions:

- We argued in class that in exploiting instruction-level parallelism, the window size is limited by our ability to determine dependences among the next N instructions. However, in that discussion we assumed that branch prediction is perfect. Can the window size that we can exploit effectively also be severely limited by the accuracy of branch prediction? Explain.
- Suppose that a sequence of N instructions on a register-register machine is provided, and we are to determine all data dependences among these N instructions, classifying them according to whether they are true, anti, or output dependences. In the worst case, how many comparisons might be needed to determine these dependences? Explain. If a comparison can be made in a single clock cycle, how many clock cycles are sufficient to make all comparisons? Explain.

04/17/08: We will finish Chapter 3 on Tuesday April 22. Class on Thursday April 24 will be a question and answer session, no lecture. Send questions to me early if you have some; otherwise the class may be quite short!! On Tuesday April 29 we will have a semester review and exam preview. Then on Thursday May 01 we will have the exam.

04/07/08:
Because some of you seem worried about grades, now that 52 percent of the grades in the course have been earned, I computed a distribution of grades as follows. First I removed all students who are either not registered or who completed no recent work.
Then I summed up the four assignment grades and converted it to a grade out of 32.
Then I converted the scaled midterm grade to be out of 20. Adding these two gives the score out of 52. I converted this to a grade out of 100.
In order to report the current `grade estimates' that result, I have
partitioned the class into two groups, those who have turned in every homework and written the midterm,
and those who have missed at least one.

Because a better final exam replaces a worse midterm, some (hopefully many) of the midterm grades will increase in the final computation. Thus these estimates should be used only to understand your current placement in the class.

Students who have missed at least one homework or test but who remain registered:
16.5
29.7
36.2
52.8
56.5
63.1
63.7

Students who have submitted all work to date:
62.5 65.7 66.2 70.8 70.9 73.8 75.5 76.6 77.4 77.4
77.8 77.8 78.5 78.6 79.4 80.2 81.1 81.5 82.2 82.5
82.6 82.6 83.4 83.4 84.5 85.8 85.8 86.5 87.5 87.7
88.8 89.1 89.4 90.6 91.7 92.2 94.3

04/07/08: Homework 4 will be returned tomorrow. The average is 42.3 and the grades in sorted order are: 22 28 35 35 36 38 38 40 40 40 40 40 41 41 41 41 42 43 43 43 43 44 44 44 44 45 45 45 45 45 47 47 48 48 48 48 49 49 49 49

04/05/08: Homework 5 is posted.

03/25/08: Solutions for homework 3 are posted. It is out of 50. Average is 47 and grades are 29 40 40 40 42 44 44 46 46 47 47 47 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 49 49 50 50 50 50 50 50 50 50 50 50 50

03/25/08: For question 4 on the midterm, here is a discussion and sample solution.

(a) Although not needed to answer this or later parts, I observed that dividing by 2 is the same as multiplying by .5. Depending on the details of the
pipeline, this observation makes the question a little shorter.

So I assumed that F10 contains the constant 0.5.
Recall also that R1 contains the address of x[0], and R2 the address of x[999].
Here is the code:

1. LD F0, 0(R1)

2. loop: LD F2, 8(R1)

3. ADDI R1,R1,#8

4. ADDD F4,F2,F0

5. MULTD F0,F4,F10

6. SD F0, 0(R1)

7. BLT R1, R2, loop

Most people did two loads each time through the loop, which was not necessary but was ok. Most people used divide instead of multiply, which was ok but made the later parts lengthier.

(b) There are two sensible sets of assumptions to make based on the book (the first one discussed has different latencies for floating point add, multiply, and divide; the second is the latencies in Figure 2.2 on page 75 of the book.) I will
assume the second because it is simpler. So here is the schedule:

1. LD F0, 0(R1)

2. loop: LD F2, 8(R1)

3. ADDI R1,R1,#8

4. ADDD F4,F2,F0

*stall*

*stall*

*stall*

5. MULTD F0,F4,F10

*stall*

*stall*

6. SD F0, 0(R1)

7. BLT R1, R2, loop

2. loop: LD F2, 8(R1)

3. ADDI R1,R1,#8

etc.

A stage-by-stage schedule was not needed.

(c) This part turned out to be quite difficult, dependending on the assumptions that you made. So I will give two correct answers. Here is one.

For this one, I assume that I must unroll my solution from (a) without adding any more instructions. When I try to do this, I find five instructions other than the branch. But the MULTD in instruction 5 must start at least 4 time slots after the ADDD
in instruction 4. But the ADDD in instruction 4 the next time through the loop must start at least 4 time slots after the MULTD in instruction 5. Hence I must need at least 8 time slots for 5 instructions to start, and there must be stalls.
So under my assumption that I must use the assembly code from (a) without additions, this cannot be done. (NOTE: A very clear explanation of why stalls cannot be avoided, and the assumptions that underlie this, were needed.)

Here is another answer. Loop unrolling works when two iterations through the loop do not have true data dependences. So to unroll effectively, I would like to rewrite the loop so that all dependences between iterations are removed.
If I let y[0] ... y[999] be the values of the array x before the loop is started, and z[0] ... z[999] be the values after the loop is run, we note that

z[0] = y[0]

z[1] = .5 y[0] + .5 y[1]

z[2] = .25 y[0] + .25 y[1] + .5 y[2]

z[3] = .125 y[0] + .125 y[1] + .25 y[2] + .5 y[3]

etc. etc.
So I could "unroll" the loop into 1000 different computations that have no data dependences between them.
One could argue that this is cheating, because this unrolled code does many more arithmetic operations than seem to be implied by the original code.
But it does make each iteration through the loop data independent, and hence makes a good candidate for unrolling.

The key to the second answer is to establish that iterations of the loop can be made data independent; the key to the first is to show that under the stated assumptions there is no way to eliminate all stalls.

Because part (c) of this question was too open-ended for the time available, I have considered it as a bonus part in the final grading (See class news poting of 03/18).

03/18/08: Homework 4 is posted. If it looks a lot like the midterm, that's because it is! Many people thought that if they only had more time, they could have done a lot more -- this is your chance to do that!

03/18/08: The midterm exam was returned in class today. I wlll be taking up the midterm in class on Thursday, and other than the discussions in class,
solutions will not be posted.
The grades on the midterm out of 50 are: 16 17 19 20 20 21 21 22 22 22 22 23 24 25 25
25 26 26 26 27 27 27 28 28 29 29 30 30 30 30
32 32 33 35 35 36 36 36 36 38 39 39.

Average is 27.7. This is on the low side (!), and so I propose adjusting the grades as follows. If you got n/50, your grade will be calculated as
n+5+ceiling(n/10) out of 50. For example, 17 becomes 17+5+2=24; 25 becomes 25+5+3=33; and 39 becomes 39+5+4=48. This results in a new average of 35.9 out of 50. This is meant to raise the grades into a more typical range, without taking away any of
the incentive to write a better final exam!

02/28/08: Homework 2 was returned in class today, and solutions are posted.

02/28/08: The midterm on 6 March covers Chapters 1 and 2 and Appendix A, i.e. all material covered before the class today.
Here is a * sample * midterm, actually the one from last fall.

02/27/08: Homework 2 grades out of 50 are: 23 27 31 33 35 36 37 37 37 37 37 37 38 38 39 40 41 41 42 42 42 42 44 44 44 45 45 46 47 48 48 48 48 48 49 50 50 50 50 50 Average is 41.65. Solutions will be posted soon.

02/25/08: Homework 3 is posted, due 18 March (after spring break).

02/18/08: I corrected a typo in the description of indirect addressing on homework two.

02/15/08: Homework 1 solutions are posted.

02/15/08: Homework 1 grades out of 50 are: 50 49 48 48 48 47 46 46 45 45 45 44 44 44 44 44 44 43 43 43 42 42 41 41 41 40 40 39 39 39 39 38 38 36 36 35 35 35 35 31 30 30 13

02/07/08: Homework 2 is posted.

02/07/08: We will post solutions for homework 1 soon. As remarked in class, these will be sample solutions, and many other solutions are possible and also correct.

01/24/08: Homework 1 is posted. You can get to it via the syllabus page.

01/24/08: The TA office hours are posted. He is a great resource person for helping with the course concepts generally, not just the homeworks. So please don't let him be *too* lonely during his office hours.

11/14/07: This will be updated with news about the class as it becomes available.