Aviral Shrivastava: Teaching


Home Publications Teaching Service Lab


ASU 101 CSE 230 CSE 310 CSE 325 CSE 420 CSE 591 PEC CSE 591 ARC Multicore Programming

CSE 591: Advances in Reliable Computing

Course Objective: To develop a technique to improve processor reliability and submit the paper to an international conference

Course Abstract

As the technology scales and the transistor sizes diminish to provide us better performance and lower power, maintaining system reliability becomes a challenge. This course is about recent advances in reliable computing, and in particular we will focus on the increasing threat from soft errors, and the techniques researchers have developed to meet this challenge. This is a seminar plus project course, in which we will learn primarily through reading papers and through projects. In the class we will read papers, understand fault mechanisms, study typical fault models, and browse through the literature to see what mechanisms have researchers developed to make computation reliable. The second parallel part will be the project, in which you have to propose and implement your own technique for reliable computing. Your solutions can range from circuit level to system-level techniques to protect embedded to distributed supercomputing systems. You are also encouraged to come up with theoretical or analytical contributions.

Teaching Staff

Instructor: Aviral Shrivastava, aviral.shrivastava@asu.edu, 203-15 Centerpoint.

Class time

TT BYAC 270, 6:00 pm - 7:15 pm.

Office Hours

Aviral: TT 7:15 pm - 8:00 pm, MW 8:00 am - 9:00 am, 203-15 Centerpoint

Pre-requisites

Must have done computer architecture course equivalent to CSE 420.
Courses in operating systems, digital system design, and parallel programming are preferable.

Textbooks

Course Contents

Course Structure

Grading

Reading List

Teams and Topic Choices

Class Schedule

Send Aviral, the summary of all the papers in your problem domain.Present the project, your progress, and the timeline in 5 slides. Send a first draft of the survey. This must be in latex, in the fomat described below.
Date Class Notes Deadline
01/13/2015 Introduction by Aviral slides
01/15/2015 Guest lecture by Dr. Jeyapaul slides
01/20/2015 Class structure and deliverables
01/22/2015 Introduction to soft errors slides Send the name of your team members, and research topic choices to Aviral
01/27/2015 Core-level redundancy paper
01/29/2015 Redundant multi-threading paper Send 4 papers that you will survey to Aviral
02/03/2015 In application instruction duplication paper
02/05/2015 Control flow checking paper
02/10/2015 Symptom-based fault detection paper Submit 1 page summary of 4 papers in your topic area.
02/12/2015 Algorithm-based fault tolerance paper
02/17/2015 Fault tolerance in GPUs paper
02/19/2015 GemV Tutorial Send Aviral a mail indicating which paper/technique are you going to implement.
02/24/2015 Fault tolerance at exascale paper Send mail to Aviral indicating what software/hardware tools you will develop your technique on, e.g., gem5, llvm.
02/26/2015 Redundant multi-threading paper
03/03/2015 In application instruction duplication paper Send Aviral, the problem definition of your survey, and the list of all the papers that will be reading and including in the survey.
03/05/2015 Control flow checking paper
03/10/2015 Spring break
03/12/2015 Spring break
03/17/2015 Project Update and Multi-threading
03/19/2015 Project Update Present the project, your progress, and the timeline in 5 slides. Send a first draft of the survey. This must be in latex, in the fomat described below.
03/24/2015 Project Update HERMES paper
03/26/2015 Symptom-based fault detection paper
03/31/2015 Algorithm-based fault tolerance paper
04/02/2015 Fault tolerance in GPUs paper
04/07/2015 Fault tolerance at exascale paper
04/09/2015 Survey on Redundant Multithreading
04/14/2015 Survey on In application instruction duplication
04/16/2015 Survey on Control flow checking Submit experimental results to Aviral
04/21/2015 Survey on Symptom-based fault detection
04/23/2015 Survey on Algorithm-based fault tolerance
04/28/2015 Survey on Fault tolerance in GPUs
04/30/2015 Survey on Fault tolerance at exascale

Survey/Paper Formatting

Paper must be written in ACM proceedings format, 9-point type, and may not exceed 8 pages (all inclusive). Word and LaTeX templates for this format are available here. Submissions must be in PDF, printable on US Letter sized paper.

Academic Integrity

Any incidence of cheating in this class will be severely dealt with. This applies to homework assignments and tests. The minimum penalty for cheating will be that the student will not obtain any credit for that particular assignment (This means that if in a test and/or assignment a student is found to have cheated, he/she will obtain zero in that test and/or assignment). Students are encouraged to discuss with others the materials covered in class. However students should not discuss problems in assignments/tests. One tends to get very suspicious if two identically wrong results show up in the homework assignment and/or tests.