Self-Study Topics on Dependable Computing, 1997

Dependability is the property of a system that reliance can justifiably be placed on the service it delivers. Dependable computing covers a wide range of subjects. This year I am offering 7 self-study topics in two dependable computing subjects: Dependable Real-Time Systems and Software Dependability. General Reference on basic concepts and terminology: J.-C. Laprie, Dependability: Basic Concepts and Terminolog. This publication can be found: (1) in book: J.-C. Laprie (Ed.), Dependability: Basic Concepts and Terminolog, Springer-Verlag, Vienna, 1992. (2) in Proc. 15th International Symposium on Fault-Tolerant Computing, 1985, (FTCS-15) (3) Special edition of FTCS-25, Pasadena, 1995, highlights from 25 years. Subject I

Dependable Real-Time Systems

-- In this subject, 5 self-study topics are offered. This topics are related to a project supported by FRD. Diagram 1 gives an example of a dependable system with five UNIX machines where three of them form a reliable hard-core. Diagram 2 gives more details of the hard-core. Introduction A real-time system is one whose basic specification, design and implementation must meet the functionality and the timing constraints. This implies that system correctness depends not only the logical correctness but also the timeliness of its actions. Although it is commonly believed that meeting the timing requirements is a matter of increasing system speed/throughput, research in real-time systems has discredited this notion. In fact, the computational structures appropriate for systems requiring bounded response time are fundamentally different from those requiring high throughput. The progress in hardware technology in recent years has made high-performance computing and communication feasible. However, it became clear that high-speed execution alone may not solve all the problems that real-time computing need to address. Real-time systems are widely applied in safety-critical areas, such as nuclear power plants, aerospace systems, industrial automation, telecommunication, banking, and traffic control systems, where the consequence of a computer failure is a significant economic impact or even loss of human life. High dependability is a fundamental requirement of the real-time system design. Fault-tolerant computing ensure that a system functions correctly even if a limited number of components are faulty and is a major technique to increase system dependability. It is a challenge to meet the functionality and the timing constraints of a system, even if some components of the system are faulty. Dependable real-time computing is a broad area of research. In the self-study topics you are going to concentrate on the following points (each point is considered as a separate self-study topic: 1 Formal methods for specification and implementation of real-time systems The problem in this area which you are expected to address is: how to specify and ensure the timing of real-time systems. Programming languages, compilers, and tools which support specification and implementation of real-time systems shall be studied. References [1] J. Vytopil (ed), Formal Techniques in Real-Time and Fault-Tolerant Systems, pp. [2] B. Dasarathy, et al, Timing constraints of real-time systems. IEEE Trans. Soft. Eng., SE-11, Jan, 1985, pp.80-86 [3] R. Gerber, et al, Compiler support for real-time programs, in S.H. Son (ed) Advances in real-time systems, Printice Hall, 1995. [4] K.M. Kavi, et al., Specification and analysis of real-time systems using CSP and Petri nets, in R. Mittal et al (ed), Fault-tolerant system and software, Narosa, 1996, pp. 141 - 147. [5] M. Heisel, et al., Formal Specification of Safety-Critical Software with Z and CSP, in Proc. 15th International Conference on Computer Safety, Reliability and Security, Vienna, Oct. 1996, pp. 31 - 45. 2 Fault-tolerant system architecture system with bounded fault handling time. The problem in this area which you are expected to address is: to study the fault-tolerant techniques which can be applied in real-time systems and some existing fault-tolerant real-time systems. Reference [1] H. Koptz, The time-triggered approach to real-time system design, in B. Randell, J.-C. Laprie, et al (ed), Predictably dependable computing systems, Springer, 1995. [2] H.-P. Meske, et al., A processor Architecture Designed to Facilitate the Safety Certification of Hard Real Time Systems, in Proc. 15th International Conference on Computer Safety, Reliability and Security, Vienna, Oct. 1996, pp. 31 - 45. [3] Y Chen, et al, Implementing Fault-Tolerance via Modular, Redundancy with Comparison, IEEE Trans. on Reliability, Vol.39, N0.2, June 1990, pp.Ê217-225. 3 Operating systems for predictable operations in a complex and unpredictable environment with multiprocessors and possible processor faults. The problem in this area which you are expected to address is: to exam some existing real-time operating systems and and scheduling and resource management algorithms which ensure timing requirements are met, possibly also considering the case of component faults References [1] J. Stankovic, et al, A reflective architecture for real-time operating systems, in S.H. Son (ed) Advances in real-time systems, Printice Hall, 1995. [2] K. Shin, A software overview of HARTS: A distributed real- time system, in S.H. Son (ed) Advances in real-time systems, Printice Hall, 1995. [3] N. Audsley, Real-time system scheduling in B. Randell, J.-C. Laprie, et al (ed), Predictably dependable computing systems, Springer, 1995. [4] J Lehoczky, Scheduling periodic and Aperiodic tasks using slack stealing algorithm, in S.H. Son (ed) Advances in real-time systems, Printice Hall, 1995. 4 Real-time communication The problem in this area which you are expected to address is: to study real-time communication mechanisms which support real-time traffic in satisfying timing constraints of individual messages, under the condition that some communication links may be broken. References [1] S. Rangarajan, A fault-tolerant protocol for location directory maintenance in mobile networksin Proc. 25th International Symposium on Fault-Tolerant Computing, Pasadena, 1995, pp.164-173. [2] D. Ferrari, A new admission control method for real-time communication in an internet, in S.H. Son (ed) Advances in real-time systems, Prentice Hall, 1995. [3] M. Hamdaoui, et al., Selection of Timed Token Protocal Parameters to Guaranttee Message Deadline, IEEE/ACM Trans. on Networking, Vol.3, No.3, June 1995, pp. 340 - 351. 5 Industry applications The problem in this area which you are expected to address is: to review dependable real-time computer systems used in industry; to analyse the industry requiremets, and how these requirements are met. References [1] R. Eriksen, et al., Reliability and vulnerability assessment as decision support during purchase and design of complex, technical systems, in Proc. 15th International Conference on Computer Safety, Reliability and Security, Vienna, Oct. 1996, pp. 207 - 218. [2] H. Kantz, Ch. Koza, The Electra railway signalling-syste, in Proc. 25th International Symposium on Fault-Tolerant Computing, Pasadena, 1995, pp.453 - 463. [3] Safety Analysis and Evaluation of an Air Trafic Control Computing System, in Proc. 15th International Conference on Computer Safety, Reliability and Security, Vienna, Oct. 1996, pp. 219 - 229. Subject II

Software Dependability

With increasing software complexity in computer systems, software dependability is coming increasing concerns of overall systems. The importance of achieving a high dependability in software is obvious. One way of increasing software dependability is to test software thoroughly (software testing) and make sure that the dependability achieve the given standard (dependability evaluation). In this subject, 2 self-study topics are offered 6 Software Dependability Model The problem in this area which you are expected to address is: to discuss different software dependability models which are used to estimate software dependability. Discuss the relation and differences of these models. References [1] C. V. Ramamoorthy, et al, Software reliability Ñ status and perspectives, IEEE, Trans. Soft. Eng., SE-8, No. 4, July 1982, pp. 354 - 371. [2] D. Hamlet, Connecting test coverage to software reliability, the 5th International Symp. on Software Reliability Engineering, Monterey, Nov. 1994, pp. 158 - 165. [3] Y. Chen, et al, Modelling Software Dependability Growth under Input Partition Testing, in Proc. 15th International Conference on Computer Safety, Reliability and Security, Vienna, Oct. 1996, pp. 183 - 192. [4] W.J. Gutjahr, et al., Failure risk estimation via Markov Software Usage Models, in Proc. 15th International Conference on Computer Safety, Reliability and Security, Vienna, Oct. 1996, pp. 136 - 145. 7 Comparing testing strategies The problem in this area which you are expected to address is: to study different software testing strategies, especially random and partition testing. Discuss the relation and differences of these strategies. References [1] T. Y. Chen, et al, On the relationship between partition and random testing, IEEE Trans. Soft. Eng., SE-20, No. 12, Dec. 1994, pp.Ê977 - 980. [2] J. W. Duran, et al, An evaluation of random testing, IEEE Trans. Soft. Eng., SE-10, July 1984, pp. 438 - 444. [3] Y. Chen, et al, Comparing Software Testing Strategies Using Reliability Growth, in R. Mittal et al (ed), Fault-tolerant system and software, Narosa, 1996. [4] D. Hamlet, R. Taylor, partition testing does not inspire confidence, IEEE Trans. Soft. Eng., SE-16, No. 12, Dec. 1990, pp.Ê1402 - 1411