Fail-safeness in a multiprocessor system : a distributed strategy based on backward error recover
Other Research Product
Publication Date:
1982
abstract:
A method for fault-handling is presented. designed for multiprocessor systems supporting concurrent processes cooperating through message exchange. The proposal is described in reference to a specific system. i. e., the MuTEAM prototype developed in Pisa: our requirements was that no erroneous output be generated by the system under a single fault hypothesis. The fault-handling model adopted is based on backward error recovery: the set of all the application processes is partitioned into disjoint subsets (called families), which represent the atomic unit of recovery. Recovery points are established on communications among families. A single consistent recovery line is maintained, thereby avoiding the domino effect. The model does not rely on the usage of mass storage devices: rather, the recovery information pertinent to all the processes is kept in the distributed main memory of the system.
Iris type:
05.12 Altro
Keywords:
Multiprocessor system; Distributed strategy; Backward error recovery
List of contributors: