Data di Pubblicazione:
1996
Abstract:
The APEmille, the third evolution of the APEfamily ofSIMD machines, is structured as a three-dimensional array of processors. In its largest configuration, the number of processors is 4096. Ics typical application range coversmassive comptuations (e.g., those neededto solve some problems in phisics research), which may requireas manyas 1017floating point operaiions. Given the long rimeneeded lo complete sucb. jobs, the machine shouldbe able to toleraie the occurrence of multiple jaults during che job execution. To this purpose, self-diagnosis capabilities have been incorporatedin its design, using an approach inspired by a family of algorithms recently introduced to perform the system-level diagnosis of regular architectures. Themachineispartitioned into three subsystems, each structuredas a threedimensionai array, which are diagnosed separately using s/ightly dlfferen: variants of the same diagnosis algorithm. The system units are tested by means of comparisons, either concurrently with che job execusion or during special diagnosis sessions. The strategy io test the units and the diagnosis algorithms are described, and the diagnosis correctess and completeness are evaluated both theoretically and experimentaliy.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Fault-tolerance; System-level diagnosis; Self-diagnosis; SIMD machines; Grid interconnection
Elenco autori:
Maestrini, Piero
Link alla scheda completa: