Reliability in LAM/MPI Requirements Specification
Loading...
Files
Can’t use the file because of accessibility barriers? Contact us
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Permanent Link
Abstract
This document describes the software requirements necessary to allow a parallel software application running on top of LAM/MPI to detect and recover from a catastrophic fault such as a compute node crash. The requirements include
- Definition and categorization of failures to be handled by a reliable LAM/MPI application,
- The behavioral (implementation) and interface requirements for LAM/MPI to provide reliable execution capabilities, and
- The development of a preliminary design interface between LAM/MPI and an application wishing to recover from such an error.
Table of Contents
Description
Keywords
Citation
Journal
DOI
Link(s) to data and video for this item
Relation
Rights
This work is protected by copyright unless stated otherwise.