Reliability in LAM/MPI Requirements Specification

Loading...
Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This document describes the software requirements necessary to allow a parallel software application running on top of LAM/MPI to detect and recover from a catastrophic fault such as a compute node crash. The requirements include - Definition and categorization of failures to be handled by a reliable LAM/MPI application, - The behavioral (implementation) and interface requirements for LAM/MPI to provide reliable execution capabilities, and - The development of a preliminary design interface between LAM/MPI and an application wishing to recover from such an error.

Table of Contents

Description

Keywords

Citation

Journal

DOI

Link(s) to data and video for this item

Relation

Rights

This work is protected by copyright unless stated otherwise.

Type