Check out the new USENIX Web site.
;login: The Magazine
of USENIX & SAGEStandards

 

Our Standards Reports editor, David Blackwood, welcomes dialogue between this column and you, the readers. Please send your comments to <dave@usenix.org>.

SRASS and Checkpoint Restart: Snitch Report for PASC .1h SRASS and .1m Checkpoint Restart Working Group July 2000

by Helmut Roth
<hroth@nswc.navy.mil>

Helmut Roth is the chair of the Services for Reliable, Available, and Serviceable Systems (SRASS) Working Group.

The PASC P1003.1h Services for Reliable, Available, and Serviceable Systems (SRASS) and P1003.1m Checkpoint Restart working groups met in Nashua, New Hampshire. The SRASS working group is in the process of developing a useful set of APIs for fault-management and serviceability applications. The goal of the SRASS Working Group is to support fault-tolerant systems, serviceable systems, reliable systems, and available systems in a portable way. Where feasible P1003.1h needs to be useful for general applications too, such as distributed, parallel, database, transaction, and safety-related systems.

Due to insufficient technical support and resources, the chair of the Checkpoint Restart working group has requested the PASC Sponsor Executive Committee (SEC) to withdraw support of P1003.1m. This request has been approved and will be forwarded to the IEEE to officially withdraw support for this standard. The working group felt that they could not devote enough time to solve all the difficult problems with checkpoint at the magnitude required for this API.

The SRASS working group has requested that the PASC SEC approve a change to its Project Authorization Request (PAR). This will change the project from an amendment to IEEE Std 1003.1 to a stand-alone standard. Consequently, the SRASS p1003.1h project has been assigned the new project number P1003.25. The current work will continue but the document will go out for a full ballot again. Although the draft is currently in ballot resolution, a full ballot will get a more thorough review by the members of the ballot pool.

The logging APIs are aimed at allowing an application to log application-specific events and system events to a system log and for the subsequent processing of those events. Fault-management applications can use this API to register for the notification of events that enter the system log. Events of interest may be those that exceed some limit, a notification can have a severity associated with it, etc. A notification can provide a way to react proactively and initiate steps to prevent a system failure later. There is a single core-dump control API to enable an application to specify the file path location if a process terminates with a core dump file. A shutdown/reboot API has been included in the draft. On careful review, several options were considered for inclusion such as fast shutdown, graceful shutdown, and optional features such as rebooting with optional scripts, etc. The configuration space-management API is intended to provide a portable method of traversing the configuration space, and for manipulating the data content of nodes in that configuration space. This API will provide fault-management applications access to underlying system-configuration information and the means to direct reconfiguration of the system. These APIs are ready for ballot after the current edit changes make it into the draft. It has been delayed and is now being reworked. The ballot pool has been formed and is closed at this time. It closed after the July 1998 PASC meeting and final signatures from the PASC chair, Lowell Johnson.

If you are interested in helping support fault management (including serviceability and fault-tolerance aspects of systems), please get in touch with Helmut Roth at <hroth@nswc.navy.mil>.

 

?Need help? Use our Contacts page.
Last changed: 26 Dec. 2000 ah
Standards index
issue index
;login: index
USENIX home