Check out the new USENIX Web site.
StandardsUSENIX

 

POSIX.1h SRASS and POSIX.1m Checkpoint Restart

Helmut Roth <hroth@ nswc.navy.mil> reports on the April 1999 meeting in Charlotte, NC.

The POSIX.1h Services for Reliable, Available and Serviceable Systems, (SRASS) and POSIX1.m Checkpoint Restart working groups are in the process of developing a set of APIs for fault management and serviceability applications. The goal of the SRASS Working Group is to support fault-tolerant, serviceable, reliable, and available systems in a useful, portable way. Where feasible, POSIX.1h also needs to be useful for general applications such as distributed parallel database transaction systems and safety-related systems. The work of Checkpoint Restart overlaps this goal, and so the working groups have effectively merged while still producing two separate standards.

The most recent Checkpoint Restart ballot achieved 59 percent approval, and the Working Group is currently resolving ballot objections and revising the draft. It is hoped that the next draft will be out for recirculation soon.

The SRASS document contains several groups of interfaces:

  • Logging interfaces, aimed at allowing an application to log application-specific and system events to a system log and subsequently to process those events. Fault-management applications can use these interfaces to register for the notification of events that enter the system log, providing a rudimentary event-management system. Notifications provide a way to proactively manage problems and initiate steps to prevent a system failure later. We now have copyright release from The Open Group to include syslog() to support backward compatibility. (Thanks to all of you who helped on this matter.)
  • A core-dump control interface to enable an application to specify the pathname to a core-dump file to be used if the process terminates abnormally.
  • A shutdown/reboot interface supporting several options such as fast shutdown, graceful shutdown, rebooting with optional scripts, and so on.
  • Configuration-space management interfaces, intended to provide a portable method of traversing the configuration space and for manipulating the data content of nodes in that configuration space. This interface will provide a fault-management application access to underlying system-configuration information and the means to direct reconfiguration of the system. This has been changed from a tree-traversal mechanism to a directed graph, since this better represents the complex interrelations among configuration items.

The working group has yet to finalize the changes in the SRASS draft (draft 4.0). Additional meetings were planned for early June to finish ballot resolution on the previous draft and to get draft 4.0 completed.

If you are interested in helping support fault management (including serviceability and fault-tolerance aspects of systems), please get in touch with Helmut Roth <hroth@nswc.navy.mil> or Dr. Arun Chandra <achandra@vnet.ibm.com>. A mailing list for this work also exists at <srass@pasc.org>; to subscribe send email to <srass-request@pasc.org> with the body "subscribe".

 

?Need help? Use our Contacts page.
Last changed: 18 Nov. 1999 jr
Standards index
;login: index
USENIX home