Abstracts - 1997 ANNUAL TECHNICAL CONFERENCE
Adaptive and Reliable Parallel Computing
on Networks of Workstations
Robert D. Blumofe, University of Texas, and Philip A. Lisiecki, MIT
Abstract
In this paper, we present the design of Cilk-NOW, a runtime system
that adaptively and reliably executes functional Cilk programs in
parallel on a network of UNIX workstations. Cilk (pronounced "silk")
is a parallel multithreaded extension of the C language, and all Cilk runtime
systems employ a provably efficient thread-scheduling algorithm. Cilk-NOW
is such a runtime system, and in addition, Cilk-NOW automatically delivers
adaptive and reliable execution for a functional subset of Cilk programs.
By adaptive execution, we mean that each Cilk program dynamically utilizes
a changing set of otherwise-idle workstations. By reliable execution, we
mean that the Cilk-NOW system as a whole and each executing Cilk program
are able to tolerate machine and network faults. Cilk-NOW provides these
features while programs remain fault oblivious, meaning that Cilk
programmers need not code for fault tolerance. Throughout this paper, we
focus on end-to-end design decisions, and we show how these decisions allow
the design to exploit high-level algorithmic properties of the Cilk programming
model in order to simplify and streamline the implementation.
|