|
Jeffrey C. Mogul
wiess02chair@usenix.org
Program Chair, WIESS-2
Compaq Western Research Lab
INTRODUCTION
The goal of WIESS-2, the Second Workshop on Industrial
Experiences with Systems Software, is
[to] help bridge the
gulf between industrial experience with systems software
(such as operating systems, compilers, large applications,
embedded systems, etc.) and the research community. We're
looking for submissions from people from industry,
reporting on their experiences, successes, failures, and needs.
We're also looking for submissions from researchers,
reporting on their own experiences in relevant areas such as
technology transfer, collaborations between research and
industry, etc.
(More information about WIESS-2
is available at https://www.usenix.org/events/wiess02/.)
Potential authors planning to submit a paper to a conference or
workshop often look at the papers accepted by previous conferences,
to see what might be expected. WIESS is explicitly
"meant to provide a venue for material that would
not normally be accepted at a conference such as SOSP or OSDI,"
so it's hard to know what kind of papers WIESS is looking for.
I've written this short article to answer that question. It lists
a sampling of papers, published at previous conferences and workshops,
that fit our idea of the kind of paper WIESS-2 would accept.
This is a somewhat arbitrary sample; certainly there are
many good "WIESS-like" papers that aren't listed here.
Since most of these papers were presented at traditional conferences,
they might be more "scientific" than is necessary
for a WIESS workshop paper, and more polished than we would
expect for an initial submission.
CATEGORIES
I found it useful to think about these papers as falling into
several categories:
- Design of novel software products
- There is often a big gap between research-generated innovations and
commercially-successful product innovations. Real products have to
solve problems that researchers don't always find interesting.
For example, Greg Minshall, Drew Major, and Kyle Powell described
the design of the NetWare operating system, which "differs in several
respects from more general-purpose operating systems" because it had
very specific goals [6].
Gaurav Banga described how a server appliance product could automatically
diagnose its own failures, which is important for a product sold
based (in large part) on lower management costs [1].
- Mission-critical systems
- Researchers often focus on making a system faster. Customers
are often willing to pay more for a system that works well, and
works always, than for one that is faster. This often means finding
ways to add new features without breaking something that already works.
Van Oleson, Karsten Schwan, Greg Eisenhouer, Beth Plale, Calton Pu,
and Dick Amin described how Delta Airlines were able to tap into Delta's
legacy transaction-processing system to add new features without
destabilizing the airline's core systems [7].
- Applications
- Papers often pay more attention to generic system software software
(such as operating systems and compilers) than to the applications that
rely on such software, or whether the applications get what they need.
Papers that describe how real applications are built on top of existing
system software can be useful to guide the future evolution of such software.
For example, Nick Christenson, David Beckemeyer, and Trent Baker
wrote about how they were able to "scale USENET News service
to very large numbers of users" [2].
- Failures
- Research papers typically describe successes.
As engineers, we learn as much (or more) from failure as
from success. Microkernel operating systems have been successful in
the research world, for example, but Freeman Rawson described how
an attempt to apply these to a large commercial OS project did not
work as expected [5]. Brett Fleisch provides an outsider's
point of view of the same project [3]; Fleisch and
Rawson don't always agree, partly because Fleisch (as an outsider) had
to speculate about some of details.
- Challenges
- You can write a good paper even if you don't know how to solve a problem.
Just explaining what the problem is, if the explanation is clear, can
be valuable. Researchers are always looking for problems to solve, and
can't always get these problems from their own experience, or perhaps
don't have a clear understanding of the problem. A good paper can
challenge us to look at something we haven't been working on.
For example, Margo Seltzer and a group of her students have described
some of the research problems in "no-futz computing" [4].
Please note that WIESS is certainly interested in papers that fall
outside these categories!
A NOTE ON THE BIBLIOGRAPHY
The bibliography lists the papers in alphabetical order; it includes
the abstract for each paper. URLs are provided for papers available
online.
BIBLIOGRAPHY
-
- 1
-
Gaurav Banga.
Auto-diagnosis of field problems in an appliance operating system.
In Proc. USENIX 2000 Technical Conference, pages 293-306, San
Diego, CA, June 2000. USENIX.
https://www.usenix.org/publications/library/proceedings/usenix2000/general/banga.html.
Abstract: The use of network appliances,
i.e., computer systems specialized to perform a single function, is becoming
increasingly widespread. Network appliances have many advantages over
traditional general-purpose systems such as higher performance/cost metrics,
easier configuration and lower costs of management.
Unfortunately, while
the complexity of configuration and management of network appliances in
normal usage is much lower than that of general-purpose systems, this is not
always true in problem situations. The debugging of configuration and
performance problems with appliance computers is a task similar to the
debugging of such problems with general-purpose systems, and requires
substantial expertise.
This paper examines the issues of appliance-like
management and performance debugging. We present a number of techniques that
enable appliance-like problem diagnosis. These include continuous monitoring
for abnormal conditions, diagnosis of configuration problems of network
protocols via protocol augmentation, path-based problem isolation via
cross-layer analysis, and automatic configuration change tracking. We also
describe the use of these techniques in a problem auto-diagnosis subsystem
that we have built for the Data ONTAP operating system. Our experience with
this system indicates a significant reduction in the cost of problem
debugging and a much simpler user experience.
- 2
-
Nick Christenson, David Beckemeyer, and Trent Baker.
A scalable news architecture on a single spool.
;login:, 22(3):XXX-XXX, June 1997.
https://www.jetcafe.org/~npc/doc/news_arch.html.
Abstract: This article describes a scalable news architecture
built around a single news spool. This architecture is in operation at
EarthLink, supporting over 1700 concurrent newsreading connections and will
scale, without alteration of the general principles, to well over 10000
concurrent client connections. This article enumerates the advantages of such
a system, the problems that we have overcome, and the hurdles we are likely
to face in the near future.
- 3
-
Brett D. Fleisch.
The Failure of Personalities to Generalize.
In Proc. 6th Workshop on Hot Topics in Operating Systems, pages
8-13, Cape Cod, MA, May 1997. IEEE Computer Society.
https://www.cs.ucr.edu/~brett/PAPERS/HOT_OS97/FINAL/hot-final-ieee.ps.
Abstract: IBM's adoption of operating system
personalities was one of the most publicized issues in operating systems
design. The basic premise of Workplace OS work was: 1) IBM would adopt and
improve the CMU Mach 3.0 microkernel for use on PDAs, the desktop, and
massively parallel machines, and 2) that several operating system
personalities would execute on the microkernel platform concurrently. This
architecture would provide users the best worlds as they switch between
applications written for different operating systems. IBM would also benefit
from significant cost savings by having one common platform for all product
lines.
IBM's plans for use of the microkernel and
multiple-personalities, as a unifying mechanism for a widely diverse set of
hardware products, have failed. Here we examine why IBM's microkernel and
multi-personality system was not successful from a technical and business
standpoint. We also discuss Power Personal systems, which were introduced
during these radical software changes, and then later abandoned.
- 4
-
David A. Holland, William Josephson, Ada Lim, Kostas Magoutis, Margo I.
Seltzer, and Christopher A. Stein.
Research issues in no-futz computing.
In Proc. 8th Workshop on Hot Topics in Operating Systems
(HotOS-VIII), pages 106-110, Elmau, Germany, May 2001. IEEE Computer
Society.
Abstract: At the 1999 Workshop on Hot Topics in Operating
Systems (HotOS VII), the attendees reached consensus that the most important
issue facing the OS research community was No-Futz computing; eliminating the
ongoing futzing that characterizes most systems today. To date, little
research has been accomplished in this area. Our goal in writing this paper
is to focus the research community on the challenges we face if we are to
design systems that are truly futz-free, or even low-futz.
- 5
-
Freeman L. Rawson III.
Experience with the Development of a Microkernel-Based, Multi-Server
Operating System.
In Proc. 6th Workshop on Hot Topics in Operating Systems, pages
2-7, Cape Cod, MA, May 1997. IEEE Computer Society.
Abstract: During the first half of the 1990s
IBM developed a set of operating system products called Workplace OS that was
based on the Mach 3.0 microkernel and Taligent's object-oriented TalOS. These
products were intended to be scalable, portable and capable of concurrently
running multiple operating system personalities while sharing as much code as
possible between personalities. Based on the design suggested by Julin and
others, the operating system personalities were constructed out of a set of
user-level personality and personality-neutral servers and libraries. While
we made a number of important changes to Mach 3.0, we maintained its
fundamentals and the multi-server design throughout our project. In
evaluating the resulting system, a number of problems are apparent. There is
no good way to factor multiple existing systems into a set of functional
servers without creating excessively large and complex servers. In addition,
the message-passing nature of the microkernel turns out to be a poor match
for the characteristics of modern processors, causing performance problems.
Finally, the use of fine-grained objects complicated the design and further
reduced the performance of the system. Based on this experience, I believe
that more modest, more targeted operating systems consume fewer resources,
offer better performance and can provide the desired semantics with fewer
compromises.
- 6
-
Greg Minshall, Drew Major, and Kyle Powell.
An Overview of the NetWare Operating System.
In Proc. Winter 1994 USENIX Technical Conference, pages
355-372, San Francisco, CA, January 1994. USENIX.
https://www.usenix.org/publications/library/proceedings/sf94/minshall.html.
Abstract: The NetWare operating system is
designed specifically to provide service to clients over a computer network.
This design has resulted in a system that differs in several respects from
more general-purpose operating systems. In addition to highlighting the
design decisions that have led to these differences, this paper provides an
overview of the NetWare operating system, with a detailed description of its
kernel and its software-based approach to fault tolerance.
- 7
-
Van Oleson, Karsten Schwan, Greg Eisenhouer, Beth Plale, Calton Pu, and Dick
Amin.
Operational Information Systems - An Example from the Airline
Industry.
In Proc. 1st Workshop on Industrial Experiences with Systems
Software (WIESS 2000), pages 1-10, San Diego, CA, October 2000. USENIX.
https://www.usenix.org/publications/library/proceedings/osdi2000/wiess2000/oleson.html.
Abstract: Our research is motivated by the
scaleability, availability, and extensibility challenges in deploying open
systems based, enterprise operational applications. We present Delta's
mid-tier Operational Information Systems (OIS) as an approach for leveraging
its legacy operational OLTP infrastructure, to participate in the emerging
world of electronic commerce, as well as enable new applications. The
approach is to place minimally intrusive 'taps' into the legacy OLTP systems
to capture transactions as they occur for consistent replay in the mid-tier
OIS. One important issue addressed by our work is the processing, and
dissemination of information in the mid-tier system itself, potentially
serving hundreds of thousands of access and display points, distributed
across a highly geographically distributed system (e.g. airports world wide),
and also involving large `working sets' of operational data, used by
applications that require rapid response and also rapid recovery from
failures. To address the scaleability, availability, and cost of this OIS
infrastructure, we are researching cluster computing techniques, as well as,
devising replication and failover techniques. To address the communications
scaleability requirements, we are experimenting with novel event-based
implementations of information transport and processing, that include
reliable multicast variations.
| |