USENIX - WIESS '02 - Examples of WIESS-like papers

Jeffrey C. Mogul
wiess02chair@usenix.org
Program Chair, WIESS-2
Compaq Western Research Lab

INTRODUCTION

The goal of WIESS-2, the Second Workshop on Industrial Experiences with Systems Software, is

[to] help bridge the gulf between industrial experience with systems software (such as operating systems, compilers, large applications, embedded systems, etc.) and the research community. We're looking for submissions from people from industry, reporting on their experiences, successes, failures, and needs. We're also looking for submissions from researchers, reporting on their own experiences in relevant areas such as technology transfer, collaborations between research and industry, etc.

(More information about WIESS-2 is available at https://www.usenix.org/events/wiess02/.)

Potential authors planning to submit a paper to a conference or workshop often look at the papers accepted by previous conferences, to see what might be expected. WIESS is explicitly "meant to provide a venue for material that would not normally be accepted at a conference such as SOSP or OSDI," so it's hard to know what kind of papers WIESS is looking for.

I've written this short article to answer that question. It lists a sampling of papers, published at previous conferences and workshops, that fit our idea of the kind of paper WIESS-2 would accept. This is a somewhat arbitrary sample; certainly there are many good "WIESS-like" papers that aren't listed here.

Since most of these papers were presented at traditional conferences, they might be more "scientific" than is necessary for a WIESS workshop paper, and more polished than we would expect for an initial submission.

CATEGORIES

I found it useful to think about these papers as falling into several categories:

Design of novel software products: There is often a big gap between research-generated innovations and commercially-successful product innovations. Real products have to solve problems that researchers don't always find interesting. For example, Greg Minshall, Drew Major, and Kyle Powell described the design of the NetWare operating system, which "differs in several respects from more general-purpose operating systems" because it had very specific goals [6]. Gaurav Banga described how a server appliance product could automatically diagnose its own failures, which is important for a product sold based (in large part) on lower management costs [1].
Mission-critical systems: Researchers often focus on making a system faster. Customers are often willing to pay more for a system that works well, and works always, than for one that is faster. This often means finding ways to add new features without breaking something that already works. Van Oleson, Karsten Schwan, Greg Eisenhouer, Beth Plale, Calton Pu, and Dick Amin described how Delta Airlines were able to tap into Delta's legacy transaction-processing system to add new features without destabilizing the airline's core systems [7].
Applications: Papers often pay more attention to generic system software software (such as operating systems and compilers) than to the applications that rely on such software, or whether the applications get what they need. Papers that describe how real applications are built on top of existing system software can be useful to guide the future evolution of such software. For example, Nick Christenson, David Beckemeyer, and Trent Baker wrote about how they were able to "scale USENET News service to very large numbers of users" [2].
Failures: Research papers typically describe successes. As engineers, we learn as much (or more) from failure as from success. Microkernel operating systems have been successful in the research world, for example, but Freeman Rawson described how an attempt to apply these to a large commercial OS project did not work as expected [5]. Brett Fleisch provides an outsider's point of view of the same project [3]; Fleisch and Rawson don't always agree, partly because Fleisch (as an outsider) had to speculate about some of details.
Challenges: You can write a good paper even if you don't know how to solve a problem. Just explaining what the problem is, if the explanation is clear, can be valuable. Researchers are always looking for problems to solve, and can't always get these problems from their own experience, or perhaps don't have a clear understanding of the problem. A good paper can challenge us to look at something we haven't been working on. For example, Margo Seltzer and a group of her students have described some of the research problems in "no-futz computing" [4].

Please note that WIESS is certainly interested in papers that fall outside these categories!

A NOTE ON THE BIBLIOGRAPHY

The bibliography lists the papers in alphabetical order; it includes the abstract for each paper. URLs are provided for papers available online.

BIBLIOGRAPHY

1

Gaurav Banga.
Auto-diagnosis of field problems in an appliance operating system.
In Proc. USENIX 2000 Technical Conference, pages 293-306, San Diego, CA, June 2000. USENIX.
https://www.usenix.org/publications/library/proceedings/usenix2000/general/banga.html.

Abstract: The use of network appliances, i.e., computer systems specialized to perform a single function, is becoming increasingly widespread. Network appliances have many advantages over traditional general-purpose systems such as higher performance/cost metrics, easier configuration and lower costs of management.

Unfortunately, while the complexity of configuration and management of network appliances in normal usage is much lower than that of general-purpose systems, this is not always true in problem situations. The debugging of configuration and performance problems with appliance computers is a task similar to the debugging of such problems with general-purpose systems, and requires substantial expertise.

This paper examines the issues of appliance-like management and performance debugging. We present a number of techniques that enable appliance-like problem diagnosis. These include continuous monitoring for abnormal conditions, diagnosis of configuration problems of network protocols via protocol augmentation, path-based problem isolation via cross-layer analysis, and automatic configuration change tracking. We also describe the use of these techniques in a problem auto-diagnosis subsystem that we have built for the Data ONTAP operating system. Our experience with this system indicates a significant reduction in the cost of problem debugging and a much simpler user experience.

2

Nick Christenson, David Beckemeyer, and Trent Baker.
A scalable news architecture on a single spool.
;login:, 22(3):XXX-XXX, June 1997.
https://www.jetcafe.org/~npc/doc/news_arch.html.

Abstract: This article describes a scalable news architecture built around a single news spool. This architecture is in operation at EarthLink, supporting over 1700 concurrent newsreading connections and will scale, without alteration of the general principles, to well over 10000 concurrent client connections. This article enumerates the advantages of such a system, the problems that we have overcome, and the hurdles we are likely to face in the near future.

3

Brett D. Fleisch.
The Failure of Personalities to Generalize.
In Proc. 6th Workshop on Hot Topics in Operating Systems, pages 8-13, Cape Cod, MA, May 1997. IEEE Computer Society.
https://www.cs.ucr.edu/~brett/PAPERS/HOT_OS97/FINAL/hot-final-ieee.ps.

Abstract: IBM's adoption of operating system personalities was one of the most publicized issues in operating systems design. The basic premise of Workplace OS work was: 1) IBM would adopt and improve the CMU Mach 3.0 microkernel for use on PDAs, the desktop, and massively parallel machines, and 2) that several operating system personalities would execute on the microkernel platform concurrently. This architecture would provide users the best worlds as they switch between applications written for different operating systems. IBM would also benefit from significant cost savings by having one common platform for all product lines.

IBM's plans for use of the microkernel and multiple-personalities, as a unifying mechanism for a widely diverse set of hardware products, have failed. Here we examine why IBM's microkernel and multi-personality system was not successful from a technical and business standpoint. We also discuss Power Personal systems, which were introduced during these radical software changes, and then later abandoned.

4

David A. Holland, William Josephson, Ada Lim, Kostas Magoutis, Margo I. Seltzer, and Christopher A. Stein.
Research issues in no-futz computing.
In Proc. 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), pages 106-110, Elmau, Germany, May 2001. IEEE Computer Society.

Abstract: At the 1999 Workshop on Hot Topics in Operating Systems (HotOS VII), the attendees reached consensus that the most important issue facing the OS research community was No-Futz computing; eliminating the ongoing futzing that characterizes most systems today. To date, little research has been accomplished in this area. Our goal in writing this paper is to focus the research community on the challenges we face if we are to design systems that are truly futz-free, or even low-futz.

5

Freeman L. Rawson III.
Experience with the Development of a Microkernel-Based, Multi-Server Operating System.
In Proc. 6th Workshop on Hot Topics in Operating Systems, pages 2-7, Cape Cod, MA, May 1997. IEEE Computer Society.

Abstract: During the first half of the 1990s IBM developed a set of operating system products called Workplace OS that was based on the Mach 3.0 microkernel and Taligent's object-oriented TalOS. These products were intended to be scalable, portable and capable of concurrently running multiple operating system personalities while sharing as much code as possible between personalities. Based on the design suggested by Julin and others, the operating system personalities were constructed out of a set of user-level personality and personality-neutral servers and libraries. While we made a number of important changes to Mach 3.0, we maintained its fundamentals and the multi-server design throughout our project. In evaluating the resulting system, a number of problems are apparent. There is no good way to factor multiple existing systems into a set of functional servers without creating excessively large and complex servers. In addition, the message-passing nature of the microkernel turns out to be a poor match for the characteristics of modern processors, causing performance problems. Finally, the use of fine-grained objects complicated the design and further reduced the performance of the system. Based on this experience, I believe that more modest, more targeted operating systems consume fewer resources, offer better performance and can provide the desired semantics with fewer compromises.

6

Greg Minshall, Drew Major, and Kyle Powell.
An Overview of the NetWare Operating System.
In Proc. Winter 1994 USENIX Technical Conference, pages 355-372, San Francisco, CA, January 1994. USENIX.
https://www.usenix.org/publications/library/proceedings/sf94/minshall.html.

Abstract: The NetWare operating system is designed specifically to provide service to clients over a computer network. This design has resulted in a system that differs in several respects from more general-purpose operating systems. In addition to highlighting the design decisions that have led to these differences, this paper provides an overview of the NetWare operating system, with a detailed description of its kernel and its software-based approach to fault tolerance.

7

Van Oleson, Karsten Schwan, Greg Eisenhouer, Beth Plale, Calton Pu, and Dick Amin.
Operational Information Systems - An Example from the Airline Industry.
In Proc. 1st Workshop on Industrial Experiences with Systems Software (WIESS 2000), pages 1-10, San Diego, CA, October 2000. USENIX.
https://www.usenix.org/publications/library/proceedings/osdi2000/wiess2000/oleson.html.

Abstract: Our research is motivated by the scaleability, availability, and extensibility challenges in deploying open systems based, enterprise operational applications. We present Delta's mid-tier Operational Information Systems (OIS) as an approach for leveraging its legacy operational OLTP infrastructure, to participate in the emerging world of electronic commerce, as well as enable new applications. The approach is to place minimally intrusive 'taps' into the legacy OLTP systems to capture transactions as they occur for consistent replay in the mid-tier OIS. One important issue addressed by our work is the processing, and dissemination of information in the mid-tier system itself, potentially serving hundreds of thousands of access and display points, distributed across a highly geographically distributed system (e.g. airports world wide), and also involving large `working sets' of operational data, used by applications that require rapid response and also rapid recovery from failures. To address the scaleability, availability, and cost of this OIS infrastructure, we are researching cluster computing techniques, as well as, devising replication and failover techniques. To address the communications scaleability requirements, we are experimenting with novel event-based implementations of information transport and processing, that include reliable multicast variations.