|
Paper - 1999 USENIX Annual Technical Conference, June 6-11, 1999, Monterey, California, USA   
[Technical Program]
Porting the Coda File System to Windows Peter J. Braam, Carnegie Mellon University
We first describe how the Coda distributed filesystem was ported to Windows 95 and 98. Coda consists of user level cache managers and servers and kernel level code for filesystem support. Severe reentrancy difficulties in the Win32 environment on this platform were overcome by extending the DJGPP DOS C compiler package with kernel level support for sockets and more flexible memory management. With this support library and kernel modules for Windows 9x filesystems in place, the Coda file system client could be ported with very little patching and will likely soon run as well on Windows 9x as on Linux. We ported Coda file servers to Windows NT. For fileservers the Cygwin32 kit was used. We will not report here on the port of the Coda client to Windows NT, which is in an early stage. In both cases cross compilation from a Linux environment was most helpful to get a good development environment. Introduction The purpose of this paper is to convey our progress on porting a sophisticated distributed file system running on the Unix platform to Windows NT and Windows 9x. Coda [1], [10] boasts many valuable features such as read/write server replication, a persistent client cache, a good security model, access control lists, disconnected and low bandwith operation for laptops, ability to continue operation in the presence of network and server failures and well-defined consistency semantics. Using Coda as a vehicle to study the feasibility of porting complex Unix systems to Windows should be very interesting. This paper is a status report on our progress in this direction. The Coda project began in 1987 with the goal of building a distributed file system that had the location transparency, scalability and security characteristics of AFS [11] but offered substantially greater resilience in the face of failures of servers or network connections. As the project evolved, it became apparent that Coda's mechanisms for high availability provided an excellent base for exploring the new field of mobile computing. Coda pioneered the concept of disconnected operation and was the first distributed file system to provide this capability [13]. Coda has been the vehicle for many other original contributions including read/write server replication [1], log-based directory resolution [14], application-specific conflict resolution [15], exploitation of weak connectivity [12], isolation-only transactions, and translucent cache management [17]. Coda has been in daily use for several years now on a variety of hardware platforms running the Mach 2.6, Linux, NetBSD and FreeBSD operating systems, and most of its features are working satisfactorily. Therefore we have started to focus on further improvement of performance and ports of Coda to other platforms. Ports to NetBSD, FreeBSD and Linux have confirmed that the underlying code relies almost solely on the BSD Unix API -- as intended -- and avoids Mach specific features to the maximum possible extent. During these ports various subsystems were recognized to depend too closely BSD specific file system data structures, and we replaced these with platform independent code, anticipating that this would greatly ease further ports in particular those to Windows. Our optimism about porting the user level code to Windows NT or Windows 95 using a POSIX implementation turned out to be justified. Our code compiled with very few patches. We will describe how this led to Coda servers running on Windows 9x and Windows NT. However, a filesystem also depends on kernel code, and it turned out to be much less trivial to get that working on Windows 9x. We will describe these difficulties in detail. We will not report here on the kernel module for the Coda client on Windows NT. This paper is organized by expanding briefly on the features and implementation of Coda. We then summarize some of the differences between the Win32 API and the Unix API which affect Coda. Finally we describe how we overcame serious difficulties to get clients running on Windows 9x. We finish by describing the porting effort for servers to Windows NT, and by summarizing the lessons we learned.
The file server Vice is implemented entirely as a user-level program servicing network requests from a variety of clients. (For performance reasons, Vice can use a few custom system calls to access files by inode, but this is a detail.) On the client the kernel module does some caching of names and attributes but is mostly there to re-direct system calls to Venus. The kernel module is called the Minicache. The user level programs make sophisticated use of the standard Unix API and they are multithreaded using a user level co-routine thread package. The filesystem metadata on clients and servers are mapped into the Vice/Venus address space and are manipulated using a lightweight transaction package called rvm [16]. The experience of the Coda project has been that placing complicated code in user-level processes offers tremendous development advantages without incurring unacceptable performance compromises [2]. This arrangement is shown in figure 1. Figure 1 To the greatest extent possible, we wanted to replicate this structure -- and reuse code -- in the Windows ports. We faced several challenges in doing so. First, the user-level programs themselves represent a significant porting effort. Second, we must provide a kernel module to provide the Win32 file system services for Coda volumes. The second task has several components. The kernel module must translate Win32 requests to requests that Venus can service. Since the Venus interface was designed to do this for BSD Unix filesystems, and not for Win32 filesystems, some plumbing between the two filesystem models was needed. More fundamentally, given that the design uses a user level cache manager, we had to provide an environment for the user-level code that reproduces the concurrency properties of Unix upon which the Coda implementation model is based. This is discussed in more detail below. This appeared to be a highly difficult task for Windows 9x. It did not represent a fundamental problem for the Windows NT port. On DOS and Coda for Windows 9x We rapidly concluded that the limitations of the Win32 API implementation in Windows 9x ruled it out as a host environment for Venus. Two possibilities at least remained. A somewhat unrealistic option would have been to implement all of Coda in the installable filesystem VMM component. A more attractive alternative was to implement the cache manager not as a Win32 process but as a DOS program. While Win32 applications often contend for the single Win16Mutex, the Windows 9x design actually gives DOS programs significantly better treatment: the actual VMM core is more reasonably flexible OS kernel which, like a Unix kernel, permits multiple VMs to have simultaneous outstanding system calls. This approach sidesteps the Win16Mutex problem and frees us from worrying about further non-reentrancy in the Windows 95 Win32 implementation. Furthermore, the application environment in a DOS box need not be as hostile as one might imagine: D.J. Delorie's DJGPP port of gcc provides a 32bit Unix-like libc, and Phar Lap sells a Win32-subset implemention that runs in DOS boxes. We decided to explore the possibility of hosting Venus in a DOS box, using DJGPP’s compiler and libc. The first step was to identify what APIs Venus needed that were not already provided. They were:
The TCP/IP implementation in Windows 9x runs in kernel mode as part of the VMM. The published Win32 networking API is implemented by a user-level library that uses undocumented VMM calls. There is no built-in support for TCP/IP access from a DOS box. However, there is a sketchily-documented internal VMM interface to the networking stack. The approach we adopted was to use this internal interface to implement a sockets-like API which could be exposed to DOS boxes. A second, related issue was support for the select() call that drives the Venus event loop. This select() call needs to wait on multiple UDP and TCP sockets and simultaneously on the queue of requests from the Minicache. To support this, our kernel sockets module exposes an interface that the Minicache uses to participate in select() calls. Finally, as mentioned above, Venus uses a package called rvm that provides a transactional, memory-resident database of filesystem metadata. This package relies on being able to read the database contents into the same region of virtual memory every time Venus runs. Unfortunately, DPMI, the interface that allows 32-bit applications running in DOS boxes to allocate memory, does not permit the application to specify the virtual address where the newly-allocated memory block will land. (To be precise: there is a version of DPMI that does specify a way to do this, but the Windows 95 implementation does not implement the feature.) As a result, we wrote a kernel module that implemented a separate memory allocation system for 32-bit DOS applications and which does permit the application to choose which virtual pages to allocate. Once these pieces were in place, our port became surprisingly straightforward. To manage our source code effectively we resisted the temptation to go native. Instead, we cross-compile from Linux workstations. For debugging, we use gdb’s remote debugging feature: a specially modified debugging stub runs Venus under debugger on Windows 95, communicating over a TCP socket (using our socket implementation) with a gdb process running on Linux. The result is a highly effective work environment for Coda development: we can compile and debug the Windows 95 Venus without leaving xemacs! It is perhaps worth mentioning two other aspects of our approach which significantly speeded the porting process. First, we implemented a work-around for Windows 95’s lack of support of dynamically loaded filesystem drivers. We wrote a small shim filesystem driver that would load and register itself at boot time as required by the Windows 95 architecture. All actual filesystem requests it would divert to a dynamically-loaded Minicache module that we could update multiple times without rebooting. Second, during development we modified Venus to use an intermediate relay program to converse with the Minicache. That is, instead of having Venus read requests from the Minicache and send replies directly back, we had Venus read and reply to Minicache requests using a UDP socket. A separate relay program would transfer requests and replies between the Minicache and the UDP socket that Venus used. This had a couple significant advantages: a) the relay program provided an excellent debugging log of interactions between Venus and the Minicache and helped identify quickly, when a bug occurred, which component was at fault; b) by using a modified relay program, it was possible to test Venus with synthetic requests. Coda Client on Windows 9x: Results The experiment of porting the Coda client on Windows 95 has been gratifyingly successful. Although it is not stable enough to use in production, it is possible to install Microsoft Office into a Coda filesystem and then use the resulting installation to do real work—including using Office while disconnected from the Coda server cluster. In fact, it has even worked to install Office into a Coda filesystem while in disconnected mode and then to allow the reintegration process to update the servers with all hundreds-of-megabytes of Office code. While it is too early to report precise figures, Coda performance when in connected mode with a cluster of Linux Coda servers is within a factor of two or so of the built-in SMB client when run against Samba on similar Linux servers. Despite the enormous differences between Unix and Windows environments Coda, with the exception of the kernel modules, builds from a single source archive, with very little conditional compilation. The overall arrangements of source are shown in figure 2. Figure 2 Coda servers on Windows 95 and NT The Coda servers are much easier to port. They do not rely on the upcall mechanism and do not need any kernel support for their operation. Here we chose to use the Cygnus Cygwin32 library [18], which implements a Unix C library on the Win32 environment. This port proceeded smoothly and led to working servers within a few months. During the port of Coda to Linux those subsystems which appeared to rely on BSD specific data structures defined by the host environment were replaced with data structures private to the Coda package, that could exist on all platforms. This strategy proved very valuable, for porting both servers and clients, since the cache manager and file server compiled with very few patches under DJGPP and Cygwin. One notable difficulty is that Windows NT executes asynchronous procedure calls which do not appear to mix well with Coda's user level thread library. While a work around could be found by modifying Cygnus' select call implementation, we expect that a better solution will come forward when we implement Coda threads on NT Fibers. Initial impression were that the performance will need tuning and that some more native features are desirable, particularly on Windows NT. Summary and lessons learned Using a variety of freely available software packages, a very complex porting problem could be tackled. A severe lack of clear documentation on Microsoft's part, made us go wrong in several ways before finding a solution to our problems. The kernel environment for Windows 95, while difficult and hostile as a development environment, allowed the implementation of socket, special memory allocation and filesystem support in a way that circumvented the user level Win32/Win16 mutex problems. Porting our servers was comparatively straightforward using the Cygwin32 library from Cygnus. Looking back at our work, we vividly remember that initially and particularly after the first set backs, we were very doubtful if Coda could run on Windows. It turned out that with some creativity it did become possible to achieve our goals and obtain acceptable quality. Kernel related software for Windows is difficult to write, but the POSIX environments offered by DJGPP and Cygwin32 are truly remarkable, and should allow may other ports to proceed smoothly.
The development of Coda over the last decade has been made possible through the generosity and sustained support of many funding agencies and corporations. The funding agencies include the National Science Foundation (NSF), under contract CCR-8657907, and the United States Air Force (USAF) and Defense Advanced Research Projects Agency (DARPA), under contract numbers F33615-90-C-1465, F19628-93-C-0193 and F19628-96-C-0061. Corporate sponsors include the Intel Corporation, and Novell Inc. The views and conclusions contained here are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either express or implied, of the USAF, DARPA, Intel, Novell, CMU, or the U.S. Government. We wish to thank members of the Coda team for discussion.
|
This paper was originally published in the
Proceedings of the 1999 USENIX Annual Technical Conference, June 6-11, 1999, Monterey, California, USA
Last changed: 1 Mar 2002 ml |
|