Check out the new USENIX Web site. next up previous
Next: Limits to Low Latency Up: Virtual Log Based File Previous: Virtual Log Based File

Introduction

In this paper, we set out to answer a simple question: how do we minimize the latency of small synchronous writes to disk?

The performance of small synchronous disk writes impacts the performance of important applications such as recoverable virtual memory [27], persistent object stores [2, 18], and database applications [33, 34]. These systems have become more complex in order to deal with the increasing relative cost of small writes [19].

Similarly, most existing file systems are carefully structured to avoid small synchronous disk writes. UFS by default delays data writes to disk. It is also possible to delay metadata writes if they are carefully ordered [9]. The Log-structured File System (LFS) [25] batches small writes. While the structural integrity of these file systems can be maintained, none of them allows small synchronous writes to be supported efficiently. Write-ahead logging systems [4, 6, 12, 32] accumulate small updates in a log and replay the modifications later by updating in place. Databases often place the log on a separate disk to avoid having the small updates to the log conflict with reads. Our interest is in the limits to small write performance for a single disk.

Because of the limitations imposed by disks, non-volatile RAM (NVRAM) or an uninterruptable power supply (UPS) is often used to provide fast stable writes [3, 14, 15, 19]. However, when write locality exceeds buffer capacity, performance degrades. There are also applications that demand stricter guarantees of reliability and integrity than that of either NVRAM or UPS. Fast small disk writes can provide a cost effective complement to NVRAM.

Our basic approach is to write to a disk location that is close to the head location. We call this eager writing. Eager writing requires the file system to be aware of the precise disk head location and disk geometry. One way to satisfy this requirement is to enhance the disk interface to the host so that the host file system can have precise knowledge of the disk state. A second solution is to migrate into the disk some of the file system responsibilities that are traditionally executed on the host. In the rest of this paper, we will assume this second approach, although our techniques do not necessarily depend on the ability to run file systems inside disks.

Several technology trends have simultaneously enabled and necessitated the approach of migrating file system responsibility into the disk. First, Moore's Law has driven down the relative cost of CPU power to disk bandwidth, enabling powerful systems to be embedded on disk devices [1].

As this trend continues, it will soon be possible to run the entire file system on the disk. Second, growing at 40% per year [11], disk bandwidth has been scaling faster than other aspects of the disk system. I/O bus performance has been scaling less quickly [21]. The ability of the file system to communicate with the disk (to reorganize the disk, for example) without consuming valuable I/O bus bandwidth has become increasingly important. Disk latency improves even more slowly (at an annual rate of 10% in the past decade [21]). A file system whose small write latency is largely decided by the disk bandwidth instead of any other parameters will continue to perform well. Third, the increasing complexity of the modern disk drives and the fast product cycles make it increasingly difficult for operating system vendors to incorporate useful device heuristics into their file systems to improve performance. By running file system code inside the disk, we can combine the precise knowledge of the file system semantics and detailed disk mechanism to perform optimizations that are otherwise impossible.

The basic concept of performing writes near the disk head position is by no means a new one [5, 8, 10, 13, 23]. But these systems either do not guarantee atomic writes, have poor failure recovery times, or require NVRAM. In this work, we present the design of a virtual log, a logging strategy based on eager writing with these unusual features:

We discuss two designs in which the virtual log can be used to improve file system performance. The first is to use it to implement a logical disk interface. This design, called a Virtual Log Disk (VLD), does not alter the existing disk interface and can deliver the performance advantage of eager writing to an unmodified file system. In the second approach, which we have not implemented, we seek a tighter integration of the virtual log into the file system; we present the design of a variation of LFS, called VLFS. We develop analytical models and algorithms to answer a number of fundamental questions about eager writing:

We evaluate our approach against update-in-place and logging by modifying the Solaris kernel to serve as a simulation engine. Our evaluations show that an unmodified UFS on an eager writing disk runs about ten times faster than an update-in-place system for small synchronous random updates. Eager writing's economical use of bandwidth also allows it to significantly improve LFS in cases where delaying small writes is not an option or on-line cleaning would degrade performance. The performance advantage of eager writing should become more profound in the future as technology improves.

Of course, like LFS, these benefits may come at a price of potentially reducing read performance as data may not be optimally placed for future reads. But as increasingly large file caches are employed, modern file systems such as the Network Appliance file system report predominantly write traffic [15]. Large caches also provide opportunity for read reorganization before the reads happen [22].

Although the virtual log shows significant performance promise, this paper remains a preliminary study. A full evaluation would require answers to algorithmic and policy questions of the data reorganizer as part of a complete VLFS implementation. These issues, as well as questions such as how to extend the virtual logging technique to multiple disks, are subjects of our ongoing research.

The remainder of the paper is organized as follows. Section 2 presents the eager writing analytical models. Section 3 presents the design of the virtual log and the virtual log based LFS. Section 4 describes the experimental platform that is used to evaluate the update-in-place, logging, and eager writing strategies. Section 5 shows the experimental results. Section 6 describes some of the related work. Section 7 concludes.


next up previous
Next: Limits to Low Latency Up: Virtual Log Based File Previous: Virtual Log Based File

Randolph Wang
Tue Jan 5 14:30:32 PST 1999