Check out the new USENIX Web site. next up previous
Next: 2. Implementation Up: FiST: A Language for Previous: FiST: A Language for

Subsections

   
1. Introduction

File systems have proven to be useful in enriching system functionality. The abstraction of folders with files containing data is natural for use with existing file browsers, text editors, and other tools. Modifying file systems became a popular method of extending new functionality to users. However, developing file systems is difficult and involved. Developers often use existing code for native in-kernel file systems as a starting point[15,23]. Such file systems are difficult to write and port because they depend on many operating system specifics, and they often contain many lines of complex operating systems code, as seen in Table 1.


 
Table: Common Native Unix File Systems and Code Sizes for Each Medium
Media Common Avg. Code Size
Type File System (C lines)
Hard Disks UFS, FFS, EXT2FS 5,000-20,000
Network NFS 6,000-30,000
CD-ROM HSFS, ISO-9660 3,000-6,000
Floppy PCFS, MS-DOS 5,000-6,000

 

User-level file systems are easier to develop and port because they reside outside the kernel[16]. However, their performance is poor due to the extra context switches these file systems must incur. These context switches can affect performance by as much as an order of magnitude[26,27].

Stackable file systems[19] promise to speed file system development by providing an extensible file system interface. This extensibility allows new features to be added incrementally. Several new extensible interfaces have been proposed and a few have been implemented[8,15,18,22]. To improve performance, these stackable file systems were designed to run in the kernel. Unfortunately, using these stackable interfaces often requires writing lots of complex C kernel code that is specific to a single operating system platform and also difficult to port.

More recently, we introduced a stackable template system called Wrapfs[27]. It eases up file system development by providing some built-in support for common file system activities. It also improves portability by providing kernel templates for several operating systems. While working with Wrapfs is easier than with other stackable file systems, developers still have to write kernel C code and port it using the platform-specific templates.

In previous approaches, performance and portability could not be achieved together. To perform well, a file system should run in the kernel, not at user level. Kernel code, however, is much more difficult to write and port than user-level code. To ease the problems of developing and porting stackable file systems that perform well, we propose a high-level language to describe such file systems. There are three benefits to using a language:

1.
Simplicity: A file system language can provide familiar higher-level primitives that simplify file system development. The language can also define suitable defaults automatically. These reduce the amount of code that developers need to write, and lessen their need for extensive knowledge of kernel internals, allowing even non-experts to develop file systems.

2.
Portability: A language can describe file systems using an interface abstraction that is common to operating systems. The language compiler can bridge the gaps among different systems' interfaces. From a single description of a file system, we could generate file system code for different platforms. This improves portability considerably. At the same time, however, the language should allow developers to take advantage of system-specific features.

3.
Specialization: A language allows developers to customize the file system to their needs. Instead of having one large and complex file system with many features that may be configured and turned on or off, the compiler can produce special-purpose file systems. This improves performance and memory footprint because specialized file systems include only necessary code.

This paper describes the design and implementation of FiST, a File System Translator language for stackable file systems. FiST lets developers describe stackable file systems at a high level, using operations common to file system interfaces. With FiST, developers need only describe the core functionality of their file systems. The FiST language code generator, fistgen, generates kernel file system modules for several platforms using a single description. We currently support Solaris, FreeBSD, and Linux.

To assist fistgen with generating stackable file systems, we created a minimal stackable file system template called Basefs. Basefs adds stacking functionality missing from systems and relieves fistgen from dealing with many platform-dependent aspects of file systems. Basefs does not require changes to the kernel or existing file systems. Its main function is to handle many kernel details relating to stacking. Basefs provides simple hooks for fistgen to insert code that performs common tasks desired by file system developers, such as modifying file data or inspecting file names. That way, fistgen can produce file system code for any platform we port Basefs to. The hooks also allow fistgen to include only necessary code, improving performance and reducing kernel memory usage.

We built several example file systems using FiST. Our experiences with these examples shows the following benefits of FiST compared with other stackable file systems: average code size is reduced ten times; development time is reduced seven times; performance overhead of stacking is less than 2%, and unlike other stacking systems, there is no performance overhead for native file systems.

Our focus in this paper is to demonstrate how FiST simplifies the development of file systems, provides write-once run-anywhere portability across UNIX systems, and reduces stacking overhead through file system specialization. The rest of this paper is organized as follows. Section 2 details the design of FiST, and describes the FiST language, fistgen, and Basefs. Section 3 discusses key implementation and portability details. Section 4 describes several example file systems written using FiST. Section 5 evaluates the ease of development, the portability, and the performance of our file systems. Section 6 surveys related work. Finally, Section 7 concludes and explores future directions.

   
1.1 The FiST Language

The FiST language is a high-level language that uses file system features common to several operating systems. It provides file system specific language constructs for simplifying file system development. In addition, FiST language constructs can be used in conjunction with additional C code to offer the full flexibility of a system programming language familiar to file system developers. The ability to integrate C and FiST code is reflected in the general structure of FiST input files. Figure 5 shows the four main sections of a FiST input file.


  
Figure: FiST Grammar Outline
\begin{figure}
\begin{center}
\begin{tabular}{c\vert l\vert}
\cline{2-2}
& {\tt\...
...nal C Code}\\
\cline{2-2}
\end{tabular}\end{center}\vspace{-0.5em}
\end{figure}

The FiST grammar was modeled after yacc[9] input files, because yacc is familiar to programmers and the purpose for each of its four sections (delimited by ``%%'') matches with four different subdivisions of desired file system code: raw included header declarations, declarations that affect the produced code globally, actions to perform when matching vnode operations, and additional code.

C Declarations (enclosed in ``{% %}'') are used to include additional C headers, define macros or typedefs, list forward function prototypes, etc. These declarations are used throughout the rest of the code.

FiST Declarations define global file system properties that affect the overall semantics of the produced code and how a mounted file system will behave. These properties are useful because they allow developers to make common global changes in a simple manner. In this section we declare if the file system will be read-only or not, whether or not to include debugging code, if fan-in is allowed or not, and what level (if any) of fan-out is used. FiST Declarations can also define special data structures used by the rest of the code for this file system. We can define mount-time data that can be passed with the mount(2) system call. A versioning file system, for example, can be passed a number indicating the maximum number of versions to allow per file. FiST can also define new error codes that can be returned to user processes, for the latter to understand additional modes of failure. For example, an encryption file system can return a new error code indicating that the cipher key in use has expired.

FiST Rules define actions that generally determine the behavior for individual files. A FiST rule is a piece of code that executes for a selected set of vnode operations, for one operation, or even a portion of a vnode operation. Rules allow developers to control the behavior of one or more file system functions in a portable manner. The FiST rules section is the primary section, where most of the actions for the produced code are written. In this section, for example, we can choose to change the behavior of unlink to rename the target file, so it might be restored later. We separated the declarations and rules sections for programming ease: developers know that global declarations go in the former, and actions that affect vnode operations go in the latter.

Additional C Code includes additional C functions that might be referenced by code in the rest of the file system. We separated this section from the rules section for code modularity: FiST rules are actions to take for a given vnode function, while the additional C code may contain arbitrary code that could be called from anywhere. This section provides a flexible extension mechanism for FiST-based file systems. Code in this section may use any basic FiST primitives (discussed in Section 2.3.1) which are helpful in writing portable code. We also allow developers to write code that takes advantage of system-specific features; this flexibility, however, may result in non-portable code.

The remainder of this section introduces the FiST language primitives, the various participants in a file system (such as files, mounts, and processes), their attributes and how to extend them and store them persistently, and how to control the execution flow in a file system. The examples in Section 4 are also helpful because they further illustrate the FiST language.

   
1.1.1 FiST Syntax

FiST syntax allows referencing mounted file systems and files, accessing attributes, and calling FiST functions. Mount references begin with $vfs, while file references use a shorter ``$'' syntax because we expect them to appear more often in FiST code. References may be followed by a name or number that distinguishes among multiple instances (e.g., $1, $2, etc.) especially useful when fan-out is used (Figure 4). Attributes of mounts and files are specified by appending a dot and the attribute name to the reference (e.g., $vfs.blocksize, $1.name, $2.owner, etc.) The scope of these references is the current vnode function in which they are executing.

There is only one instance of a running operating system. Similarly, there is only one process context executing that the file system has to be concerned with. Therefore FiST need only refer to their attributes. These read-only attributes are summarized in Table 2. The scope of all read-only ``%'' attributes is global.


  
Table 2: Global Read-Only FiST Variables
\begin{table}
\centering
\begin{tabularx}{\linewidth}{\vert l\vert X\vert}
\hlin...
...\
\%uid & effective user ID\\
\hline
\end{tabularx}\vspace{-0.5em}
\end{table}

FiST code can call FiST functions from anywhere in the file system, some of which are shown in Table 3. The scope of FiST functions is global in the mounted file system. These functions form a comprehensive library of portable routines useful in writing file systems. The names of these functions begin with ``fist.'' FiST functions can take a variable number of arguments, omit some arguments where suitable defaults exist, and use different types for each argument. These are true functions that can be nested and may return any single value.


  
Table 3: A sample of FiST functions used in this paper
\begin{table}
\centering
\begin{tabularx}{\linewidth}{\vert l\vert X\vert}
\hlin...
...fistOp & execute an arbitrary vnode operation\\ \hline
\end{tabularx}\end{table}

Each mount and file has attributes associated with it. FiST recognizes common attributes of mounted file systems and files that are defined by the system, such as the name, owner, last modification time, or protection modes. FiST also allows developers to define new attributes and optionally store them persistently. Attributes are accessed by appending the name of the attribute to the mount or file reference, with a single dot in between, much the same way that C dereferences structure field names. For example, the native block size of a mounted file system is accessed as $vfs.blocksize and the name of a file is $0.name.

FiST allows users to create new file attributes. For example, an ACL file system may wish to add timed access to certain files. The following FiST Declaration can define the new file attributes in such a file system:

per_vnode {
int user; /* extra user */
int group; /* extra group */
time_t expire; /* access expiration time */
};

With the above definition in place, a FiST file system may refer to the additional user and group who are allowed to access the file as $0.user and $0.group, respectively. The expiration time is accessed as $0.expire.

The per_vnode declaration defines new attributes for files, but those attributes are only kept in memory. FiST also provides different methods to define, store, and access additional attributes persistently. This way, a file system developer has the flexibility of deciding if new attributes need only remain in memory or saved more permanently.

For example, an encrypting file system may want to store an encryption key, cipher ID, and Initialization Vector (IV) for each file. This can be declared in FiST using:

fileformat SECDAT {
char key[16]; /* cipher key */
int cipher; /* cipher ID */
char iv[16]; /* initialization vector */
};

Two FiST functions exist for handling file formats: fistSetFileData and fistGetFileData. These two routines can store persistently and retrieve (respectively) additional file system and file attributes, as well as any other arbitrary data. For example, to save the cipher ID in a file called .key, use:

int cid;
/* set cipher ID */
fistSetFileData(".key", SECDAT, cipher, cid);

The above FiST function will produce kernel code to open the file named ``.key'' and write the value of the ``cid'' variable into the ``cipher'' field of the ``SECDAT'' file format, as if the latter had been a data structure stored in the ``.key'' file.

Finally, the mechanism for adding new attributes to mounts is similar. For files, the declaration is per_vnode while for mounts it is per_vfs. The routines fistSetFileData and fistGetFileData can be used to access any arbitrary persistent data, for both mounts and files.

   
1.1.2 Rules for Controlling Execution and Information Flow

In the previous sections we considered how FiST can control the flow of information between the various layers. In this section we describe how FiST can control the execution flow of various operations using FiST rules.

FiST does not change the interfaces that call it, because such changes will not be portable across operating systems and may require changing many user applications. FiST therefore only exchanges information with applications using existing APIs (e.g., ioctls) and those specific applications can then affect change.

The most control FiST file systems have is over the file system (vnode) operations that execute in a normal stackable setting. Figure 6 highlights what a typical stackable vnode operation does: (1) find the vnode of the lower level mount, and (2) repeat the same operation on the lower vnode.


  
Figure: A skeleton of typical kernel C code for stackable vnode functions. FiST can control all three sections of every vnode function: pre-call, post-call, and the call itself.
\begin{figure}
\begin{center}\footnotesize
\begin{alltt}int {\it fsname}_getattr...
...es here */}
return error;
\}\end{alltt}\end{center}\vspace{-0.5em}
\end{figure}

The example vnode function receives a pointer to the vnode on which to apply the operation, and other arguments. First, the function finds the corresponding vnode at the lower level mount. Next, the function actually calls the lower level mounted file system through a standard VOP_* macro that applies the same operation, but on the file system corresponding to the type of the lower vnode. The macro uses the lower level vnode, and the rest of the arguments unchanged. Finally, the function returns to the caller the status code which the lower level mount passed to the function.

There are three key parts in any stackable function that FiST can control: the code that may run before calling the lower level mount (pre-call), the code that may run afterwards (post-call), and the actual call to the lower level mount. FiST can insert arbitrary code in the pre-call and post-call sections, as well as replace the call part itself with anything else.

By default, the pre-call and post-call sections are empty, and the call section contains code to pass the operation to the lower level file system. These defaults produce a file system that stacks on another but does not change behavior, and that was designed so developers do not have to worry about the basic stacking behavior--only about their changes.

For example, a useful pre-call code in an encryption file system would be to verify the validity of cipher keys. A replication file system may insert post-call code to repeat the same vnode operation on other replicas. A versioning file system could replace the actual call to remove a file with a call to rename it; an example FiST code for the latter might be:

%op:unlink:call {
fistRename($name, fistStrAdd($name, ".unrm"));
}

The general form for a FiST rule is:


\begin{displaymath}\%callset:optype:part ~ \{ code \}
\end{displaymath} (1)

Table 4 summarizes the possible values that a FiST rule can have. Callset defines a collection of operations to operate on. Optype further defines the call set to a subset of operations or a single operation. Part defines the part of the call that the following code refers to: pre-call, call, post-call, or the name of a newly defined ioctl. Finally, code contains any C code enclosed in braces.


  
Table: Possible Values in FiST Rules
\begin{table}
\centering
\begin{tabularx}{\linewidth}{\vert l\vert X\vert}
\hlin...
...ame of a newly defined ioctl\\
\hline
\end{tabularx}\vspace{-0.5em}
\end{table}

   
1.1.3 Filter Declarations and Filter Functions

FiST file systems can perform arbitrary manipulations of the data they exchange between layers. The most useful and at the same time most complex data manipulations in a stackable file system involve file data and file names. To manipulate them consistently without FiST or Wrapfs, developers must make careful changes in many places. For example, file data is manipulated in read, write, and all of the MMAP functions; file names also appear in many places: lookup, create, unlink, readdir, mkdir, etc.

FiST simplifies the task of manipulating file data or file names using two types of filters. A filter is a function like Unix shell filters such as sed or sort: they take some input, and produce possibly modified output.

If developers declare filter:data in their FiST file, fistgen looks for two data coding functions in the Additional C Code section of the FiST File: encode_data and decode_data. These functions take an input data page, and an allocated output page of the same size. Developers are expected to implement these coding functions in the Additional C Code section of the FiST file. The two functions must fill in the output page by encoding or decoding it appropriately and return a success or failure status code. Our encryption file system uses a data filter to encrypt and decrypt data (Section 4.1).

With the FiST declaration filter:name, fistgen inserts code and calls to encode or decode strings representing file names. The file name coding functions (encode_name and decode_name) take an input file name string and its length. They must allocate a new string and encode or decode the file name appropriately. Finally, the coding functions return the number of bytes in the newly allocated string, or a negative error code. Fistgen inserts code at the caller's level to free the memory allocated by file name coding functions.

Using FiST filters, developers can easily produce file systems that perform complex manipulations of data or names exchanged between file system layers.

   
1.2 Fistgen

Fistgen is the FiST language code generator. Fistgen reads in an input FiST file, and using the right Basefs templates, produces all the files necessary to build a new file system described in the FiST input file. These output files include C file system source files, headers, sources for user level utilities, and a Makefile to compile them on the given platform.

Fistgen implements a subset of the C language parser and a subset of the C preprocessor. It handles conditional macros (such as #ifdef and #endif). It recognizes the beginning of functions after the first set of declarations and the ending of functions. It parses FiST tags inserted in Basefs (explained in the next section) used to mark special places in the templates. Finally, fistgen handles FiST variables (beginning with $ or %) and FiST functions (such as fistLookup) and their arguments.

After parsing an input file, fistgen builds internal data structures and symbol tables for all the keywords it must handle. Fistgen then reads the templates, and generates output files for each file in the template directory. For each such file, fistgen inserts needed code, excludes unused code, or replaces existing code. In particular, fistgen conditionally includes large portions of code that support FiST filters: code to manipulate file data or file names. It also produces several new files (including comments) useful in the compilation for the new file system: a header file for common definitions, and two source files containing auxiliary code.

The code generated by fistgen may contain automatically generated functions that are necessary to support proper FiST function semantics. Each FiST function is replaced with one true C function--not a macro, inlined code, a block of code statements, or any feature that may not be portable across operating systems and compilers. While it might have been possible to use other mechanisms such as C macros to handle some of the FiST language, it would have resulted in unmaintainable and unreadable code. One of the advantages of the FiST system is that it produces highly readable code. Developers can even edit that code and add more features by hand, if they so choose.

Fistgen also produces real C functions for specialized FiST syntax that cannot be trivially handled in C. For example, the fistGetIoctlData function takes arguments that represent names of data structures and names of fields within. A C function cannot pass such arguments; C++ templates would be needed, but we opted against C++ to avoid requiring developers to know another language, because modern Unix kernels are still written in C, and to avoid interoperability problems between C++ produced code and C produced code in a running kernel. Preprocessor macros can handle data structure names and names of fields, but they do not have exact or portable C function semantics. To solve this problem, fistgen replaces calls to functions such as fistGetIoctlData with automatically generated specially named C functions that hard-code the names of the data structures and fields to manipulate. Fistgen generates these functions only if needed and only once.

   
1.3 Basefs

Basefs is a template system which was derived from Wrapfs[27]. It provides basic stacking functionality without changing other file systems or the kernel. To achieve this functionality, the kernel must support three features. First, in each of the VFS data structures, Basefs requires a field to store pointers to data structures at the layer below. Second, new file systems should be able to call VFS functions. Third, the kernel should export all symbols that may be needed by new loadable kernel modules. The last two requirements are needed only for loadable kernel modules.


  
Figure: Where Basefs fits inside the kernel
\begin{figure}
\begin{centering}
\epsfig{file=figures/basefs-boundaries.eps, height=1.2in}\vspace{-0.5em}
\end{centering}\end{figure}

Basefs handles many of the internal details of operating systems, thus freeing developers from dealing with kernel specifics. Basefs provides a stacking layer that is independent from the layers above and below it. Figure 7 shows this. Basefs appears to the upper VFS as a lower level file system. Basefs also appears to file systems below it as a VFS. All the while, Basefs repeats the same vnode operation on the lower level file system.

Basefs performs all data reading and writing on whole pages. This simplifies mixing regular reads and writes with memory-mapped operations, and gives developers a single paged-based interface to work with. Currently, file systems derived from Basefs manipulate data in whole pages and may not change the data size (e.g., compression).

To improve performance, Basefs copies and caches data pages in its layer and the layers below it.1 Basefs saves memory by caching at the lower layer only if file data is manipulated and fan-in was used; these are the usual conditions that require caching at each layer.

Basefs is different from Wrapfs in four ways. First, substantial portions of code to manipulate file data and file names, as well as debugging code are not included in Basefs by default. These are included only if the file system needs them. By including only code that is necessary we generate output code that is more readable than code with multi-nested #ifdef/#endif pairs. Conditionally including this code also resulted in improved performance, as reported in Section 5.3. Matching or exceeding the performance of other layered file systems was one of the design goals for Basefs.

Second, Basefs adds support for fan-out file systems natively. This code is also conditionally included, because it is more complex than single-stack file systems, adds more performance overhead, and consumes more memory. A complete discussion of the implementation and behavior of fan-out file systems is beyond the scope of this paper.

Third, Basefs includes (conditionally compiled) support for many other features that had to be written by hand in Wrapfs. This added support can be thought of as a library of common functions: opening, reading or writing, and then closing arbitrary files; storing extended attributes persistently; user-level utilities to mount and unmount file systems, as well as manipulate ioctls; inspecting and modifying file attributes, and more.

Fourth, Basefs includes special tags that help fistgen locate the proper places to insert certain code. Inserting code at the beginning or the end of functions is simple, but in some cases the code to add has to go elsewhere. For example, handling newly defined ioctls is done (in the basefs_ioctl vnode function) at the end of a C ``switch'' statement, right before the ``default:'' case.


next up previous
Next: 2. Implementation Up: FiST: A Language for Previous: FiST: A Language for
Erez Zadok
2000-04-25