Check out the new USENIX Web site. next up previous
Next: NetBSD vfs changes Up: Description of DMFS Previous: Metadata

Comparison with other projects

Some of the design goals of the DMFS layer are best understood in terms of some of the data migration efforts which have come before it.

The DMFS layer is based on NASA's experience with the RASHFS file system used in the NAStore 2 system. NAStore 2 was deployed at NAS from 1991 until June, 1999. It was initially deployed on an Amdahl 5880 running UTS. Later it was deployed on two Convex 3820 computers running ConvexOS. RASHFS was a modified version of the native FFS file system implementation. The changes needed to support data migration were grafted into the FFS implementation, extra space in the inode was used to store metadata, and a few system calls were modified to call into the RASHFS implementation to ensure that a file was restored before proceeding. (A file had to be fully restored before it was executed.) This implementation was very successful. The system deployed at NAS had about 4 TB of disk in the file systems and a couple of petabytes of on-line tape.

One of the difficulties with the RASHFS system is that it basically was RASH-FFS. To add the same migration abilities on top of a different file system type would require a substantial amount of re-implementation. As there is less unused space in FFS inodes in modern (4.4BSD and later) FFS implementations, it is possible that the metadata will no longer fit. This is one of the main reasons we went with a layered file system when designing DMFS. While untested, DMFS should work with NetBSD's LFS, and possibly even EXT2FS file systems, about as well as with the FFS file system. Additionally, at the same time as DMFS was being developed, there was a tremendous amount of file system development going on within the *BSD community. Kirk McKusick was working on adding ``soft updates'' to FFS, a technology which works to give FFS the resiliency of a journalized file system without having to have a journal. Konrad Schroder has been working to make NetBSD's LFS implementation quite robust and functional. By not integrating DMFS into a file system, DMFS stays independent of the above technology progressions - a site could choose to follow them or not, using the same DMFS code either way.

NAStore 3 is not the first tertiary storage system built on top of 4.4BSD. One predecessor was HighLight, a migration file system built above LFS. The major difference between it and DMFS is that HighLight is like RASHFS in that it represents melding data migration into a file system in a very intimate manner. However, HighLight uses certain aspects of the LFS structure to its advantage. LFS operates in terms of written segments[4,6]. A file is described as having contents in specific segments. HighLight implements data migration by including in the file system block space (where these segments may reside) the tapes in the robotics assigned to the file system. Thus when a file is migrated to tape storage, the inode is modified to point to the tape block. When a migrated file is read, portions of the on-tape segments are read onto disk. Otherwise, the file system behaves much as a non-migrating LFS. One advantage of this methodology is that it is possible to create dense files larger than the disk storage allocated to the migration file system (something not possible with DMFS).

The DMFS layer does have a few advantages over HighLight. One is that it has no knowledge of where migrated files are stored. As mentioned above, HighLight marks the blocks stored on the tape as part of the file system. Thus it requires knowledge of how many tapes of which size (in which robots) are available at file system creation time. While it certainly would be feasible to extend this knowledge base (add new tape robotics for instance) for an operational file system, keeping this knowledge separate from the migration file system is simpler. Additionally, by decoupling the tape storage information, we permit more sophisticated tape policies. For instance, NAStore 3 will by default archive a file into two independent vv's (which will reside on two separate tapes) before making it non-resident (deallocating its blocks on disk). This behavior permits a level of redundancy not readily obtained with HighLight.


next up previous
Next: NetBSD vfs changes Up: Description of DMFS Previous: Metadata
Bill Studenmund
2000-04-24