Check out the new USENIX Web site. next up previous
Next: Design Approaches for Multi-tier Up: A 3-tier RAID Storage Linux Previous: A 3-tier RAID Storage Linux

Introduction

The need for reliable, efficient, fast and easily manageable storage has dramatically increased because of the Web as web servers and database servers need to have these properties. The management cost nowadays is much higher than the actual storage cost, often by a factor of 4 to 7. Consider the case of a caching web proxy that maintains a large cache in persistent storage. But all the cached data is not useful; hit rates have been reported only in range of 30%. The performance of the caching proxies and database servers can be improved if it is possible to have storage on a device which allows fast retrieval of the frequently accessed data. At the same time, the storage should be reliable, i.e. it should be able to sustain disk failures without bringing the system down, a necessity for highly available applications. Further, even if the system crashes, it should be able to recover from the crash as soon as possible so that system down time is minimal. Another desirable feature is the efficient use of available storage, due to the huge amount of data that database and web servers may potentially handle.

Disk access patterns display good locality of reference [#!bib:unix-diskpatterns!#], especially in non-scientific environments. For achieving cost-effective storage systems with terabytes of data, such locality can be exploited by using a multi-tiered storage system with different price-performance tiers that adapts to the access patterns by automatically migrating the data between the tiers.

This approach has a lot in common with memory caches. Like caches, we try to improve the performance using a small faster storage layered on top of a bigger, slower and relatively cheaper storage. But our problem is different from caches in that the caches need not provide reliable persistence semantics in the event of a system failure. Next, the latencies in our case can be much longer due to the use of lower speed devices such as disks when compared with memory caches. Also, in caches, the same data can occupy space in multiple tiers; in our case, data can be resident in only one tier. Our approach also has some similarities with hierarchical storage management (HSM) solutions. However, HSM is a more general storage system comprising of secondary (disks) and tertiary storage (tape), while our solution uses only secondary storage.

Our design currently has 3 tiers: declustered RAID1, RAID5 and compressed RAID5 (cRAID5). Redundant Arrays of Inexpensive Disks (RAID) is a technique to improve the reliability and performance of secondary storage. Of the various levels of RAID discussed in [#!bib:raid!#], RAID1 and RAID5 have become more popular due to ease of use and price/performance respectively. Mirroring or RAID1 maintains two copies of the same data and generally provides best performance and is easier to configure. Rotating parity scheme or RAID5 costs the least of all the RAID levels for the reliability and performance it provides. It suffers from poor small update performance and configuring RAID5 is more involved. Declustered RAID1 differs from RAID1 in that the data is striped across multiple disks. Two physical stripes constitute a RAID1 logical stripe with each stripe unit data being present in two different disks, thus ensuring resilience to a single disk failure. cRAID5 is the same as RAID5 except that the data is compressed before writing. Parity is computed on the written (compressed) data; typically, the parity computation has to be done on the compressed data of more than one stripe. Random read access into a cRAID5 stripe is handled by decompressing just the compressed stripe (and not the full stripe as we have locking at the right granularity). However, writes are more involved which we discuss later.



 
next up previous
Next: Design Approaches for Multi-tier Up: A 3-tier RAID Storage Linux Previous: A 3-tier RAID Storage Linux
Dr K Gopinath
2000-04-25