In order to perform this work, we have used HRaid [4],
which is a storage-system
simulator
All tests presented in this paper were performed simulating an array with a combination of slow and fast disks. The model used for these disks is the one proposed by Ruemmler and Wilkes [16]. The parameters used for the slow disks were taken from the Seagate Barracuda 4LP [18] and to emulate the fast disk we used the parameters of a Cheetah 4LP [18], which is also a Seagate disk. A list with some important characteristics for each disk (controller and drive) are presented in Table 1. Finally, the size used for the striping unit is 128Kbytes. This size has been computed using the ideas presented by Chen et al. [2]. Although the formulas presented in that paper were for homogeneous disk arrays, we have assumed they would be adequate for heterogeneous ones.
Fast Disk | Slow Disk | |
Size | ||
Disk size | 4.339 Gb | 2.061 Gb |
Cache size | 512Kbytes | 128Kbytes |
Sector size | 512Bytes | 512Bytes |
Cache model | ||
Read/Write fence | 64Kbytes | 64Kbytes |
Prefetching | YES | YES |
Immediate report | YES | YES |
Overheads | ||
New-command | 1100µs | 1100µs |
Track switch | 800µs | 800µs |
Bandwidth | ||
RPM | 10033 | 7200 |
Seek model | ||
Limit (in cylinders) | 600 | 600 |
Sort: a+b*sqrt(d) µs | a = 1.55 | a = 3.0 |
b = 0.155134 | b = 0.232702 | |
Long: a+b*(d) µs | a = 4.2458 | a = 7.2814 |
b = 0.001740 | b = 0.002364 | |
These disks and the hosts were connected through a Gigabit network (10µs latency and 1Gbits/s bandwidth). We simulated the contention of the network, but no protocol overhead was simulated.
We also have to keep in mind that in the simulations we only took the network and disks (controller and drive) into account. The possible overhead of the requesting hosts was not simulated because it greatly depends on the implementation of the file system. The only issue we simulated from the file system was that it can only handle 10 requests at a time. The rest of requests wait in a queue until one of the previous requests has been served.
Finally, we have to mention that when using the synthetic traces presented in the next section, we made 10 runs for each one of them (all with different seed to generate the access pattern) and report the average value. In these runs we always obtained very similar results and the difference was never larger than 2%.
In order to get the first results, we have studied the behavior of the system on a set of synthetic workloads based on the following parameters:
Table 2 presents the characteristics of the synthetic workloads used.
Request | Aligned | Operation | |
Size | Type | ||
W8 | 8Kbytes | No | Writes |
W256 | 245Kbytes | No | Writes |
W1024 | 1024Kbytes | Yes | Writes |
W2048 | 2048Kbytes | Yes | Writes |
R8 | 8Kbytes | No | Reads |
R2048 | 2048Kbytes | Yes | Reads |
On the other hand, we also wanted to obtain results for a real system, and thus we used a portion of the traces gathered by the Storage System Group at the HP Laboratories (Palo Alto) in 1999 [22]. These traces represent a detailed characterization of every low-level disk access generated in the system over a 6 month period. This system contained a file server and some workstations used by the people in the Storage System Group to perform their work (compilations, edition, databases, simulations, etc.). As the size of the traces was too large (6 months) we will only present the results obtained during the busiest hour of February 14th. The tested portion has 159208 reads and 115044 writes and the average request size is around 12 Kbytes. With these traces, as with most traces, dependencies such as that a given operation has to follow another one are not recorded. However, this does not invalidate the results presented because the general load they represent continues to be real.
All the experiments presented in this paper have been done using disk arrays with 9 disks. This number of disks is large enough to see the possible advantage and limitations of the proposal. Furthermore, it is small enough to make things easy to understand.
Another important issue is the way small writes are handled. All the arrays we have evaluated used the read-write-modify algorithm, which means that the blocks read in a small write are the same ones as the blocks written [3]. This option has been used because it increases the parallelism between requests.
For simplicity, the configurations used always have all fast disks in the first positions and the slow ones in the last position of the array.
Finally, we have chosen a single SIP of 19 for all experiments, also for simplicity reasons. Regarding the utilization factors we have used a UF of 1 for the fast disks and .46 for the slow disks. These values have been decided experimentally and a sensitivity analysis for this parameter is presented in Section 6.6. We know that better values could be used for some of the experiments, but this is not the important issue as we want to prove the goodness of the idea and not to propose the best possible parameters.
We have compared AdaptRaid5 with the following two base configurations: