While real-world traces give a realistic representation of some real systems, synthetic traces have the advantage of isolating specific behaviors not clearly expressed in recorded traces. We therefore also generated a set of synthetic traces. We varied the trace characteristics as much as possible in order to cover a very wide range of different workloads.
We generated the following four sets of synthetic traces:
Each file has equal likelihood of being selected.
Files are divided into two groups. One group contains 10% of files; it is called hot group because its files are visited 90% of the time. The other group is called cold; it contains 90% of the files but they are visited only 10% of the time. Within groups each file is equally likely to be visited. This access pattern models a simple form of locality.
This suite contains small files and tries to model the behavior of systems such as the electronic mail or the network news systems. The sizes of files are limited from 1 KB to 1 MB. They are frequently created, deleted and updated. The data lifetime of this suite is the shortest one in this paper (90% of byte lifetimes are less than 5 minutes).
This trace consists of a typical TPC-D benchmark which accesses twenty large size database files from 512 MB to 10 GB. The database files consist of the different number of records ranged from 2,000,000 to 40,000,000. Each record is set to 100 bytes. Most transaction operations are queries and updates in this benchmark. The I/O access pattern is random writes followed by sequential reads. Random updates are applied to the active portion of the database. And then sometime later, large sweeping queries read relations sequentially [18]. This represents the typical I/O behavior of a decision support database. In this trace, we use sequential file reads to simulate 17 SQL queries for business questions. As for implementing TPC-D update functions, we generate random writes to represent following categories: updating 0.1% of data per query, inserting new sales data with 0.1% of table size and deleting old sales data of 0.1% of table size.
The other information of these four synthetic traces can be seen in Table 1.