In the following, we describe a number of relations that cover the set of common types of semantic information listed in Section 1. An RDF schema is defined for each of these relations, but it is not provided here, due to space restrictions. Neither do we use RDF notation to describe relations. Instead, we use an informal triplet notation, as above, using curly brackets to represent composite properties (constructed by means of blank properties or containers in RDF).
File versioning. Each file in pStore corresponds to one file object and multiple file version objects. Each update to the file automatically creates a new file version. The notion of a ``file'' will be represented by a data object that captures some of the basic attributes of the file (owner, file name, etc). For example, it could be the root node in a hierarchical content-addressable storage system [17]. As soon as the file has some content, each version of the file is represented by another object.
There are two types of relations between a file and its versions. Relation o1 has_version{o2, v1} states that object with id o2 is version v1 of o1. Similarly, o1 latest_version{o2} states that object o2 is the latest version of o1. Property has_version may have additional attributes, such as creation_time, and comment.
Hierarchical name space. The traditional hierarchical name space is defined using the is_parent_of and in_directory properties. E.g., ``movie1 is_parent_of sequence2'' represents the file path ``movie1/Sequence2''. File system access control is represented by the access_control property. The range of this property is a Class that defines, e.g., an ACL structure.
Dependencies. In addition to the hierarchical relations, a user can define other types of dependencies among objects. In fact, is_parent_of is just one instance of Property schema Depend_on. Instances of this Property may be application specific. For example, the relation Shrek char_dep Ogre, where char_dep is an instance of Depend_on, means that file Shrek has a dependency on file Ogre. Another example of dependency is the relationship between the master copy of the data and its replicas.
Associative semantics. Another common relationship is that of a metadata object describing an ordinary file. For instance, Fiona comments text indicates that object text describes the Fiona character. Such metadata will, in many cases, be automatically extracted and used for searching, as explained in the next section.
Context information. The data model can also be used to track context information from the file system and user behavior. Examples of related properties include no_reads, no_writes, accessed_before, accessed_by, and accessed_from. For example, we can use hair accessed_before {time=5s, nose} to record the fact that file hair is accessed 5 seconds before accessing file nose. This information can be used, to gather statistics that pStore (or applications) can use to improve the performance of the system. Examples include prefetching and caching in distributed environments, data placement, as well as advanced searching.
An important challenge that needs to be addressed is automatically extracting various types of semantic information from data. E.g., people use vector space models to extract features from text documents and images [2,5]. Similarly, they derive frequency, amplitude, and tempo feature vectors from music data [6]. More recently, Soules and Ganger [19] proposed methods for capturing file attributes and inter-file relations, by analyzing user access patterns.