Check out the new USENIX Web site.



next up previous
Next: Application Scenarios Up: Architecture of pStore Previous: Dynamic evolution of schema

Framework

The pStore framework offers built-in support for representing and accessing semantic metadata in file stores.

Event model/consistency control. Inter-file dependencies is an important type of semantic information captured by pStore . Often, such dependencies imply some consistency requirement users assume between the related files. Such requirements vary for different instances of a relation, or even across time.

We capture such consistency requirements by augmenting dependency relations with an associated relation of type Event. An event consists of an ordered list of <precondition: action> tuples (implemented as a rdf:seq container in RDF). When a data object is accessed (e.g., open, write), the system checks each of these preconditions and executes the corresponding actions if the precondition holds. Suppose that object Shrek depends on object Ogre. One of the events associated with that relation may look like <modified: rebuild(Shrek)>, specifying that Shrek needs to be regenerated if Ogre is modified.

Customized name space views. In addition to the conventional hierarchical name space, the data model provides the basis on which customized per-user or per-application name spaces can be constructed. We sketch several ways that this can be done.

One way to construct customized name spaces is by constraining the corresponding relations. A special case is when the customized name space is a sub-graph of the original file system hierarchy. For instance, Shrek is_parent_of {user=Mary, script} states that object Shrek is a parent directory of object script only for user Mary. Another possibility is to exploit Property inheritance in the schema. For example, Property land_mammal{feet} can be regarded as a super class of Property elephant{feet, trunk}.

In principle, a virtual directory can be created to include links to an arbitrary set of files, e.g., searching results [8].

Security and access control. In an enterprise environment such as a digital movie studio, data is its biggest asset. Thus, data dependability is of paramount importance. They use mechanisms such as encryption and access control to protect the data and mechanisms such as erasure coding and replication for high reliability and availability. We envision that such data dependability mechanisms can be represented using our data model.

RDF Property inheritance can be used to fine tune the relations for certain types of data.

Advanced searching capabilities. One of the open research questions in storage systems today is how to perform advanced and efficient searching of content in large corpuses of data. Our model and framework provide a uniform platform for integrating content, attribute, and context-based searching. For example, it can be used in combination with information retrieval algorithms [2] that depend on semantic information from the data. Similarly, our model can capture context information (such as access patterns) and inter-file relationships that can be used for advanced context-based searching [19]. We would also like to provide searching with variable recall and precision to be able to trade-off this against speed. Especially for queries where the recall and precision are not 100%, the ranking of the search results becomes important. This is an area where context information has been successfully used, for example in Google.

Archival support. An on-line archival storage system is one of the main applications we envision for pStore . Compression and versioning are essential given the volume and complexity of the data [17]. The semantic information that our model can capture about the data can be used to reduce storage consumption [11] and facilitate efficient data organization for fast data storage and retrieval.



next up previous
Next: Application Scenarios Up: Architecture of pStore Previous: Dynamic evolution of



Magnus Karlsson
ti 17 jun 2003 14.32.10