Next: Conclusions Up: Tribeca: A System for Previous: Performance Measurements

Related Work

The difficulties in using relational databases stored on tape are overviewed in [3]. Sarawagi [11] modifies a relational query optimizer to consider large tape archives in its cost formula and caches tape data on faster storage. Video-on-demand systems [4] might use tape storage, but in these workloads many users randomly access independent large objects instead of sequences of small ones.

A temporal DBMS usually treats time as an additional dimension [16]. Implementation issues involve multidimensional indices [6] and disk-based temporal joins [8]. Network traces are temporal, but Tribeca treats it as a one-dimensional stream. Also, so far, the traffic analysts have treated data as events rather than intervals, making temporal joins simpler in Tribeca. Illustra [17] implements time-series data as an ADT, but does not have operators like demux required for traffic analysis.

The SEQ sequence DBMS [12][13][14] integrates sequence operations into an RDBMS. It has operators analogous to those defined in Tribeca although the data flow style of Tribeca's query language should make constructing large batches of sequence queries easier. Because Tribeca eliminates features of the RDBMS that would slow sequence queries, it runs considerably faster than SEQ on sequence data. However, Tribeca will not support queries that mix relational and sequence data as SEQ does. Also, many of the SEQ optimization strategies involve teaching the relational optimizer to distinguish sequences from relations so the executor can access and buffer them differently. Because it does not support relations, Tribeca's optimizer does not face these problems. For example, Tribeca implements a window primitive instead of teaching the optimizer to distinguish a relation joined to itself from a window scan.

The Tangram system [10] implemented in Prolog has operators similar to Tribeca's basic stream processing operators. Tangram did not include more complex operators such as demux, mux, window, and window filter. Tangram was important early work in stream processing. However, processing stream data in a Prolog system has some potential performance drawbacks. The rule processing in Prolog systems is typically made efficient by carefully-tuned main memory data structures; they do not handle data on secondary or tertiary storage well. Recognizing this, the Tangram project used a relational system as a front-end to handle the bulk of the I/O processing and filtering for the prolog backend. Still, implementing both data management and stream processing together in the same engine will reduce data handling overheads. It will also allow users to write queries in a single query language instead of composing them partially in SQL and partially in Prolog. Further, as in SEQ, trying to implement two kinds of systems in one program will inevitably lead to performance compromises.

There have been several efforts at querying live networks. Datacycle [1] used a specialized network interface to query data circulating through a high speed local network. The Berkeley packet filter [9] allows users to load simple filters into the operating system kernel to generate qualified packet traces efficiently.

Next: Conclusions Up: Tribeca: A System for Previous: Performance Measurements