Check out the new USENIX Web site. next up previous
Next: Performance Measurements Up: Implementation and Optimization Issues Previous: Implementing Fixed and Moving

Optimization Issues

Query optimization in Tribeca has two competing goals. The first is to minimize query execution time. The second is to ensure that the intermediate state associated with the query fits in main memory. Tribeca approaches the first goal like a standard RDBMS. For the second goal, it must pay special attention to two Tribeca operators because they require a data-dependent amount of memory. First, a demux operator is implemented as a hash table whose size depends on the cardinality of the demultiplex field. Second, moving windows and window filters require buffer space to hold their contents. The buffer is data-dependent if the window size is a function of the sort field.

In both space- and time-based optimization, Tribeca benefits from some standard RDBMS optimizations. As is often the case in an RDBMS, pushing qualifications towards the source of the data is helpful so Tribeca migrates quals towards the source stream. Successive qualifications are then grouped together so that the optimizer can use constraint minimization to eliminate redundant quals and order the predicates to minimize expected cost. Migration is, of course, not always possible. For instance, a qual cannot move past an aggregate operator and cannot be moved past a positional window. Tribeca also does some simple common sub-expression elimination. If two instances of the same operator have the same input, they are combined. It should eventually be possible to combine and eliminate larger sequences of operators.

Qual migration can help reduce the number of tuples in windows and reduce the cardinality of demuxes. To further minimize the storage cost of windows that require buffering, Tribeca does several things. First, it combines overlapping windows when it can. Second, it migrates any functional projections whose inputs are smaller than their outputs in front of the window. Still, overflow can occur. On overflow, Tribeca drops records entering the full buffer rather than allow disk I/O. Better user control of overflow error handling is an area for future work.


next up previous
Next: Performance Measurements Up: Implementation and Optimization Issues Previous: Implementing Fixed and Moving