Network traffic analysis is used in a variety of applications ranging from performance analysis and network provisioning to fraud detection. The characteristics of the data sets and the analyses to be performed on them do not lend themselves to the use of a conventional DBMS, but writing custom programs for each query performed on the data is not desirable either. The data sets are extremely large (the IP-over-ATM trace mentioned earlier in the paper consists of approximately 176 GBytes of data) and come pre-ordered by timestamp. The only practical way to cope with the data is to either analyze it in real time as it happens or to record it on tape.
Tribeca is a stream-oriented database management system designed to support network traffic analysis. Its query language has a data flow character that is familiar to network analysts and supports sequence operators they use in their work. Tribeca's executor is tuned for sequential I/O and the optimizer is focused toward memory and processor limitations rather than join ordering and access path selection. The current implementation of Tribeca has performed useful analyses on several large data sets. The overhead introduced by Tribeca relative to a special-purpose analysis program is not very large, i.e. the convenience and flexibility of Tribeca provide more than enough incentive for analysts to use it.