We have shown that FFPF increases packet filtering efficiency even for relatively simple tasks. The previous tests fail to show, however, where the performance gains originate and how the system would operate with more complex filters. Table 1 breaks down the overhead in several subtasks.
Rows deal with general overhead, namely the calling of a filter , the total overhead per filter in the flowgraph (with filters that return immediately after being called to show only framework overhead), the saving of an element in an index buffer and the saving of a 1500B packet to . The decrease in cost by a factor 50 for saving a reference in over saving a full packet shows that in the presence of overlapping flows, FFPF's flowgroups can truly increase efficiency. This, combined with memory mapping of buffers, is perhaps the most important factor to the gradual degradation of performance when running multiple applications.
Rows show resource consumption for a number of often executed
filters, namely the Aho-Corasick pattern matching algorithm used in
snort
[31], and a simple tcpdump
filter5 executed in FPL2 code and BPF respectively.
Rows and show that FPL2 is four times as efficient as BPF,
even for such a trivial filter. While not shown, cost savings grow
with expression complexity (as expected). Unfortunately, the
performance of really elaborate filters, such as those shown in
Figures 6 and 7, cannot be compared, as such
complex filters cannot be expressed in BPF.
Pattern matching can also be seen to be costly. We show the case
where an application (e.g., snort
) is only interested in packets that
contain a signature. Especially when a signature is not found
after scanning the entire packet processing costs are high (the result
shown is for 1500 byte packets).
By executing this function in the kernel, FFPF eliminates a journey to
userspace for every packet, avoiding unnecessary packet copies,
context switches and signalling. Note that even compared to the high
overhead of pattern matching, the overhead of storing packets is
significant.
The complete cost of context switching is hard to measure due largely to the asynchronous nature of userspace/kernel communication. One measure that is quantifiable is the cost to wake up a user process, row in Table 1. At 600 cycles (4 times the overhead of a filter stage), this is a significant cost. To minimize this overhead users can reduce communication by batching packets. Waking up a client process only once every packets reduces this type of overhead by . In FFPF, is configured by the size of the circular buffers and can be thousands of packets.
Furthermore, comparing filtering (row ) and framework (rows ) overhead shows that costs due to FFPF's complexity contributes only a moderate amount to overall processing. Finally, we discuss in a related publication that the IXP implementation is able to sustain full Gigabit rates for the same simple filter that was used for Figure 1, while a few hundred Mbps can still be sustained for complex filters that check every byte in the packet [29]. As the FPL-2 code on the IXP is used as pre-filtering stage, we are able to support line rates without being hampered by bottlenecks such as the PCI bus and host memory latency, which is not true for most existing approaches. We conclude that FFPF can be used as an efficient solution for both simple (e.g. BPF) and more complex (sampling, pattern matching) tasks.