The core flow slicing algorithm addresses the problem of reducing the memory
usage of the flow measurement module. Sampled NetFlow and Adaptive NetFlow use
random packet sampling: they only handle sampled packets. Just as sample and
hold [11], flow slicing uses sampling only to control
the creation of flow entries, once a sampled packet creates an entry for a flow,
all its subsequent packets are counted (not just the sampled ones). This
increases the accuracy of the estimates of packet counts, without changing the
memory requirement. We use the ``flow slicing probability'' to control the
creation of flow entries. We expire and report each entry exactly
after its creation, irrespective of the rate at which packets arrive for a
particular flow. We call this core algorithm ``flow slicing'' because each
entry tracks a ``slice'' of length
from the flow.
Just as in the case of
NetFlow, the entry associated with a flow has a byte and packet counter updated
at every packet, timestamps for the first and last packet, and it stores
protocol information such as whether any of the packets counted against the
entry had the SYN flag set. To ensure unbiasedness of estimators, on creation of
an entry we do not initialize the byte counter to the number of bytes
in the packet that caused the creation of the entry, but to
(see for more details).
The slice length is related to the ``active timeout'' of NetFlow which
controls for how long an active entry is kept before expiring and being reported
(default 30 minutes). Both of these parameters limit the staleness of the data
(i.e. if we have a long-lived flow, we know that its traffic will be reported
with at most this much delay).
By dynamically adapting the flow slicing probability, we can control the rate at
which entries are created and freed, thus ensuring that the algorithm stays
within its allocated memory budget . By keeping the rate at which entries are
created, on average slightly below
, we can also keep the rate at which
flows records are reported smooth. In contrast Adaptive NetFlow proposes
expiring all active entries at the end of the measurement bin, so it either has
a large peak in reports, or it requires buffers that increase the memory usage
by almost a factor of two if the reporting of the records is smoothed out over
the next measurement bin. We do not however, discuss dynamic adaptation in much
detail in this paper, as adaptation techniques similar to that in
[10] can be applied in this context using feedback from
the current memory usage. Note however, that in our adaptation, we do not
require the costly operation of renormalization that is required in Adaptive
NetFlow. Next we discuss some of the tuning knobs we provide to control the
three resource bottlenecks (CPU, Memory, Bandwidth).