To reduce the number of flow records, while maintaining accurate byte counts,
smart sampling [8] proposes sampling the flow records
with a size dependent probability
where
is a threshold
parameter controlling the trade-off between the loss in accuracy and the
reduction in the volume of reports. We can adapt smart sampling to flow slices
using
and we could still estimate byte, packet and
flow arrival counts based on the smart sampled flow records using
,
, and
. But using this formula for
results in a
variance for
much larger than that of
because it
discriminates against flows with few bytes, and since most flows have few bytes,
they will also produce most flow records with the SYN flag set - and these are
exactly the records
and
rely on.
We propose a new variant of smart sampling, multi-factor smart sampling, which
takes into consideration not just byte counts, but also packet counts and SYN
flags. By picking a smart sampling probability of
we can balance the
requirements of the three estimators. The three individual thresholds control
the trade-off between accuracy and reduction in report volume separately for the
three estimators of bytes, packets and flow arrivals. Note that
multi-factor smart sampling is a generalization of smart sampling: if
we set
,
, and
, it will assign the
exact same sampling probabilities to records as smart sampling.