Smart sampling has been proposed as a way of reducing the number of flow records
without causing much error. Smart sampling focuses on measuring the number of
bytes in arbitrary aggregates of traffic and thus smart sampling favors flow
records with large byte counters over those with small flow counters. Common
packet sizes vary between and
, so while the packet counts are not
proportional to the byte counts, they are closely correlated. Thus smart
sampling will ensure that the errors introduced in packet counts are also small.
The situation is different with flow arrival counts. These depend heavily on
flow records with the SYN flag set, and most such records come from small flows
which are discriminated against by smart sampling. Thus the errors introduced by
smart sampling in the flow arrival counts are significant.
We propose a new variant of smart sampling, multi-factor smart sampling
which takes into consideration not just byte counts, but also packet counts and
SYN flags. While multi-factor smart sampling still favors flow records with large
byte and packet counts, it also favors records with the SYN flag, thus ensuring
that the errors introduced into the flow arrival counts are not large either.
Because the exact rule used to determine the multi-factor smart sampling
probability depends on estimators of byte and packet counts, we postpone its
discussion to .