To reduce the number of flow records, while maintaining accurate byte counts, smart sampling [8] proposes sampling the flow records with a size dependent probability where is a threshold parameter controlling the trade-off between the loss in accuracy and the reduction in the volume of reports. We can adapt smart sampling to flow slices using and we could still estimate byte, packet and flow arrival counts based on the smart sampled flow records using , , and . But using this formula for results in a variance for much larger than that of because it discriminates against flows with few bytes, and since most flows have few bytes, they will also produce most flow records with the SYN flag set - and these are exactly the records and rely on.
We propose a new variant of smart sampling, multi-factor smart sampling, which takes into consideration not just byte counts, but also packet counts and SYN flags. By picking a smart sampling probability of we can balance the requirements of the three estimators. The three individual thresholds control the trade-off between accuracy and reduction in report volume separately for the three estimators of bytes, packets and flow arrivals. Note that multi-factor smart sampling is a generalization of smart sampling: if we set , , and , it will assign the exact same sampling probabilities to records as smart sampling.