Check out the new USENIX Web site. next up previous
Next: Dynamically adjusting the flow Up: Estimators based on flow Previous: Estimating flow arrivals


Multi-factor smart sampling

To reduce the number of flow records, while maintaining accurate byte counts, smart sampling [8] proposes sampling the flow records with a size dependent probability $ r=\min(1,b/z)$ where $ z$ is a threshold parameter controlling the trade-off between the loss in accuracy and the reduction in the volume of reports. We can adapt smart sampling to flow slices using $ r=\min(1,\widehat{B}/z)$ and we could still estimate byte, packet and flow arrival counts based on the smart sampled flow records using $ \widehat{\cal{S}}=1/r\widehat{S}$, $ \widehat{\cal{B}}=1/r\widehat{B}$, and $ \widehat{\cal{A}}=1/r\widehat{A}$. But using this formula for $ r$ results in a variance for $ \widehat{\cal{A}}$ much larger than that of $ \widehat{A}$ because it discriminates against flows with few bytes, and since most flows have few bytes, they will also produce most flow records with the SYN flag set - and these are exactly the records $ \widehat{A}^{(1)}$ and $ \widehat{A}^{(2)}$ rely on.

We propose a new variant of smart sampling, multi-factor smart sampling, which takes into consideration not just byte counts, but also packet counts and SYN flags. By picking a smart sampling probability of $ r=\min(1,\widehat{s}/z_s+\widehat{B}/z_b+\widehat{A}/z_a)$ we can balance the requirements of the three estimators. The three individual thresholds control the trade-off between accuracy and reduction in report volume separately for the three estimators of bytes, packets and flow arrivals. Note that multi-factor smart sampling is a generalization of smart sampling: if we set $ z_b=z$, $ z_s=\infty$, and $ z_a=\infty$, it will assign the exact same sampling probabilities to records as smart sampling.


next up previous
Next: Dynamically adjusting the flow Up: Estimators based on flow Previous: Estimating flow arrivals
Ramana Rao Kompella 2005-08-12