Multi-factor smart sampling

Next: Dynamically adjusting the flow Up: Estimators based on flow Previous: Estimating flow arrivals

Multi-factor smart sampling

To reduce the number of flow records, while maintaining accurate byte counts, smart sampling [8] proposes sampling the flow records with a size dependent probability $r=\min(1,b/z)$ where is a threshold parameter controlling the trade-off between the loss in accuracy and the reduction in the volume of reports. We can adapt smart sampling to flow slices using $r=\min(1,\widehat{B}/z)$ and we could still estimate byte, packet and flow arrival counts based on the smart sampled flow records using $\widehat{\cal{S}}=1/r\widehat{S}$ , $\widehat{\cal{B}}=1/r\widehat{B}$ , and $\widehat{\cal{A}}=1/r\widehat{A}$ . But using this formula for results in a variance for $\widehat{\cal{A}}$ much larger than that of $\widehat{A}$ because it discriminates against flows with few bytes, and since most flows have few bytes, they will also produce most flow records with the SYN flag set - and these are exactly the records $\widehat{A}^{(1)}$ and $\widehat{A}^{(2)}$ rely on.

We propose a new variant of smart sampling, multi-factor smart sampling, which takes into consideration not just byte counts, but also packet counts and SYN flags. By picking a smart sampling probability of $r=\min(1,\widehat{s}/z_s+\widehat{B}/z_b+\widehat{A}/z_a)$ we can balance the requirements of the three estimators. The three individual thresholds control the trade-off between accuracy and reduction in report volume separately for the three estimators of bytes, packets and flow arrivals. Note that multi-factor smart sampling is a generalization of smart sampling: if we set , $z_s=\infty$ , and $z_a=\infty$ , it will assign the exact same sampling probabilities to records as smart sampling.

Next: Dynamically adjusting the flow Up: Estimators based on flow Previous: Estimating flow arrivals

Ramana Rao Kompella 2005-08-12