Estimators based on flow slices

Table 3: Notation used in this paper.

Name	Meaning
	flow slicing probability
	packet sampling probability
	smart sampling probability
	size of flow (in packets) before flow slicing
	packet counter in flow record
$\widehat{s}$	estimate of the size of flow before flow slicing (0 if flow not sliced)
	original size of flow (in packets) before packet sampling
$\widehat{S}$	estimate of the original size of flow (0 if flow not sampled or not sliced)
	size of a flow in bytes before flow slicing
	byte counter in flow record
$\widehat{b}$	estimate of the number of bytes in flow based on flow slices (0 if flow not sliced)
	original size of flow in bytes before packet sampling
$\widehat{B}$	estimate of the original size of flow in bytes (0 if flow not sampled or not sliced)
$\widehat{f}$	contribution to the estimate of the number of active flows (0 if flow not sliced)
$\widehat{a}$	contribution to the estimate of the number of flow arrivals (0 if flow not sliced)
$\widehat{A}^{(1)}$	contribution to first estimator of number of flow arrivals (0 if flow not sampled or not sliced)
$\widehat{A}^{(2)}$	contribution to second estimator of number of flow arrivals (0 if flow not sampled or not sliced)
	smart sampling threshold controlling the influence of $\widehat{S}$ on
	smart sampling threshold controlling the influence of $\widehat{B}$ on
	smart sampling threshold controlling the influence of $\widehat{A}^{(1)}$ on

In this section, we discuss formulae for estimating traffic based on the flow records provided by Flow Slices. In practice, the user would be interested in the number of bytes, packets or flows in the entire traffic mix or a portion of it (e.g. the HTTP traffic, etc.). All our estimators focus on a single flow. To compute the total traffic, the user has to sum the contributions of all individual flow records. If the estimators for individual flows have the property of unbiasedness, the errors in the estimates for individual flows will not accumulate, but cancel out (to some extent).

For the purposes of our analysis, a bin is an arbitrary interval of time of interest to traffic analysis. To simplify analysis, we start by focusing on the simple case of a single bin, with slice length

and inactivity timeout $t_{inactive}$ larger than the size of the bin and flow memory empty at the beginning of the bin. Next, we look at how the estimators generalize when we remove these constraints. summarizes notation used throughout the paper.