Before discussing how to estimate byte count estimates in flow slices, we show
why a simpler solution does not work. We could have the byte counter in
the flow entry just count the total number of bytes in the packets seen once the
flow record is created. Just like with the packet counter, we need an additive
correction to account for the packets missed before the creation of the entry.
We can get an unbiased estimate for the number of packets missed, but not for
their total size, because we do not know their sizes. We could assume that the
packet sizes are uniform within the flow, but this would lead to systematic
biases because they are not.
As the proof of shows, storing the size of the
sampled packet that led to the creation of the entry would solve the problem
because using it to estimate the total number of bytes in the packets not
counted does lead to an unbiased estimator. But this would require another entry
in the flow record. Instead, we store this information in the byte counter
itself by initializing
to
when the entry is created
(
is the size in bytes of the sampled packet). Let
be the number
of bytes of the flow at the input of the flow slicing algorithm.
Proof: By induction on the number of packets in the flow
. Let
for
from
to
be the sizes of the individual
packets. By definition the number of bytes in the flow is
. For convenience of notation, we index the packet
sizes in reverse order, so
will be the size of the last packet
and
the size of the first one.
Base case If s=1, the only packet is sampled with probability
and in that case it is counted
bytes. With
probability
, it is not sampled (and it counts as 0). Thus
.
Inductive step By induction hypothesis, we know that if the first
packet is not sampled we are left with the last packets and
. If the first packet gets sampled, we count it as
and we count the rest exactly because the flow slice length
and the inactivity timeout
are larger than the bin
size.
![]() |
![]() |
![]() |
|
![]() |
![]() |
If we sample packets randomly with probability before applying the
flow slicing algorithm, we will want to estimate the number of bytes
at the input of the packet sampling stage. Since
, it is
easy to show that
is an unbiased
estimator for
.