The most elementary relaxation of the assumption is to consider continuous operation of the algorithm: records still last longer than the bin length, and we still have separate counters for each bin, but there can be active records at the start of our bin, records created earlier.
The simplest case is that of records spanning the entire bin. The byte and
packet counters will reflect the actual traffic, so we use
and
. If we do not have a packet sampling stage we can also
compute
if
and
otherwise.
because the flow started in an earlier bin.
If a flow record expires within the bin we run the analysis on, it can be the
only record for the flow, but it is also possible that another record for the
same flow would get created after the first record's expiration. For byte and
packet counts which are additive we can just add the counters from the first
record to the estimates from the second
and
. The analysis of unbiasedness carries
through because we can consider that the bin is actually two sub-bins, one
ending when the first record ends and the other starting at the same time. Since
we have unbiased byte and packet estimates for both sub-bins, our estimates for
the sum of the bins will still be unbiased.
If , we know that the flow sent packets during the bin, so we set
to 1, otherwise we use with
since an unbiased estimator for whether the flow was active in the second
sub-bin will tell use whether it was active overall. This approach preserves
overall unbiasedness, but it makes analysis more complicated because the two
flow records representing the flow cannot be processed independently anymore:
the contribution of the second record to the flow count of the bin depends on
whether there was a first record with the same flow identifier. When the router
reports the records, they might not be near each other, so the analysis has to
do ``flow reconstruction'': keep a hash table with flow identifiers and find
flow records with the same flow identifier covering parts of the same bin. The
consequence of not doing flow reconstruction is running the risk of double
counting such flows with more than one record (which might be acceptable in many
settings).
By our definition of flow arrivals from , as long as
assumption 1 holds, if a flow has a record that starts before the start of the
bin, we should use
, irrespective of whether we have a second
flow record (possibly with a SYN flag) or not. If we have a second flow record
with the SYN flag set we can clearly say that assumption 1 does not hold, but
without flow reconstruction we might count it separately against the flow
arrival count. In many settings this type of over-counting is not a
serious concern.
should not be used because
assumption 2 does not hold.