4 Computation Model
Hancock's computation model is built around the notion of iterating
over a sorted stream of calls. Sorting call records ensures good
locality for references to the signature data that follow the sorting
order.
Off-direction references may not have good locality, however.
For example, if we sort the call records by the originating number,
then updating the usage for that number would have good
locality, while updating the usage for the dialed number would suffer from
bad locality.
Consequently, signature
computations are typically done with multiple passes over the data,
each sorting the data in a different order and updating a different
part of the profile data. We call each such pass, represented in
Figure 1 as a dashed box, a phase.
A phase starts by specifying a name and a parameter list. The body of
a phase has three pieces:
an iterate clause,
a list of variable declarations,
and a list of event clauses.
The following pseudo-code outlines the outgoing phase of the usage signature:
phase out(callStream calls, uMap usage) {
iterate clause
variable declarations
event clauses
}
This code defines a phase out that takes two parameters:
a stream of calls and a usage map. The iterate clause specifies an
initial stream and a set of transformations to produce a new stream.
Variables declared at the level of a phase are visible throughout that
phase.
The event clauses specify how to process the transformed stream.
The next two subsections describe these clauses.
4.1 Iterate Clause
Through the iterate clause, Hancock allows programmers to transform a
stream of logical call records into a stream of records tailored to the
particular signature computation.
The iterate clause has the following form:
iterate
over stream variable
sortedby sorting order
filteredby filter predicate
withevents event list
We explain each of these pieces in turn.
The over clause names an initial stream to transform.
The sortedby clause specifies a sorting order for this
stream.
For example, the calls could be sorted by the originating phone number or by
the connect time.
At present, the
allowable sorting orders are hard-coded into Hancock. The filteredby clause specifies a predicate that is used to remove
unneeded records from the stream. For example, a call stream may
include incomplete calls, which are not used by the Usage signature.
Removing unneeded call records before processing the stream simplifies
the processing code.
The withevents clause specifies which events are relevant to
this signature. We call the occurrence of a group of calls in the
stream an event. Depending on the sorting order, different
groups of calls can be identified in the call stream. For example, if
the calls are sorted by originating number, the possible groups are:
-
calls for the same area code,
- calls for the same exchange,
- calls for the same line, or
- a single call.
Within an event there are two important sub-events: the beginning of
the block of calls and its ending. The possible events and sub-events
are related hierarchically; calls are nested with lines, lines within
exchanges, and exchanges within area codes.
Putting all these pieces together, the iterate clause for the outgoing
phase of Usage has the following form:
iterate
over calls
sortedby origin
filteredby noIncomplete
withevents line, call;
This code specifies that calls is the initial stream, that it
should be sorted by the originating number, that it should be filtered
using function noIncomplete, which removes incomplete calls from the
stream, and that two events, line and call, are of
interest.
4.2 Event Clauses
Figure 4: Hierarchical event structure
The iterate clause specifies the events of interest in the stream, but
it does not indicate what to do when an event is detected. The event
clauses of a phase specify code to execute in response to a given
event. We illustrate this structure in Figure 4.
Programmers supply event code for the boxes,
while Hancock generates the control-flow code that
corresponds to the arrows.
Each event has a name that corresponds to the event name listed in the
iterate clause. Each event takes as a parameter the portion of the call
record that triggered the event. For example, an area code event is
passed the area code shared by the block of calls that triggered the event.
The body of each event has three parts: a list of variable
declarations, a begin sub-event, and an end sub-event. Variables
declared at the level of an event are shared by both its sub-events and
are available to events lower down in the hierarchy, using a scoping
operator (event name::variable name). A common pattern is for
the programmer to declare a variable in the code for a line event, to
initialize it in the begin-line sub-event, to accumulate information
using that variable while processing calls, and then to store the
accumulated information during the end-line sub-event code.
A sub-event is a kind (begin or end) followed by a C-style
block that contains a list of variable declaration and Hancock and C
statements. Variables declared in a sub-event are visible only within
that sub-event. The code in Figure 5 implements the
line and call events for the outgoing phase of Usage.
event line(line_t pn) {
uSig cumSec;
begin {
cumSec.out = 0;
cumSec.outTF = 0;
}
/* process calls */
end {
uSig us;
us = usage<:pn:>$uSig;
us.outTF = blend(cumSec.outTF, us.outTF);
us.out = blend(cumSec.out, us.out);
usage<:pn:> = us$uApprox;
}
}
event call(callRec_t c) {
uSig line::cumSec;
if (c.isTollFreeCall)
cumSec.outTF += c.duration;
else
cumSec.out += c.duration;
}
Figure 5: Event code for Usage signature
Note that the call event does not have sub-events. We call such
events bottom-level events. The lowest event in any list of
events is a bottom-level event. For example, Frequency, a
signature that we discuss later, does not collect summary
information for a set of calls. It tracks only the existence of at
least one call, so line is its bottom-level event.
4.3 Discussion
Hancock's event specifications have several advantages. First, the
specifications have the flavor of function definitions with their
attendant modularity advantages, but without their usual cost because
the Hancock compiler expands the event definitions in-line with the
control-flow code. Second, having the compiler generate the
control flow removes a significant source of bugs and complexity from
Hancock programs. Finally, programmers can use the variable-sharing
mechanism to share information across events.
A common question when thinking about designing a domain-specific
language is whether or not a library would suffice. We rejected the
library option largely because of Hancock's control-flow abstractions.
In particular, expressing Hancock's event model and the
information sharing it provides proved awkward in a
call-back framework, the usual technique for implementing such
abstractions.