Computation Model

4 Computation Model

Hancock's computation model is built around the notion of iterating over a sorted stream of calls. Sorting call records ensures good locality for references to the signature data that follow the sorting order. Off-direction references may not have good locality, however. For example, if we sort the call records by the originating number, then updating the usage for that number would have good locality, while updating the usage for the dialed number would suffer from bad locality. Consequently, signature computations are typically done with multiple passes over the data, each sorting the data in a different order and updating a different part of the profile data. We call each such pass, represented in Figure 1 as a dashed box, a phase.

A phase starts by specifying a name and a parameter list. The body of a phase has three pieces: an iterate clause, a list of variable declarations, and a list of event clauses. The following pseudo-code outlines the outgoing phase of the usage signature:

phase out(callStream calls, uMap usage) { 
    iterate clause 
    variable declarations
    event clauses 
}

This code defines a phase out that takes two parameters: a stream of calls and a usage map. The iterate clause specifies an initial stream and a set of transformations to produce a new stream. Variables declared at the level of a phase are visible throughout that phase. The event clauses specify how to process the transformed stream. The next two subsections describe these clauses.

4.1 Iterate Clause

Through the iterate clause, Hancock allows programmers to transform a stream of logical call records into a stream of records tailored to the particular signature computation. The iterate clause has the following form:

iterate  
   over  stream variable
   sortedby  sorting order 
   filteredby  filter predicate
   withevents  event list

We explain each of these pieces in turn. The over clause names an initial stream to transform. The sortedby clause specifies a sorting order for this stream. For example, the calls could be sorted by the originating phone number or by the connect time. At present, the allowable sorting orders are hard-coded into Hancock. The filteredby clause specifies a predicate that is used to remove unneeded records from the stream. For example, a call stream may include incomplete calls, which are not used by the Usage signature. Removing unneeded call records before processing the stream simplifies the processing code.

The withevents clause specifies which events are relevant to this signature. We call the occurrence of a group of calls in the stream an event. Depending on the sorting order, different groups of calls can be identified in the call stream. For example, if the calls are sorted by originating number, the possible groups are:

calls for the same area code,
calls for the same exchange,
calls for the same line, or
a single call.

Within an event there are two important sub-events: the beginning of the block of calls and its ending. The possible events and sub-events are related hierarchically; calls are nested with lines, lines within exchanges, and exchanges within area codes.

Putting all these pieces together, the iterate clause for the outgoing phase of Usage has the following form:

iterate  
   over calls  
   sortedby origin  
   filteredby noIncomplete
   withevents line, call;

This code specifies that calls is the initial stream, that it should be sorted by the originating number, that it should be filtered using function noIncomplete, which removes incomplete calls from the stream, and that two events, line and call, are of interest.

4.2 Event Clauses

Figure 4: Hierarchical event structure

The iterate clause specifies the events of interest in the stream, but it does not indicate what to do when an event is detected. The event clauses of a phase specify code to execute in response to a given event. We illustrate this structure in Figure 4. Programmers supply event code for the boxes, while Hancock generates the control-flow code that corresponds to the arrows.

Each event has a name that corresponds to the event name listed in the iterate clause. Each event takes as a parameter the portion of the call record that triggered the event. For example, an area code event is passed the area code shared by the block of calls that triggered the event. The body of each event has three parts: a list of variable declarations, a begin sub-event, and an end sub-event. Variables declared at the level of an event are shared by both its sub-events and are available to events lower down in the hierarchy, using a scoping operator (event name::variable name). A common pattern is for the programmer to declare a variable in the code for a line event, to initialize it in the begin-line sub-event, to accumulate information using that variable while processing calls, and then to store the accumulated information during the end-line sub-event code.

A sub-event is a kind (begin or end) followed by a C-style block that contains a list of variable declaration and Hancock and C statements. Variables declared in a sub-event are visible only within that sub-event. The code in Figure 5 implements the line and call events for the outgoing phase of Usage.

event line(line_t pn) { 
  uSig cumSec;   

  begin {  
    cumSec.out = 0; 
    cumSec.outTF = 0; 
  } 

  /* process calls */  

  end {
    uSig us;  

    us = usage<:pn:>$uSig; 
    us.outTF = blend(cumSec.outTF, us.outTF);
    us.out = blend(cumSec.out, us.out);
    usage<:pn:> = us$uApprox; 
  } 
} 

event call(callRec_t c) { 
  uSig line::cumSec;  

  if (c.isTollFreeCall) 
    cumSec.outTF += c.duration;  
  else 
    cumSec.out +=  c.duration;
}

Figure 5: Event code for Usage signature

Note that the call event does not have sub-events. We call such events bottom-level events. The lowest event in any list of events is a bottom-level event. For example, Frequency, a signature that we discuss later, does not collect summary information for a set of calls. It tracks only the existence of at least one call, so line is its bottom-level event.

4.3 Discussion

Hancock's event specifications have several advantages. First, the specifications have the flavor of function definitions with their attendant modularity advantages, but without their usual cost because the Hancock compiler expands the event definitions in-line with the control-flow code. Second, having the compiler generate the control flow removes a significant source of bugs and complexity from Hancock programs. Finally, programmers can use the variable-sharing mechanism to share information across events.

A common question when thinking about designing a domain-specific language is whether or not a library would suffice. We rejected the library option largely because of Hancock's control-flow abstractions. In particular, expressing Hancock's event model and the information sharing it provides proved awkward in a call-back framework, the usual technique for implementing such abstractions.