Data model

3 Data model

This section describes Hancock's data model, which includes a model for collections of call records and a model for profile data.

3.1 Call stream

We model a collection of call records as a stream in Hancock. Programmers use the stream type operator to declare a new stream type. Such a declaration names the new type and specifies both the physical and the logical representations of the records in the stream. Intuitively, the physical representation describes the (highly encoded) structure of the records as they exist on disk, while the logical representation describes an expanded form convenient for programming. The declaration specifies a function to convert from encoded physical to expanded logical records. For example, the following code declares a stream type callStream:

stream callStream { 
  getvalidcall : PCallRec_t => callRec_t; 
}

For this stream, the physical type is PCallRec_t, the logical type is callRec_t, and the conversion function getvalidcall constructs a logical record from a physical one. Function getvalidcall has type

char getvalidcall(PCallRec_t *pc, callRec_t *c)

This function checks that the record *pc is valid, and if so, unpacks *pc into *c and returns true to indicate a successful conversion. Otherwise, getvalidcall simply returns false. Programmers can declare variables of type callStream using standard C syntax (for example, callStream calls).

We represent streams on disk as a directory that contains binary files. Hancock's wiring-diagram mechanism, which we discuss in Section 5, provides a way to match the name of a directory to a stream.

3.2 Signature data

Hancock provides two mechanisms for describing signature data. Programmers use the record declaration to specify the format of a profile and the map declaration to specify the mapping between phone numbers and profiles.

Records are designed to capture the relationship depicted in Figure 2. They specify the types for the signature and approximation views of a profile, as well as the freeze and thaw expressions for converting between these types. We use the following simple record to introduce the pieces of a record declaration.

record uField(ufSig, ufApprox){
  int <=> char;
  ufSig(b) = bucketToSec[b];
  uApprox(s) = secToBucket(s);
}

This declaration introduces three types:

uField: the type of the record,
ufSig: the type of the left-hand view (int), and
ufApprox: the type of the right-hand view (char).

The ufSig(b) = ... portion of the record declaration specifies how to thaw ufApprox b to produce a ufSig value. Similarly, ufApprox(s) = ... specifies how to freeze a uSig s to obtain a uApprox value. In this application, uApprox(s) uses the function secToBucket to convert the seconds stored in integer s into a bucket number, and uSig(b) uses the array bucketToSec to convert a bucket stored in b into the corresponding mean number of seconds for that bucket. Together, these expressions are an example of where thaw(freeze(s)) does not equal s.

Records can have more than one field (in which case the fields are named), and they can be included in other record declarations. For example, a second record declaration that appears in the Usage signature, uLine, has the following form:

record uLine(uSig, uApprox){
  uField in;
  uField out;
  uField outTF;
}

As in our earlier example, this declaration introduces three types: uLine, uSig, and uApprox. The type uLine is the type of the record. The types uSig and uApprox are equivalent to C structures constructed from the left and right types of uField:

typedef struct {  
   int in;
   int out;
   int outTF;  
} uSig;  

typedef struct {  
   char in;
   char out;
   char outTF;  
} uApprox;

Because the record uLine does not include explicit freeze and thaw expressions, Hancock constructs them automatically from the freeze and thaw expressions of the record's fields. For this record, the compiler constructs the following freeze function:

uApprox freeze(uSig s){
  uApprox a;
  a.in  = secToBucket(s.in);
  a.out = secToBucket(s.out);
  a.outTF = secToBucket(s.outTF);
  return a;
}

The thaw function is constructed similarly.

Although this example record only contains fields with record types, fields may also have regular C types. In this context, C types can be thought of as records with the same left- and right-hand type and the identity function for freezing and thawing.

To convert between views, Hancock provides the view operator ($). The expression ua$uSig converts uApprox ua to a uSig, using the conversion specified implicitly in the uLine declaration. Expression us$uApprox behaves analogously.

Hancock's map declaration provides a way to associate data with keys. Typically a map does not contain data for every possible key. Consequently, Hancock supports the notion of a default value, which is returned when a programmer requests data for a key that does not have a value stored in the map. For example, in the following map declaration, the keys have type line_t, the data are structures of type uApprox, and the default value is the constant uApprox structure consisting of all zeros.

map uMap {
   key line_t; 
   value uApprox; 
   default {0,0,0};
}

Defaults may also be specified as functions that use the key in question to compute an appropriate default for that key. For example, the map declaration below specifies a function, lineToDefault, to call with the line in question when a default record is needed.

map uMapF {  
   key line_t; 
   value uApprox; 
   default lineToDefault;
}

A common use for this mechanism is to construct defaults by querying another data source.

The identifier uMap names a new map type. Variables of this type can be declared using the usual C syntax (for example, uMap usage). Hancock provides an indexing operator <:...:> to access values in a map. The code:

line_t pn;

u = usage<:pn:>; 
... 
usage<:pn:> = u;

gives an example of reading from and writing to a map. The usual idiom for accessing map data combines the indexing and view operators as in:

us = usage<:pn:>$uSig;

Hancock also provides an operator :=: to copy maps. In particular, the statement

 new_usage :=: usage;

causes uMap map new_usage to be initialized with the data from usage.

3.3 Discussion

By providing the programmer with appropriate abstractions, Hancock reduces the intellectual burden of writing signatures. Although programmers were freezing and thawing their data prior to Hancock, they had not abstracted this idea. The result was numerous bugs caused by confusing the types of the two views. The structure enforced by records eliminates many of these bugs by requiring programmers to document the relationship between the two views, and to apply the view operator to convert between them explicitly. As an added benefit, records simplify signature code by generating conversion functions automatically from record fields when possible.

Maps provide an efficient implementation for the most performance critical part of signature programs. The index operation is more convenient than a library interface, and it provides stronger type-checking.