3 Data model
This section describes Hancock's data model, which includes a model
for collections of call records and a model for profile data.
3.1 Call stream
We model a collection of call records as a stream in Hancock.
Programmers use the stream type operator to declare a new stream
type. Such a declaration names the new type and specifies both the
physical and the logical representations of the records in
the stream. Intuitively, the physical representation describes the
(highly encoded) structure of the records as they exist on disk, while
the logical representation describes an expanded form convenient for
programming. The declaration specifies a function to convert from
encoded physical to expanded logical records. For example, the
following code declares a stream type callStream:
stream callStream {
getvalidcall : PCallRec_t => callRec_t;
}
For this stream, the physical type is PCallRec_t, the logical
type is callRec_t, and the conversion function getvalidcall constructs a logical record from a physical one.
Function getvalidcall has type
char getvalidcall(PCallRec_t *pc, callRec_t *c)
This function checks that the record *pc is valid, and if so,
unpacks *pc into *c and returns true to indicate a
successful conversion. Otherwise, getvalidcall simply returns
false. Programmers can declare variables of type callStream
using standard C syntax (for example, callStream calls).
We represent streams on disk as a directory that contains binary
files. Hancock's wiring-diagram mechanism, which we discuss in
Section 5, provides a way to match the name of a
directory to a stream.
3.2 Signature data
Hancock provides two mechanisms for describing signature data.
Programmers use the record declaration to specify the format of
a profile and the map declaration to
specify the mapping between phone numbers and
profiles.
Records are designed to capture the relationship depicted in Figure
2. They specify the types for the signature and
approximation views of a profile, as well as the freeze and thaw
expressions for converting between these types. We use the following
simple record to introduce the pieces of a record declaration.
record uField(ufSig, ufApprox){
int <=> char;
ufSig(b) = bucketToSec[b];
uApprox(s) = secToBucket(s);
}
This declaration introduces three types:
-
uField: the type of the record,
- ufSig: the type of the left-hand view (int), and
- ufApprox: the type of the right-hand view (char).
The ufSig(b) = ... portion of the record declaration specifies
how to thaw ufApprox b to produce a ufSig value.
Similarly, ufApprox(s) = ... specifies how to freeze a uSig s to obtain a uApprox value. In this application,
uApprox(s) uses the function secToBucket to convert the
seconds stored in integer s into a bucket number, and uSig(b) uses the array bucketToSec to convert a bucket
stored in b into the corresponding mean number of seconds for
that bucket. Together, these expressions are an example of where thaw(freeze(s)) does not equal s.
Records can have more than one field (in which case the fields are
named), and they can be included in other record declarations. For
example, a second record declaration that appears in the Usage
signature, uLine, has the following form:
record uLine(uSig, uApprox){
uField in;
uField out;
uField outTF;
}
As in our earlier example, this declaration introduces three types:
uLine, uSig, and uApprox. The type uLine is the
type of the record. The types uSig and uApprox are
equivalent to C structures constructed from the left and right types of
uField:
typedef struct {
int in;
int out;
int outTF;
} uSig;
typedef struct {
char in;
char out;
char outTF;
} uApprox;
Because the record uLine does not include explicit freeze and
thaw expressions, Hancock constructs them automatically from the
freeze and thaw expressions of the record's fields. For this record,
the compiler constructs the following freeze function:
uApprox freeze(uSig s){
uApprox a;
a.in = secToBucket(s.in);
a.out = secToBucket(s.out);
a.outTF = secToBucket(s.outTF);
return a;
}
The thaw function is constructed similarly.
Although this example record only contains fields with record types,
fields may also have regular C types. In this context, C types can be
thought of as records with the same left- and right-hand type and
the identity function for freezing and thawing.
To convert between views, Hancock provides the view operator ($).
The expression ua$uSig converts uApprox ua to a uSig, using the conversion specified
implicitly in the uLine declaration. Expression us$uApprox behaves analogously.
Hancock's map declaration provides a way to associate data
with keys. Typically a map does not contain data for every possible
key. Consequently, Hancock supports the notion of a default
value, which is returned when a programmer requests data for a
key that does not have a value stored in the map. For example, in the
following map declaration, the keys have type
line_t, the data are structures of type uApprox, and
the default value is the constant uApprox structure consisting
of all zeros.
map uMap {
key line_t;
value uApprox;
default {0,0,0};
}
Defaults may also be specified as functions that
use the key in question to compute an appropriate default for that key.
For example, the map declaration below specifies a function,
lineToDefault, to call with the line in question when a default
record is needed.
map uMapF {
key line_t;
value uApprox;
default lineToDefault;
}
A common use for this mechanism is to construct defaults by querying
another data source.
The identifier uMap names a new map type. Variables of this
type can be declared using the usual C syntax (for example, uMap
usage). Hancock provides an indexing operator <:...:> to
access values in a map. The code:
line_t pn;
u = usage<:pn:>;
...
usage<:pn:> = u;
gives an example of reading from and writing to a map. The usual idiom
for accessing map data combines the indexing and view operators as in:
us = usage<:pn:>$uSig;
Hancock also provides an operator :=: to copy maps.
In particular, the statement
new_usage :=: usage;
causes uMap map new_usage to be initialized with the data from
usage.
3.3 Discussion
By providing the programmer with appropriate abstractions, Hancock
reduces the intellectual burden of writing signatures. Although
programmers were freezing and thawing their data prior to
Hancock, they had not abstracted this idea.
The result was numerous bugs caused by confusing the types of the two views.
The structure enforced by records eliminates many of these
bugs by requiring programmers to document the relationship between the
two views, and to apply the view operator to convert between
them explicitly. As an added benefit, records simplify signature code
by generating conversion functions automatically from record fields when
possible.
Maps provide an efficient implementation for the most performance
critical part of signature programs. The index operation is more
convenient than a library interface, and it provides stronger
type-checking.