Bro supports two levels of scoping: local to a function or event handler, and global to the entire Bro script. Experience has already shown that we would benefit by adding a third, intermediate level of scoping, perhaps as part of a ``module'' or ``object'' facility, or even as simple as C's static scoping. Local variables are declared using the keywork local, and the declarations must come inside the body of a function or event handler. There is no requirement to declare variables at the beginning of the function. The scope of the variable ranges from the point of declaration to the end of the body. Global variables are declared using the keyword global and the declarations must come outside of any function bodies. For either type of declaration, the keyword can be replaced instead by const, which indicates that the variable's value is constant and cannot be changed.
Syntactically, a variable declaration looks like:
{class} {identifier} [':' {type}] ['=' {init}]That is, a class (local or global scope, or the const qualifier), the name of the variable, an optional type, and an optional initialization value. One of the latter two must be specified. If both are, then naturally the type of the initialization much agree with the specified type. If only a type is given, then the variable is marked as not having a value yet; attempting to access its value before first setting it results in a run-time error.
If only an initializer is specified, then Bro infers the variable's type from the form of the initializer. This proves quite convenient, as does the ease with which complex tables and sets can be initialized. For example,
const IRC = { 6666/tcp, 6667/tcp, 6668/tcp };infers a type of set[port] for IRC, while:
const ftp_serv = { ftp.lbl.gov, www.lbl.gov };infers a type of set[addr] for ftp_serv, and initializes it to consist of the IP addresses for ftp.lbl.gov and www.lbl.gov, which, as noted above, may encompass more than two addresses. Bro infers compound indices by use of [] notation:
const allowed_services = { [ftp.lbl.gov, ftp], [ftp.lbl.gov, smtp], [ftp.lbl.gov, auth], [ftp.lbl.gov, 20/tcp], [www.lbl.gov, ftp], [www.lbl.gov, smtp], [www.lbl.gov, auth], [www.lbl.gov, 20/tcp], [nntp.lbl.gov, nntp] };results in allowed_services having type set[addr, port]. Here again, the hostname constants may result in more than one IP address. Any time Bro encounters a list of values in an initialization, it replicates the corresponding index. Furthermore, one can explicitly introduce lists in initializers by enclosing a series of values (with compatible types) in []'s, so the above could be written:
const allowed_services: set[addr, port] = { [ftp.lbl.gov, [ftp, smtp, auth, 20/tcp]], [www.lbl.gov, [ftp, smtp, auth, 20/tcp]], [nntp.lbl.gov, nntp] };The only cost of such an initialization is that Bro's algorithm for inferring the variable's type from its initializer currently gets confused by these embedded lists, so the type now needs to be explicitly supplied, as shown.
In addition, any previously-defined global variable can be used in the initialization of a subsequent global variable. If the variable used in this fashion is a set, then its indices are expanded as if enclosed in their own list. So the above could be further simplified to:
const allowed_services: set[addr, port] = { [ftp_serv, [ftp, smtp, auth, 20/tcp]], [nntp.lbl.gov, nntp] };Initializing table values looks very similar, with the difference that a table initializer includes a yield value, too. For example:
global port_names = { [7/tcp] = "echo", [9/tcp] = "discard", [11/tcp] = "systat", ... };which infers a type of table[port] of string.
We find that these forms of initialization shorthand are much more than syntactic sugar. Because they allow us to define large tables in a succinct fashion, by referring to previously-defined objects and by concisely capturing forms of replication in the table, we can specify intricate policy relationships in a fashion that's both easy to write and easy to verify. Certainly, we would prefer the final definition of allowed_services above to any of its predecessors, in terms of knowing exactly what the set consists of.
Along with clarity and conciseness, another important advantage of Bro's emphasis on tables and sets is speed. Consider the common problem of attempting to determine whether access is allowed to service S of host H. Rather than using (conceptually):
if ( H == ftp.lbl.gov || H == www.lbl.gov ) if ( S == ftp || S == smtp || ... ) else if ( H == nntp.lbl.gov ) if ( S == nntp ) ...we can simply use:
if ( [S, H] in allowed_services ) ... it's okay ...The in operation translates into a single hash table lookup, avoiding the cascaded if's and clearly showing the intent of the test.