|
Perl Practicum: Thanks for the Reference
by Hal Pomeranz
In the last Perl Practicum, I covered the new Perl5 construct,
references. Because the syntax and usage for references are confusing
in places, I promised an extended example that would cover, in a
real-world example, all the material I had introduced.
The problem is to "marshall" data into format such that if we do
|
$string = marshall($some_ref);
eval("\$other_ref = $string");
|
then the data structure pointed to by $other_ref will
have the same contents as the data structure pointed to by
$some_ref . If this is true, then we can save
$string to a file and read it back in later in some other
application. We use this kind of functionality here at NetMarket to
allow Web CGI applications to share data because the format is
extremely portable.
Thinking Recursively
The key to unlocking this problem is really a mode of thought, rather
than a programming trick. Sometimes assuming you already know how to
solve the problem enables you to develop a solution - this is the
heart of "recursive thinking."
Remember from last time that the way to create a reference to an
anonymous list was
|
$a_ref = [1, 2, 3];
|
where the elements of the list could be lists (or associative arrays,
etc.)
|
$other_ref = [1, ["a", "b"], 3];
|
In general, then, an anonymous list reference is assigned with
|
$some_ref = [EXPR1, ..., EXPRn];
|
where EXPR1, ..., EXPRn are either a scalar or a
reference to some other complex data structure (list, associative
array, list of lists, etc.).
This is where recursive thinking comes in. We assume that we already
have a marshall() function which can encode EXPR1,
..., EXPRn properly. With this simplifying assumption, we can
quickly write a function that creates a string containing the
righthand side of the expression above. All we have to do is put
commas after successive calls to the marshall() function
and put square brackets around the whole affair:
|
sub encode_list {
my($list_ref) = @_;
my($string);
$string = "[ ";
for (@$list_ref) {
$string .= marshall($_);
$string .= ", ";
}
$string .= "]";
return($string);
}
|
Notice the @$list in the for loop: remember that you can
use a scalar reference any place you would use the name of an
identifier. Yes, there is a dangling comma after the last element of
the list - Perl ignores it. If you are thinking this is all handwaving
and I have not even begun solving the problem, bear with me.
Dealing with Associative Arrays
Remember that the syntax for declaring associative array references
was
|
$mail_info = {
"hal" => "hal@netmarket.com",
"tina" => "tmd@iwi.com",
"rob" => "kolstad@bsdi.com",
};
|
Here the values in the hash can actually be references to complex data
structures, but the keys have to be scalars. To state things
generally, we assign hash references with expressions like:
|
$some_ref = {
KEY1 => EXPR1,
...,
KEYn => EXPRn
};
|
We are already assuming we have marshall() lying around
to encode EXPR1, ..., EXPRn . I promise that I will write
an encode_scalar() function next to handle encoding
KEY1, ..., KEYn . With those two assumptions, encoding an
associative array reference is now just a matter of putting commas in
the right place:
|
sub encode_hash {
my($hash_ref) = @_;
my($string, $key);
$string = "{ ";
foreach $key (keys(%$hash_ref)) {
$string .= encode_scalar($key);
$string .= "=> ";
$string .= marshall ($$hash_ref{$key});
$string .= ", ";
}
$string .= "}";
return($string);
}
|
Encoding Scalars
You may think that encoding scalar values is trivial. After all, one
of our simple examples above just used:
|
$mail_info = {
"hal" => "hal@netmarket.com",
"tina" => "tmd@iwi.com",
"rob" => "kolstad@bsdi.com",
};
|
You would guess from this example that you just throw quote marks
around the value and be done. Well, suppose your scalar value is one
of these strings:
|
got"ya
$variable
|
You will end up with encodings like:
|
"got"ya" # dangling quote
"$variable" # evaluates $variable
|
The safest thing to do is to backslash every nonalphanumeric character
in the scalar:
|
sub encode_scalar {
my($scalar) = @_;
$scalar =~ s/(\W)/\$1/g;
return("
}
|
Now our strings will get encoded as:
|
"got\"ya"
"$variable"
|
Putting It All Together
All along I have been defining the encoding routines in terms of this
mythical marshall() routine. It turns out that we can define
marshall() in terms of the encoding routines:
|
sub marshall {
my($thing) = @_;
$type = ref($thing);
if ($type eq "ARRAY") {
return(encode_list($thing));
}
elsif ($type eq "HASH") {
return(encode_hash($thing));
}
elsif (!$type) {
return(encode_scalar($thing));
}
else { die("Can't handle $type\n"); }
}
|
Remember that ref() is a function that returns what type
of data type its argument points to, returning undef if
its argument is not a reference.
How can this possibly work? Things will become clearer when we walk
through a simple example. Suppose we do:
|
$simple = [1, ["a", "b"], 3];
$output = marshall($simple);
|
In the first call to marshall() , $simple is
identified as a reference to a list and marshall() calls
encode_list() .
The encode_list() function starts building up
$string . First the function sets $string to be the
initial opening bracket. Then encode_list() begins
walking through the list and calling marshall() on each
element.
The first argument of the list is a scalar, 1. When
encode_list() calls marshall() on this
element, marshall() immediately calls
encode_scalar() , and encode_scalar() returns
the string:
|
"1"
|
marshall() simply returns this value back up to the
encode_list() function, which appends it to the value of
$string . At the end of one iteration of the loop,
$string looks like this:
|
[ "1",
|
Now encode_list() calls marshall() on the
second argument of the list. This argument is a list reference, so
marshall() calls encode_list() recursively.
This second call to encode_list() starts building a new
$string . First it initializes this new
$string with the opening square bracket. Next it calls
marshall() on each of the list elements in turn, both of
which are scalars. After the first iteration of the loop, the new
$string looks like this:
|
[ "a",
|
After the second iteration, we have:
|
[ "a", "b",
|
The loop terminates and the encode_list() function
appends a closing bracket and returns
|
[ "a", "b", ]
|
to the marshall() function that called it originally. In
turn, marshall() returns this value to the original
encode_list() call. This function appends the string
above to its own $string , which now looks like this:
|
[ "1", [ "a", "b", ],
|
This encode_list() function now moves onto the third and
final element of the list in this example. This is just another
scalar, so after the third iteration we have this:
|
[ "1", [ "a", "b", ], "3",
|
We have exhausted the elements of the example list, so we fall out of
the for loop. encode_list() appends the closing bracket
and returns the string:
|
[ "1", [ "a", "b", ], "3", ]
|
Try working out a simple example for yourself, possibly with an
associative array this time. Or just type in the code and try running
some examples.
Brevity is the Soul of Wit
We can use function references to simplify the marshall()
function. First, we build a "jump table": an associative array that
links certain keywords to various function references. In this case,
we link the strings returned by ref() to the function
used to encode each data type:
|
%encode = ("SCALAR" => \&encode_scalar,
"ARRAY" => \&encode_list,
"HASH" => \&encode_hash,);
|
marshall() now becomes extremely terse:
|
sub marshall {
my($thing) = @_;
my($type, $func_ref);
$type = ref($thing) || "SCALAR";
$func_ref = $encode{$type};
return(&$func_ref($thing)) if ($func_ref);
die("Can't handle $type\n");
}
|
First we assign $type to be whatever gets returned by
ref() or SCALAR if ref() returns
undef . Next we extract the appropriate function reference
from %encode , and call that function on the argument to
marshall() . If we cannot find an appropriate function to
call, we die() .
To make this even more terse, remember that you can use a block inside
of curly braces in place of a scalar reference. In other words, we can
use
|
{$encode{$type}}
|
in place of $func_ref . Our new version of the function
is:
|
sub marshall {
my($thing) = @_;
my($type);
$type = ref($thing) || "SCALAR";
return(&{$encode{$type}}($thing)) if (defined($encode{$type}));
die("Can't handle $type\n");
}
|
I generally hate using extra variables, but the function above is hard
for the human eye to comprehend.
Thanks for the Reference
Please type in this code and try it out on some complex examples. If
nothing else, the exercise will get you using references so that you
will remember how they work. Also, never forget this little
demonstration of the power of recursive thinking - a useful tool for
any programmer.
Reproduced from ;login: Vol. 21 No. 2, April 1996.
|