|
Perl Practicum: Failed To Understand the Reference
by Hal Pomeranz
One of the nice new features of Perl5 is the ability to create
references: a scalar that points to another Perl data object (e.g., a
list or an associative array). Along with references comes the ability
to create compound data types (lists of lists or arrays of lists, for
example), which were difficult to create in Perl4. These new compound
data objects have the typical properties of other Perl data structures
- most importantly they automatically allocate storage for themselves,
unlike C.
Some Concrete Examples
Perl5 adds a new \ operator to create a reference to an existing Perl
variable. For example, here's how to create a reference to a simple
scalar variable:
|
$scalar_ref = \$a_scalar;
|
When you want to get to the value of the scalar, you just substitute
the reference for the name of the variable:
|
$$scalar_ref = "some value";
print "$$scalar_ref\n";
|
Note the double dollar signs. Perl uses the leftmost dollar sign to
recognize what type of object we are talking about - in this case a
scalar variable. With this information, Perl can appropriately
dereference anything that might follow.
You can also create references to lists and associative arrays:
|
$list_ref = \@some_list;
$hash_ref = \%the_hash;
|
Again, the symbols surrounding the reference determine exactly how
Perl will dereference and use the object. Here are a couple of
examples using the list reference defined above:
|
@$list_ref = localtime();
$hour = $$list_ref[2];
|
In the first case we are resetting the entire contents of the list
pointed to by $list_ref . In the second we are
manipulating a single element. In the second case, Perl deduces the
context from both the dollar sign to the left of and the square
brackets following the reference.
The same idea applies to references to associative arrays, except the
special characters there are % instead of @
and curly braces instead of square brackets:
|
%$hash_ref = (
"January" => 1,
"February" => 2,
);
$$hash_ref{"March"} = 3;
|
Things get even more complicated when we start having compound data
types (arrays of list references, etc.). Suppose we were going to
store various time vectors in an associative array. First we create
lists holding the values, and then we store references for those lists
in the array:
|
@gmtime = gmtime();
@localtime = localtime();
$time{"greenwich"} = \@gmtime;
$time{"localtime"} = \@localtime;
|
Sometime later, we want to get the hours value out of the lists. You
might be tempted to do:
|
# WRONG! WRONG! WRONG!
$gmhour = $$time{"greenwich"}[2];
|
but this does not work. There is a precedence problem - scalar
variables get dereferenced BEFORE key lookups. Because the scalar
$time is undefined in our example, you will never get the
value you want.
What you have to do is enclose compound references in curly braces:
|
# CORRECT
$gmhour = ${$time{"greenwich"}}[2];
|
The formal rule at work here is that you can replace a scalar
reference with a Perl block - that is, an expression in curly
braces. So the expression above is the moral equivalent of writing:
|
$list_ref = $time{"greenwich"};
$gmhour = $$list_ref[2];
|
This nested curly brace syntax is extremely cumbersome, so you can use
the following shortcut:
|
$gmhour = $hash{"greenwich"}->[2];
|
C programmers should be familiar with the -> operator,
which means "follow pointer"- same thing here. The lefthand side of
the -> is an expression whose result is a reference, and
the right-hand side is an index in the object that reference points
to.
Because this is Perl, there is yet another way to do the same
thing. You can omit the -> between list and array indexes
(i.e., things in square or curly brackets):
|
$gmhour = $hash{"greenwich"}[2];
|
I generally prefer this last syntax, but your mileage may vary.
The -> was made optional for these operations simply
because programmers commonly want to use multidimensional arrays and
lists, and it is more natural to write
|
$coord[$x][$y] = $z;
$coord[$x]->[$y] = $z;
${$coord[$x]}[$y] = $z;
|
which are equivalent, but ugly and cumbersome.
Anonymous Data
You can actually create references to data objects that do not have an
explicit identifier associated with them. This allows you to have
static declarations for arrays and lists whose members are also lists
(there was no way to do this in Perl4). Objects "created on the fly"
in this way are generally referred to as "anonymous data objects."
The easiest cases are where we want to create an anonymous list or
associative array and a reference to the object:
|
$short_months = ["Sep", "Apr", "Jun", "Nov", "Feb"];
$mail_info = {
"hal" => "hal@netmarket.com",
"tina" => "tmd@iwi.com",
"rob" => "kolstad@bsdi.com",
};
|
So, square brackets for anonymous lists and curlies for anonymous
hashes, just like their index brackets. These examples are not very
interesting, however, because we could have just explicitly declared a
list, @short_months , or an array, %mail_info .
Things get more interesting when we start declaring compound
objects. Here is an example of declaring an associative array that has
one value that is a list reference:
|
%hostinfo = (
"name" => "myhost",
"domain" => "netmarket.com",
"addrs" => ["199.79.247.20", "204.25.36.200"],
"owner" => "Hal Pomeranz",
);
|
You would print the second address with:
|
print "$hostinfo{`addrs'}[1]\n";
|
Yes, you can nest these kinds of declarations arbitrarily deeply:
|
@hosts = (
{"name" => "myhost",
"domain" => "netmarket.com",
"addrs" => ["199.79.247.20", "204.25.36.200"],
"owner" => "Hal Pomeranz",
},
{"name" => "thathost",
"domain" => "netmarket.com",
"addrs" => ["199.79.247.21"],
"owner" => "Bob Smith",
},
# etc, etc, etc,
);
|
Given the declaration above,
|
print "$hosts[1]{`addrs'}[0]\n";
|
would print "199.79.247.21." Just to reiterate, you could also rewrite
the above print statement either of the following ways:
|
print "$hosts[1]->{`addrs'}->[0]\n";
print "${${$hosts[1]}{`addrs'}}[0]\n";
|
You can see now why I prefer the first syntax.
Putting Things Together
Use of anonymous data structures allows us to simplify that "array of
time vectors" example above. In that example, we explicitly created
lists and then used the backslash operator to create scalar references
to them:
|
@gmtime = gmtime();
@localtime = localtime();
$time{"greenwich"} = \@gmtime;
$time{"localtime"} = \@localtime;
|
Rather than creating the @gmtime and
@localtime arrays, we could
|
@$gm_vec_ref = gmtime();
@$loc_vec_ref = localtime();
$time{"greenwich"} = $gm_vec_ref;
$time{"localtime"} = $loc_vec_ref;
|
This is not very exciting. True, we got rid of those annoying
backslashes, but who really cares? Remember one of the early rules we
learned: you can put a block in place of a scalar reference. This
means that we can get rid of the extra assignment statements
altogether:
|
@{$hash{"localtime"}} = localtime();
@{$hash{"greenwich"}} = gmtime();
|
We are just replacing $gm_vec_ref with the block
{$hash{"greenwich"}} , and the same for the
localtime() vector.
References to Subroutines
Because subroutines are just another Perl data object, you can create
references to them as well:
|
sub hello {
print "Hello world!\n";
}
$sub_ref = \&hello;
&$sub_ref();
|
Perl5 allows you to call your own subroutines without the
& , but when you are dealing with references, Perl needs
the & as a hint to tell it what type of data the
reference points to.
You can also create references to anonymous subroutines:
|
$sub_ref = sub {
print "Hello World!\n";
};
&$sub_ref();
|
Notice the trailing semicolon after the closing curly brace.
Other Useful Tidbits
Perl5 now has a ref() operator which tells you what kind
of object a given reference points to. So,
|
$array_ref = \%this_hash;
print ref($hash_ref), "\n";
|
prints HASH . Other values returned by ref() include
SCALAR , ARRAY (for lists), and
CODE (for subroutine references). ref($foo)
returns undef if $foo is not a reference.
By the way, the following code:
|
$refname = "foo";
$$refname = "Surprise!";
print "$foo\n";
|
prints Surprise! In other words, if you use a variable as
a reference and if the value of that variable is not a reference, then
Perl interprets the value of the variable as the name of an
identifier. You can really shoot yourself in the foot with this one.
Coming in the Next Article
Some of this reference stuff is mysterious at best, so in the next article we will look at an extended example
that covers all aspects of references discussed thus far. Here is the
problem, so you can practice on it before reading on.
In my last column I briefly mentioned the concept of "marshalling"
data: converting complex data objects to a format that can easily be
saved to disk and retrieved later. The idea is to create a function
marshall() such that if we
|
$string = marshall($some_ref);
eval("\$other_ref = $string");
|
then the data structure pointed to by $other_ref will
have the same contents as the data structure pointed to by
$some_ref . Remember that the data structure pointed to by
$some_ref could be arbitrarily complex: a list of
associative arrays whose elements could be lists, arrays, and/or
scalars, for example.
Good luck with your coding. See you next time.
Reproduced from ;login: Vol. 21 No. 1, February 1996.
|