SAGE - Perl Practicum - Failed To Understand the Reference

Perl Practicum: Failed To Understand the Reference

by Hal Pomeranz

One of the nice new features of Perl5 is the ability to create references: a scalar that points to another Perl data object (e.g., a list or an associative array). Along with references comes the ability to create compound data types (lists of lists or arrays of lists, for example), which were difficult to create in Perl4. These new compound data objects have the typical properties of other Perl data structures - most importantly they automatically allocate storage for themselves, unlike C.

Some Concrete Examples

Perl5 adds a new \ operator to create a reference to an existing Perl variable. For example, here's how to create a reference to a simple scalar variable:

        $scalar_ref = \$a_scalar;

When you want to get to the value of the scalar, you just substitute the reference for the name of the variable:

        $$scalar_ref = "some value";
        print "$$scalar_ref\n";

Note the double dollar signs. Perl uses the leftmost dollar sign to recognize what type of object we are talking about - in this case a scalar variable. With this information, Perl can appropriately dereference anything that might follow.

You can also create references to lists and associative arrays:

        $list_ref = \@some_list;
        $hash_ref = \%the_hash;

Again, the symbols surrounding the reference determine exactly how Perl will dereference and use the object. Here are a couple of examples using the list reference defined above:

        @$list_ref = localtime();
        $hour = $$list_ref[2];

In the first case we are resetting the entire contents of the list pointed to by $list_ref. In the second we are manipulating a single element. In the second case, Perl deduces the context from both the dollar sign to the left of and the square brackets following the reference.

The same idea applies to references to associative arrays, except the special characters there are % instead of @ and curly braces instead of square brackets:

        %$hash_ref = (
             "January" => 1,
             "February" => 2,
        ); 	
        $$hash_ref{"March"} = 3;

Things get even more complicated when we start having compound data types (arrays of list references, etc.). Suppose we were going to store various time vectors in an associative array. First we create lists holding the values, and then we store references for those lists in the array:

        @gmtime = gmtime();
        @localtime = localtime();

        $time{"greenwich"} = \@gmtime;
        $time{"localtime"} = \@localtime;

Sometime later, we want to get the hours value out of the lists. You might be tempted to do:

        # WRONG! WRONG! WRONG!
        $gmhour = $$time{"greenwich"}[2];

but this does not work. There is a precedence problem - scalar variables get dereferenced BEFORE key lookups. Because the scalar $time is undefined in our example, you will never get the value you want.

What you have to do is enclose compound references in curly braces:

        # CORRECT
        $gmhour = ${$time{"greenwich"}}[2];

The formal rule at work here is that you can replace a scalar reference with a Perl block - that is, an expression in curly braces. So the expression above is the moral equivalent of writing:

        $list_ref = $time{"greenwich"};
        $gmhour = $$list_ref[2];

This nested curly brace syntax is extremely cumbersome, so you can use the following shortcut:

        $gmhour = $hash{"greenwich"}->[2];

C programmers should be familiar with the -> operator, which means "follow pointer"- same thing here. The lefthand side of the -> is an expression whose result is a reference, and the right-hand side is an index in the object that reference points to.

Because this is Perl, there is yet another way to do the same thing. You can omit the -> between list and array indexes (i.e., things in square or curly brackets):

        $gmhour = $hash{"greenwich"}[2];

I generally prefer this last syntax, but your mileage may vary.

The -> was made optional for these operations simply because programmers commonly want to use multidimensional arrays and lists, and it is more natural to write

        $coord[$x][$y] = $z;

than

        $coord[$x]->[$y] = $z; 	
        ${$coord[$x]}[$y] = $z;

which are equivalent, but ugly and cumbersome.

Anonymous Data

You can actually create references to data objects that do not have an explicit identifier associated with them. This allows you to have static declarations for arrays and lists whose members are also lists (there was no way to do this in Perl4). Objects "created on the fly" in this way are generally referred to as "anonymous data objects."

The easiest cases are where we want to create an anonymous list or associative array and a reference to the object:

        $short_months =  ["Sep", "Apr", "Jun", "Nov", "Feb"];

        $mail_info = {
             "hal" => "hal@netmarket.com",
             "tina" => "tmd@iwi.com",
             "rob" => "kolstad@bsdi.com",
        };

So, square brackets for anonymous lists and curlies for anonymous hashes, just like their index brackets. These examples are not very interesting, however, because we could have just explicitly declared a list, @short_months, or an array, %mail_info. Things get more interesting when we start declaring compound objects. Here is an example of declaring an associative array that has one value that is a list reference:

        %hostinfo = (
             "name" => "myhost",
             "domain" => "netmarket.com",
             "addrs" => ["199.79.247.20", "204.25.36.200"],
             "owner" => "Hal Pomeranz",
        );

You would print the second address with:

        print "$hostinfo{`addrs'}[1]\n";

Yes, you can nest these kinds of declarations arbitrarily deeply:

        @hosts = (
             {"name" => "myhost",
             "domain" => "netmarket.com",
             "addrs" => ["199.79.247.20", "204.25.36.200"],
             "owner" => "Hal Pomeranz",
             },
             {"name" => "thathost",
             "domain" => "netmarket.com",
             "addrs" => ["199.79.247.21"], 	
             "owner" => "Bob Smith",
             },
             # etc, etc, etc,
        );

Given the declaration above,

        print "$hosts[1]{`addrs'}[0]\n";

would print "199.79.247.21." Just to reiterate, you could also rewrite the above print statement either of the following ways:

        print "$hosts[1]->{`addrs'}->[0]\n";
        print "${${$hosts[1]}{`addrs'}}[0]\n";

You can see now why I prefer the first syntax.

Putting Things Together

Use of anonymous data structures allows us to simplify that "array of time vectors" example above. In that example, we explicitly created lists and then used the backslash operator to create scalar references to them:

        @gmtime = gmtime();
        @localtime = localtime();
	
        $time{"greenwich"} = \@gmtime;
        $time{"localtime"} = \@localtime;

Rather than creating the @gmtime and @localtime arrays, we could

        @$gm_vec_ref = gmtime();
        @$loc_vec_ref = localtime();

        $time{"greenwich"} = $gm_vec_ref;
        $time{"localtime"} = $loc_vec_ref;

This is not very exciting. True, we got rid of those annoying backslashes, but who really cares? Remember one of the early rules we learned: you can put a block in place of a scalar reference. This means that we can get rid of the extra assignment statements altogether:

        @{$hash{"localtime"}} = localtime();
        @{$hash{"greenwich"}} = gmtime();

We are just replacing $gm_vec_ref with the block {$hash{"greenwich"}}, and the same for the localtime() vector.

References to Subroutines

Because subroutines are just another Perl data object, you can create references to them as well:

        sub hello {
             print "Hello world!\n";
        }
        $sub_ref = \&hello;
        &$sub_ref();

Perl5 allows you to call your own subroutines without the &, but when you are dealing with references, Perl needs the & as a hint to tell it what type of data the reference points to.

You can also create references to anonymous subroutines:

        $sub_ref = sub {
             print "Hello World!\n";
        };
        &$sub_ref();

Notice the trailing semicolon after the closing curly brace.

Other Useful Tidbits

Perl5 now has a ref() operator which tells you what kind of object a given reference points to. So,

        $array_ref = \%this_hash;
        print ref($hash_ref), "\n";

prints HASH. Other values returned by ref() include SCALAR, ARRAY (for lists), and CODE (for subroutine references). ref($foo) returns undef if $foo is not a reference.

By the way, the following code:

        $refname = "foo";
        $$refname = "Surprise!";
        print "$foo\n";

prints Surprise! In other words, if you use a variable as a reference and if the value of that variable is not a reference, then Perl interprets the value of the variable as the name of an identifier. You can really shoot yourself in the foot with this one.

Coming in the Next Article

Some of this reference stuff is mysterious at best, so in the next article we will look at an extended example that covers all aspects of references discussed thus far. Here is the problem, so you can practice on it before reading on.

In my last column I briefly mentioned the concept of "marshalling" data: converting complex data objects to a format that can easily be saved to disk and retrieved later. The idea is to create a function marshall() such that if we

        $string = marshall($some_ref);
        eval("\$other_ref = $string");

then the data structure pointed to by $other_ref will have the same contents as the data structure pointed to by $some_ref. Remember that the data structure pointed to by $some_ref could be arbitrarily complex: a list of associative arrays whose elements could be lists, arrays, and/or scalars, for example.

Good luck with your coding. See you next time.

Reproduced from ;login: Vol. 21 No. 1, February 1996.

Need help? Use our Contacts page.

Last changed: May 24, 1997 pc

Perl index

Publications index

USENIX home