|
Perl Practicum: Know All the Angles
by Hal Pomeranz
Perl 5 is Coming!
Good news, Perl enthusiasts! As of this writing, Larry Wall has just
announced the Alpha release of Perl 5. It passes all of the Perl 4
regression tests, but has no Configure script yet, and is only
guaranteed to build on a Sun Sparc machine. New ftp sites are
springing up daily, so consult comp.lang.perl for more details. A
stable release of Perl 5 by Christmas? I guess we'll have to wait and
see...
File Manipulation
File manipulation is a Perl fundamental. If you have been using the
language for any period of time, then you are probably more than
familiar with
|
open(FILE, "< myfile") || die "Can't open `myfile'\n";
while (<FILE>) { ... }
|
However, you may not have caught on to everything that can go inside
those angle brackets. They're not just for file handles anymore.
Arguments as Filenames
First, we have the special file handle ARGV . When used in
a loop like
|
while (<ARGV>) {
...
}
|
each element of the argument list, @ARGV , will be treated
as a filename. Perl will attempt to open each file in the list, read
the entire contents, and move on to the next file. You will get an
error message if a file cannot be opened, but the loop will continue
until all filenames are exhausted. If there are no arguments to the
program (i.e., if @ARGV is empty), then the loop above
will get lines from the standard input, as any good UNIX program
should. By the way, since the idiom is so common,
there is a shorthand notation, <> , which means the same
thing.
Associated with the file handle ARGV is the scalar
$ARGV , which contains the name of the file currently
open. We can use this to write a simple-minded grep program:
|
$pat = shift @ARGV;
$many = @ARGV > 1;
while (<>) {
next unless /$pat/;
print "$ARGV:" if $many;
print;
}
|
The program uses shift() to remove the first element, the
pattern to search for, from the argument list. All other arguments are
treated as file names. The name of the current file is printed before
the matching lines if more than one file name is given on the command
line (just like the UNIX grep program does).
The only rub with this whole business is that as
each file is opened, the special variable $. , which gives
the current line number that we are on, is not reset. If this is a
problem, employ the following trickery:
|
$oldname = '';
while (<>) {
if ($ARGV ne $oldname) {
$lineno = 0;
$oldname = $ARGV;
}
$lineno++;
...
}
|
and use $lineno instead of the $.
variable. You may ask yourself why we are messing around with
$lineno and not just doing the assignment to
$. instead. The reason is that it simply does not work:
$. is only reset on an explicit close() .
Indirect File Handles
A scalar variable inside angle brackets, <$file> ,
indicates an indirect file handle. Perl attempts to read the next line
from the file handle whose name is the string value of
$file . For example:
|
open(FILE, "/etc/motd") || die "Can't open /etc/motd\n";
$file = 'FILE';
$line1 = <FILE>;
$line2 = <$file>;
|
Why is this at all useful? First it allows us to pass file handles to
subroutines in a reasonable fashion:
|
open(FILE, "/etc/motd") || die "Can't open /etc/motd\n";
&mysub(FILE);
sub mysub {
local($file) = @_;
while (<$file>) {
...
}
}
|
Second, the string contained in the variable need not be a valid
identifier. It could even, for example, be the name of the file that
we are opening:
|
for (0..8) {
$file = "/var/adm/messages.$_";
open($file, "$file") || die "Can't open $file\n";
}
|
and then we could later do something like:
|
&do_something_with ("/var/adm/messages.0");
sub do_something_with {
local($file) = @_;
while (<$file>) {
...
}
}
|
This can be a big win as far as readability goes, e.g., if you have
lots of open file handles running around in your program.
Third, we can build up lists and arrays of indirect file handles:
|
@myfiles = 0..7;
for (@myfiles) {
open($myfiles[$_], "syslog.$_") || die "Can't open syslog.$_\n";
}
|
Note that though we are able to use an array reference in the
open() call in the above example, we can only use a
scalar variable inside angle brackets to denote an indirect file
handle. Thus, we must first dereference values from
@myfiles before using them:
|
$file = $myfiles[3];
$line = <$file>;
print "$line";
|
Globbing
If a string inside angle brackets is not a file handle (direct or
indirect), then it is passed to a subshell (the C shell if available,
otherwise the Bourne shell) to be globbed. You can use the glob in a
loop to get back the matching file names one at a time:
|
while (<*.c>) {
print "Checking out $_...\n";
system("co -l $_");
}
|
or you can slurp all the files into a list:
|
chmod 0644, <*.c>;
|
However, don't think from the two examples above that the glob behaves
just like a file handle, because it doesn't. This example
|
$file1 = <*.c>;
print "$file1\n";
$file2 = <*.c>;
print "$file2\n";
|
prints the same filename twice (and spawns a subshell twice as well),
rather than printing the first and second matching filenames. Read on,
though, if you care to see some real tragedy.
One layer of variable interpolation will be done before the glob, but
you can't say <$glob> because that's an indirect file
handle. You have to throw curly braces around your variable name to
force interpolation:
|
$glob = "*.c";
@c_files = <${glob}>;
|
For those of you who weren't paying attention, we have just
illuminated one part of the seamy underbelly of Perl: a place where
$glob and ${glob} do not mean the same
thing.
Since Perl does an exec() to let the shell glob the
files, rather than relying on some built-in globbing function, it is
almost always more efficient (in terms of run-time, but perhaps not in
terms of readability or amount of code) to use the builtin directory
operators:
|
opendir(DIR, ".") || die "Can't open directory `.'\n";
@c_files = grep(/\.c$/, readdir(DIR));
closedir(DIR);
|
Note that the glob will always return the file names in alphabetical
order while the above code won't (although you're always free to
sort() the list of files you get from the above method).
Summary
The <> idiom is a useful one and should be part of every
Perl programmer's toolkit. Indirect file handles and shell globs are
used less frequently but often to good effect in improving your code's
clarity and readability. Indirect file handles in particular can also
be used to needlessly obfuscate your code. So remember as you try to
cloud the minds of lesser mortals, that in this life one sometimes
needs to maintain one's own code.
Reproduced from ;login: Vol. 18 No. 5, October 1993.
|