Perl Practicum: The Camel Spins a Webby Hal PomeranzThose of you who have been living under a rock for the last twelve months may have missed out on this whole World Wide Web thing. Most of you have probably already tried your hand at some basic HTML authoring. The most interesting application of Web technology, though, is using the Web as an interface to arbitrary data from other sources such as databases and system applications. One mechanism for creating these interfaces is the Common Gateway Interface, CGI for short. CGI BasicsSimply, a CGI program produces (on the standard output) a special header line followed by an arbitrary number of lines of output. The HTTP server running on your machine invokes your CGI script and feeds the output to the browser that requested the page (usually the hardest part of this whole equation is learning how to configure your HTTP server to execute your file). The CGI program can be written in any language you like, but why would you write in anything but Perl?Here is a trivial example |
#!/bin/perl print "Content-type: text/html\n\n"; print <<"EOmyPage"; <The Camel Spins a Web>"Hello World!" Page</TITLE> <H2>HELLO WORLD!</H2> EOmyPage
The first line of the script prints the header information, specifying
the type of document which follows the header. In this case, we are
saying that the document is an HTML text document. A blank line must
follow the header information (note the two \n s). The
rest of the program is just a "here document" which prints a trivial
HTML page. If at this point you are thinking, "That's easy!", you are
absolutely correct: there is no great mystery to this CGI stuff.
External Files and ApplicationsHowever, the power of this mechanism cannot be overstated. As long as your program produces the correct output format, it can be arbitrarily complex. For example, you can read from and write to external files: |
#!/bin/perl print "Content-type: text/html\n\n"; $visitors = 'cat countfile'; $visitors++; if (open(OUT, "> countfile")) { print OUT $visitors; close(OUT); print <<"EOmyPage"; <The Camel Spins a Web>Welcome</TITLE> Hello visitor number $visitors. EOmyPage } else { print "Sorry, an error occurred\n"; }
Be warned that your HTTP server will probably be running under some
other user ID and will have that user's access rights to files on your
system (try to run your servers as a user with no privileges, like the
"nobody" user - NEVER give HTTP servers superuser access). Make sure
that whatever files you are manipulating have the correct access
rights.
It is almost never a good idea to abort a CGI program in the middle of
execution. Remember that there is a user on the other side of the
Internet who is expecting some sort of page to be returned by your
script. Notice that the script above prints an error message if the
Also keep in mind that you can manipulate the output of other programs
from within your Perl script. In the example above, we used the UNIX
|
#!/bin/perl print "Content-type: text/plain\n\n"; if (open(PS, "ps -ef |")) { while (<PS>) { print; } } else { print "An error occurred\n"; }
Note that we are using a different Content-type
header. Plain text is usually displayed by browsers in a fixed-width
font (Courier) with all whitespace preserved (unlike HTML). For those
of you familiar with HTML, the output usually looks like it has been
formatted in the <PRE> block.
You can call just about any program. You could interface with other
network information services like gopher and WAIS, or even NNTP (how
about a Web-based threaded newsreader?). You could interface with
pieces of your company database and write a company phone book page,
or allow people to review their benefits via the Web. However, think
about security before you go off and try to save the world with the
Web: you may not want everybody in the world to have easy access to
much of your data. Even the The CGI EnvironmentBefore executing your CGI program, your HTTP server will set a number of environment variables. The CGI specification ( https://hoohoo.ncsa.uiuc.edu/cgi/interface.html) spells out exactly what information is provided, but here is a useful little test program to see for yourself: |
#!/bin/perl print "Content-type: text/plain\n\n"; foreach $var (sort keys %ENV) { print "\$ENV{$var} = '$ENV{$var}'\n"; }
For example, the REMOTE_HOST and REMOTE_ADDR
variables give the fully qualified hostname and the IP address of the
machine that it connecting to your HTTP server. At NetMarket we get a
lot of "How'd you do that?!?" comments because our home page prints a
little "Thanks for connecting from $ENV{'REMOTE_HOST'} "
message.
The client browser can also send information to your HTTP server. Your
HTTP server will put this information into your CGI program's
environment using variables that are prefixed with
What good is identifying a browser? Remember that older browsers may not support all the latest features of the HTML specification. For example, you do not want to send a table to NCSA Mosaic 2.4 because the browser cannot format the table information, and you would not want to send an image map to a text-only browser like Lynx because the user would not be able to see the image. Processing FormsHTML allows you to create pages which allow the user to type in information and submit it to your server. Here is a simple HTML form: |
<The Camel Spins a Web>Send Us Email!</TITLE> We'd love to hear from you. Enter your email address and comments in the spaces provided and we'll respond as quickly as we can!<P> <FORM METHOD="POST" ACTION="bin/process_form"> Your E-mail address<BR> <INPUT NAME="email" SIZE=45 MAXLENGTH=45><BR> Your Message<BR> <TEXTAREA NAME="comments" ROWS=12 COLS=45></TEXTAREA><P> <INPUT TYPE="submit" VALUE="Send your comments"> </FORM>
The <FORM ... ACTION=" ... "> tag specifies what program
the user's browser should try to call when they submit the form
information. This form creates a space for the user to enter an email
address and a free-form text area for the user to type in a
message. Finally, there is a Send your comments button to
allow the user to submit the form information.
When the user punches the |
#!/bin/perl read(STDIN,$stuff $ENV{`CONTENT_LENGTH'}); . . .
Now you have to break up the data into intelligible pieces. The data
comes to you in name=value pairs separated by
& characters. The names for each piece of data are
whatever you specified in the form using the <... NAME="
... "...> tags: in the example above, the name for the email
field is email , and the name for the free-form text area
is comments . The other tricky part is that spaces are
converted to + signs and non-alphanumeric characters are
generally converted to %<hex> where <hex> is
the ASCII value for the character in hexadecimal notation. Typically,
the beginning of all form processing programs looks like:
|
#!/bin/perl read(STDIN, $stuff, $ENV{'CONTENT_LENGTH'}); @pairs = split(/\&/, $stuff); for (@pairs) { ($field, $val) = split(/=/); $field =~ s/\+/ /g; $field =~ s/%(\w\w)/sprintf("%c", hex($1))/eg; $val =~ s/\+/ /g; $val =~ s/%(\w\w)/sprintf("%c", hex($1))/eg; $entries{$field} = $val; } ...
First, we read the data off the standard input and then break it up
into a list of name=value pairs. Then we iterate over
each pair, break the pair apart, and convert the plus signs and
hexadecimal escapes back to the original characters. Do not try to do
the substitutions before you split everything up because some of the
escaped characters may be & or = . Convert
the + signs to spaces first because some of the escaped
characters may be + .
Now that you have parsed out the input into an associative array, you can do anything with the information you like. You must return a page back to the user, however, as a result of their forms submission: |
print "Content-type: text/html\n\n"; if (open(MAIL, "| /usr/lib/sendmail webmaster")) { print MAIL <<"EOdoc"; From: The Comments Page <webmaster> To: webmaster Subject: Comments Mail Mail from: $entries{"email"} $entries{"comments"} EOdoc close(MAIL); print <<"EOpage"; <The Camel Spins a Web>Thanks!</TITLE> Thanks for taking the time to send us comments!<P> We will be responding promptly.<P> EOpage } else { print <<"EOpage"; <The Camel Spins a Web>Bummer!</TITLE> We encountered an error trying to send your comments.<P> Please send mail to <I>webmaster\@netmarket.com</I><P> EOpage }
Be VERY careful about what you do with the data you collect from a
form: remember that the user can type ANYTHING into that form and
could cause huge amounts of havoc if you trust what they type in. Do
not ever allow form data to be used as part of a command that you
execute from your script. Notice that I will not even put the user's
email address in the From: line of my message because
that data might be used to generate a sendmail command if the email
bounces.
Further StudyThe best way to become familiar with CGI is to start writing some CGI programs. You will probably want to install your own HTTP server so that you can play around with the configuration. NCSA httpd (available via anonymous FTP from ftp.ncsa.uiuc.edu) is free and easy to build and configure, though it is not the fastest server in the world. You will also want to study the CGI overview (https://hoohoo.ncsa.uiuc.edu/cgi/overview.html) and the tips for writing secure CGI scripts (https://hoohoo.ncsa.uiuc.edu/cgi/security.html).Sample CGI programs are available all over the Web (NCSA has a small archive of examples to get you started). Reproduced from ;login: Vol. 20 No. 4, August 1995. |
Need help? Use our Contacts page.
Last changed: May 24, 1997 pc |
|