|
Perl Practicum: Network Wiles (Part II)
by Hal Pomeranz
In the last installment, we saw how to program a network client by
writing a simple tool to get pages from remote Web servers. In this
issue, we will explore how to write a simple network server. As an
example project, we will actually write a simpleminded Web server (the
complete code is presented at the end of this article in case you find
it easier to follow along that way). Reread the previous issue if you
think you have forgotten any of the basic networking concepts I
presented there.
Getting Started
The first thing a network server must do is set up a socket upon which
it can accept requests. The first phase of this process looks a lot
like the initial code of a network client:
|
use Socket;
$this_host = `my-server.netmarket.com';
$port = 8080;
$server_addr = (gethostbyname($this_host))[4];
$server_struct = pack("S n a4 x8", AF_INET, $port, $server_addr);
$proto = (getprotobyname(`tcp'))[2];
socket(SOCK, PF_INET, SOCK_STREAM, $proto)|| die "Failed to initialize socket: $!\n";
|
First, the program has to pull in the Perl Socket.pm
module. The hostname of the machine upon which the server will run and
the port upon which it will accept requests are specified on the next
two lines (you can imagine getting these parameters out of a
configuration file or on the command line). The program then calls
gethostbyname() to get the IP address of the server
machine and uses that information to create a C structure which we
will use later. Finally, we call socket() to create a
file handle for the socket.
Remember from the last article that Web servers usually wait for
connections on port 80. Why does the code above specify the port as
8080? As a security feature, only the superuser is allowed to run
servers that accept connections on ports below 1024. The thinking
behind this policy is that users should then be able to trust
connecting to unknown machines as long as they are connecting to
services (like Telnet, FTP, gopher, et al.) that listen for
connections at low port numbers because they will require the system
manager at the remote site to "approve" the service being run on those
ports. This reasoning is probably no longer true in this age of
workstations on every desk, but the rule remains.
Returning to our example, the server now needs to prepare to receive
connections at the given address and port combination:
|
setsockopt(SOCK, SOL_SOCKET, SO_REUSEADDR,1) ||
die "setsockopt() failed: $!\n";
bind(SOCK, $server_struct) || die "bind() failed: $!\n";
listen(SOCK, SOMAXCONN) || die "listen() failed: $!\n";
|
The setsockopt() function allows the program to change
various parameters associated with the socket: more on
SO_REUSEADDR in a moment. The bind() call is
what actually associates the SOCK file handle with the
address and port number pair specified at the top of the program. As
long as any program has bound itself to a particular address and port,
no other program can bind to the same location. This is useful and
prevents confusion. However, even after a given server program has
exited, its address/port combination does not become available for
reuse (at least until the machine the server was running on is
rebooted) - even if you rerun the exact same program. This is annoying
and creates bad feelings. Use setsockopt() to set the
SO_REUSEADDR bit to 1 (true) - BEFORE the call to
bind() - so other programs can reuse the same port after
the server program has exited. Both the SOL_SOCKET and
SO_REUSEADDR constants are defined in Socket.pm .
The listen() call is probably misnamed. All this function
does is specify how long a queue of pending connection attempts the
server is willing to deal with. If the server queue is full, further
connection attempts will be rejected. On almost every socket
implementation in existence, the maximum queue length that you can set
is 5 (so handle incoming connection requests quickly!), and
SOMAXCONN (another helpful constant from
Socket.pm ) is usually set to 5. If you try to set the
queue length to a value above 5, the operating system silently
throttles the queue length back to the maximum value. Solaris 2.x is
the only modern operating system that I am aware of where you can
meaningfully specify queue length values that are greater than 5
(though interestingly SOMAXCONN is still given as 5 in
the Solaris 2.x system header files).
Dealing with Pending Requests
At this point, most network servers go into a tight loop so that they
can rapidly deal with their queue of pending network connections:
|
for (;;) {
$remote_host = accept(NEWSOCK, SOCK);
die "accept() error: $!\n" unless ($remote_host);
# do some work here
close(NEWSOCK);
}
|
The accept() call grabs the next connection request off
the pending queue for SOCK . (If there are no pending
connections, accept() pauses until one comes in.) A new
socket that is the local endpoint of this new communications channel
is created. If you print to NEWSOCK you are sending data
to the remote machine making the connection, and you can read data
from NEWSOCK just like any other file handle to get data
from the remote machine. Always remember to close NEWSOCK
when it is no longer needed.
The accept() function returns a C structure containing
the address of the remote machine (or undef if the
accept() fails for any reason). This structure is the
same as the one passed to bind() and
connect() , and you can extract the IP address of the
remote machine as follows:
|
$raw_addr = (unpack("S n a4 x8",$remote_host))[2];
@octets = unpack("C4", $raw_addr);
$address = join(".", @octets);
|
You can also obtain the hostname of the remote host (usually) with the
gethostbyaddr() function:
|
$hostname = (gethostbyaddr($raw_addr,AF_INET))[0];
|
This can be useful for logging purposes. Note the reappearance of
AF_INET - gethostbyaddr() needs to be told what type of
network address it is being given.
A Simple Web Server
Up to this point, we've been flushing out the basic skeleton that
every network server application has to have. Now let's do something
interesting with it.
HTTP is an incredibly simpleminded protocol. Requests sent by the Web
browser are simply lines of ASCII text, terminated by a blank
line. After seeing the blank line, the server sends back the requested
data and shuts down the connection. Although the client typically
sends over a great deal of useful information in its request, a simple
Web server can ignore everything except the line that looks like:
|
GET /some/path/to/file.html ...
|
Here's some code that reads the client request and extracts the path
to the information that the user is requesting:
|
while (<NEWSOCK>) {
last if (/^\s*$/);
next unless (/^GET /);
$path = (split(/\s+/))[1];
}
|
Now the server has to respond. Typically $path is
relative to the top of some directory hierarchy where your Web
documentation lives - your $docroot in Web-speak. This
directory can be defined in a config file or on the
command line. Assuming that $docroot has been defined
elsewhere we can simply
|
if (open(FILE, "< $docroot$path")) {
@lines = <FILE>;
print NEWSOCK @lines;
close(FILE);
}
else {
print NEWSOCK <<"EOErrMsg";
<Network Wiles (Part II)>Error</TITLE><H2>Error</H2>
The following error occurred while
trying to retrieve your information:
$!
EOErrMsg
}
|
If we are able to open the requested file, we simply dump its contents
down NEWSOCK . Note that the server sends back an error
message if the open() fails. Never forget that there is
somebody on the other end of that connection who is waiting to hear
something back as a result of his or her request.
Congratulations. If you glue together all the code fragments in this
article, you will have a bare-bones Web server. You will find all of
the code in proper order at the end of this article to make it easier
to review all the concepts presented here.
That's Not All
Although this Web server "works" as far as answering simple requests
for information, it has a number of problems. First and foremost, it
only can handle one request at a time: most production-quality servers
can handle hundreds or thousands of simultaneous requests. Second, if
you run this server on your machine, I can request
|
/../../../../../../../etc/passwd
|
and get a copy of your password file. Obviously, a better access
control mechanism is needed.
In the third and final installment of this series, we will look at
ways to solve these (and other) problems with our mini Web server.
|
#!/packages/misc/bin/perl
use Socket;
$docroot = `/home/hal/public_html';
$this_host = `my-server.netmarket.com';
$port = 8080;
# Initialize C structure
$server_addr =(gethostbyname($this_host))[4];
$server_struct = pack("S n a4 x8", AF_INET,$port, $server_addr);
# Set up socket
$proto = (getprotobyname(`tcp'))[2];
socket(SOCK, PF_INET, SOCK_STREAM,$proto)|| die "Failed to initialize socket:$!\n";
# Bind to address/port and set up pending queue
setsockopt(SOCK, SOL_SOCKET, SO_REUSEADDR, 1) || die "setsockopt() failed: $!\n";
bind(SOCK, $server_struct) || die "bind() failed: $!\n";
listen(SOCK, SOMAXCONN) || die "listen() failed: $!\n";
# Deal with requests
for (;;) {
# Grab next pending request
#
$remote_host = accept(NEWSOCK, SOCK);
die "accept() error: $!\n" unless ($remote_host);
# Read client request and get $path
while (<NEWSOCK>) {
last if (/^\s*$/);
next unless (/^GET /);
$path = (split(/\s+/))[1];
}
# Print a line of logging info to STDOUT
$raw_addr = (unpack("S n a4 x8", $remote_host))[2];
$dot_addr = join(".", unpack("C4", $raw_addr));
$name = (gethostbyaddr($raw_addr, AF_INET))[0];
print "$dot_addr\t$name\t$path\n";
# Respond with info or error message
if (open(FILE, "< $docroot$path")) {
@lines = <FILE>;
print NEWSOCK @lines;
close(FILE);
}
else {
print NEWSOCK <<"EOErrMsg";
<Network Wiles (Part II)>Error</TITLE><H2>Error</H2>
The following error occurred while trying to retrieve your information: $!
EOErrMsg
}
# All done
close(NEWSOCK);
}
|
Reproduced from ;login: Vol. 21 No. 5, October 1996.
|