June, 1997
JavaOne, the yearly Java developer's conference in San Francisco, has just
completed, and I have been poring over the press releases looking for
interesting tidbits. I recently heard someone ask if there were any necessary
Java applets on the Internet (this was in regards to security and letting Java
through a firewall). One participant replied that his company has an applet
that helps customers design an investment strategy. Many others knew only of
flashy applets used to decorate Web pages.
Java does appear to be moving in the directions I have expected: that is, not
so much for use within an Internet environment, but rather within an
organization. Corel has a beta release of their desktop productivity software,
all written in Java. Many vendors have built classes that enable others to
write Java applications that use these classes to interact with legacy systems.
And JavaSoft has created JavaPC, which permits the use of 486 or Pentium
systems as network computers, with or without Windows included. See
https://java.sun.com/nav/read/products.html for details.
The announcement I have been waiting for is the Java accelerator, a little
board that plugs into an ISA bus and has a picoJava chip on it. A picoJava chip
will run Java applications at least twice as fast as a Pentium running at the
same clock and executing just-in-time compiled Java code. This little
accelerator would become the heart of the PC, using the existing CPU for I/O
operations. You could feasibly turn a junker 386 into a quite respectable
network computer with such a card. I talked to Sun's hardware division and
asked if they had any plans in this direction. Apparently, Sun would be content
to let someone else build this board. They did say that a high-end picoJava
chip would cost less than $100.
Something Useful
In past columns, I tried to point out things I found interesting and important
about Java. But I have yet to produce something useful in itself. To remedy
this situation, I came up with an idea for an application that Web masters
could actually find useful. Ideally, it would also work as an applet, so anyone
with a Java-enabled browser could use it. Also, to make it more likely to run,
I have reverted to JDK 1.0.2. JavaSoft has released JDK 1.1.1, but at this
time, I have found that not many browsers will actually support the 1.1 APIs. I
apologize for this lapse.
Java, as mentioned in the December column, has very nice networking support. It
also applies severe restrictions on networking when operated from within most
browsers (HotJava being the configurable exception; try https://java.sun.com/products/HotJava/.
My idea is an application/applet which checks all the links found in HTML
documents, a tame Web walker. If run as an applet, the Web walker will be able
to check only the server from which it was loaded. In application form, it
performs its own restrictions, preventing it from checking links on servers
other than the original target.
The results of the Web walker will show you if all the local links in your Web
pages, including images, actually point to a resource. You can modify this
application so it can also check to see if the remote resources also exist,
although the program will not walk through remote hierarchies as written.
The code itself can be found here.
WebWalker
WebWalker.java works like this. The main() routine creates a
Frame, then calls init() and start() . These methods,
which will also be called by the applet version, set up a TextArea (for
displaying results), a Start button, and a TextField for entering the starting
URL. Pressing the return key within the TextField or clicking the Start button
invokes the checkStart() method which gets the ball rolling.
At first glance, a Web walker sounds like a great opportunity for using
multiple threads. But the more I thought about it, the more sensible it seemed
to keep things simple. After all, I don't want to build the next version of the
Internet Worm, an application that will consume all network resources. It
doesn't have to be screamingly fast, just accurate enough to check newly added
HTML pages, or a Web server's file hierarchy.
public void checkStart(String s) {
String t = s.trim();
try {
u1 = new URL(s);
} catch (MalformedURLException e) {
append("---URL not recognized: "+ s);
return;
}
stack.push(u1);
thread = new Thread(this);
thread.start();
}
WebWalker starts by attempting to create a URL object with the string entered
in the TextField. If this fails, an error message gets displayed in the
TextArea. Java is rather finicky about URL formats (much more so than friendly
browsers) and wants you to enter something like
"https://somehost/dir/filename.html". Also, the behavior will be different if
you run this as an applet within Netscape (I used the 3.0 Gold for testing).
Netscape includes support for more protocols in its Java libraries than does
the JDK (which supports only the HTTP protocol in JDK 1.0.2). Protocols, such
as ftp or mailto , will be reported by WebWalker
running stand-alone as "Not recognized."
If a URL object is created, it is pushed onto a stack object. I used a stack so
that each HTML file could be scanned one at a time, rather than recursively
scanning each new HTML file when it is referenced in the file. Then a Thread is
created and the run() method started. We need this thread -
without it, events, such as resizing the window where the application is
running, will be ignored.
public void run() {
while(true) {
try {
u = (URL) stack.pop();
} catch (EmptyStackException e) {
append("DONE: stack empty.");
return;
}
if (table.contains(u)) continue; // We have already
// visited here
table.put(u, u); // Else, add it to the hashed list
append("Checking: " + u.toString());
try {
InputStream in = u.openStream();
readStream(in);
} catch (FileNotFoundException e) {
append("===File Not Found: " + u.toString());
} catch (IOException e) {
append("===IOERROR: " +
u.toString() + ": " + e.toString());
}
} // End while loop
}
Within the run() method, a while() loop is used to
pop the top URL off the stack until the stack is empty. To prevent checking the
same URL twice, a HashTable is used. If the URL is found in the HashTable, the
loop continues. Otherwise, the URL is added to the HashTable. The
put() method creates a four-byte hash by calling the
URL.toHash() method.
The private append() method displays messages in the TextArea
shown within the Frame or Applet panel. Next, WebWalker attempts to get an
InputStream from the URL and proceeds to a scanning for more images or host
references. If the open fails, the resource was not found, and an error
message, "===File Not Found", gets displayed. In other words, this link does
not point to a valid reference.
I won't go into all the details of this longer than usual example. The fruit of
this application comes when it discovers links that do not point to existing
resources. WebWalker checks that all URLs have the same host portion as the
starting URL. For most browsers, the SecurityMonitor will enforce this
restriction.
WebWalker could be improved in several ways. For example, HTML files are
scanned a line at a time, which is slow. Also, each line is checked for one
HREF or one SRC tag; if there are more than one of either of these per line,
the second one is ignored. Other tags are currently ignored, for example,
applet tags. It would be easy to add more tags, but I would prefer a more
elegant solution than rescanning each line using the
String.indexOf() method.
Instead of reading each non-HTML file, a subclass of URLConnection could be
used to simply check for the existence of the resource by getting just the
resource's header. Another idea would be to keep track of line numbers, so that
each invalid URL could be flagged with its line number within the file checked.
And the application version could save its results as a file.
Mea Culpa
That's Latin for "my fault." I have made some mistakes in past columns and
would like a chance to rectify them.
David Holmes of the Microsoft Research Institute (Macquarie University, Sydney,
Australia) wrote me in March to tell me that my February column about threads
had some serious errors. I checked, and he is right. First, I was confused
about the role of ThreadDeath exceptions. When a thread exits normally,
returning from its run() method, it returns to a handling method
from which it was called. This method cleans up after the thread by removing
the thread from its threadgroup and invoking notifyAll() on the
Thread object. However, if a thread is stopped prematurely (with
stop() ), ThreadDeath is thrown. The thread can catch the
ThreadDeath exception and clean up, but it must rethrow this exception because
it will be handled by the calling method, which then cleans up after the thread
as if its run() method had returned.
I was also quite wrong about the function of wait() . I had it
backwards, in fact, which came about from thinking like a UNIX programmer
instead of a Java programmer (and from not reading Java Language Specification,
which David points out is the only correct source of information).
Wait() can be called only by an object that already has an
object's lock. Wait() and notify() are designed to
handle synchronization between threads, allowing one thread to communicate with
others. As an example, imagine that one thread has locked an object, only to
discover that it needs another resource to complete processing the locked
object. By waiting, this thread releases the lock and will eventually reacquire
the lock after being notified by some other thread. The whole idea is a
nonspecific mechanism for interthread communication. Because a
wait() is nonspecific, it will typically be called within a
while() loop that checks a condition variable to see if the thread
should wait() again.
I would like to write about JavaBeans sometimes, but am also looking for other
Java programmers who would like to write a Java column, about Beans or the
topic of your heart's desire. Send proposals by email to kolstad@usenix.org or to me, and your work
can fill this slot in ;login: some month.
First published in ;login:, Volume 22, No. 3, June
1997.
|