USENIX - Using Java 5

by Rik Farrow
rik@spirit.com

June, 1997

JavaOne, the yearly Java developer's conference in San Francisco, has just completed, and I have been poring over the press releases looking for interesting tidbits. I recently heard someone ask if there were any necessary Java applets on the Internet (this was in regards to security and letting Java through a firewall). One participant replied that his company has an applet that helps customers design an investment strategy. Many others knew only of flashy applets used to decorate Web pages.

Java does appear to be moving in the directions I have expected: that is, not so much for use within an Internet environment, but rather within an organization. Corel has a beta release of their desktop productivity software, all written in Java. Many vendors have built classes that enable others to write Java applications that use these classes to interact with legacy systems. And JavaSoft has created JavaPC, which permits the use of 486 or Pentium systems as network computers, with or without Windows included. See https://java.sun.com/nav/read/products.html for details.

The announcement I have been waiting for is the Java accelerator, a little board that plugs into an ISA bus and has a picoJava chip on it. A picoJava chip will run Java applications at least twice as fast as a Pentium running at the same clock and executing just-in-time compiled Java code. This little accelerator would become the heart of the PC, using the existing CPU for I/O operations. You could feasibly turn a junker 386 into a quite respectable network computer with such a card. I talked to Sun's hardware division and asked if they had any plans in this direction. Apparently, Sun would be content to let someone else build this board. They did say that a high-end picoJava chip would cost less than $100.

Something Useful

In past columns, I tried to point out things I found interesting and important about Java. But I have yet to produce something useful in itself. To remedy this situation, I came up with an idea for an application that Web masters could actually find useful. Ideally, it would also work as an applet, so anyone with a Java-enabled browser could use it. Also, to make it more likely to run, I have reverted to JDK 1.0.2. JavaSoft has released JDK 1.1.1, but at this time, I have found that not many browsers will actually support the 1.1 APIs. I apologize for this lapse.

Java, as mentioned in the December column, has very nice networking support. It also applies severe restrictions on networking when operated from within most browsers (HotJava being the configurable exception; try https://java.sun.com/products/HotJava/. My idea is an application/applet which checks all the links found in HTML documents, a tame Web walker. If run as an applet, the Web walker will be able to check only the server from which it was loaded. In application form, it performs its own restrictions, preventing it from checking links on servers other than the original target.

The results of the Web walker will show you if all the local links in your Web pages, including images, actually point to a resource. You can modify this application so it can also check to see if the remote resources also exist, although the program will not walk through remote hierarchies as written.

The code itself can be found here.

WebWalker

WebWalker.java works like this. The main() routine creates a Frame, then calls init() and start(). These methods, which will also be called by the applet version, set up a TextArea (for displaying results), a Start button, and a TextField for entering the starting URL. Pressing the return key within the TextField or clicking the Start button invokes the checkStart() method which gets the ball rolling.

At first glance, a Web walker sounds like a great opportunity for using multiple threads. But the more I thought about it, the more sensible it seemed to keep things simple. After all, I don't want to build the next version of the Internet Worm, an application that will consume all network resources. It doesn't have to be screamingly fast, just accurate enough to check newly added HTML pages, or a Web server's file hierarchy.

public void checkStart(String s) { 
    String t = s.trim(); 
    try { 
	u1 = new URL(s); 
    } catch (MalformedURLException e) {
	append("---URL not recognized: "+ s); 
	return;
    } 
    stack.push(u1); 
    thread = new Thread(this); 
    thread.start(); 
}

WebWalker starts by attempting to create a URL object with the string entered in the TextField. If this fails, an error message gets displayed in the TextArea. Java is rather finicky about URL formats (much more so than friendly browsers) and wants you to enter something like "https://somehost/dir/filename.html". Also, the behavior will be different if you run this as an applet within Netscape (I used the 3.0 Gold for testing). Netscape includes support for more protocols in its Java libraries than does the JDK (which supports only the HTTP protocol in JDK 1.0.2). Protocols, such as ftp or mailto, will be reported by WebWalker running stand-alone as "Not recognized."

If a URL object is created, it is pushed onto a stack object. I used a stack so that each HTML file could be scanned one at a time, rather than recursively scanning each new HTML file when it is referenced in the file. Then a Thread is created and the run() method started. We need this thread - without it, events, such as resizing the window where the application is running, will be ignored.

public void run() {
   while(true) {
	try {
	    u = (URL) stack.pop();
	} catch (EmptyStackException e) {
	    append("DONE: stack empty.");
	    return;
	}
	if (table.contains(u)) continue; // We have already
					 // visited here
	table.put(u, u); // Else, add it to the hashed list
	append("Checking: " + u.toString());
	try {
		InputStream in = u.openStream();
		readStream(in);
	    } catch (FileNotFoundException e) {
	    append("===File Not Found: " + u.toString());
	    } catch (IOException e) {
	    append("===IOERROR: " +
		u.toString() + ": " + e.toString());
	} 
   } // End while loop 
}

Within the run() method, a while() loop is used to pop the top URL off the stack until the stack is empty. To prevent checking the same URL twice, a HashTable is used. If the URL is found in the HashTable, the loop continues. Otherwise, the URL is added to the HashTable. The put() method creates a four-byte hash by calling the URL.toHash() method.

The private append() method displays messages in the TextArea shown within the Frame or Applet panel. Next, WebWalker attempts to get an InputStream from the URL and proceeds to a scanning for more images or host references. If the open fails, the resource was not found, and an error message, "===File Not Found", gets displayed. In other words, this link does not point to a valid reference.

I won't go into all the details of this longer than usual example. The fruit of this application comes when it discovers links that do not point to existing resources. WebWalker checks that all URLs have the same host portion as the starting URL. For most browsers, the SecurityMonitor will enforce this restriction.

WebWalker could be improved in several ways. For example, HTML files are scanned a line at a time, which is slow. Also, each line is checked for one HREF or one SRC tag; if there are more than one of either of these per line, the second one is ignored. Other tags are currently ignored, for example, applet tags. It would be easy to add more tags, but I would prefer a more elegant solution than rescanning each line using the String.indexOf() method.

Instead of reading each non-HTML file, a subclass of URLConnection could be used to simply check for the existence of the resource by getting just the resource's header. Another idea would be to keep track of line numbers, so that each invalid URL could be flagged with its line number within the file checked. And the application version could save its results as a file.

Mea Culpa

That's Latin for "my fault." I have made some mistakes in past columns and would like a chance to rectify them.

David Holmes of the Microsoft Research Institute (Macquarie University, Sydney, Australia) wrote me in March to tell me that my February column about threads had some serious errors. I checked, and he is right. First, I was confused about the role of ThreadDeath exceptions. When a thread exits normally, returning from its run() method, it returns to a handling method from which it was called. This method cleans up after the thread by removing the thread from its threadgroup and invoking notifyAll() on the Thread object. However, if a thread is stopped prematurely (with stop()), ThreadDeath is thrown. The thread can catch the ThreadDeath exception and clean up, but it must rethrow this exception because it will be handled by the calling method, which then cleans up after the thread as if its run() method had returned.

I was also quite wrong about the function of wait(). I had it backwards, in fact, which came about from thinking like a UNIX programmer instead of a Java programmer (and from not reading Java Language Specification, which David points out is the only correct source of information). Wait() can be called only by an object that already has an object's lock. Wait() and notify() are designed to handle synchronization between threads, allowing one thread to communicate with others. As an example, imagine that one thread has locked an object, only to discover that it needs another resource to complete processing the locked object. By waiting, this thread releases the lock and will eventually reacquire the lock after being notified by some other thread. The whole idea is a nonspecific mechanism for interthread communication. Because a wait() is nonspecific, it will typically be called within a while() loop that checks a condition variable to see if the thread should wait() again.

I would like to write about JavaBeans sometimes, but am also looking for other Java programmers who would like to write a Java column, about Beans or the topic of your heart's desire. Send proposals by email to kolstad@usenix.org or to me, and your work can fill this slot in ;login: some month.

First published in ;login:, Volume 22, No. 3, June 1997.

by Rik Farrow rik@spirit.com

Something Useful

WebWalker

Mea Culpa

by Rik Farrow
rik@spirit.com