The following paper was originally published in the
Proceedings of the
USENIX
Fourth Annual
Tcl/Tk Workshop
Monterey, California, July 1996.
For more information about
USENIX Association
contact:
1. Phone: | (510) 528-8649 |
2. FAX: | (510) 548-5738 |
3. Email: | office@usenix.org |
4. WWW URL: | https://www.usenix.org |
Jonathan L. Herlocker herlocke@cs.umn.edu
The NR Newsreader
Department of Computer Science
University of Minnesota
Minneapolis, MN 55455 Abstract
NR is a point and click GUI interface for browsing Usenet news. The NR interface is built using the Tk interface toolkit, and coded entirely in the Tcl scripting language. NR was designed as a framework for pursuing research into electronic information br
owsing. As a result, NR had strong requirements for configurability, extensibility, portability, and performance. This paper describes how those requirements were met through the use of features and extensions of the Tcl/Tk scripting language. The paper a
lso describes some of the information filtering technologies implemented in NR. Background
Usenet news is a medium that provides for collaboration and sharing of knowledge on a global scale through a bulletin-board model. Usenet is organized into thousands of newsgroups, each of which represents a general topic of interest. Every message posted
to a newsgroup is seen by all readers who subscribe to that newsgroup. As Usenet news has increased its readership to millions, the quantity of text posted daily to popular newsgroups exceeds the amount of information that a person has time to process. A
s a result, reading Usenet news can be a frustrating process. A user must scan hundreds of subject lines each week in order to locate interesting information.
In an attempt to lessen the workload of users reading electronic news, I am experimenting with agent technologies that help a user locate information of interest. One of these technologies is content-based: an agent that observes and remembers the content
of user-selected articles and then attempts to suggest new articles that contain similar content. The other technology, part of the GroupLens[1][7] project, is collaborat
ive based. GroupLens helps a user to locate new information based on the opinions of other users with similar interest. Both of these technologies are described in more detail at the end of the paper.
After designing the architecture of my agents, I found that there were no existing newsreaders that met my requirements for a prototype system. I wanted a GUI-based newsreader to provide more flexibility in creating interfaces for visual feedback. Existin g GUI-based newsreaders such as XRN[6] did not lend themselves to easy extension of the graphical interface.
I chose to implement a prototype newsreader in Tcl/Tk, primarily because I knew from previous experience that I would be able to quickly build a working interface. In addition, the Tk text widget had the desirable property that it both understood how to l
ay out text efficiently and supported embedded graphics and controls.
Requirements
There were four major requirements that would make NR possible and successful.
Tcl is an ideal language for a Usenet client, because both the data and communication protocols in Usenet are entirely textual, and require little support for binary datatypes.
One of my constant paranoias is that I will reinvent the wheel every time I write a piece of code. However, I was much more at ease when writing NR, due to the enormous amounts of code that I reused in developing my application. Of the 16,360 lines of Tcl code currently in the NR application, only 6,350 lines were written by me.
The majority of the code I reused came from the exmh mail reader[10]. Organizing code in fashion similar to exmh (as described in [10]) is a good way to create modularizat ion for a language that provides little built-in support for it.
However, one of the frustrations that I experienced when writing NR was that there was no good place that I could go to locate code modules for code reuse. Current Tcl/Tk archives provide a centralized location for Tcl/Tk source code and applications, but
these archives focus on applications and not on reusable code modules. In addition, all the sites provide the applications and code modules in a form of a monolithic list with nothing more than a one or two line abstract. A much more structured interface
is needed for a Tcl/Tk code modules archive. This ideal archive for Tcl/Tk code modules will be hierarchically organized, so that if a programmer wants to find a common interface element (such as an extended listbox, file selection box, or extended canva
s widget), he can move directly to a point in the hierarchy and only browse through applicable modules.The abstract text of all code modules and applications should be searchable, so that users can locate applications of interest, and programmers can loca
te Tcl code modules if they are unsure of its location in the hierarchy.
Configurability
Every user of Usenet news has developed their own habits of browsing electronic news, and many have no desire to change their habits. This was my stumbling block when I sought to obtain useful data to document my agent assistant. So NR had to support comm
only desired features and be easily extended to support other features that users might desire. I also wanted to design NR such that it could be easily extended for research involving information browsing such as user trace collection or development of ag
ent assistants. Overall, NR had to be an application that was highly configurable in terms of features and interface.
set moduleName ``killFromAOL''
set moduleVersion ``1.0''
set moduleDescription ``This module will cause NR to discard all posts from authors at AOL (*@aol.com)''
set moduleHelp ``By default,the killFromAOL module will alert the user before discarding any posts, but killFromAOL can be configured to quietly and indiscriminantly kill all posts from AOL. See the killFromAOL preferences for more.''
Preferences_Add ``Kill AOL'' \
``These options control preferences for the Kill AOL module'' {
killAOL(showStats) \
killAOLShowStats \
ON \
{ Show statistics } \
``This setting controls whether or not a dialog box will inform whenever AOL articles are killed, how many articles were killed and provide you with the option of overriding.''
}
}
NR_AddKillPattern author {*@aol.com}
Figure 2: Example feature module that will cause NR to discard all articles originating from authors at aol.com.
Adding new features to NR is made easy by providing a documented set of hook functions, which allow users to specify Tcl scripts to be executed at key points of the application. Statistics on newsreading can be collected through hooks that are called when
ever an article or group is read. Extensions can control how and what articles are displayed through hook functions. Together these hooks can provide information and control to intelligent agent assistants.
Good Performance
In order to create a viable user interface to Usenet news, NR has to provide comparable performance to other existing newsreaders which may be written in compiled languages such as C. Speed was a considerable problem with the initial implementation of NR.
There were three sources for poor performance: the network protocol, content parsing of text, and insertion of text into the text widget.
Due to the inherent performance limitations of Tcl, post-processing of article text proved that it could be time consuming. In several cases, NR needs to parse large text articles, looking for regular expressions or patterns. Currently there is no good so
lution for these situations, however in the future, NR may prefetch and preprocess article text off-line. Prefetching will also hide the network delay.
Portability
In order to reach a large cross section of users on the internet, NR had to be portable to a variety of architectures. Tcl was an ideal language, because the Tcl language is almost entirely architecture and operating system independent. However there were
some barriers to portability.
The cross-platform support introduced in Tcl7.5 and Tk4.1 has made NR accessible to millions of Microsoft Windows users on the Internet. However making NR runnable on a Windows machine required considerable modification, due to heavy use of the ``exec'' T
cl command in some of NR's code modules. Here are some suggestions to writing portable Tcl code:
Resource Locking in Tcl
NR maintains a single network socket for communication with the server. Since NR allows the user to be browsing multiple groups at once (each in a separate window), access to the network socket must be controlled to prevent multiple groups from accessing
the socket simultaneously. Some method of resource locking must be used. Locking is not completely intuitive in Tcl, because only a single execution context is supported. However, there turns out there is a very simple solution to resource locking in Tcl.
A single global variable is used to represent the lock and processes desiring to obtain the lock enter a busy-wait loop, using the ``after'' command. For example, any Tcl procedure needing to access the network socket would have a form similar to the fol
lowing:Proc GetOverview { group } {
global socketLock
if { $socketLock == 1}
{
after 500 [info level 0]
return
} else {
set socketLock 1
# Main code of procedure
# follows
...
set socketLock 0
}
}
Upon failing to acquire the lock, the procedure uses the after command to schedule itself again in 500ms, in hope that the lock will then be free. [info level 0] returns the current procedure name and arguments. This simple form of resource locking works
because Tcl has only a single thread of execution and reads and writes to the lock variable are guaranteed to be atomic. Those more familiar with the theory of process synchronization will point out that the method described above is not guaranteed to be
fair. Processes waiting for a resource are not guaranteed to be serviced in a first come-first served manner. This was not a requirement in NR. Examples of Using NR for Information Filtering
GroupLens is a collaborative-based filtering system that is currently being targeted on Usenet news. A GroupLens-enhanced newsreader provides predictions as to how interesting a user will find each news article. GroupLens works by having users assign rati
ngs to each news article they read based on how interesting they found that article. Ratings are sent to a central GroupLens server, known as the Better Bit Bureau or BBB. The BBB correlates all the users' ratings and clusters users of similar interests.
The BBB then generates predictions for each user based on the ratings of users in the same interest cluster. The newsreader client can then chose how it wishes to use the predictions. It could just display them (as shown in figure 3), sort them, or even c
hose to display only articles with a prediction above a certain threshold. Current Status of NR
An alpha release of NR is currently available from ftp://ftp.cs.umn.edu/dept/users/herlocke/nr. The extensibility interface and documentation have not been completed yet. Once work on both these issues is completed, a beta test of NR will be announced. If
you wish to be added to the announcement mailing list, please send me a email note (herlocke@cs.umn.edu). Conclusions
NR has grown from a simple prototype to a large application of around 16,000 lines. The process of evolving NR has been relatively painless because the Tcl language supports and encourages extensibility. I have shared some of my key experiences from devel
oping NR, and I hope that the Tcl/Tk community will continue to develop highly configurable and extensible applications. To restate the key points:
Share your useful code. If you have written a useful code module, don't be afraid to share it with the rest of the Tcl/Tk community. Chances are that one of us would like to use it too. But please document it well.
Appendix - Features of NR
Figure 3: NR with GroupLens predictions. The predictions are represented here as ASCII scales. The larger the scale, the better the prediction.
Figure 4: NR with A-F predictions, based on keywords found in the subject line. Note that the ratings are actually menubuttons embedded into a text widget.
Figure 5: The embedded menubuttons are used both to provide more information about a prediction, but also allow the user the opportunity to provide feedback on a specific prediction.