| 2000 USENIX Annual Technical Conference   
[Technical Index] 
| Pp. 171182 of the Proceedings |  |  
Integrating a Command Shell Into a Web Browser   Integrating a Command Shell Into a Web Browser 
 Robert C. Miller and Brad A. Myers Carnegie Mellon University
 {rcm,bam}@cs.cmu.edu
 Abstract
The transition from command-line interfaces to graphical interfaces has resulted
in programs that are easier to learn and use, but harder to automate and reuse.
Another transition is now underway, to HTML interfaces hosted by a web browser.
To help users automate HTML interfaces, we propose the browser-shell,
a web browser that integrates a command interpreter into the browser's Location
box. The browser-shell's command language is designed for extracting and manipulating
HTML and text, and commands can also invoke local programs. Command input is
drawn from the current browser page, and command output is displayed as a new
page. The browser-shell brings to web browsing many advantages of the Unix shell,
including scripting web services and creating pipelines of web services and
local programs. A browser-shell also allows legacy command-line programs to
be wrapped with an HTML/CGI interface that is graphical but still scriptable,
and offers a new shell interaction model, different from the conventional typescript
model, which may improve usability in some respects.
 
         
1  Introduction 
 
The transition from command-line interfaces to graphical interfaces carries
with it a significant cost. In the Unix shell, for example, programs accept
plain text as input and generate plain text as output. This makes it easy to
write scripts that automate user interaction. An expert Unix user can create
sophisticated programs on the spur of the moment, by hooking together simpler
programs with pipelines and command substitution. For example:
 
kill `ps ax | grep xterm | awk '{print
$1;}'` 
This command uses ps to list information about running processes, grep
to find just the xterm processes, awk to select just the process
identifiers, and finally kill to kill those processes. 
These capabilities are lost in the transition to a graphical user interface
(GUI). GUI programs accept mouse clicks and keystrokes as input and generate
raster graphics as output. Automating graphical interfaces is hard, unfortunately,
because mouse clicks and pixels are too low-level for effective automation and
interprocess communication. Attempts to introduce Unix shell features like pipelining
into graphical user interfaces [3,6,7,8,15,16]
have been unsuccessful, largely because they were not integrated well with existing
applications, required extra work from application developers to expose hooks
and programming interfaces, or were too hard to use.
 
With the advent of the World Wide Web, another transition is underway, this
time to distributed web applications that run on a web server and interact with
the user through a web browser. Most web services accept input from HTML forms
and generate output as HTML pages. Since HTML is textual and capable of being
parsed and manipulated, we have the opportunity to recover some of the interactive
automation capabilities that were available in the Unix shell, but missing in
graphical interfaces. Consider the following web-browsing tasks that could be
partially or totally automated:
 
 
 Download and print a group of links on a page; 
 Compare airfares and schedules for several choices of departure and arrival
dates; 
 Look up a colleague in the online university phonebook, obtain a home address,
locate the address on a map, and get driving directions; 
 Given a list of books to read, search for each book in the local library catalog,
and if the book is not on the shelves, buy it from an online bookstore; 
 Make a smart alarm clock that announces the current temperature from an online
weather report, and the time until the next bus departs from an online schedule,
while you dress in the morning. 
 
As a step towards automating these tasks and others, we have extended a web
browser in several ways:
 
 
 Embedding a pattern language for matching text and HTML, and a suite
of text-processing tools for extracting and manipulating web page data. High-level
pattern-matching and text manipulation are essential to web automation, acting
as a glue language for connecting unrelated web services and programs. 
 Embedding a scripting language and integrating a command interpreter
into the Location box. In addition to accepting a typed URL, the browser window's
Location box can also accept a typed command with arguments. A command may be
a built-in command, a user-defined script, or an external program. The built-in
scripting language includes commands for automatic web browsing, such as clicking
on hyperlinks, filling out forms, and extracting data from web pages.
 Using the browser window to display command output and construct pipelines
of commands. When a command is invoked, it takes its input from the current
page in the browser window, and sends its output back to the browser window
as a new page. 
 Including executed commands in the browsing history. Forward and Back
navigate through command output pages as well as web pages. Part of the history
can be extracted and saved as a script for later execution.
 
We have implemented these extensions in a prototype web browser named LAPIS
(Lightweight Architecture for Processing Information Structure). The first extension,
consisting of a pattern language and text-processing tools, was described in
a previous paper [14], which is summarized below. This paper focuses
on the other three features, which integrate a command shell into the web browser
to create a browser-shell.
 
The browser-shell addresses the problem of interactive web automation by allowing
the user to apply patterns, script commands, and external programs directly
to the browser page. For one-shot tasks, commands can be interleaved with manual
browsing to perform the task as quickly and directly as possible. For repeated
tasks, the user can interactively define a script by invoking a sequence of
commands on example data, using the Back button to correct mistakes, and then
copying the command sequence out of the browsing history and saving it as a
script.
 
The browser-shell concept has implications beyond web automation, two of which
are considered in this paper:
 
 
 HTML interfaces for local programs. Currently, programs with HTML interfaces
must be installed in a web server in order to handle form submissions. LAPIS
can submit forms to local programs by the Common Gateway Interface (CGI) [17],
an existing standard used by web servers. This opens the possibility of running
HTML applications entirely locally. HTML offers benefits of both a graphical
user interface (GUI) and a command-line interface (CLI). An HTML interface can
be as easy to learn and use as a GUI, yet still open to automation like a CLI.
As a demonstration, we have wrapped an HTML interface around the Unix find
program.
 Using the browser as a command shell, in place of the Unix shell or
MS-DOS command prompt. The browser-shell can be used to invoke local programs,
but it behaves differently from a conventional typescript shell. Whereas a typescript
shell interleaves commands with program output in the same window, a browser-shell
displays commands and program output in separate parts of the browser window,
and automatically redirects a program's input from the current page. These differences
make some tasks easier, such as viewing program output and constructing pipelines,
but others harder, such as running legacy programs that use standard input to
interact with the user. The tradeoffs are discussed in more detail in section 5.
 
The remainder of this paper is organized as follows. Section 2
covers related work. Section 3 describes important features of the LAPIS
browser-shell, including the pattern language, the scripting language, and invocation
of external programs. Section 4 describes our prototype
implementation of LAPIS and contrasts some implementation alternatives. Section 5
discusses some of the implications of integrating a command shell into a web
browser, in particular creating local programs with HTML interfaces and using
the browser as an alternative interface to the system command prompt. Section 6
reports on the status of the LAPIS prototype, and Section 7
concludes.
 
         
2  Related Work 
 
Several systems have addressed the problem of web automation. One approach is
macro recording, typified by LiveAgent [11]. LiveAgent
automates a task by recording a sequence of browsing actions in Netscape through
a local HTTP proxy. Macro recording requires little learning on the part of
the user, but recorded macros suffer from limited expressiveness, often lacking
variables, conditionals, and iteration.
 
Another approach is scripting, writing a program in a scripting language
such as Perl, Tcl, or Python. These scripting languages are fully expressive,
Turing-complete programming languages, but programs written in these languages
must be developed, tested, and invoked outside the web browser, making them
difficult to incorporate into a web user's work flow. The overhead of switching
to an external scripting language tends to discourage the kind of spur-of-the-moment
automation required by the tasks described above, in which interactive operations
might be mixed with automation in order to finish a task more quickly.
 
A particularly relevant scripting language is WebL [9], which provides
high-level service combinators for invoking web services and a markup
algebra similar to the LAPIS pattern language for extracting results. Like
other scripting languages, WebL lacks tight integration with a web browser,
forcing a user to study the HTML source of a web service to develop markup patterns
and reverse-engineer form interfaces. In LAPIS, web automation can be done while
viewing rendered web pages in the browser, and simple tasks can be automated
entirely by demonstrating the steps on examples.
 
Other systems have tackled more restricted forms of web automation by demonstration.
Turquoise [13] and Internet Scrapbook [22] construct
a personalized newspaper, a dynamic collage of pieces clipped from other
web pages, by generalizing from a cut-and-paste demonstration. SPHINX [12]
creates a web crawler by demonstration, learning which URLs to follow from positive
and negative examples.
 
Wrapping GUI frontends around CLI programs is a common way to support both ease-of-use
and scriptability. Many integrated development environments follow this pattern,
in which the graphical user interface invokes the compiler, linker, and other
tools using command-line interfaces. Particularly relevant is the Commando dialog
box system in the Macintosh Programmer's Workshop [1], which allows
a developer to specify a dialog box interface for an arbitrary Macintosh command-line
program. A Commando dialog box resource is an abstract description specifying
the dialog box controls and how the controls are mapped to command-line options.
In that sense, it resembles an HTML interface, but is more platform-dependent
than HTML.
 
Others have investigated wrapping HTML interfaces around command-line programs
on a web server, but not on the client. For example, Phanouriou and
Abrams [19] described an HTML interface that presented status information
about a web server (network, filesystem, memory, kernel, etc.) obtained from
Unix commands.
 
The browser-shell is not the first alternative to the standard typescript Unix
shell. Another is Sam [21], a graphical text editor which integrates
external program execution in three ways: ``< command''
replaces the current selection with the output of a command, ``> command''
runs the command with the current selection as input, and ``| command''
redirects both input and output. The Emacs shell-command-on-region command
provides similar capabilities. In a later editor, Acme [20], each external
command's output appears in a new window, with a tag line similar to
a browser's Location box that can be used to invoke another external program.
Unlike Sam, Acme had no provision for supplying a command's input from a window,
and both systems lacked the output history provided by a browser-shell.
 
         
3  User Interface 
 
We now describe some important features of the browser-shell user interface.
The first section is a summary of some previous work on which we are building.
Subsequent sections describe new work: the command interpreter, web automation,
creating web scripts by example, and invoking external programs and CGI programs.
 
       
 
  
 
Figure 1: 
 The LAPIS web browser, displaying a web page that
lists new cars.  The page structure is described by patterns shown in the inset
window (Text Constraint Editor).  Some of the terms used in these patterns (Row,
Link, etc.) are defined by other patterns not shown, and others are defined
by the built-in HTML parser.  The user has entered a pattern in the Find box
to highlight certain cars (rear-wheel drive Kias), and is now about to run a
command in the Command box to sort all cars by horsepower. 
 
The web browser we used to prototype the browser-shell is called LAPIS (Figure 1),
part of a system of generic tools for structured text that we call lightweight
structured text processing [14]. Lightweight structured text processing
enables users to define text structure interactively and incrementally, so that
generic tools can operate on the text in structured fashion. Our lightweight
structured text processing system has four components:
 
 
 a pattern language for describing text structure; 
 parsers for standard structure, such as HTML and programming language
syntax; 
 tools for manipulating text using structure, including sorting, searching,
extracting, reformatting, editing, computing statistics, graphing, etc; 
 a document viewer (in this case, a web browser) for viewing documents,
developing and testing patterns, and invoking tools. 
 
LAPIS includes a new pattern language called text constraints. Text constraints
describe a set of regions in a page in terms of relational operators, such as
before, after, in, and contains. Text constraints can refer to
structure defined by arbitrary parsers, such as the built-in HTML parser that
identifies HTML elements and assigns them names, such as Link, Paragraph, and
Heading. A single text constraint pattern can refer to multiple parsers -
for example, Line at start of Function refers to both Line,
a name defined by a line-scanning parser, and Function, a name defined
by a programming-language parser. In general, text constraints are designed
to be more readable and comprehensible for users than context-free grammars
or regular expressions, because a structure description can be reduced to a
list of simple, intuitive constraints which can be read and understood individually.
More details about the text constraints language can be found in a previous
paper [14].
 
The LAPIS browser includes several tools for transforming web pages. For example,
keep extracts a set of regions matching a text constraint pattern, delete
deletes a set of regions, sort sorts a set of regions in-place, and replace
replaces a set of regions with some replacement text. In the LAPIS browser described
in a previous paper [14], a tool could only be invoked from a menu,
and its output was directed to a new page in the browser. The browser-shell
extensions described in this paper make it possible to invoke these tools from
the Location box and from user-defined scripts.
 
       
3.2  The Browser-Shell
In order to create scripts of commands, we embedded Tcl [18] into LAPIS.
Tcl was chosen partly because of its syntactic simplicity, and partly because
a good Java implementation was available [5]. Tcl is also well-suited
to interactive command execution.
 
Instead of presenting a Tcl interpreter in a separate window, LAPIS integrates
the interpreter directly into the browser window. Tcl commands may be typed
into the Location box. The typed command is applied to the current page, and
its output is displayed in the browser as a new page that is added to the browsing
history.
 
Using the Location box as a command line has several advantages. The page generated
by a command can be browsed like a page generated by a URL. The browsing interface
- Back, Forward, Stop, and Reload - also applies to command outputs. The
Back button returns the browser to the previous page, Stop aborts a long-running
command, and Reload runs the command again.
 
Since either a URL or a command can be typed into the Location box, LAPIS must
be able to distinguish between them. The problem is trivial if the typed entry
begins with a protocol prefix, such as http: or file:, and
LAPIS also recognizes the protocol cmd: for invoking a command unambiguously.
If the typed entry does not begin with a prefix, LAPIS tries every possible
interpretation: first as a command to execute, then as a filename to display,
then as a domain name for a web server. This is an extension of the heuristics
already used by the Location box of most web browsers.
 
For security reasons, LAPIS only executes a cmd: URL if it originates
locally - e.g., if it is typed into the Location box or found in a page loaded
from the local filesystem. A link in a remote web page cannot invoke a Tcl command.
 
       
3.3  Web Automation
Web browsing has two basic actions: clicking on hyperlinks and submitting forms.
Automating web browsing requires equivalent script commands for these actions.
 
Clicking on a link has the same result as typing in its URL in the Location
box. Thus the script command for clicking on a link is simply the link's URL,
such as:
 
 
http://weather.yahoo.com/
For some links, however, the URL varies depending on when the page is viewed.
Variable links are often found in online newspapers, for example, where links
to top stories change from day to day. The click command can be used
to click on a variable link by describing its location in the web page with
a LAPIS text constraint pattern. For example: 
http://www.salon.com/ # Start at Salon
click {Link after Image in Column3}
         # Click on top story
         # (curly braces are Tcl quoting) 
For entering data into forms, the enter command is provided, with two
arguments. The first argument is a pattern describing the form field to affect.
Since HTML form fields are named, this pattern may simply be the field name.
Alternatively, the pattern may describe the field in terms of its context (e.g.,
TextBox just after "Email Address:"), which has
the advantage of being comprehensible without looking at the HTML source. The
second argument to enter specifies the value to enter in the field.
For text fields, this value is entered in the field directly. For menus or lists,
the value is selected in the list. For radio buttons or checkboxes, the value
should be ``on'' or ``off'' (or yes/no, true/false, or 0/1).
Forms are submitted either by a click command describing the form's
submit button. For example, here is a complete script that searches Google for
the USENIX 2000 conference home page: 
 
http://www.google.com/
enter {Textbox just after 
       "Search the web using Google"} 
      {USENIX 2000}
click SubmitButton
LAPIS also provides script commands for other web browsing actions, including
Home, Back, Forward, Stop, Reload, and Save.
The examples presented so far have been web-site-specific, but some browsing
tasks are sufficiently uniform across web sites to be handled by a generic script.
For example, the following script can log into many web sites, assuming the
user's login name and password have been stored in the Tcl variables id
and password:
 
enter {Textbox 
        just after Text containing 
        ("login"|"email"|"id"|"user")} 
      $id
enter {Textbox just after Text containing 
         "password"} 
      $password
click SubmitButton
3.4  Automation by Demonstration
To create a browsing script quickly, the user can demonstrate it by
recording a browsing sequence. The demonstration begins with an arbitrary example
page, the input page, showing in the browser. Invoking the Demonstrate
command pops up a new browser window, in which the browsing demonstration will
take place. A new window is created so that the browsing sequence can refer
to the input page for parameters. Like any LAPIS browser window, the Demonstrate
window records a browsing history: URLs visited and commands typed. Unlike a
normal browser window, however, the Demonstrate window's history also records
user events in form controls. For example, if the user types into a form field,
the history will record an equivalent enter command.
 
To fill in a form with text from the input page, the user can make a selection
in the input page, then drag-and-drop (or copy-and-paste) to a form field in
the Demonstrate window. If the copied text was selected by searching for a pattern,
then this action records the command enter field-name
pattern in the history. If the copied data was selected manually,
then the command enter field-name {Selection}
is recorded in the history. When the script is run at a later time, Selection
will return the user's selection at that time. More complex dependencies can
be expressed by typing a Tcl command instead of pointing-and-clicking. For example,
if a radio button should be selected only if the input page has certain features,
then the user might type the command if {[find pattern]}
{click field-name}.
 
Using Back and Forward, the user can revise the demonstration as necessary until
the desired results are achieved. The browsing history, which is essentially
a Tcl script, can also be opened in an editing window, where the user can insert
conditionals, iteration, and comments, if desired. When the user is satisfied
with the demonstration, the Demonstrate window is closed, the history is saved
as a script, and the script becomes available as a named command.
 
LAPIS demonstrations have two advantages over the macro recorders in previous
systems, such as LiveAgent [11]. First, the recorded transcript
is represented by the browsing history, which is visible, easy to navigate,
and very familiar. A crucial part of making this work is that LAPIS inserts
commands as well as URLs in the browsing history. Second, an experienced user
can generalize the demonstration on-the-fly by typing commands at crucial points
instead of pointing-and-clicking. Since a full scripting language is supported,
the resulting scripts can be significantly more expressive than recorded macros,
without taking much more time to develop.
 
       
3.5  Script Optimization
A script created by demonstration may include unnecessary steps, which may be
expensive if they fetch web pages. To address this problem, LAPIS includes an
optimizer that tries to compact the browsing script. For example, a sequence
of simple link-clicking may result in a list of URLs:
 
# Start at Yahoo
http://www.yahoo.com/
 
# Click on Weather
http://weather.yahoo.com/
 
# Click on US
http://weather.yahoo.com/regional/US.html 
Since the URLs are constant, depending neither on the input page nor on previous
pages in the demonstration, the optimizer can delete all but the last, saving
several page fetches. 
The optimizer can also streamline form submissions. Submitting a form normally
requires two page fetches, one to retrieve the form and another to submit the
form. The optimizer can eliminate the first fetch by hard-coding the form submission
URL, the form field names, and their values.
 
These optimizations are not always safe, however. For example, some forms have
a variable submission URL or variable default values, often referring to unique
session identifiers or persistent state. Thus the optimizer does not run by
default. Instead, the user selects some or all of the script and invokes the
optimizer on it manually. In the future, the optimizer may be able to gather
information from repeated runs of a script to determine which optimizations
would be safe to make automatically.
 
An optimized form submission may stop working correctly if the form changes,
which happens from time to time when web sites are redesigned or moved. Gross
changes can be detected by various techniques, such as the modification time
or checksum of the form page, but the cost of detecting changes in just the
form (as opposed to page content around the form, which might change
often) would overwhelm the savings of optimization. This is a special case of
a general challenge for web automation: recognizing and dealing with change
on the Web. LAPIS helps with the problem by providing a rich pattern language,
enabling browsing scripts to be insulated from many kinds of changes, but otherwise
leaves detecting and debugging broken scripts to the user.
 
       
3.6  External Programs
 
In addition to built-in Tcl commands, LAPIS can also run an external command-line
program from the Location box. If the command name is not found as a built-in
Tcl command or user-defined script, then LAPIS searches for an external program
by that name. If an external program happens to share the same name as a Tcl
command, the user can force the external program to run with the exec:
prefix.
 
Like a Tcl command, an external program is applied to the current page and displays
its output as a new page added to the browser history. For example, if the user
types (on BSD-style Unix) ps aux, then the browser displays a list
of running processes. If the next command is grep xclock, then the
process listing is filtered to display only those lines containing ``xclock.''
 
To make this work with legacy programs such as ps and grep,
the external program is invoked in a subprocess with its input and output redirected.
Standard input is read from the current page of the browser, passing the HTML
source if the current page is a web page. Standard output is sent to a new page
of the browser, which is displayed incrementally as the program writes output.
Standard error is sent to a subframe of the page, to separate it from standard
output.
 
A program's output may be parsed and manipulated like any other page in LAPIS.
For example, ps aux displays information about running processes:
 
USER    PID %CPU %MEM SIZE RSS TTY...
bin     160  0.0  0.4  752 320  ? ...
daemon  194  0.0  0.6  784 404  ? ...
rcm     294  0.0  1.0 1196 660  ? ...
The output of ps can be parsed by simple LAPIS text constraint patterns: 
Process is Line but not starting with "USER"
User is Alphanumeric at start of Process
PID is Number just after User
These identifiers can be used with LAPIS commands that search and manipulate
the output of ps: 
# sort processes by PID
sort Process -by PID -order numeric 
 
# display only xterm processes
keep {Process containing "xterm"} 
 
# kill all xterm processes
kill [extract {PID in Process 
               containing "xterm"}]
By default, patterns and commands are applied only to standard output, but standard
error may also be processed by referring to the Tcl variable $error,
as in find {"Warning:"} $error.
       
3.7  CGI Programs
If an external program outputs HTML instead of plain text, the browser-shell
detects it and renders it as a web page. HTML output is detected by several
simple heuristics, such as an initial <html> or <doctype>
tag.
 
The HTML output may contain embedded forms. To submit a filled-out form back
to the external program, LAPIS passes form parameters using the Common Gateway
Interface (CGI) [17]. CGI passes form fields and other request information
by setting environment variables, such as QUERY_STRING. Although CGI
is commonly used by web servers to invoke external programs, no major web browser
can invoke a CGI program locally. (The closest we've found is the Help Viewer
in KDE 1.1, which displays HTML help documents and uses CGI to invoke a local
search engine.) One beneficial side-effect of using CGI to communicate with
external programs is that existing CGI scripts can be run directly by the browser-shell,
without installing them in a web server. This feature may be useful for developing
and testing CGI applications outside a web server.
 
Whether a form is being submitted or not, LAPIS always sets the CGI environment
variables when it invokes an external program. A program can use the presence
or absence of these variables to determine whether it was invoked from the browser-shell,
in which case it can present an HTML interface and act like a CGI program, or
from the ordinary typescript shell, in which case it should present a text-only
or command-line interface.
 
One use for this facility is wrapping a friendlier HTML interface around an
existing command-line program. For example, some users have trouble remembering
the syntax for the Unix find command, which searches for files matching
certain constraints. Find supports a variety of predicates on filename,
date, user ownership, etc., and Boolean operators for combining predicates.
We wrote a Perl CGI script wrapper around find which displays a simple
HTML form (Figure 2). The first part of the wrapper script (Figure 3)
tests whether the script is running under LAPIS. If not, or if the user passed
command line arguments, then the wrapper simply invokes the original find.
Otherwise, the script prints an HTML page containing the form. When the form
is filled out and submitted back to the wrapper script, the script invokes find
appropriately.
 
The HTML wrapper makes it possible to use find without learning or
remembering its command-line syntax. A GUI frontend for find would offer
the same benefits, but at greater cost: a GUI frontend has no ready hooks for
automation, but the HTML form interface can be scripted in LAPIS exactly as
if it were a web service. For example, a Java programmer may want a script that
searches all subdirectories for files ending with .class and stores
them in a ZIP file. The user pops up a Demonstrate window, invokes find
to display its HTML form, fills in the form to search for files named *.class,
and applies zip to the resulting list of files. This sequence of actions
is then saved as a script. Thus the user can include find in a script
without learning its more complex command-line interface.
 
 
 
  
 
Figure 2: 
HTML interface for Unix find. 
 
 
 
#!/usr/bin/perl -w 
 
# Check if invoked outside of browser-shell
# or passed arguments.
if (!defined $ENV{"GATEWAY_INTERFACE"} 
    || @ARGV > 0) {
    # Pass arguments directly to find
    exec ("/usr/bin/find", @ARGV);
}
 
# Otherwise act as CGI script.
use CGI qw/:standard/;
 
if (!param()) {
  # No form submitted.
  # Display the HTML interface.
  exec "cat /usr/doc/find/find-form.html";
} else {
  # Handle form submission.
  exec ("/usr/bin/find",
        param("directory"),
        param("search_subdirectories") 
         ? () : ("-maxdepth", "1"),
        "-name", param("name"),
        "-print");
}
Figure 3: 
Perl wrapper for find that displays the HTML form interface
shown in Figure 2 when invoked inside the browser-shell. Form
submissions are handled by the Perl CGI.pm module.
          
4  Implementation
The browser-shell prototype described in this paper was implemented by modifying
an existing web browser, LAPIS, originally designed to test new user interface
ideas. LAPIS is written in Java 1.1 using the HTML layout component JEditorPane
from the Java Foundation Classes. Before modification, LAPIS consisted of about
18,000 lines of code. The browser-shell features added about 2,000 lines of
code. The LAPIS browser-shell has been tested on Linux, Solaris, and Windows
NT.
 
Modifying a browser is not the only way to implement browser-shell capabilities.
Two other general strategies exist for adding features to web browsers. One
scheme uses an internal browser extension mechanism, such as a Netscape plugin,
an Internet Explorer ActiveX component, or a Java applet. The other scheme is
an HTTP proxy, external to the browser but running on the same machine, that
filters the browser's HTTP requests.
 
Both schemes have the advantage of working with existing browsers, but lack
of tight integration with the browser makes some browser-shell features difficult
or impossible to implement. For example, neither scheme would allow commands
to be typed directly into the browser's Location box. Highlighting the results
of pattern matches would be much harder, as would monitoring the user's entries
in form fields to generate scripts by demonstration. The lack of control over
the browser's user interface makes these browser-extension schemes too constraining
for use as a research testbed. For a developed product, however, one of these
schemes may be the best bet, even if it can only deliver a subset of the capabilities
described in this paper.
 
We suggest that there are several levels of browser-shell complexity. Higher
levels are harder to design and implement, but deliver correspondingly greater
benefits. In increasing order of complexity, the levels are:
 
 
 Local program invocation. Implementing this level requires spawning a
subprocess and redirecting its input and output to the browser. This level is
sufficient for using the browser as a command shell.
 Local CGI invocation. Implementing this level requires encoding a form
submission into environment variables and invoking a local program. This level
is sufficient to support local HTML interfaces with form submission.
 Embedded scripting language. Many web browsers already embed Javascript,
but do not support automatic browsing (i.e., a sequence of script commands invoked
on successive web pages). With automatic browsing, this level is sufficient
to support web automation.
 Embedded pattern language. A pattern language like text constraints enables
the user to describe, manipulate, and extract parts of web pages and program
outputs. This level acts as a glue language for connecting unrelated
information sources or programs, so an ideal pattern language should be capable
of describing not only HTML, but also text and XML.
 Web automation by demonstration. Implementing this level requires recording
user events and generalizing them into script commands. This level helps novice
users learn the scripting language and helps expert users streamline the construction
of scripts.
 
         
5  Discussion
 
We now discuss some general implications of integrating a command shell into
a web browser, in particular the new applications, architectures, and interaction
styles that such a hybrid enables.
 
       
5.1  HTML Interfaces
Much interest in recent years has focused on creating and deploying HTML-based
applications that run in web servers. The advantages of deploying
an application as a web service are well understood: it can be accessed by millions
of users at the click of a button, it can be upgraded easily, and it can even
be given away for free, paid for by advertising. The most popular sites on the
Web are HTML interfaces in this sense.
 
The browser-shell opens up a new possibility: deploying HTML interfaces on the
client. There are still many reasons to deploy applications
on the client, including performance, security, and ability to run disconnected
from the network. Current browsers cannot submit HTML forms to client-side programs,
however, forcing a client-side HTML application to handle its user input in
a more complicated way (e.g., with Javascript, Java, or ActiveX). The browser-shell's
ability to submit forms to local programs allows client-side programs to have
pure HTML user interfaces, displayed entirely in the browser.
 
HTML interfaces have several advantages. First, an HTML interface is easy to
implement portably, since it needs only the standard I/O library rather than
larger, less portable GUI libraries. Second, a wide variety of HTML editors
and CGI libraries already exist, making the job easier. Third, compared to a
command-line interface, an HTML interface is easy to use, not only because it
is visual, but also because users are familiar with similar interfaces on the
Web. Finally, compared to a GUI, an HTML interface is easier to script because
it is declarative and textual, allowing systems like LAPIS to parse the interface
and control it automatically.
 
Some applications are well-suited to HTML; others are not. User input is limited
to forms with standard controls such as buttons, menus, and text fields, so
applications that demand richer interaction would be poorly suited. On the other
hand, applications with high information content, such as detailed help or reference
materials, would be well-suited, since HTML makes it easy to intersperse forms
with formatted text, pictures, and hyperlinks.
 
Any program that already has a command-line interface is a prime candidate for
an HTML interface. As our find example showed, wrapping an HTML interface
around a legacy program is simple if the program takes all user input as command-line
arguments. Programs that conduct an interactive dialog with the user are trickier
to wrap, however, because the CGI protocol does not support persistent connections.
The wrapper must be reinvoked for every form submission. This problem could
be solved by a more complex wrapper that maintains its own persistent connection
to the legacy program, or by an alternative form submission protocol with a
persistent connection to the wrapper.
 
HTML interfaces allow command-line programs to be self-describing. Instead of
the terse ``usage'' message printed by command-line interfaces, a program
running in a browser-shell would display its HTML documentation, and embedded
in the documentation itself would be the program's user interface. Thus, the
usage message of an HTML interface not only explains what the program does,
but also presents an interface for actually invoking it.
 
       
5.2  New Shell Interaction Model
The web browser is becoming a central part of the desktop interface. Modern
browsers, such as Microsoft Internet Explorer and KDE's kfm[10],
already include file management among the web browser's responsibilities. Integrating
the system command prompt is another step along the same path, which makes sense
because file management and command execution are often intertwined.
 
The browser-shell interface behaves differently from a traditional typescript
shell, however. Whereas a typescript shell interleaves commands with program
output in the same window, a browser-shell separates the command prompt from
program output. The browser-shell also automatically redirects program input
from the current browser page, and automatically sends program output to a new
browser page.
 
One effect of these differences is on scrolling. In a typescript interface,
long output may scroll out of the window. To view the start of the output, the
user must either scroll back, or else rerun the command with output redirected
to more or head. The browser-shell, by contrast, initially
displays the first windowful of output, rather than the last,
reducing the need for scrolling. When output is less than a windowful, a typescript
can become cluttered by outputs of several commands, forcing the user to scan
for the start of the latest output. The browser-shell displays each program
output on a new, blank page. The overall effect of the browser-shell is like
automatically redirecting output to more.
 
Unlike more, however, the browser-shell's display is not ephemeral.
The displayed output can be passed as input to another command, which allows
pipelines to be assembled more fluidly than in the typescript interface. Developing
a complicated pipeline, such as ps ax | grep xclock | cut -d
' ' -f 1, is often an incremental process. In typescript interfaces, where
input redirection must be specified explicitly, this process typically takes
one of two forms:
 
 
 Repeated execution: run A and view the output; then run A|B
and view the output; then (B turned out wrong) run A|B ¢
and view the output; etc. This strategy fails if any of the commands run slowly
or have side-effects.
 Temporary files: run A > t1 and examine t1; then run B
< t1 > t2 and examine t2; then
(B was wrong) run B ¢ < t1 >
t2, etc.
 
The browser-shell offers a third alternative: run A and view the output;
then run B (which automatically receives its input from A)
and view the output; then press Back (because B was wrong) and run
B ¢ instead. The browser-shell displays each intermediate result
of the pipeline while serving as automatic temporary storage.
 
Automatic input redirection makes constructing a pipeline very fluid, but it
is inappropriate for programs that use standard input for interacting with the
user, such as passwd. Such programs cannot be run in a browser-shell
without modification, such as wrapping an HTML interface around the program,
or running the program in a terminal emulator, possibly embedded in the browser-shell
window.
 
One problem with the browser-shell model is the linear nature of the browsing
history. If the user runs A, backs up, and then runs B, the output of A disappears
from the browsing history. To solve this problem, the LAPIS prototype lets the
user duplicate the browser window, including its history, so that one window
preserves the original history while the other is used to backtrack. (Netscape's
New Window command worked similarly before version 4.0.) A more complex solution
might extend the linear browsing history to a branching tree [2]. 
 
         
6  Status and Future Work
 
The LAPIS web browser described in this paper, including Java source code, is
available from 
 
    
http://www.cs.cmu.edu/~rcm/lapis/
 
LAPIS is only a prototype, but it demonstrates the basic ideas described in
this paper. Unfortunately, the LAPIS prototype is not robust enough for everyday
use, largely because JEditorPane renders many web pages poorly. An
important avenue of future work will be to convert a production-quality web
browser into a browser-shell and experiment with using it on a daily basis.
 
Several features are needed to make the browser-shell more useful and more efficient
as a command shell, including:
 
 
 Background processes. Web browsers generally stop loading a page when a new
URL is typed in the Location box. Similarly, LAPIS automatically stops the currently
executing command when a new command is typed. As a result, only one command
can be running in each LAPIS browser window. An improvement would be support
for background-process syntax. If a command ends with &, it could continue
running in the background, storing its output in case the user ever backs up
through the history. 
 Handling large outputs. A command may generate too much output for the browser
to display efficiently. The same problem often happens in typescript shells,
usually forcing the user to abort the program and run it again redirected to
a file. To handle this problem, the browser could automatically truncate the
display if the output exceeds a certain user-configurable length. The remaining
output would still be spooled to the browser cache, so that the entire output
can viewed in full if desired, or passed as input to another program.
 Streaming I/O. A pipeline may process too much data for the browser's limited
cache to store efficiently. Although the browser-shell's automatic I/O redirection
could still be used to assemble the pipeline (presumably on a subset of the
data), the pipeline would run better on the real data if its constituent commands
were invoked in parallel with minimal buffering of intermediate results. The
browser-shell might do this automatically when invoking a script.
 Shell syntax. Expert users would be more comfortable in the browser-shell if
it also supported conventional operators for pipelining and I/O redirection,
such as , , , and .
The most direct way to accommodate expert users might be to embed an existing
shell, such as bash or tcsh, as an alternative to Tcl.
 
         
7  Conclusions
 
We have integrated a command shell into a web browser, and shown how this arrangement
delivers benefits in three areas: (1) web automation; (2) HTML user interfaces
for command-line applications; and (3) using a web browser as a new way to interact
with the system command prompt. 
 
We would hope that the next generation of web browsers will include at least
some of these features, enabling future web users to put the power of automation
to work in browsing and manipulating the Web.
 
 Acknowledgements
The authors are grateful to David Garlan, John Pane, and the anonymous referees
for their helpful suggestions on improving this paper.
 
This research was funded in part by a USENIX Student Research Grant.
 
 References
[1]Apple Computer, Inc. Macintosh Programmer's Workshop. http://devworld.apple.com/tools/mpw-tools/
[2]E.Z. Ayers and J.T. Stasko. ``Using Graphic History in Browsing the World Wide
Web.'' Proc. 4th International World Wide Web Conference WWW4, December
1995, pp 259-270.
[3]K. Borg. ``IShell: A Visual UNIX Shell.'' Proc. Conference on Human
Factors in Computing Systems (CHI '90), 1990, pp 201-207.
[4]M. A. Cusumano and D. B. Yoffie. ``What Netscape Learned From Cross-Platform
Software Development.''Comm. ACM, v42 n10, October 1999, pp 72-78. 
[5]M. DeJong, et al. Jacl and Tcl Blend. http://www.scriptics.com/software/java
[6]P. E. Haeberli. ``ConMan: A Visual Programming Language for Interactive Graphics.''
Proc. ACM SIGGRAPH 98, 1988, pp 103-111. 
[7]T. R. Henry and S. E. Hudson. ``Squish: A Graphical Shell for Unix.'' Graphics
Interface, 1988, pp 43-49. 
[8]B. Jovanovic and J. D. Foley. ``A Simple Graphics Interface to UNIX.'' Technical
Report GWU-IIST-86-23, George Washington University Institute for Information
Science and Technology, 1986. 
[9]T. Kistler and H. Marais. ``WebL - A Programming Language For the Web.'' In
Computer Networks and ISDN Systems (Proc. 7th International World
Wide Web Conference WWW7), v30, April 1998, pp 259-270. Also appeared as DEC
SRC Technical Note 1997-029. 
[10]K Desktop Environment. KFM. http://www.kde. org/
[11]B. Krulwich. ``Automating the Internet: Agents as User Surrogates.'' IEEE
Internet Computing, v1 n4, July/August 1997. http://computer.org/internet/v1n4/
krul9707.htm 
[12]R. C. Miller and K. Bharat. ``SPHINX: A Framework for Creating Personal, Site-Specific
Web Crawlers.'' In Computer Networks and ISDN Systems (Proc. 7th
International World Wide Web Conference WWW7), v30, April 1998. 
[13]R. C. Miller and Brad A. Myers. ``Creating Dynamic World Wide Web Pages By
Demonstration.'' Carnegie Mellon University School of Computer Science Tech
Report CMU-CS-97-131 (and CMU-HCII-97-101), May 1997. 
[14]R. C. Miller and B. A. Myers. ``Lightweight Structured Text Processing.''
Proc. USENIX 1999 Annual Technical Conference, Monterey, CA, June 1999,
pp 131-144. 
[15]F. Modugno and B. A. Myers. ``Typed Output and Programming in the Interface.''
Carnegie Mellon University School of Computer Science Technical Report, no.
CMU-CS-93-134. March 1993. 
[16]F. Modugno and B. A. Myers. ``Pursuit: Visual Programming in a Visual Domain.''
Carnegie Mellon University School of Computer Science Technical Report, no.
CMU-CS-94-109. January 1994. 
[17]NCSA. Common Gateway Interface. http://hoohoo. ncsa.uiuc.edu/cgi/  
[18]J. Ousterhout. ``Tcl: An Embeddable Command Language.'' Proc. USENIX
1990 Winter Technical Conference, pp 133-146. 
[19]C. Phanouriou and M. Abrams. ``Transforming Command-Line Driven Systems to
Web Applications.'' Proc. 6th International World Wide Web Conference
(WWW6), 1997, Santa Clara CA, pp 599-606. 
[20]R. Pike. ``Acme: A User Interface for Programmers.'' Proc. USENIX 1994
Winter Technical Conference. 
[21]R. Pike. ``The Text Editor sam.'' Software Practice & Experience,
v17 n11, November 1987, pp 813-845. 
[22]A. Sugiura and Y. Koseki. ``Internet Scrapbook: Automating Web Browsing Tasks
by Demonstration.'' Proc. ACM Symposium on User Interface Software
and Technology (UIST 98), 1998, pp 9-18. 
 
 File translated from
TEX
by 
TTH,
version 2.58.
 On 21 Apr 2000, 16:12.
 
 
 
| This paper was originally published in the
Proceedings of the 2000 USENIX Annual Technical Conference,
June 18-23, 2000, San Diego, California, USA Last changed:  12 Apr 2002 ml
 
 |  |  |