################################################
	   #                                              #
	   # ##   ## ###### ####### ##    ## ## ##     ## #
	   # ##   ## ##  ## ##      ###   ## ##  ##   ##  #
	   # ##   ## ##     ##      ####  ## ##   ## ##   #
	   # ##   ## ###### ######  ## ## ## ##    ###    #
	   # ##   ##     ## ##      ##  #### ##   ## ##   #
	   # ##   ## ##  ## ##      ##   ### ##  ##   ##  #
	   # ####### ###### ####### ##    ## ## ##     ## #
	   #                                              #
	   ################################################


	 The following paper was originally presented at the

		     Third Annual Tcl/Tk Workshop
		 Toronto, Ontario, Canada, July 1995

	   sponsored by Unisys, Inc. and USENIX Association


	    It was published by USENIX Association in the
		  1995 Tcl/Tk Workshop Proceedings.
 
 
        For more information about USENIX Association contact:
 
                   1. Phone:    510 528-8649
                   2. FAX:      510 548-5738
                   3. Email:    office@usenix.org
                   4. WWW URL:  https://www.usenix.org
 
 
^L
RIVL: A Resolution Independent Video Language

by Jonathan Swartz and Brian C. Smith, Cornell University

Abstract

As common as video processing is, programmers still implement video
programs as manipulations of arrays of pixels. This paper presents an
extension to Tcl called Rivl (pronounced "rival") where video is a first
class data type. Programs in Rivl use high level operators that are
independent of video resolution and format, increasing portability of
programs and allowing rapid prototyping of video effects. This paper gives
several examples of still-image and video sequence programs in Rivl. It
also discusses efficiency issues and experiences with Tcl as a platform for
Rivl.

1. Introduction

The incorporation of video into our computing environment will change the
way we interact with computers as much as the shift from alphanumeric
terminals to graphical user interfaces. To realize this vision, video must
become as accessible in our computing environment as text and images are
today. Because video has different semantic, storage, and timing
requirements, the realization of truly programmable video requires research
in storage systems, transport protocols, compression methods, and
algorithms.

The way we encode video algorithms today is similar to the way we expressed
numerical algorithms in the days of assembly language. Video is composed of
a sequence of images, each of which is represented as a two dimensional
array of pixel values. In the past, floating point operations were
expressed by manipulating individual bits. Today, video and image
operations are expressed by manipulating pixel values. Some systems (e.g.,
Data Explorer[2] or Khoros[10]) provide a graphical programming environment
where programs are expressed as flowcharts. Although this is an
improvement, the limitations of flowcharts for expressing complex programs
are well-known.

What is needed is a language that incorporates video as a first class data
type, just as floating point numbers are first class data types in almost
all modern programming languages. Thus, just as the floating point addition
"A+B" is well-defined regardless of whether A and B are single, double, or
even quad-precision floating point numbers, the operation "cut the first
five seconds of the video clip" is well defined whether the film resolution
is 16 or 30 frames per second, whether the format is MPEG[5] or motion
JPEG[7], and whether the image size is 100x100 or 6000x4000. This paper
describes one such language, called Rivl (pronounced "Rival"). In Rivl,
video operations are expressed independent of the internal representation
of video data. Just as it is the responsibility of traditional
languages\x11to map a floating-point operation onto the underlying bit
manipulations, it is Rivl's responsibility to map image and video clip
operations onto the underlying pixel and frame manipulations.

Rivl is currently implemented as an extension to the Tcl language [9]. This
paper describes the design and implementation of Rivl, and discusses our
experiences with the language. The rest of the paper is organized as
follows. Section 2 illustrates the Rivl language through a series of
examples. Section 3 discusses efficiency considerations for video
languages. Section 4 discusses our experiences with Tcl as a language
platform for Rivl. Section 5 reviews related work, and concludes with
current status and future research directions.

2. Rivl: The Language

This section illustrates Rivl programs through a series of examples. The
intention is to give the reader a feel for the flavor of Rivl programs and
an idea of the expressive power of the language, rather than providing a
reference manual or an exhaustive catalog of Rivl's functions.

Since Rivl is an extension of Tcl, Rivl programs have access to all the
primitives of the Tcl language. Rivl extends Tcl with two data types:
images, which represent still images, and sequences, which represent video
segments (a timestamped set of images).

2.1. Images

Table 1 lists three classes of image primitives in Rivl. The first class
provides for image input/output, the second class provides geometric
operations on images, such as translation, rotation, and scaling, and the
third class provides miscellaneous operations including cropping,
overlaying, and fading.

The commands in table 1 can be used in Tcl procedures to create more
complex visual effects. For example, consider the following Tcl procedure:

proc whirlpool {image1 p} {

	set image2 [im_scaleC $image1 [expr 1 - $p]]

	set image3 [im_rotateC $image2 [expr 360 * $p]]
	return $image3
}

This procedure takes two arguments: a Rivl image (image1) and a double
value between 0 and 1 (p). The call to im_scaleC reduces image1 by a factor
of 1-p, and assigns the result to image2. The call to im_rotateC rotates
image2 about its center by an angle proportional to p, and stores the
result in image3, which is returned. Figure 1 shows examples of whirlpool
with several values of p.

Many Rivl procedures, like whirlpool, apply a sequence of operations to an
image. Since the operators in table 1 are all non-destructive, these
procedures consist of a sequence of statements of the form

set im [im_effect $im params]

We found this notation cumbersome, so we borrowed an idiom from Scheme. Any
operator with the character "!" appended destructively modifies its first
argument. This rule applies to all operators listed in table 1. Taking
advantage of this notation, we can write whirlpool as:

proc whirlpool {image p} {
	im_scaleC! image [expr 1 - $p]]
	im_rotateC! image [expr 360 * $p]
	return $image
}

Notice that the destructive operation omits the "$" in front of its first
argument, whereas the non-destructive form requires the "$". This artifact
is caused by the way pass-by-reference is implemented in Tcl, a point we
discuss in more detail in section 4.

For consistency, we felt that user defined effects should also have
destructive and non-destructive versions. To this end we provide the
rvl_proc command, which is similar to Tcl's proc command but also creates a
destructive form of the procedure. Thus, if we use rvl_proc in the
definition of whirlpool above, two commands, called whirlpool and
whirlpool!, are created.

Since the full Tcl language is available, Rivl procedures can be
constructed that include looping, branching, and recursion. For example,
suppose the following sequence of operations is applied to an image n
times:

1. Shrink the image to half size.

2. Create three duplicates of the image. Move them "north", "southwest",
and "southeast" respectively.

3. Merge the three duplicates into a single image.

In the limit of large n, a fractal (Sierpinski's Gasket) is created. The
corresponding Rivl program is given in figure 2. The first two lines
compute the translation distance dx and dy for the second step as a
function of the image size. The rest of the procedure follows from the
description given above. Figure 2 shows sample results for several values
of n.

2.2 Sequences

A sequence, the Rivl abstraction for video, can be thought of as a set of
time-stamped images. Table 2 lists four classes of sequence primitives in
Rivl. The first class provides for sequence input/output. The second class
provides operations to split and join sequences (called sequence assembly
operations.) The third class allows scaling and translation of sequences in
the time dimension. And, the final class provides a bridge between
sequences and images.

Like image commands, sequence commands can be composed to create
procedures. For example, consider the following Rivl procedure:

proc preview {movieList n} {
	# Create an empty sequence
	set result [seq_new]
	foreach movie $movieList {
	# Grab segment from next movie
	set chunk [seq_crop $movie 0.0 $n]
	# Concatenate segment onto result
	seq_concat! result $chunk
	}
	return $result
}

Preview extracts n seconds off the beginning of each movie in a list of
movies. The call to seq_new creates an empty sequence for the return
value. The loop repeatedly extracts the first n seconds from each sequence
in movieList and appends it to result, which is returned.

Sequences can also be scaled and shifted in the time dimension, operations
that are analogous to scaling and translating images in the spatial
dimension. Seq_speedup changes the speed of a movie, seq_reverse reverses a
movie, and seq_shift shifts the sequence in time.

Rivl has primitives to convert between sequences and lists of images, and
to apply an image operator to every image in a sequence. The operators
seq_to_ims and ims_to_seq convert a sequence to a list of images and
back. Seq_map executes a given script for each image of a sequence and
assembles the resulting images into a new sequence. Seq_map is similar to
map in Scheme. Consider the command

seq_map $clip {im_fade %1 0.5} 

Seq_map substitutes the handle of the current image wherever %1 appears in
the script. Thus, this command returns a new sequence containing the images
in clip faded by half.

Sometimes, rather than applying the exact same effect to each image, it is
desirable to vary the effect over time. For example, consider the
fade-to-black effect. This effect can be achieved by calling im_fade on
each image in the sequence with a parameter that decreases over time.In
this case, seq_map must call a procedure with a parameter that indicates
the time of the image being modified. To this end, seq_map performs the
following additional substitutions:

 %t: Substitute the time stamp of the current image, in seconds
 %l: Substitute the length of the sequence in seconds
 %p: Substitute the relative time of the current image: %t divided by %l

Using this mechanism, fade-to-black can be expressed

seq_map $clip {im_fade %1 [expr 1-%p]} 

If clip is 21 frames long, this expands into the following series of commands:

im_fade image1 1.00
im_fade image2 0.95
im_fade image3 0.90
...
im_fade image20 0.05
im_fade image21 0.00

The resulting images are concatenated into a new sequence, resulting in the
desired fade-out visual effect.

When combined with sequence assembly operations, seq_map simplifies the
expression of effects that are often used in transitions between two parts
of a movie. For example, the Rivl procedure connectWithTransition, shown in
figure 3, connects two sequences with a transition. The first parameter,
transition, is a script to be passed to seq_map. MovieA & movieB are the
two sequences to be joined, and duration is the time (in seconds) to apply
the transition effect. For example, connectWithTransition {im_fade %1 [expr
1-%p]} $jack $jill 5 connects two sequences jack and jill with a five
second fade.

3. Efficiency Considerations

This section discusses efficiency considerations involved in implementing
Rivl. Efficiency is always an issue for image processing because of the
large amount of data to process. The problem is compounded for video
streams, which have many images for each second of footage. There are two
ways to optimize image computing. First, we must make sure that individual
image operations, such as scales, rotations, etc., are efficient. These
issues have been addressed at length in the graphics literature, and good
algorithms are readily available[3]. Second, we must be intelligent about
which operations we call, in what order, to achieve our final result.

A feature of Rivl that allows us to exploit the second form of optimization
is lazy evaluation, also known as demand-driven execution[8]. Rivl only
computes video data when it is needed for output or display. The result is
that at computation time, Rivl can plan a more intelligent computing
strategy than if each command were executed immediately and independently.

Consider the following Rivl session:

% rivl
> set flowers [im_read flowers.jpg]
rvl_image2
> fractal! flowers 25
rvl_image9
> im_write images/newflowers.ppm
>

Objects such as rvl_image2 and rvl_image9 are actually image proxies. A
proxy contains the dimensions of an image, but no actual pixel data. As we
call image operations with proxies, Rivl builds up a directed graph
representing the operations to be executed without actually computing the
results of the image operators. In effect, the graph is a dynamic trace of
the Rivl program execution. For example, the first two lines of the above
session result in the graph shown in figure 4.

Every edge in the graph represents an image, and every node represents a
primitive image operation. Having this graph provides us with several
opportunities for optimization. Three optimizations are described below.

Computing only what you need. In the process of computing an output image,
we often compute several intermediate images. However, not all the pixels
of an intermediate image necessarily affect the result. For example, an
image might be scaled, rotated, and smoothed, only to be mostly obscured in
a later overlay operation. Rivl uses an algorithm described by Shantzis [8]
that guarantees that only those regions of an image which affect the output
are computed. We apply the same idea to sequences: a sequence frame is only
computed if it appears in the output sequence. Thus, if we apply effects to
all the frames in a sequence, and then cut most of the sequence, only the
frames that remain will need to be processed.

Graph modifications. The second optimization modifies the graph so its
output is equivalent, but the computation is more efficient. Such
modifications include combining or swapping adjacent nodes. For example, an
affine transformation (rotation, scale, translation) is characterized by a
3x3 matrix. Any adjacent affine nodes can be combined into a single node by
multiplying their matrices together.

Graphs can be optimized for speed of computation or quality of output. This
option allows Rivl users to prototype operations at low quality and then
compute the result off-line at high quality once the program is debugged.

Smart caching. Since the graph allows us to determine the exact order of
image computations in advance, we know how many times an image will be
used, and when it is safe to throw it away. Thus Rivl can keep an image in
memory for the optimal length of time. Furthermore, if there are too many
images in memory at once, we can page images to disk using an optimal
virtual memory policy because we know every future image reference. An
implementation of the latter idea is in progress.

4. Experiences

On the whole, our experience with Tcl as a platform for Rivl has been
favorable. Tcl's extensibility makes it easy to build new video processing
commands from Rivl. Tcl's interpreted nature allows for instant feedback on
the results of new video effects, facilitating experimentation and
fine-tuning. Access to Tk is another advantage. We are currently working on
a Tk video editor that will use an embedded Rivl interpreter for video
processing.

We encountered a few Tcl-related problems in our work. The most bothersome
syntactic issue was the lack of transparent pass-by-reference. Recall that
most Rivl commands come in two flavors: destructive (the argument is
modified) and non-destructive (the result is returned). To implement
destructive operations in Tcl, the name of the variable must be passed in
without the leading $. The result is an inconsistency in syntax, as the
following equivalent lines of code show:

im_scale! image 0.5
set image [im_scale $image 0.5]							
Predictably, a common error is to mistakenly insert or omit a dollar sign.

5. Related and Future Work

Many commercial packages are available that provide software libraries of
image manipulation functions. Some use demand-driven execution to achieve
similar optimizations as those mentioned in section 4. These include the
Pixar system described by Shantzis[8], and Silicon Graphics'
ImageVisionLibrary[11]. Holzmann's Popi[4] allows image transformations to
be specified with concise expressions at run-time, a mechanism that permits
rapid prototyping of new image effects. We are adapting this idea to Rivl.

Matthews, Gloor, and Makedon's VideoScheme [6] combines Apple's QuickTime
movie player with a Scheme-based video manipulation language. In
VideoScheme the user works with objects close to the underlying
implementation of video data, such as pixel arrays and frames. This
low-level access gives users considerable flexibility in creating new image
operations. For example, algorithms for detecting "cuts" in video can be
easily built out of pixel array primitives. In contrast, Rivl's high level
of abstraction allows it to exploit delayed computation for improved
efficiency, and its resolution independence makes programs more portable.

Rivl is implemented with 4000 lines of C code and 500 lines of Tcl code. It
has been ported to the Sun OS, HPUX, and Linux operating systems. We plan
to release Rivl version 1.0 in early summer 1995.

The Rivl language is still evolving. We are extending the core set of Rivl
primitives to support other types of video processing, such as image
analysis and vision, text titling, and morphing. We are also building a Tk
video editor on top of Rivl. The editor provides similar features to other
video editors such as Avid's VideoShop[1]. Users can edit multiple
sequences, perform assembly editing, and apply video effects in a
convenient manner. But since the editor is based on Rivl, users can exploit
the power of the Rivl interpreter to define effects and perform tasks
beyond the scope of the graphical user interface.

6. References

[1] Avid VideoShop. Avid Technology Inc. Tewksbury, MA.
[2] Data Explorer software package. IBM.
[3] J. D. Foley, et. al., Computer Graphics: Principles and Practice,
second edition, Reading, Mass. Addison-Wesley, 1990.
[4] Holzmann, Gerald J. Popi. AT&T Bell Laboratories. Murray Hill, NJ.
[5] D. Le Gall, MPEG: a video compression standard for multimedia
applications, Communications of the ACM, April 1991, Vol. 34, Num.4,
pp. 46-58.
[6] Matthews, James, Peter Gloor, and Fillia Makedon. "VideoScheme: A
Programmable Video Editing System for Automation and Media Recognition."
ACM Multimedia `93 Proceedings, pp. 419-426.
[7] W. B. Pennebaker, JPEG still image data compression standard, Van Nos
and Reinhold, New York, 1992.
[8] Shantzis, Michael. "A Model for Efficient and Flexible Image
Computing." SIGGRAPH `94 Proceedings, pp. 147-154.
[9] Ousterhout, John K. Tcl and the Tk Toolkit. Addison-Wesley,
Massachusetts, 1994.
[10] Rasure and Kubica, "The Khoros Application Development Environment",
Experimental Environments for Computer Vision and Image Processing, editor
H.I Christensen and J.L Crowley, World Scientific 1994.
[11] Silicon Graphics Inc. ImageVision Library. Silicon Graphics Inc.,
Mountain View, CA.

Table 1: Image primitives
im_read file: Read an image file (ppm, pgm, jpeg)
im_write image file: Write the image to a file
im_trans image dx dy: Translate an image
im_rotate[C] image theta: Rotate an image; C = around its center
im_scale[C] image s: Scale an image; C = around its center
im_crop image x1 y1 x2 y2: Crop the specified rectangle to make a new image
im_overlay image image ...: Overlay images
im_fade image factor: Darken the image according to factor
im_convolve image kernel: Convolve the image with specified kernel

Table 2: Sequence Primitives
seq_read filepat: Read sequence from MPEG or multiple image files
seq_write seq filepat: Write sequence to disk
seq_crop seq begin end: Crop the specified range to make a new sequence
seq_concat seq seq ...: Concatenate multiple sequences end to end
seq_overlay seq seq ...: Overlay the sequences to form one composite
seq_speedup seq factor: Speed up or slow down sequence
seq_shift seq delta: Shift sequence in time
seq_reverse seq delta: Reverse sequence in time
seq_to_ims seq: Return a list of images by sampling the sequence
ims_to_seq imagelist: Construct a sequence from the list of images
seq_map seq script: Apply a script to each image in the sequence

Figure 2: fractal program
rvl_proc fractal {image n} {
	set dx [expr 0.25 * [im_width $image]]
	set dy [expr 0.25 * [im_height $image]]
	for {set i 0} {$i < $n} {incr i} {
		# 1. Shrink to half size.
		im_scaleC! image 0.5
	
		# 2. Create and move three duplicates.
		set north [im_trans $image 0 -$dy]
		set sw [im_trans $image -$dx $dy]
		set se [im_trans $image $dx $dy]

		# 3. Merge the three duplicates.
		set image [im_overlay $north $sw $se]
	}
	return $image
}

Figure 3: connectWithTransition procedure
proc connectWithTransition {transition movieA movieB duration} {
	set lengthA [seq_length $movieA]
	set lengthB [seq_length $movieB]

	# Untouched parts of first and second movie
	set begin [seq_crop $movieA 0.0 [expr $lengthA-$duration]]
	set end [seq_crop $movieB $duration $lengthB]

	# Apply timed effect to end of first movie; overlay with
	# beginning of second movie
	set mid1 [seq_crop $movieA [expr $lengthA-$duration] $lengthA]
	set mid2 [seq_crop $movieB 0.0 $duration]
	set middle [seq_overlay [seq_map $transition $mid1] $mid2]
	
	seq_concat $begin $middle $end
}