LISA '06 Paper
Fighting Institutional Memory Loss: The Trackle Integrated Issue and Solution Tracking System
Daniel S. Crosta and Matthew J. Singleton - Swarthmore
College Computer Society
Benjamin A. Kuperman - Swarthmore College
Pp. 287-298 of the Proceedings of LISA '06:
20th Large Installation System Administration Conference (Washington, DC:
USENIX Association, December 3-8, 2006).
Abstract
For part-time sysadmins, a record of past actions is an invaluable
tool that provides guidance in repairing or extending system services.
However, requiring sysadmins to keep a detailed log of changes made to
a live system can often seem like a low priority task when compared to
addressing long and growing to-do lists. This problem is worse if the
system administrator is a part-time volunteer and an overworked
student. In this paper we present Trackle, an integrated trouble
ticket and solution tracking system which takes the legwork out of
creating and maintaining this sort of institutional memory.
Furthermore, Trackle is designed to allow untrained student sysadmins
to bootstrap their knowledge by peeking over the shoulders of their
more experienced colleagues - even if those colleagues graduated years
earlier. We accomplish this by tracking the exact actions taken by
sysadmins, showing what lines were changed and in which configuration
files. We allow experienced and inexperienced sysadmins alike to
freely annotate and cross-reference these shell session logs through
an integrated Wiki web interface.
Introduction
The Swarthmore College Computer Society (SCCS) is a group of
student volunteers who provide services to more than 2,000 current
students, alumni, faculty, and staff of Swarthmore College. We provide
UNIX shell accounts, email, group mailing lists, web space for
individuals and student groups, Wikis, database access, and general
computer expertise. We also maintain a public lab of eight Debian
GNU/Linux and two Apple Mac OS X computers. The sysadmins meet weekly
for an hour to keep up to date on projects, problems, and planned
improvements, and communicate via email between meetings.
Unlike many large UNIX administration setups, the SCCS has no
full-time trained staff. The student sysadmins are hired once a year,
usually in groups of two or three, and most graduate after serving as
a sysadmin for two years. At any time we have about eight sysadmins on
staff. Since a part of SCCS's mission is to provide an environment in
which interested students may learn the art of system administration,
we recruit students with technical backgrounds ranging from ``I can
email and browse the web,'' to ``my desktop PC is a Beowulf cluster.''
This disparity in background and the high turnover rate presents us
with the dual problems of how to preserve our collective knowledge and
how to train future generations.
For many years, our primary means of training sysadmins has been
the SCCS staff email list and archived diaries. The diaries are a
cluttered repository; we use the staff email list as often to tease
one another as to discuss pertinent policy, administrative issues, or
help requests from our users. Though sysadmins are supposed to report
changes to our servers and lab machines to the list, we often have to
rediscover particular configurations when problems arise. Furthermore,
mbox mailboxes are a hard format to browse,
particularly for new, inexperienced sysadmins who are eager to learn
but more comfortable browsing the web than navigating the sometimes
murky waters of UNIX shell prompts, commands, and filesystems.
In the last three years, SCCS staff have used a private Wiki to
document changes, configurations, and internal policies as well as for
collaborative planning. We have found the unconstrained and highly
interlinked environment very valuable to our operations, but
relatively little of the content on the Wiki is pertinent system
configurations or records of changes made. As with the email diaries,
the extra time and effort required to self-report changes made to our
systems presents too high a hurdle for our volunteer sysadmins, each
of whom has a range of other academic and extra-curricular
commitments.
For training, we needed a system that preserved the ability to
associate high level conceptual descriptions of desired configurations
or problems with the particular shell actions or configuration file
changes to put those ideas into effect. Unlike the Wiki and email
list, we wanted a system that did not require sysadmins to take on the
extra burden of having to write up complicated records of their
actions after long, sometimes frustrating problem-solving sessions.
Furthermore, since our sysadmins are hired without any requirement of
previous experience, we wanted a tool which could help new sysadmins
learn the subtleties of UNIX administration and our particular setup
without requiring any former familiarity with UNIX.
To respond to these concerns, we created Trackle, a web- and
console-based integrated issue and solution tracking system. Each
problem reported to the system, either by our users or by SCCS staff,
creates a ticket in the issue database. Open issues may be viewed
through the web where they can be freely annotated, edited, and cross-referenced
with the built-in Wiki; or through a console interface,
from which a tracked shell session may be started. All actions taken
in the shell session, and all files changed, are recorded in the
Trackle database, and create an entry in the ticket history log. These
shell session logs are then editable in Wiki fashion.
Existing and Related Work
We first attempted to discover if any existing tools could be used
to solve this problem. Since a sysadmin's job is primarily task-oriented,
we began our search with trouble ticket and issue tracking
systems. Issue tracking systems seem to have lost appeal in the
literature since the late 1990s. Before that, there were one or two
issue tracking papers per year in LISA, most of which explored
different extensions to the basic trouble ticket idea.
One early tool, Request 2 [8], was designed with a pedagogy of
training junior system administrators in mind. With Request 2, senior
sysadmins could divide and delegate work to less experienced admins,
and oversee their work or offer advice.
PLOD, the Personal LOgging Device [6], enables sysadmins to easily
self-report by providing a set of command-line Perl scripts which the
sysadmin may use to make notes to him or herself about work just
completed. PLOD simplifies the tasks of keeping records consistently
and making them centrally available. PLOD did not include, and was not
integrated with, any issue tracking system.
The LOS Task Request System [9], designed several years ago by
another SCCS sysadmin, allows users to create ticket-like task
requests through a web interface. Each task request is associated with
a task description, which includes validation logic and the commands
required to execute the task. This alleviates sysadmin workload by
automating the tedious tasks of collecting and verifying information
for repetitive changes to the system.
A related field of more current interest is configuration
management and version control. Configuration management systems
attempt to centralize or abstract configuration details for large
numbers of similar hosts. Tasks include initial setup, maintenance,
troubleshooting, and incorporating local changes back into the global
description. For a more careful consideration of the history and
future of configuration management, see [1, 3, 7].
The SCCS currently has no plans to use a configuration management
system at our site. Our two primary servers do not share enough common
configuration to yield much advantage from such a system, while our
public lab machines, which are configured similarly, are periodically
reinstalled to a known consistent state for security reasons. For
these reasons, we do not believe we would gain enough benefit from a
configuration management system to justify the setup effort.
Much of the time our sysadmins spend working involves user-owned
files, such as spam filter settings, dotfiles, and web access
configurations. Since these files may be created and deleted by users
at any time, a configuration management system would not be a good
choice for monitoring changes to these files.
Moreover, though configuration management may be considered an
industry best practice, we believe no automatic configuration system
can adequately replace a sysadmin's ability to get dirty with manual
configuration and hand-tuning of UNIX services and applications. We
would rather our sysadmins learn how to discover a problem at its
source and implement a solution appropriately than merely learn how to
control a single piece of software, no matter how powerful it may be.
One of the chief goals of Trackle is to provide a platform for
training inexperienced sysadmins. We believe that it is important to
gain an understanding of UNIX cause and effect by observing and
learning about configuration files, logs, commands, and scripts. If
new sysadmins only learn to rely on abstract configuration management
systems, they might find themselves unprepared to deal with problems
outside the scope of those systems.
None of the systems we considered deploying covered our needs for
a system to track issues, actions, and train new sysadmins. A ticket
tracking system would help organize and abstract the institutional
memory knowledgebase, but would still require self-reporting. A
configuration management system would keep accurate records for the
files that it tracked, but would not provide a good platform for
training incoming sysadmins. Having searched and failed to find just
what we wanted gave us a chance to reevaluate our problems, and we
decided that only by implementing a custom solution could we meet our
requirements.
Design of Trackle
From the beginning, the design of Trackle has been motivated by a
philosophy of minimalism and simplicity. We believe that the most
useful system for our needs is one that imposes as little as possible
on our sysadmins in terms of new workflows and interfaces to learn. We
want Trackle to stay out of the way of sysadmins, but at the same time
to automate as much of its own data collection and presentation tasks
as possible.
It is always difficult to find the right balance of the automatic
and the manual, the consistent and the customizable. Given the sort of
information Trackle manages, we believe the right place for the cut is
at data collection and initial presentation. Creating the complex
knowledge structures through annotation and cross referencing is left
to the sysadmins as a secondary task, since it is not one that a
computer is likely to do well.
Trackle is intended for smaller groups of sysadmins with a
relatively low volume of requests. It is designed to be used
effectively by both expert and inexperienced sysadmins. Trackle's
information organization methods are best suited to small groups who
have time to annotate and cross-reference collected data. The goal of
allowing this rehashing and reorganization of information is to
provide a platform for self-directed training of inexperienced
sysadmins on their own time.
Requirements
Two classes of users: Since the SCCS naturally has
two classes of users, sysadmins and SCCS end users, Trackle was
designed to accommodate both types. Differentiating the web interface
for the two types of users allows us to improve both security and
convenience. Unauthenticated end users are denied sensitive
information, such as email addresses in tickets or files in logged
shell sessions, and are not shown potentially confusing prompts when
creating tickets. Sysadmins, on the other hand, need access to all the
information about tickets and shell logs, and should know enough to
understand the more detailed ticket attributes.
Low barriers to use: One problem with existing
ticket tracking systems is the often overwhelming amount of
information that is requested to submit a ticket. With Trackle, we
wanted to make filing a ticket as easy as possible so that end users
and sysadmins alike can move quickly through the web forms. Our
interfaces are designed to be intuitive so that they can be used
without having to waste time reading documentation.
For end users, we wanted Trackle to be as easy to use the first
time as the tenth. End users do not have to register accounts with
Trackle to create or subscribe to tickets. Instead we use an email
confirmation system to verify an end user's identity.
High-level organizational tools: Trackle's central
goal is to provide a framework for flexible representation of the
collected data. It is designed to facilitate Wiki-like annotation and
cross-referencing rather than imposing a fixed organizational
structure. Objects in Trackle support Wiki formatting, enabling them
to link to each other. With these tools, the recorded data can be
arranged, indexed, and presented in the ways most useful to the
sysadmins who need access to it.
Few dependencies: Trackle was designed to be
dependent on as few external software packages as possible. It is
specifically designed to work with the packages and versions available
in the Debian Sarge GNU/Linux distribution (our deployment platform);
however, we wanted to make Trackle freely available for any interested
groups for use on many platforms. To ensure easy portability, it is
not closely tied to any particular versions or packages.
Ticket System
As discussed earlier, we decided to use a ticket tracking system
as the basis for Trackle's higher-level categorization of captured
data. We briefly considered RT,[Note 1]
but decided that it would be too confusing and cluttered with
information to be useful to inexperienced sysadmins. Instead, we
decided to use a relative newcomer to the trouble ticket world,
Trac,[Note 2] as our foundation.
Trac is a BSD-licensed bug tracking system with integrated version
control and Wiki. Trac embodies a minimalistic approach to ticketing
which fit well with our design goals: it presents a clean, interface
intuitive to both our end users and sysadmins, and has relatively few
features and ticket attributes specific to bug tracking. Many of the
features we wanted (easy annotation of tickets and shell histories,
high-level organization and cross-referencing through the integrated
Wiki, file change visualization) were already implemented, and adding
our own functionality was very easy due to the well-organized Trac
API.
However, Trac required more than cosmetic changes to fit our
needs. Despite its flexible permission system, Trac does not natively
support multiple classes of users, nor does it have ticket locking.
Additionally, we had to modify the ticket status logic to accommodate
unconfirmed tickets.
Tracking File Changes
Tracking file changes was the biggest challenge encountered when
designing Trackle. Because we do not use any configuration management
or version control, we needed to ensure we could get a before and
after view of each file modified during a sysadmin's shell session. We
considered several alternative strategies before eventually settling
on a custom interposition library. The possibilities may be roughly
split into kernel-space solutions (LVM snapshots, UnionFS, FUSE, C2
audit logs) and user-space solutions (plugins/history interpretation
and library interposition).
LVM snapshots: The Linux Logical Volume Manager[Note 3]
(LVM) provides a way to create a time-frozen snapshot of a volume. In
Trackle, we could use a LVM Snapshot to capture the pre-shell tracking
session state of the system and then compare the two filesystems at
the end of the session. LVM works below the filesystem and captures
changes by tracking changed disk blocks rather than changed files. To
use LVM with Trackle we would have to relate changed disk blocks to
changed files, or compute a recursive diff between the two
hierarchies; the former would be prohibitively hard and would be tied
to particular filesystem implementations, and the latter would be
prohibitively slow. It is also unclear how to differentiate what files
were changed by the user during the shell session and which files were
merely changed by other users or processes during that time.
Additionally, it would impose LVM as a runtime dependency for Trackle
and limit Trackle to use on Linux.
UnionFS: UnionFS[Note 4]
is a meta-filesystem that stacks one or more existing, mounted
filesystems into one hierarchy. It supports copy-on-write so that a
read-only filesystem can be made to appear as a read-write filesystem.
This functionality is used by many Live Linux Distributions, like
SLAX[Note 5]
and KNOPPIX.[Note 6]
When operating on a file in a UnionFS stack, the topmost instance of
the named file is used. This means that in order to keep a initial
copy of the file for Trackle's use, the underlying filesystem would
have to be mounted read-only. Since it would be impractical to remount
the entire filesystem hierarchy read-only, Trackle would have to
create a read-only bind mount of the root hierarchy, to use as the
bottom of the UnionFS stack. On top of that would be placed a read-write
filesystem mounted elsewhere to capture the revised versions of
files. Because this third filesystem is hierarchically removed from
the active root filesystem, any changes made during the shell tracking
session would not be reflected to the root filesystem unless manually
copied (for instance, by a special Trackle command) or when
synchronized at the end of the session. Additionally, UnionFS requires
the insertion of a new kernel module, which some sysadmins may be
hesitant to allow. At this time, UnionFS is only available for Linux.
FUSE: The Filesystem in Userspace[Note 7]
(FUSE) technique works similarly to the interposition library approach
we finally adopted. By use of a special kernel module, some or all
filesystem-related library calls can be delegated to a user space
daemon. Trackle could use FUSE to make note of which files the user is
accessing and copy the files' initial state before returning from the
user's library call. This provides about the same level of flexibility
as using an interposition library. Using FUSE requires loading a
kernel module and mounting and unmounting filesystems. In order to use
FUSE with Trackle, we would have to create additional setuid-root
binaries to handle these tasks. FUSE is currently only available for
Linux and FreeBSD.
C2-like audit logs: Many operating systems include auditing
facilities to track file accesses as one of the requirements for a
trusted computer system [10, 2]. There are projects to bring this type
of auditing to GNU/Linux such as SNARE for Linux.[Note 8]
and SELinux[Note 9]
These systems require kernel modifications, and are typically enabled
for the entire system instead of just a process and its subprocesses.
Consequently, it is difficult to dynamically enable auditing, which
results in the generation of overwhelming amounts data. Additionally,
these audit logs only notify us that a file has been modified after
the fact, limiting our ability to determine what changes were made.
Plugins and shell history interpretation: In theory it is possible
to capture the majority of file changes made by a sysadmin by writing
plugins for the editors Vim and Emacs to record files opened or saved,
and by looking at the command history of the shell to infer which
files might have been modified. This approach can never be more than
approximate as there are many other ways to change a file other than
by these two editors. Even writing plugins for these editors will not
capture changes made to underlying files by wrapper programs like
vipw that act on temporary files. Additionally, editor
plugins do not capture file changes made by users at the command line
either by shell redirection (``echo stuff >>
outfile'') or file-related commands (mv,
cp, rm, chmod, etc.).
While parsing a shell history file might be able to discover these
kinds of changes, it would have to happen after the commands changing
the files had run. This would prevent us from tracking file changes,
and the list of command patterns to check would be quite large.
Wrapping all file change tools and implementing a custom shell would
impose a large (if not insurmountable) maintenance task.
Library interposition: Most binary executables on UNIX are
dynamically linked and access files through standard C library
functions (open, unlink,
chmod, etc.) The dynamic linker/loader checks the
environment variable LD_PRELOAD for a list of shared
libraries to be loaded and searched before all others. These libraries
are called interposed libraries because any function that is
defined in one of these libraries intercepts the call to standard
libraries.
We use a custom interposition library to intercept file-related
library calls so that we can track changes to files during a shell
session. When we intercept the call, we are able to gather information
about the current state of the file before passing the request on to
the real library call. We are also able to observe the resulting
changes and collect additional information as needed.
We decided to use an interposition library for a number of
reasons. It can be written so that it only captures the events we are
concerned with, including privileged operations (using
sudo, su, etc.). It is a userspace
solution, so it is not dependent on any particular kernel and is
portable to most other UNIX platforms. Finally and perhaps most
importantly, the majority of the functionality was already present in
a component of Audlib [4, 5] designed to track user and sysadmin
actions for detecting abuse of authority.
Trackle Architecture
Trackle consists of four functional components, each of which
communicates with a central database:
-
The web interface, which allows full access to the ticket database,
Wiki pages, and shell session logs. It is the primary interface to all
Trackle data.
-
The console tools, which allow sysadmins to access the ticket
database, and begin shell tracking sessions to capture changes made to
resolve a problem.
-
The interposition library, which is responsible for tracking changes
to files during a tracked shell session.
-
The email notification system, which keeps end users and sysadmins
informed of ticket status changes, and the email confirmation system
which enables unauthenticated end users to interact with aspects of
the web interface.
The web interface, console tools, and email system are all written
in Python and communicate directly with the central PostgreSQL[Note 10]
database. The shell tracking session is written in C, and communicates
through the console tools (see Figure 1).
Figure 1: Trackle consists of four components which
store their data in a central PostgreSQL database.
Web Interface
The Trackle web interface provides full access to tickets, shell
logs, and an integrated Wiki. The web interface has been designed with
simplicity and integration in mind. Like most of Trackle, the web
interface is written mostly in Python. Trackle uses Clearsilver[Note 11]
for its HTML templates. The web interface recognizes two classes of
users, anonymous end users and authenticated sysadmins. Because they
are not required to log in, end users are required to complete email
confirmation when creating tickets. Sysadmins will be able to
authenticate with the system, which will allow them to perform
privileged tasks.
Tickets: Trackle's integrated issue tracking system is more than
just a to-do list. It provides the initial framework for organizing
automatically collected data, linking an abstract description of a
problem and the concrete steps required to solve it. All users may
create tickets. End users are presented with a streamlined form
(Figure 2) requesting:
-
contact email address
-
brief problem description
-
problem type
-
problem severity
Figure 2: The ticket creation form for anonymous
users is streamlined so that a user can file a ticket quickly. The set
of fields displayed and the explanation shown next to each is
configurable.
In addition to these fields, authenticated sysadmins are prompted
for more information which would not be relevant to end users:
-
priority
-
sysadmin assignment
-
keywords
-
subscription list
Ticket browsing is also available to anonymous and authenticated
users. Again, the two user groups have very different views. Anonymous
users are presented with a minimal amount of information about the
ticket, for convenience, security, and privacy. They can also
subscribe to existing tickets. Authenticated users have access to the
entire ticket history, have the ability to change the ticket fields,
and can close, reopen, and assign tickets.
Wiki pages: Trac's integrated Wiki required very little
modification for Trackle. A Wiki enables users to add, remove, and
modify content easily and quickly directly through their web browser
using a simple formatting language. Wiki pages are visible to both
authenticated and anonymous users, but are only modifiable by
authenticated users and can be made private. Most long text fields in
Trackle support Wiki formatting to allow even more possibilities for
cross-referencing and annotation.
Shell session logs: All the information recorded in a shell
tracking session is presented in the web interface in a shell session
log (Figure 3). They are accessible to authenticated sysadmins through
an index where they are sorted by ticket, with the most recently added
shell log appearing first. The logs are also linked individually from
the associated ticket. In keeping with the ideal of providing the most
functionality while imposing the least structure, Trackle shell logs
are editable and support Wiki formatting. The Wiki-like nature of the
shell logs allows for easy annotation and cross-referencing.
Figure 3: The shell session log displays all the data
that was recorded during the shell session. At the top, the shell
history, start and end times, and environment variables are displayed.
Below, the contents of files that were changed are shown with
deletions highlighted in red and additions in green.
After the Wiki portion of the log (which contains shell history,
environment variables, and any other pertinent shell data), the
modified files are displayed as colored diffs. All of the modified
files are stored in the database, but not all are displayed by
default. The shell log edit page provides a list of all files from
which individual files can be selected for display.
Console Tools
The Trackle console tools consist of several scripts supporting
the main interface, trackle-cli. Like most of the rest
of Trackle, the console tools are written in Python for easy
extensibility and maintenance.
Trackle-cli: The primary interface to the ticket database from the
console is trackle-cli, a screen-oriented program for
the curses environment. It is written using DTK,[Note 12]
a Pythonic wrapper for the curses Python module.
Following our goal of minimalistic interfaces, trackle-cli
is designed to give enough information that sysadmins can
quickly find a ticket and begin working on it, but not so much that
they are bogged down by text-filled screens of details. The two
primary screens of trackle-cli are the ticket overview
and ticket detail screens.
The ticket overview screen (Figure 4) lists the open tickets,
showing for each the values of a configurable set of ticket fields.
The default set shows the ticket's numeric ID, summary, owner, and
time of last change. Sysadmins navigate this list using the keyboard
and may either select an existing ticket or create a new ticket.
Figure 4: Trackle-cli's ticket
overview list, sorted by change date. The ``Help Bar'' at the top
lists currently active keybindings. Trackle-cli is
designed to work in an 80 × 24 character window.
The ticket detail screen (Figure 5) shows the most pertinent
details of an individual ticket, with short ticket fields displayed
above the fold and the longer ticket description shown below in
unformatted Wiki text. Some ticket information, notably the ticket
change log, is not available through the console interface due to
space constraints.
Figure 5: Trackle-cli displays the
most pertinent ticket details, and allows simple editing to correct
any mistakes. From this screen the sysadmin may begin recording shell
actions to be associated with this ticket.
Trackle-cli supports editing of all visible
information in the ticket detail screen. After selecting a field, the
focus is moved to either a text editing field or an enumerated
selection field which displays appropriate available values.
Enumerated fields can have possible values added to them through the
web interface or trackle-admin program. Ticket editing
in trackle-cli is designed to allow a sysadmin to
correct minor errors found in a ticket - full-featured editing is
available in the web interface.
Sysadmins can edit the ticket description using the editor set in
the VISUAL or EDITOR environment
variable. When the editor terminates, the description display area is
updated with the new value. If the sysadmin wishes to save the changes
back to the Trackle database, trackle-cli prompts for
a ticket changelog message using the same method.
From the ticket detail screen, the sysadmin can start a tracked
shell session. Trackle-cli is suspended and replaced
by a shell with a modified environment. Upon completion of the shell
session, the sysadmin is shown a list of all the files that were
modified during the session (see Figure 6). The sysadmin then selects
files to display in the web shell session log. This selection may be
changed later through the web interface. After selecting the files and
saving the session log, the sysadmin is returned to the ticket detail
screen.
Figure 6: Trackle-cli allows the
sysadmin to decide which files will be displayed in the web version of
the shell session log.
Trackle-shell-prompt: Our experience at the SCCS with the beta
release of Trackle showed that
even experienced sysadmins occasionally forgot whether they were in an
ordinary shell or in a tracked shell spawned by trackle-cli.
To address this, we created trackle-shell-prompt,
a helper script which detects whether the user is
running inside a tracked shell session by checking for certain
environment variables. If so, it prepends the prompt with ``Trackle''
in green boldface. Trackle-shell-prompt also sets its
exit code to 0 when in a shell tracking session, or 1 when not, so
that it can be used in shell scripts.
Trackle-admin: This tool is used to configure the run-time
settings of Trackle stored in the central database. Trackle-admin
is used to add, update, or remove accounts (only
sysadmins need an account for Trackle - end users may use the web
interface without authentication), and the following enumerated ticket
attribute fields: component (a brief description of the area of the
system affected by a problem), milestone, priority, user severity, and
ticket type. Wiki pages can be imported from or exported to plain text
files. Like trackle-shell-prompt, the exit code
returned by trackle-admin is set to 0 on success, or
another value on error.
Shell Session Tracking
The ability to track shell sessions sets Trackle apart from
previous solutions. The tracking system removes the burden of self-reporting
from the sysadmin. Relying on the sysadmin to report exactly
what changes were made can be problematic. After a long session, even
the most conscientious sysadmin may be unable to remember exactly what
they did, let alone record it accurately. A seemingly inconsequential
change made early on may be overlooked after tackling a more
frustrating issue.
Recording incorrect or incomplete information defeats the purpose
of tracking changes because it reduces the accuracy of the report and
degrades the utility of the system as a training tool. The goal of
Trackle is to automate as much record collection as possible in order
to avoid human error.
To provide an accurate summary of a shell session, Trackle
collects information about the session:
-
start and end time
-
environment variables
-
shell history
-
copies of all created, removed, or modified files
Most of these items are easy to collect using built-in Python
commands. The history of shell commands can be obtained from the
bash shell.[Note 13]
This information is stored in a state directory created for each
tracked shell session.
Recording file changes is the tricky part. For this we use library
interposition. Our interposition library, libtrackle,
is based on the work done by Kuperman and Paksoy for
Audlib. We can intercept library calls by writing our
own version of the call in question with the same signature.
Effectively, we wrap the library call with additional functionality to
support Trackle. In this function we record any information we need
and then pass the call on. This whole process is transparent to the
running program, and as long as the parameters and return values are
passed through to the real version without modification, interposition
does not change the program's behavior.
In order to catch file modifications we need to intercept file-related
calls (fopen, chmod, etc.)
before they get to the kernel so that we can make an initial copy of
the file in question. Once we have a copy, we allow all further calls
on that file to pass through to the kernel without logging. Just
before the program terminates, we make another copy of the file to
compare against the original version.
In addition to intercepting calls, interposition allows us to
define functions that are called when the library is first loaded and
when the process terminates. We use the initialization routine to
cache local state variables originally set in the environment by the
console tools. In the finalization function we iterate over all files
that have been accessed during the execution of the program and make a
final copy of all files which have been modified.
A file might be modified several times during the course of a
tracked shell session. We record the initial copy only once but
overwrite the final copy each time a program terminates. Therefore,
the difference information generated consists of changes made over the
entire session.
Some programs written with security in mind attempt to disable
library interposition (for good reasons), and sysadmins are encouraged
to use these sorts of tools (e.g., sudo) when changing
system configuration. In order to collect these changes, we need to
prevent these programs from disabling libtrackle. For
example, the environment created for executing vim by
``sudo vim /etc/filename.cfg'' lacks the
LD_PRELOAD environment variable, effectively disabling
libtrackle. We could not find a way to override this
behavior in sudo without modifying its source code. We
circumvented this issue by overriding the exec library
calls with our own versions that reset the LD_PRELOAD
variable to the proper value before executing the program.[Note 14]
This solution ensures that libtrackle is loaded for
all dynamically linked programs.
Email Notification and Confirmation
Email plays two roles in Trackle, providing notification to
sysadmins and end users, and email confirmation to prevent abuse of
the web interface by unauthenticated users.
Whenever a sysadmin changes a ticket's status, like closing an
open ticket or assigning a new ticket, a configurable list of staff
email addresses (in SCCS's case, our staff email list address) is sent
a notification indicating the status change and any comments or
changed fields changed at the same time. Subscribed users receive an
abbreviated version of this email containing just the status change
information and the comment.
Trackle also provides the trackle-notify script which
sends periodic updates to the staff email addresses. The email
contains a list of all open tickets, details for each open ticket, and
a list of ticket status changes since the previous periodic update was
sent. Trackle-notify is designed to be run through
cron or another task scheduler, therefore it produces
no output under normal circumstances.
The email confirmation system allows end users to confirm their
identity without requiring them to get, remember, and use a separate
set of user names and passwords. When an end user creates a ticket or
subscribes to an existing ticket, Trackle sends an email to the
address the end user entered. Users click the link in the email to
make the action take effect.
Sample Workflows
Trackle is designed with human interaction in mind, and as such
any description of its components does not fully capture the feel of
the application. In this section we describe two sample workflows to
demonstrate the expected usage of Trackle: 1) the lifecycle of a
typical ticket created by an end user, and 2) a sysadmin annotating
existing resources for future reference.
Ticket Lifecycle
Suppose SCCS user Alice is unable to access the administration
page for her group's email list. She points her browser at Trackle's
ticket creation page, https://bugs.sccs.swarthmore.edu/, and begins to
fill out the form. She is prompted for her email address, a brief
summary, a ticket type (software bug, software request, security
issue, etc.), a user severity (that is, how severe the issue is to
Alice) and a longer description of the problem. Alice notices the link
to a reference on Wiki formatting, and helpfully formats her
description into several sections including error output she received,
a link to the page with the problem, etc.
When Alice clicks the submit button, she is taken to a screen
informing her that she will receive an email containing instructions
and a link to confirm her ticket, and that her email address has been
automatically subscribed to the ticket for status updates. Without
clicking this link, her ticket will not show up in ticket reports, and
the sysadmins will not be notified. Once she clicks the link, Trackle
marks the ticket as confirmed and emails the sysadmins to notify them
of the newly created ticket.
Now suppose SCCS user Bob is also unable to access the
administration page for his group's email list. Because he read the
page on ticket etiquette, he knows to first use Trackle's built-in
search feature to see if any open tickets are already filed for this
problem. He notices the ticket Alice just created, and confirms that
it is for the same issue he is facing, so he does not need to file his
own ticket. But Bob is impatient and has important work to do on his
list, and wants to know as soon as the problem is resolved.
Fortunately, Trackle allows Bob to subscribe to the ticket Alice
created, again using email confirmation to verify his identity.
Now suppose SCCS sysadmin Melvin checks either his email or the
Trackle website and notices the newly created ticket. Being our
resident expert on Mailman list administration, Melvin SSH'es to our
primary web and email server, runs trackle-cli, and
begins working on the ticket. Behind the scenes, trackle-cli
has locked his ticket so that no other sysadmin can begin
working on the same issue. His shell history as well as any files he
changes are now being recorded, so as he investigates and solves the
problem, all his actions are logged. When he types
`exit' to leave the shell tracking session and returns
to trackle-cli, he is shown a list of all the files
whose contents or permissions he changed, and he can select which will
be included in the log by default. When he saves the shell session log
to the database, the ticket is automatically unlocked.
Since, by virtue of his Mailman wizardry, Melvin resolved the
problem for Alice and Bob, he opens up the ticket's page in Trackle's
web interface, makes some remarks to be appended to the ticket's
history, and marks the ticket closed. Trackle then emails all the
subscribed users to notify them that the issue has been resolved. This
notification includes the remarks Melvin appended to the ticket's
history.
Annotation and Cross-Referencing
Suppose SCCS sysadmin Wendy, our resident Wiki enthusiast, decides
it is time for her to learn more about Mailman. She begins by
searching through the existing Wiki pages, tickets and shell session
logs to find anything that might be related to Mailman lists. She
creates a new Wiki page, which she protects from anonymous access
since it may contain sensitive information that should not be leaked
to the public, and begins adding links.
In Trackle, all Wiki pages, tickets, shell session logs, and
milestones are linkable objects, and all have built-in Wiki support so
that any of them can link to any of the others. Wendy takes advantage
of this by not only including forward links from the new Wiki page to
the related tickets and shell session logs, but by editing those
objects to include return links to the new page she is creating.
By default, the shell session logs that are captured by Trackle
are pretty bare and contain just collected information. Since Wendy is
in no hurry, she is at ease to spend some time investigating the
effects of the commands recorded in the shell history of the session
log created by Melvin above. She can integrate this information, along
with links to other Trackle objects, online documentation, or any
website, by editing the Wiki page at work behind each shell session
log. She can remove commands from the history that are not pertinent
so that the shell session log contains just the relevant information
for future reference. She can also change the set of files which are
displayed in the shell session log, overriding the defaults Melvin
selected at the end of his shell tracking session.
Future Plans
As with any large software project, the initial public release of
Trackle is far from complete. As frequently happens with new software,
some of the most interesting features were not suggested until the
SCCS started using Trackle, and others were inspired as a byproduct of
the development process. Many of these new feature ideas that came to
us mid-project made it in to the initial public release, but some did
not. The following are planned features for Trackle that have not yet
been implemented.
Ticket extensions: Early feedback from SCCS
sysadmins has shown that some extensions to the ticket system may be
useful, particularly the ability to express relationships among active
tickets. A dependency relationship (the completion of ticket B can
only happen after ticket A is closed) could hide or deprioritize
tickets depending on others, or emhpasize tickets which are depended
on. A parent-child relationship would be used to group several related
tickets (e.g., configuration upgrades) under one parent (e.g., Linux
distribution upgrades). Ticket due dates would enable Trackle to
automatically escalate a ticket's importance over time. Finally,
private tickets (not visible to unauthenticated end users) would be
used to track sensitive information such as hiring or policy debates.
Multiple machine support: All of the components of
Trackle interact with one central PostgreSQL database. Currently, the
Trackle console tools do not keep track of host-specific info, so only
one machine can be tracked per instance of Trackle. Because PostgreSQL
communicates over TCP, it should not be difficult to add network
functionality to the console tools. It would then be possible to
install the console tools on any number of properly configured servers
and clients, and have them all report back to one central database.
There are, however, some situations where a per-machine instances
of Trackle might be useful. Trying to track many disparate issues
occurring on unrelated machines would become cumbersome and is
unnecessary. An alternative that might provide the best of both worlds
would be a hierarchical approach. Rather than storing all the data on
one central server, leave the data distributed, but allow
communication between the individual instances of Trackle. For
example, deploy one network-wide overview server, one site-wide
overview server per site, and install Trackle individually on the
machines at each site. This could help mitigate some of the scaling
issues that would come with very large databases.
File revision control: We briefly mentioned
configuration management tools, some of which include revision control
functionality. Trackle is based on the open source program Trac, which
includes tight integration with the Subversion revision control
system, and could provide revision control for all files touched
during shell session tracking. A straightforward approach of storing
one revision per session would not work, as the files in question may
also change between tracked shell sessions. A better approach is to
create a revision at each of the before and after states for each file
in the repository.
Further high-level abstractions: One recent
innovation of Web 2.0 technologies is the establishment of new
interface and organization paradigms that more closely model how
people think about information. In particular, the tagging concept,
where arbitrary words or phrases are associated with each idea or
object in a system, supported by an interface which makes suggestions
about tags to apply, would further increase the utility and
scalability of Trackle. Some tags for shell session logs could be
automatically generated, for instance, a tag for each file and each
directory involved in a particular shell tracking session. Tags could
also be assigned for software packages involved in a shell tracking
session, by integrating with package management tools
(dpkg, rpm, etc.).
Conclusions
Our experience with the SCCS staff email list and Wiki has shown
that relying on self-reporting leads to missing, incomplete, or
inaccurate reports of changes made to our systems. The poor quality of
these reports makes it hard to find the source of a particular change.
Further, an inaccurate report might cause a sysadmin trying to
duplicate past steps to cause new errors instead of repairing existing
ones. Additionally, the presence of inaccuracies degrades confidence
in all reports. Trackle alleviates these problems by keeping
consistent and accurate records so that sysadmins can focus on solving
problems rather than the tedious task of keeping logs.
Trackle's integrated Wiki has allowed us to begin collecting
related topics into a sysadmin training manual and how-to guide. This
evolving guide allows new sysadmins to ask more sophisticated
questions of their experienced colleagues. Trackle allows sysadmins to
learn on their own time and at their own pace so that no one gets
bored or left behind.
Often, volunteer sysadmins learn the intricacies of a system only
when it breaks. Trackle allows us to learn by reflection rather than
by struggling to fix a critical error. This leads to more efficient
use of office time for those not present when problems occur. We also
learn from authentic situations rather than toy problems or contrived
examples.
Because our shell tracking tools operate transparently, Trackle
can be used to complement existing change/configuration management
systems. Though many configuration management systems have already
solved the problem of discovering file changes, Trackle goes further
to associate file changes with a particular issue. Additionally,
Trackle detects changes to any files, not just those that are already
being monitored by a configuration management system.
Availability
Trackle is open source software, licensed under the BSD license.
You may download stable Trackle releases and documentation from
https://www.sccs.swarthmore.edu/org/trackle/.
Acknowledgements
We would like to thank Benjamin A. Kuperman and Mustafa Paksoy for
their work on Audlib [5], on which libtrackle is
based. We would also like to thank Edgewall Software and the numerous
contributors to Trac, without which Trackle would not have been
possible.
Author Biographies
Daniel S. Crosta graduated from Swarthmore College in June, 2006,
with a B.A. in Computer Science. During his tenure at Swarthmore, he
served three years as Systems Administrator for the Swarthmore College
Computer Society, most recently as Lead Systems Administrator. He has
also participated in research in Computer Graphics at Princeton
University, and in Computer Vision at Swarthmore College. Since July,
2006, he has been working as a Software Developer at Wireless
Generation in New York City. Contact him electronically at
dcrosta@sccs.swarthmore.edu .
Matthew J. Singleton is a currently a senior at Swarthmore College
in Swarthmore, PA, where he is a double-major in Computer Science and
Linguistics. He is also the Lead Systems Administrator for the
Swarthmore College Compter Society. He has participated in
Computational Linguistics research in the Department of Computer
Science at Swarthmore College. Reach him electronically at
msingle1@sccs.swarthmore.edu .
Benjamin A. Kuperman received the M.S. and Ph.D. degrees from the
Department of Computer Sciences at Purdue University in 1999 and 2004.
He is an assistant professor at Oberlin College in Ohio and previously
taught at Swarthmore College in Pennsylvania. While at Purdue, he was
a researcher in the Center for Education and Research in Information
Assurance and Security (CERIAS) for five years and was affiliated with
COAST before that. His main areas of research are on host-based
computer security monitoring systems and OS level audit systems. Reach
him electronically at Benjamin.Kuperman@oberlin.edu .
Bibliography
[1] Anderson, Paul and Edmund Smith, ``Configuration Tools: Working
Together,'' Proceedings of LISA 2005: 19th Systems Administration
Conference, pp. 31-37, December, 2005.
[2] Common Criteria for Information Technology Security
Evaluation, https://www.commoncriteria.org/.
[3] Evard, Rémy, ``An Analysis of UNIX System
Configuration,'' Proceedings of LISA 1997: 11th Systems
Administration Conference, pp. 179-193, October, 1997.
[4] Kuperman, Benjamin A., A Categorization of Computer
Security Monitoring Systems and the Impact on the Design of Audit
Sources, Ph.D. thesis, Purdue University, West Lafayette, IN,
August, CERIAS TR 2004-26, 2004.
[5] Paksoy, Mustafa and Benjamin A. Kuperman, ``Audlib:
Generating computer security audit logs with interposing libraries,''
Presented at 2005 Swarthmore College Sigma Xi poster session,
September, 2005.
[6] Pomeranz., Hal ``PLOD: Keep Track of What You're Doing,''
Proceedings of LISA 1993: 5th Systems Administration
Conference, November 1993.
[7] Roth, Mark D., ``Preventing Wheel Reinvention: The psgconf
System Configuration Framework,'' Proceedings of LISA 2003: 17th
Systems Administration Conference, pp. 205-211, October, 2003.
[8] Sharp, James M., ``Request: A Tool for Training New Sys
Admins and Managing Old Ones,'' Proceedings of LISA 1992: 4th
Systems Administration Conference, October, 1992.
[9] Stepleton, Thomas, ``Work-Augmented Laziness with the LOS
Task Request System,'' Proceedings of LISA 2002: 16th Systems
Administration Conference, November, 2002.
[10] US Department of Defense, Trusted Computer Systems
Evaluation Criteria (also known as the `Orange Book') Technical
Report DoD 5200.28-STD, DoD Computer Security Center, Fort Meade, MD,
December, 1985.
Footnotes:
Note 1:
https://www.bestpractical.com/rt/
Note 2: https://trac.edgewall.com/
Note 3:
https://sourceware.org/lvm2/
Note 4: https://www.unionfs.org/
Note 5: https://slax.linux-live.org/
Note 6: https://knoppix.org/
Note 7:
https://fuse.sourceforge.net/
Note 8:
https://www.intersectalliance.com/projects/snare/
Note 9:
https://www.nsa.gov/selinux/
Note 10:
https://www.postgresql.org/
Note 11:
https://www.clearsilver.net/
Note 12:
https://firefly.student.swarthmore.edu/trac/wiki/DTK
Note 13: Trackle currently only
supports the bash shell. Support for
tcsh is planned for a future release.
Note 14: For setuid programs,
ld.so only honors values of
LD_PRELOAD that are in the default library search
path and are also setuid. A user would have needed administrative
access to put such a library in place, so this is not a major
vulnerability.
|