USENIX 2003 Annual Technical Conference, FREENIX Track Paper   
[USENIX Annual Conference '03 Tech Program Index]
Pp. 15-28 of the Proceedings | |
CSE A C++ Servlet Environment
for High-Performance Web Applications
Thomas Gschwind
Benjamin A. Schmit
Abteilung für Verteilte Systeme
Technische Universität Wien
Argentinierstraße 8/E1841
A-1040 Wien, Austria, Europe
{tom,benjamin}@infosys.tuwien.ac.at
https://www.infosys.tuwien.ac.at/Staff/{tom,benjamin}/
Abstract
Current environments for web application development focus on Java or
scripting languages. Developers
that want to or have to use C or C++ are left behind with little
options. We have developed a C/C++ Servlet Environment (CSE) that
provides a high performance servlet engine for C and C++. One of the
biggest challenges we have faced while developing this environment was
to come up with an architecture that provides high performance while
not allowing a single servlet to crash the whole servlet environment,
a serious risk with C and C++ application development. In this paper
we explain our architecture, the challenges and trade-offs we have
faced, and compare the performance of our environment to that of top
servlet environments available today.
1 Introduction
Every day, the web is applied to more and new application domains.
In some cases people try to convert services that historically have
been handled by a different mechanism, such as USENET News, to the
web. In other cases, such as email, they are complemented by a
mechanism that allows users to access these services via a web
front-end. Due to this success and the increasing number of
application domains web server performance is of major
importance [8,11] and unresponsive and slow
web sites may send users seeking for alternatives [5].
Traditionally, web services have been implemented using CGI
programs in the form of compiled C and C++ programs or Perl scripts
that were executed by the web server. Although these approaches work
perfectly fine for simple dynamic web content they are cumbersome to
use if a whole business process should be modeled as a web
application. This stems from the fact that they do not take care of
the state-management between subsequent web requests. These issues,
however, are taken care of by newer approaches such as the Java
Servlet Technology [22], the Python Zope
Server [14], or
dedicated application servers (e.g., Bea Weblogic or JBoss).
One advantage of servlets is that they are executed as part of a
servlet environment that, unlike CGI scripts, needs not be restarted
at each invocation. To protect one servlet from another, these
environments use programming languages that take care of memory
management issues. Another advantage is
that these languages come bundled with standardized libraries
providing support for numerous different tasks.
These advantages, however, have a price. Legacy applications that
make use of C or C++ cannot be easily integrated. Although
technologies such as SWIG [4] or JNI [16,25] simplify the integration
of legacy applications into scripting languages they still require
developers of such systems to deal with different systems. Sometimes
the choice of language is mandated by management or the developers
simply prefer the use of C or C++.
Another advantage of C and C++ is that these languages provide
better performance and a finer grained integration of the operating
system's security mechanisms. This is probably one of the reasons why
the Apache HTTP Server and the Microsoft Internet Information Server,
the two most prominent web servers with a combined market share of over
85% [20], are written in
C and C++ respectively.
In this paper, we present a servlet environment that is completely
implemented in C++. To the best of our knowledge, our C/C++ Servlet
Environment (CSE) is the first servlet environment that uses the
potential of C++. Similar to the servlet engines implemented in Java,
the architecture of our servlet environment offers adequate protection
between the individual servlets to be executed. Hence, our servlet
engine provides the following advantages over those using Java or
scripting languages:
- It supports the integration of existing C/C++ code into web
applications without mixing different programming languages and
converting data types.
- It provides a better integration into the security mechanisms
provided by the operating system.
- The C++ programming language allows developers to write more
efficient code since they can choose from a wider range of
implementation choices [24].
This paper is structured as follows. In Section 2 we present the requirements for a
servlet environment as well as the advantages and challenges of using
C++. Section 3 shows how
these challenges have influenced the design and implementation of the
C/C++ Servlet Environment, and application development for the CSE is
discussed in Section 4.
Related work is presented in Section 5 and in Section 6 we compare the performance of our
approach with that of other approaches. Future work is discussed in
Section 7 and we draw our
conclusions in Section 8.
2 Requirements
The main goals during the design of the C/C++ Servlet Environment
were security, stability, ease of use, parallelism, and performance.
Before we can have a closer look at these requirements, however, it is
necessary to give a brief overview of the typical architecture of a
web site. This architecture is depicted in Figure 1.
Figure 1: A Typical Web Site Implementation
|
A web site must include a web server that handles requests from
multiple clients. Requests that cannot be handled by the web server
itself are forwarded to a CGI program or a servlet engine where a web
service is executed. The web service in turn may contact a database
or application server for persistent data storage. Since HTTP is
stateless [7], the client has
to transmit the application's current state or a session identifier
with each request. In the latter case a mapping from session
identifier to state has to be maintained by the servlet engine.
CGI
programs have the drawback that they have to be executed anew for each
client's request. Hence, they need to be started at each request and
then have to read in their configuration data and session state from
persistent storage. A servlet engine, on the other hand, is running
all the time and keeps several servlets as well as their session state
in memory. Servlets need to maintain the client's session state (for
instance, the contents of a shopping basket) because web clients only
provide limited functionality to maintain this kind of data.
Since many servlet engines execute several different servlets
within the same process stability is a major concern. If one
servlet crashes, the other servlets have to continue to run
unaffectedly. For a servlet engine, it is very important to recognize
and handle this kind of failure since there is no way of judging the
stability of a user-supplied servlet from within the servlet
engine.
Stability is one reason why Java and interpreted languages in
general are popular for this task. If a Java servlet crashes only its
thread of execution is terminated and all the other servlets continue
to run. Hence, the worst that can happen is that a servlet consumes
overly much resources such as processor time, memory in the form of
unused but still referenced Java objects, or network bandwidth. The
price for these benefits is that all servlets are executed with the
same privileges and that the operating system's security mechanisms
need to be re-implemented as part of the servlet engine. Another
drawback is a slight performance overhead since Java does not allow
developers to write code on a level as low as it can be written with C
and C++.
The disadvantage of C and C++ is that these programming languages
are not as safe. Bugs in C and C++ programs typically take down the
whole process and they tend to get apparent at a much later time
(typically at a point of execution that is unrelated to the place that
caused the error). Therefore, even if the application is catching the
signal that a segmentation violation has occurred, recovery is
difficult. Hence, the design of a servlet engine written in these
languages has to solve those challenges.
Security is necessary to minimize the chance that servlets
can be exploited by an intruder. Our architecture reuses the security
mechanisms that have been built into the operating system. It allows
sandboxes, and thus servlets, within the same servlet environment to
be executed with different user privileges. The advantage of this
approach is that system administrators can use standard access
privilege mechanisms and do not have to get familiar with a new
security management system which can lead to misunderstandings.
Additionally, our approach requires only a single security mechanism
to be checked for possible vulnerabilities.
A disadvantage of C and C++, however, is that such programs are
open to buffer overflow attacks if they have not been implemented
carefully. The threat and impact of such attacks can be minimized by
using the C++ Standard Template Library which has been designed in
order to minimize such programming mistakes and by using operating
system mechanisms such as a non-executable user stack area. No
programming language or servlet environment, however, can guarantee
that a program or servlet cannot be exploited by an intruder. The
final responsibility is always up to the developer.
Ease of use is another important aspect for a servlet
engine. Even though there are servlet engines for Java that provide
good performance, these servlet engines are of little use to C or C++
programmers that want to use existing code for their web applications.
Using C and C++ in combination with such servlet engines requires the
use of the Java Native Interface (JNI) [16,25], the conversion between C
and Java data types, and to deal with low-level details of both
languages. Although systems such as SWIG [4] can be used to help
with this issue they do not solve the problem of having to deal with
two different languages. Hence, a C++ servlet environment is more
convenient to use for developers that have to deal with C or C++
code.
A good servlet engine must not only be easy to use but has to
provide good performance as well. This gets apparent by the
fact that the two most popular web servers are implemented in C and
C++ respectively. A convenient servlet environment that requires load
balancing among multiple servers to be able to handle the incoming
requests loses much of its original appeal. This is one of the
reasons why we have developed the CSE. The CSE was designed towards
optimal performance.
Parallelism is, on a server machine, a prerequisite for
performance and scalability. On a network, the upper bound of the
access time to a service can be high and thus forbids handling
requests in serial. The CSE introduces parallelism by providing
several services on a single machine, by partitioning a service into
several parts with simple interfaces between them, and by replicating
those parts. Additionally, compared to using interpreted languages
that use a global interpreter lock for thread locking [28, Section 8.1],
the approach to use C++ has the advantage of having a fine grained
thread locking mechanism as provided by linux-threads or
pthreads [15].
3 Design and
Implementation
Figure 2 shows the
architecture of our C/C++ Servlet Environment. It consists of an
Apache module, a servlet server and several sandbox processes. The
CSE uses the Apache server to handle incoming HTTP requests, the
servlet server is used for the management of the sandbox processes and
the sandboxes are responsible for the execution of the servlets. We
use different sandbox processes because this protects a servlet in one
sandbox from an unstable one in another sandbox. Additionally, this
approach allows administrators to execute different sandboxes with
different user privileges. Although we have used C++ for the
implementation, our architecture can be easily reused for a servlet
environment implemented in plain C.
Figure 2: CSE Architecture
|
3.1 The Apache Module
We use the Apache web server [1] to handle HTTP requests. This
approach has several advantages over implementing a new web server
for our servlet engine such as provided by the Tomcat Servlet
Engine [2].
- Apache uses a modular design and can be extended easily.
- It is implemented in C facilitating data exchange with our C++
servlet engine. C structures are compatible to C++ structures.
- Its add-on modules provide features that we would never have
implemented as part of CSE.
- It has a 60% market share [20] and has proven to be a
reliable web server.
- It is free software.
Another advantage of the modular design we have chosen is that only
the web server's Servlet Module has to be re-implemented if a
different web server has to be used for a given web site.
The Apache Module registers the URLs that are implemented by the
servlet engine's servlets and the C++ Server Pages. Requests to other
URLs such as static web pages or images are handled by Apache itself
without consulting our module. Servlet requests are forwarded to the
servlet server which assigns them to one of the sandboxes. One risk
of this approach is that the servlet server might become a bottleneck.
Hence, in future versions of CSE, we plan to extend the Apache module
such that the requests are sent to the appropriate sandbox directly.
Going a step further and executing the servlets by the Apache module
directly, however, is not possible since that would compromise the web
server's stability.
3.2 The
Servlet Server
The Servlet Server is responsible for the management of the
sandboxes. To do that it processes requests forwarded by the Apache
module and determines, based on the servlet registry, the sandbox that
is responsible for the servlet's execution. The Servlet
Registry maintains a mapping of the servlets and the sandboxes
they should be executed in. This mapping defined within the server's
configuration file.
If a C++ Server Page (CSP) is used, the CSP Manager converts
the CSP document into a servlet and compiles it. Several Compiler
Threads can be started by the server in order to compile CSPs when
they are first requested or when they have changed. They are
synchronized so that no more than a single thread for a single servlet
can run at a time. The compiled servlet is put into a cache
directory, along with the servlet source code, a configuration file
containing the destination sandbox, and an error output file which
also serves as a timestamp of the last compilation attempt. A CSP is
only recompiled if its timestamp is newer than the timestamp of its
error output file. We do not use the CSP's object file since it is
only created if the compilation is successful. If the sandbox does
not exist yet, it will be created after compilation. Then, the
destination sandbox is configured to host the new servlet. If for
some reasons, however, CSP compilation during run-time is undesired or
impossible, it is possible to compile them before starting the
server.
A Guardian Thread watches over the state of the sandboxes
and restarts them if an unstable web application crashes. Instead of
using a separate thread that sleeps most of the time, it should also
be possible on UNIX systems to catch the child signal instead. From
the performance point of view, however, this poses no difference.
In order to minimize downtimes of the servlet server, its
configuration can be changed dynamically. The configuration file
tells the servlet registry which servlets should be loaded into which
sandboxes. When the application server processes a request, it first
checks the timestamp of the configuration file. If there was a change
(or there was no request after startup yet), the configuration file is
read, and the internal configuration data is updated. Part of the
configuration data, such as the location of the shared object files,
is passed on to the sandboxes where it is needed. This update process
is also invoked when a sandbox has to be restarted by the guardian
thread.
3.3 The Sandboxes
Several separate tasks, the Sandboxes, handle the actual
execution of the servlets. Each sandbox may encapsulate one or more
web applications consisting of one or more servlets. The purpose of
using several sandboxes is to take care of the session management and
to provide a barrier for unstable servlets.
The ability to execute several servlets within the same sandbox
increases the scalability of our servlet environment. This approach
allows developers to place servlets that frequently need to interact
with each other into the same sandbox and hence enables a more
efficient communication between the servlets and reduces the number of
context switches required.
The sandbox processes read requests from an input socket and
execute them. The servlets themselves are loaded as dynamically
shared objects. C++ Server Pages are translated into servlets which
in turn are compiled into shared objects. Shared objects can be
loaded into a running program on demand by a system call. The
disadvantage, however, is that the server's original programmer can
never be sure about their stability.
If a servlet crashes while processing a request, the servlet
server's guardian thread notices the crash and restarts and
reinitializes the failed sandbox. If a request comes in while the
destination sandbox is down, it is buffered and executed as soon as
the sandbox is up and running again.
This architecture has already proved its usefulness: Our original
implementation contained an error that would crash every sandbox after
a little more than 1000 requests. We have not discovered this error
until we have started running the benchmarks because the crashed
sandboxes were always restarted, and (except for a slight performance
loss) no problem was visible.
Additionally, since each sandbox is executed within its own
process, system administrators may choose to run different sandboxes
with different user privileges. This architecture provides a finer
grained access control than Java servlet environments which execute
all servlets within the same process and thus with the same user
privileges.
3.4 Persistent
Storage
The ability to provide persistent storage is of major importance
for most web applications. A voting application, for example, has to
manage the votes cast by its users. The persistence mechanism of the
CSE can be used internally by the sandbox's session management if the
servlet's sessionType attribute is set to
DatabaseSession. Alternatively, it may be used directly by
the application developer.
The C++ Servlet Environment has been designed so that the
persistence mechanism can be replaced easily. The persistence
mechanism is encapsulated by a traits class [19,24] with which its classes
are parameterized. A traits class can be compared to a set of
callback functions with the difference that they are known during
compile time and thus leave more room for optimization. This approach
allows application developers to choose a persistence level that fits
their needs best, e.g., a database such as MySQL [30], or a file-based
database such as PSTL, a persistent implementation of the Standard
Template Library [10].
We also provide a general-purpose database interface which is a set
of template classes that offer general database handling functions,
along with data-types used within a database. They do not contain
functions to access specific databases but are able to use concepts
like SQL statements and cursors. The template classes provided
are:
- DB
- is used to create, maintain, and close a database connection. How
this is done depends on the concrete database driver. Also, this
class allows for the execution of SQL statements that do not return
any data such as INSERT statements.
- DBQuery
- helps formulating SQL queries. The query is written to the object
as if it would be written to a stream.
- DBResponse
- executes an SQL query that returns some data. Again, the
implementation depends on the concrete database driver. The class has
many features similar to those of a C++ STL container [24].
- DBIterator
- implements a C++ STL input iterator [24] that iterates over
the elements of a result set and supports database drivers with and
without cursors.
- DBRow
- is created when the iterator is accessed. Its index operator
(``[]'') is used to retrieve an actual data item. It uses
either an integer (the column id) or a string (the column designation)
as the argument.
- DBObject
- is the class that contains data items. It has the subclasses
DBString, DBInt, DBLong, DBFloat,
DBDouble, DBBlob, and DBDateTime. They
store C++ strings, 32- and 64-bit integers, 32- and 64-bit floating
point numbers, binary content, and dates.
In order to access a given database server, a concrete database
driver (a traits class) for that database must be written. If the
database has a C++ interface, this task is usually trivial because
most work has already been done at the general-purpose database
interface. A concrete database driver for the successful
database MySQL is already available.
4 Application
Development
The C++ Servlet Environment provides two approaches for writing web
applications similar to those available in Java Servlet Environments:
Servlets that use only C/C++ and C++ Server Pages that use C/C++ code
embedded in HTML code.
4.1 Servlets
A Servlet is implemented as a C++ class inheriting from the
Servlet class as shown in Figure 3. Subsequently this
class is compiled into a shared object. Servlets can access the
parameters and cookies that have been passed as part of the web
request and output an HTML page onto their output stream.
Figure 3:
C++ Servlet Example
#include <cse/cse.h>
class HelloServlet : public Servlet {
protected:
int counter;
public:
HelloServlet() : counter(0) { session=false; }
virtual ~HelloServlet() { }
virtual void service(const ServletRequest& rq, ServletResponse& re,
Session* session=NULL) {
re << "<html><head><title>Hello World</title></head>" << endl
<< "<body><h1>Hello World</h1>" << endl
<< "<p>Servlet request count: " << ++counter << "</p>" << endl
<< "</body></html>" << endl;
}
};
Servlet* factory() {
return new HelloServlet();
}
|
The most important methods and attributes of the Servlet
class are:
- service():
- This method is called for each client request that has to be
processed. The default implementation calls either doGet()
for GET or doPost() for POST requests. If
necessary, this method may be overridden. The service()
method's parameters are a ServletRequest object providing
details about the current request, a ServletResponse object
handling the servlet's response, and a Session object
containing session information if requested.
- doGet(), doPost():
- These methods are similar to the service() method but
handle only GET or POST requests.
- getSession():
- Requests a Session object from the session manager. The
application server automatically executes this method if the servlet's
sessionType is set. The method is implemented within the
servlet base class.
- sessionType:
- This attribute indicates whether the sandbox should take care of
the servlet's session management and the type of session management
requested. Valid options are NULL for no session management,
MemorySession for transient session management, and
DatabaseSession for persistent session management surviving
sandbox or server restarts.
- threadSafe:
- Indicates whether the servlet is able to process multiple requests
simultaneously. This functionality, however, is not yet
implemented.
The ServletRequest class encapsulates information about
the current request. It provides access to the HTTP request
parameters using the getParameters() method. Such parameters
may be passed as part of the URL in case of a GET request and
as part of the request body in case of a POST request. The
request parameters are decoded and returned as a map using
the parameter names as keys.
Cookies sent with the HTTP request can be obtained with the
getCookies() method. They are automatically transformed into
Cookie objects. The return type is a vector of these
objects. Unless a cookie has been changed there is no need to include
it in the response sent back to the client.
Among other information, the ServletRequest class also
provides the URL of the request (getURL()) and whether the
request used the GET or POST method
(getMethod()).
The ServletResponse object contains a stream to which the
output of the servlet is written. For example,
re << rq.getURL();
writes the servlet's URL to the output. Other interesting attributes
of this class are contentType which specifies the MIME type
of the response and cookies which is an initially empty
vector containing the cookies to be sent to the client.
Session data is provided through the Session class. Each
session has a name and an ID. Together with the Sandbox name, they
uniquely identify the session. The name distinguishes between
different types of sessions within a single web application. The
session IDs are assigned in a random order, which makes guessing them
almost impossible and thus enhances data security.
The setParameter() and getParameter() methods
allow a servlet to store and retrieve session data. Depending on the
kind of session, these data are stored within the memory or within a
database.
The session identifier must be passed on between subsequent
requests to the web server. This can be done using cookies. The
function getCookie() provides a cookie (as defined
in [12]) that contains
the session information. Cookies, however, do not work with all web
clients. If the servlet programmer cannot rely on them, the methods
asLink() and asForm() transform the session
designation into a string suitable for links (in URL-encoded format)
or for forms (as a hidden field).
4.2 C++ Server Pages
C++ Server Pages (CSPs) are stored in the web server's
document root directory along with static HTML pages and can be
identified by their .csp-extension. When they are requested
for the first time these pages are converted into servlets and
subsequently into shared objects.
Figure 4: TableServlet.csp
<%#vector%>
<%!vector<string> strings;%>
<% // check whether a string should be removed/added
ServletRequest::Map::const_iterator mi;
if((mi=rq.getParameters().find("remove"))!=rq.getParameters().end())
strings.erase(strings.begin()+atoi(mi->second.c_str()));
if((mi=rq.getParameters().find("string"))!=rq.getParameters().end())
strings.push_back(mi->second); %>
<html><head><title>TableServlet</title></head><body>
<h1>TableServlet</h1>
<p>This servlet stores strings within a table.</p>
<form method=get action=TableServlet.csp><p>Please enter a string:
<input type=text size=64 name=string></input><input type=submit value="Go!">
</input></p></form>
<p><table border=1>
<tr><th colspan=2>Strings entered to date: <%=strings.size()%></th></tr>
<% for (vector<string>::iterator i=strings.begin(); i!=strings.end(); ++i) { %>
<tr>
<td><%=*i%></td>
<td><a href="TableServlet.csp?remove=<%=i - strings.begin()%>">remove</a></td>
</tr>
<% } %>
</table></p>
</body></html>
|
CSPs are HTML documents enriched with special tags containing,
among other things, C++ code. A sample CSP is shown in Figure 4. The tags used to identify
C++ declarations and code are described below. Except for the first
two tags, they have been designed similar to the JavaServer Pages
(JSP) specification [27] to
increase readability for people familiar with JSPs.
- <%$sandbox%>
- declares the sandbox the CSP belongs to. CSPs without this tag
are executed within the default sandbox.
- <%#include%>
- denotes a header file to be included. As in a normal C++ program,
header files provide declarations for functions and objects defined by
a library. The library itself can be loaded from the configuration
file. Different libraries may be used for different sandboxes.
- <%!definition%>
- declares an attribute or a method.
- <%@initialization%>
- indicates code to be placed into the constructor of the servlet's
class. Here, attributes (those declared within this class and those
inherited from the Servlet class) are initialized.
- <%code%>
- executes C++ code. It is written into the servlet's
service() function at the tag's location.
- <%=expression%>
- evaluates an expression and inserts the result into the servlet's
response. It may be of any type that can be written to an output
stream.
- <%--comment--%>
- declares a CSP comment. Unlike an HTML comment, its contents are
not sent to the web browser.
The example shown in Figure 4 first includes the
vector header file and declares a vector containing
strings. The second block checks for the parameters passed to the CSP
and based on these parameters removes or adds a new string to the
vector. The remaining part of the servlet is used to display
a form to add a new string and a table with the strings currently
stored in the vector. Additionally, the strings are supplied
with links to allow them to be removed.
Figure 5: TableServlet.cc Generated from TableServlet.csp
1 #include <vector>
2 #include <cse/cse.h>
3 #include <string>
4
5 class CSPServlet : public Servlet {
6 protected:
7 virtual void print(ostream& os) const;
8 vector<string> strings;
9
10 public:
11 CSPServlet() : Servlet() { }
12 virtual ~CSPServlet();
13 virtual void service(const ServletRequest& rq, ServletResponse& re,
14 Session* session= NULL);
15 };
16
17 // ... helper functions ...
18
19 void CSPServlet::service(const ServletRequest& rq, ServletResponse& re,
20 Session* session=NULL) {
21 re << "
22 ";
23 // check whether a string should be removed/added
24 ServletRequest::Map::const_iterator mi;
25 if((mi=rq.getParameters().find("remove"))!=rq.getParameters().end())
26 strings.erase(strings.begin()+atoi(mi->second.c_str()));
27 if((mi=rq.getParameters().find("string"))!=rq.getParameters().end())
28 strings.push_back(mi->second);
29 re << "
30 <html><head><title>TableServlet</title></head><body>
31 <h1>TableServlet</h1>
32 <p>This servlet stores strings within a table.</p>
33 <form method=get action=TableServlet.csp><p>Please enter a string:
34 <input type=text size=64 name=string></input><input type=submit value=\"Go!\">
35 </input></p></form>
36 <p><table border=1>
37 <tr><th colspan=2>Strings entered to date: ";
38 re << strings.size();
39 re << "</th></tr>
40 ";
41 for (vector<string>::iterator i=strings.begin(); i!=strings.end(); ++i) {
42 re << "
43 <tr>
44 <td>";
45 re << *i;
46 re << "</td>
47 <td><a href=\"TableServlet.csp?remove=";
48 re << i - strings.begin();
49 re << "\">remove</a></td>
50 </tr>
51 ";
52 }
53 re << "
54 </table></p>
55 </body></html>
56 ";
57 }
|
After the example servlet has been deployed, our CSE translates it
into a C++ servlet as shown in Figure 5. Include
directives of the C++ Server Page are converted into include
pre-processor macros at the beginning of the file (line 1),
definitions are converted into attribute and member function
definitions (line 8). Code and expression directives are used to
form the service() member function and HTML code is converted
into statements sending it unmodified to the web client
(lines 19-57).
5 Related Work
Since we have started with the implementation of our C++ Servlet
Environment other developers have also recognized the need for a
servlet environment for C++.
The commercial vendor Rogue Wave has developed
Bobcat [21], a C++
servlet engine that has an API similar to that of the Java Servlet
Specification [26].
Unfortunately, its evaluation license contains a non-disclosure
agreement. Hence, we cannot include their product in this paper and
have to assume that it is not yet ready for a production system.
Ape Software, an Indian company, has developed
Servlet++ [3]
under a BSD-like free license, which also supports a basic form of C++
Server Pages. Unlike CSE, it does not contain a stand-alone server
for the execution of the servlets but is implemented completely within
an Apache module. Hence, an unstable servlet can compromise the
stability of the Apache server itself. The Servlet++ module supports
C++ servlets with an interface similar to that of the Java Servlet
class. Servlets are loaded as shared objects. Unlike CSE, servlets
and C++ Server Pages have to be compiled manually. Also, Servlet++
currently lacks session management, a fundamental necessity for every
servlet environment, and hence we left it out of the comparison in
section 6.
C Server Pages [9] is a servlet engine that has
been designed with goals somewhat similar to those of the C++ Servlet
Environment, but does not use a free license (commercial use is
non-free). However, this system uses no sandboxes to encapsulate its
servlets, so that it is less secure. There is currently no support
for dynamic configuration, which means that each servlet has to be
compiled manually (using a tool to transform C Server Pages into
servlets and a C++ compiler), and the server then has to be restarted.
C Server Pages is implemented as a CGI script, but the author has also
built an Apache module which is, unfortunately, not yet available for
download.
Micronovae [17] is developing a C++
Server Pages engine which works together with Microsoft's Internet
Information Server (IIS). Like the previous system, it does not use
the concept of sandboxes. Additionally, it only provides support for
C++ Server Pages but not for servlets. Unfortunately, their download
(beta version) includes no source code, so that our information about
this system is solely based on our experiences with it.
The Weblet Application Server [29] seems to be a servlet engine
for C++ developed by Webletworks. Unfortunately, we were
unable to contact the web server of the application server's vendor
for several months now and were also unable to obtain a copy or other
information about the system through other web sites.
Besides C++ servlet engines, there are numerous such engines for
Java and scripting languages. One such servlet engine is the
Apache Tomcat [2]
servlet engine, a subproject of the Apache Jakarta Project. It has
complete support of the JavaServer Pages and Java Servlet
specifications and is included in Sun's reference implementation.
Although Tomcat contains its own web server it can also cooperate with
other web servers like the Apache HTTP Server. JavaServer Pages may
be compiled by the built-in Java compiler or by alternative Java
compilers such as Jikes.
Jetty [18] is a
Java Servlet engine developed by the Australian company Mort
Bay. It can cooperate with the Apache HTTP Server, but Mort Bay
suggests to use the included web server. Jetty is available under a
free license and offers both Java Servlets and JavaServer Pages. Like
Tomcat, Jetty is configured using a set of XML files and may use
alternative Java compilers for the compilation of JavaServer Pages.
Swill [13] is a lightweight
programming library that provides a simple web server for C and C++.
Unlike our servlet environment, its goal is not to provide a full
fledged web server that allows the execution of multiple web
applications. Instead, it allows developers to embed the web server
into their own programs. This web server can be used to control the
embedding application and to display the application's results using a
web browser.
Zope [14]
is a servlet environment that allows developers to use Python for
servlet development. It includes a web server and a web
administration front-end. Initial performance results have shown that
Zope cannot compete with the top servlet engines. We assume that this
is due to the performance penalty incurred by the Python interpreter.
Zope is probably the right environment for Python programmers
maintaining small web sites.
6 Evaluation
For the evaluation of the C++ Servlet Environment's performance we
have implemented a benchmark suite that tests various aspects of the
different servlet engines. The tests were performed with the built-in
web server. All servlets were compiled using the individual engine's
default servlet compiler before the execution of a benchmark.
Static Page Access. In this benchmark we measure
how long it takes to request a static HTML page together with 40
embedded images for 100 times. The primary goal of this benchmark is
to measure the throughput of the servlet engine's web server.
Bulletin Board System. We have implemented a small
bulletin board system that stores its entries in the file system. It
allows users to create messages, read them (using a session for
remembering the least recently read message), and deleting them
again.
The test creates 100 messages of 320 characters each. These
messages are then requested 50 times in batches of 10 messages.
Finally, the messages are deleted again resulting in another 100
requests. The time taken for these 800 requests (message creation is
verified with a second request) was measured. The goal of this
benchmark is to check how well the servlet engine maintains the
client's session state.
Dynamic Page Access. This benchmark uses a
shopping cart servlet that we have implemented for each servlet
engine.
For the measurement of this benchmark, we request the start page
once to obtain the cookie containing the session information. Then,
we add an item to the shopping cart, display the shopping cart, remove
the item, and display the shopping cart again. The empty shopping
cart page has 2048 bytes. A test run consists of
50 consecutive requests, with a total of 250 dynamic pages
served. This benchmark is intended to measure the servlet engine's
performance in a typical everyday situation.
Parallel Page Access. For this request, we used
the previous benchmark and accessed the server simultaneously from a
varying number of clients, each running on a different machine. We
have measured the time needed by a single client to execute the same
number of requests as in the previous test. The other clients in the
benchmark were started before we started the client to be measured and
also were terminated afterwards. The goal of this benchmark is to see
how well the servlet engines can cope with increasing load.
Mandelbrot Calculation. This test measures the
efficiency of calculation-intensive servlets by computing the
Mandelbrot fractal. It accepts the size of the image, the area of the
fractal to be calculated, and the maximum recursion depth as
parameters. Although we support the generation of an xpm graphics
file the output has been suppressed during this benchmark since we
wanted to measure the ``number crunching'' performance only.
A test run consists of 10 accesses to the servlet, calculating a
picture of the Mandelbrot set at a resolution of 1024x768 pixels and a
maximum of 256 iterations per pixel.
System Library Call. Since a main reason for the
design and implementation of the CSE was the possibility to easily
integrate legacy applications into servlets, this test evaluates the
inter-operation with legacy C and C++ code. For the test, we created
a small servlet that calls functions from a shared library.
A similar servlet might e.g. poll sensor data with high
frequency in order to calculate a mean value. For C and C++ programs
this is a straight-forward task. Java programs, however, have to use
the Java Native Interface [16] which requires a JNI wrapper
function to be written for each C or C++ function to be invoked. This
benchmark measures how much performance gets lost at that
interface.
The hardware for the performance tests consisted of two computers
with AMD Duron/800 MHz processors running Linux. They were linked via
a 100 Mbit/s switch in order to ensure constant network bandwidth.
For the dynamic and parallel tests, we used 1-8 identical Intel
Pentium II/350 MHz computers as clients, with the same server as
above.
The results of our tests are shown in Table 1. We have included Tomcat as
Sun's Java reference implementation, Jetty as an independent
implementation in Java, and CSP and Micronovae as other C/C++
solutions. Since little information about the inner workings of
Micronovae is available, we can only speculate why it performs better
or worse than the other systems.
Table 1:
Evaluation Results
Engine |
CSE |
Tomcat |
Jetty |
CSP |
Micronovae |
Static Page Access |
35.79s |
54.79s |
48.20s |
35.79s |
35.03s |
Bulletin Board System |
3.96s |
2.34s |
3.66s |
4.74s |
2.40s |
Dynamic Page Access |
20.34s |
18.83s |
21.73s |
24.62s |
25.77s |
Mandelbrot Calculation |
10.69s |
33.55s |
33.22s |
10.95s |
14.86s |
System Library Call |
12.94s |
209.39s |
207.15s |
14.16s |
7.58s |
|
Figure 6:
Parallel Page Access Benchmark Times
|
As shown by the static page access benchmark using Apache as front-end was the
right choice for this benchmark. Tomcat and Jetty both did not
perform as well. Although they can be set up to be used in
combination with Apache, this setup is complicated. Mort Bay even
recommends to use Jetty for static documents as well. In our
evaluation, the Windows Internet Information Server was slightly
faster than Apache on Linux. We assume that this effect is caused by
the operating system.
The dynamic and parallel access benchmark, whose result is
shown in Figure 6,
reveals why Tomcat has become Sun's reference implementation. Both
Java implementations perform much better than we would have assumed
initially. Although we knew that our implementation leaves room for
improvements, this was a surprise to us.
The CSE and Jetty scale equally well, but not as good as Tomcat and
slightly worse than Micronovae. The CSE does not perform as well
because in its current implementation the servlet server might be a
bottleneck, as we have mentioned in Section 3.1. Jetty performs
slower because it seems that its implementation has not been as well
optimized as Tomcat's. CSP scales worst in our benchmark. Obviously,
using the CGI for servlet execution is, at best, only a solution when
C/C++ applications should be integrated into a web server whose
performance is not an issue.
In the bulletin board system test all systems were relatively close together, which
indicates that bulk transfers in Java are similarly fast as in C/C++.
CSP's bad result is likely caused by the fact that it creates and
initializes a new task for every servlet invocation. The difference
of this benchmark over the others is that the bulletin board system's
entries are stored on the file system. Hence we assume that
Micronovae performs worse than the other approaches due to differences
in the file system implementation between Linux and Windows.
The C/C++ based systems have a clear advantage when performing CPU
intensive computations as shown by the Mandelbrot calculation. It seems that the Java just-in-time compiler
is unable to optimize the code as well as the GNU compiler. In the
direct comparison, the compiler from Microsoft Visual C++ which is
used within Micronovae performs worse than its GNU equivalent, but we
do not know what optimizations are performed.
The library benchmark shows severe limitations of the Java-based
systems. In our scenario, C code is more than 16 times faster when
many function calls into a legacy C application need to be done. It
seems that the Java Native Interface (JNI) which handles C/C++ library
calls has not been optimized at all. CSP takes slightly more time
than the CSE but is still an order of magnitude faster than the
Java-based systems. The very good performance of Micronovae is
probably caused by a different shared library mechanism provided by
the Windows platform.
7 Future Work
The current implementation of the CSE has some room for
optimization. This gets apparent by looking at the architecture
presented in Section 3. Requests to the individual
servlets are delegated to the sandboxes by the servlet server. This
is a potential bottleneck and could be solved by letting the Apache
module themselves delegate the requests to the individual sandboxes.
Additionally, our current implementation does not yet allow the
simultaneous execution of a servlet's service() method. This
stems from the fact that we do not yet honor a servlet's
threadSafe attribute, which results in a loss of
performance.
During our tests we identified that our servlet engine is not yet
capable of handling binary content correctly. We assume that this bug
is located within the Apache module that passes the result of the
sandboxes back to the clients. Our current assumption is that we do
not handle the NULL character correctly since it indicates the end of
a C string.
In future versions we also plan to implement a test environment for
servlets and C++ Server Pages that supports testing outside the CSE.
To test the servlet, the environment would be linked to the servlet
and could be debugged like a stand-alone C++ application.
In future versions of the CSE, we also plan submit it to the BOOST
web site [6] which
provides a collection of free peer-reviewed portable C++ source
libraries. The focus of BOOST is on libraries that work well with the
C++ Standard Library and are suitable for eventual
standardization.
8 Conclusions
The contribution of this paper is an architecture that enables the
implementation of high-performance web applications using C or C++
while providing the stability known from Java servlet engines.
We have also implemented a C++ Servlet Environment using this
architecture. Although our implementation has not yet been optimized
and provides enough room for further performance improvements, it
offers similar performance as the top servlet engines available
today.
Our C++ Servlet Environment uses an API which is based on that
provided by Java Servlets and JavaServer Pages. This design choice
has the advantage that developers familiar with this technology will
immediately be able to write C++ Servlets and C++ Server Pages.
Providing a servlet environment for C++ is important since it
allows developers to reuse existing C++ code without having to use
the Java Native Interface [16,25] and hence without having
to deal with different languages and the conversion of different type
systems. As we have explained in Section 5, this need has recently
been identified by other researchers as well. Although similar,
these products are slightly incompatible to each other. Hence, we
think that the standardization of a C++ Servlet Environment will be
important for the future of this technology.
Availability
The C/C++ Servlet Engine is freely available under the GNU General Public License.
A more detailed description of the design and implementation of the CSE can be found in [23]. The CSE as well as its
documentation is available for download from the CSE homepage at
https://www.infosys.tuwien.ac.at/CSE/.
Acknowledgements
We would like to thank Dave Beazley, Andreas Grünbacher, and the many
anonymous reviewers for their helpful comments.
We also gratefully acknowledge the financial support provided by the
USENIX Advanced Computing Systems Association and by the European
Union as part of the EASYCOMP project (IST-1999-14191).
- 1
- Apache Software Foundation.
The Apache HTTP Server.
https://httpd.apache.org/docs-2.0/.
- 2
- Apache Software Foundation.
The Apache Tomcat Servlet Engine.
https://jakarta.apache.org/tomcat/.
- 3
- Ape Software.
The Servlet++ Homepage, 2002.
https://www.apesoft.net/servlet++/.
- 4
- David M. Beazley.
SWIG: An easy to use tool for integrating scripting languages with
C and C++.
In Proceedings of the USENIX Fourth Annual Tcl/Tk Workshop.
USENIX, July 1996.
- 5
- Nina Bhatti, Anna Bouch, and Allan Kuchinsky.
Integrating user-perceived quality into web server design.
Technical Report HPL-2000-3, Hewlett-Packard Laboratories, January
2000.
- 6
- The Boost Homepage.
https://www.boost.org/.
- 7
- Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk, Larry Masinter,
Paul J. Leach, and Tim Berners-Lee.
Hypertext Transfer Protocol--HTTP/1.1, June 1999.
RFC2616.
- 8
- Pankaj K. Garg, Kave Eshgi, Thomas Gschwind, Boudewijn Haverkort, and Katinka
Wolter.
Enabling network caching of dynamic web objects.
In Proceedings of the 12th International Conference on Modelling
Tools and Techniques for Computer and Communication System Performance
Evaluation. Springer-Verlag, April 2002.
- 9
- Teodoro González.
C Server Pages.
https://www.cserverpages.com/.
- 10
- Thomas Gschwind.
PSTL--A C++ Persistent Standard Template Library.
In Proceedings of the 6th USENIX Conference on Object-Oriented
Technologies and Systems, pages 147-158. USENIX, January 2001.
- 11
- Arun Iyengar, Mark S. Squillante, and Li Zhang.
Analysis and characterization of large-scale web server access
patterns and performance.
The World Wide Web Journal, 2:85-100, 1999.
- 12
- Dave Kristol and Lou Montulli.
HTTP State Management Mechanism, February 1997.
RFC2109.
- 13
- Sotiria Lampoudi and David M. Beazley.
SWILL: A simple embedded web server library.
In Proceedings of the 2002 USENIX Annual Technical Conference.
USENIX, June 2002.
- 14
- Amos Latteier and Michel Pelletier.
The Zope Book.
New Riders Publishing, July 2001.
- 15
- Bil Lewis and Daniel J. Berg.
Multithreaded Programming with Pthreads.
Prentice-Hall, 1997.
- 16
- Sheng Liang.
The Java Native Interface: Programmer's Guide and
Specification.
Addison-Wesley, June 1999.
- 17
- Micronovae.
The Micronovae CSP Engine Homepage, 2002.
https://www.micronovae.com/CSP.html.
- 18
- Mort Bay Consulting.
Jetty--Java HTTP server and servlet container, 2002.
https://www.mortbay.org/jetty/.
- 19
- Nathan C. Myers.
Traits: A new and useful template technique.
C++ Report, June 1995.
- 20
- Netcraft.
Netcraft Web Server Survey, October 2002.
https://www.netcraft.com/survey/.
- 21
- Rogue Wave Software.
The Bobcat Servlet Container: Using C++ to Integrate
Applications and Business Logic with the Web.
- 22
- Peter Rossbach and Hendrik Schreiber.
Java Server and Servlets: Building Portable Web Applications.
Addison-Wesley, March 2000.
- 23
- Benjamin A. Schmit.
A C++ Servlet Environment.
Master's thesis, Technische Universität Wien, 2002.
Draft version.
- 24
- Bjarne Stroustrup.
The C++ Programming Language.
Addison-Wesley, special 3rd edition, February 2000.
- 25
- Sun Microsystems.
Java Native Interface Specification, May 1997.
https://java.sun.com/j2se/1.4/docs/guide/jni/.
- 26
- Sun Microsystems.
Java Servlet Specification (Version 2.3), August 2001.
- 27
- Sun Microsystems.
The JavaServer Pages Specification (Version 1.2), August
2001.
- 28
- Guido van Rossum.
Python/C API Reference Manual.
Python Labs, October 2002.
- 29
- Webletworks.
The Weblet Application Server Homepage.
https://www.webletworks.com/.
- 30
- Michael Widenius and David Axmark.
MySQL Reference Manual.
O'Reilly & Associates, June 2002.
Thomas Gschwind
and Benjamin
A. Schmit
This paper was originally published in the
Proceedings of the
USENIX Annual Technical Conference (FREENIX Track),
June 9 14, 2003,
San Antonio, TX, USA
Last changed: 3 Jun 2003 aw
|
|
|