USENIX ;login: - Standards Reports

The Single UNIX Specification, Version 2: Threads Extensions

Andrew Josey <a.josey@opengroup.org> continues his series of articles based on the new Single UNIX Specification, Version 2.

The Single UNIX Specification, Version 2, includes the threads model and interfaces defined in POSIX.1c1995 together with a number of extensions. These extensions, known as the X/Open Threads Extension, based on widely accepted existing industry practice, were developed by the Aspen Group and submitted to The Open Group's Base Working Group (the group that develops operating system interface specifications within The Open Group). This article is a brief introduction to these extensions. It assumes a working knowledge of the threads model specified in POSIX.1c and threads programming concepts in general.

The X/Open Threads Extension is built upon the threads model and interfaces defined POSIX.1c, otherwise known as Pthreads. POSIX.1c contains much optional functionality. When POSIX.1c was incorporated into the Single UNIX Specification, Version 2, the majority of this optional functionality was made mandatory, and additional functionality, known as the Aspen threads extensions submission, was incorporated.

The Aspen Group

Over the past few years almost all UNIX system vendors have implemented some flavor of a threads package based on the POSIX.1c interfaces. Each vendor found that the POSIX.1c interfaces were not complete in solving all their threads requirements. Consequently, most vendors implemented extensions to their thread packages to meet those requirements.

Unfortunately for application developers, not all vendors implemented the exact same set of extensions. To make things worse, the same functionality was added, but used different interface names or parameter sets. In short, this resulted in proprietary threads interfaces that are not portable across implementations, yet certain applications, such as database engines, were making heavy use of these proprietary interfaces.

Fortunately, many of the threads extensions developed were general enough that they are easily supported on any UNIX system threads implementation. In late 1995, the Aspen Group formed a subgroup to standardize the interfaces and functionality of the common thread extensions that various UNIX system vendors had implemented. The threads extensions that came out of this work by the Aspen Group comprise extensions that were made for OSF DCE 1.0 as well as others by Sun, HP, and Digital. The Aspen Group handed the completed work over to X/Open in 1996 as a submission for consideration for inclusion in the next revision of the Single UNIX Specification.

The Aspen Group extended the POSIX.1c interfaces in the following areas:

extended mutex attribute types
readwrite locks and attributes
thread concurrency level
thread stack guard size
parallel I/O

The Aspen Group carefully followed the threads programming model specified in POSIX.1c when developing these extensions. As with POSIX.1c (and unlike traditional UNIX functions), all the new functions return zero if successful; otherwise an error number is returned to indicate the error.

The concept of attribute objects was introduced in POSIX.1c to allow implementations to extend the standard without changing the existing interfaces. Attribute objects were defined for threads, mutexes, and condition variables. Attribute objects are defined as implementationdependent opaque types to aid extensibility, and functions are defined to allow attributes to be set or retrieved. The Aspen Group followed this model when adding the new type attribute of pthread_mutexattr_t and the new readwrite lock attributes object pthread_rwlockattr_t.

Extended Mutex Attributes

POSIX.1c defines a mutex attributes object as an implementationdependent opaque and specifies a number of attributes this object must have and a number of functions that manipulate these attributes.

The Single UNIX Specification, Version 2, specifies another mutex attribute called type. The type attribute allows applications to specify the behavior of mutex-locking operations in situations where the POSIX.1c behavior is undefined. The OSF DCE threads implementation, which was based on Draft 4 of POSIX.1c, specified a similar attribute, but the names of the attributes have changed somewhat from the OSF DCE threads implementation.

The Single UNIX Specification, Version 2, also extends the specification of the following POSIX.1c functions that manipulate mutexes:

pthread_mutex_lock()
pthread_mutex_trylock()
pthread_mutex_unlock()

These take account of the new mutex attribute type and specify behavior declared undefined in POSIX.1c. How a calling thread acquires or releases a mutex now depends upon the mutex type attribute.

ReadWrite Locks and Attributes

Readwrite locks (also known as readerswriter locks) allow a thread to exclusively lock some shared data while updating that data or allow any number of threads to have simultaneous readonly access to the data.

Unlike a mutex, a readwrite lock distinguishes between reading data and writing data. A mutex excludes all other threads. A readwrite lock allows other threads access to the data, providing no thread is modifying the data. Thus, a readwrite lock is less primitive than either a mutexcondition variable pair or a semaphore.

Application developers should consider using a readwrite lock rather than a mutex to protect data that is frequently referenced but seldom modified. Most threads (readers) will be able to read the data without waiting and will have to block only when some other thread (a writer) is in the process of modifying the data. Conversely, a thread that wants to change the data is forced to wait until there are no readers. This type of lock is often used to facilitate parallel access to data on multiprocessor platforms or to avoid context switches on single processor platforms where multiple threads access the same data.

If a readwrite lock becomes unlocked and there are multiple threads waiting to acquire the write lock, the implementation's scheduling policy determines which thread will acquire the readwrite lock for writing. If there are multiple threads blocked on a readwrite lock for both read locks and write locks, it is unspecified whether the readers or a writer acquire the lock first. However, for performance reasons, implementations often favor writers over readers to avoid potential writer starvation.

A readwrite lock object is an implementationdependent opaque object. There are two different sorts of locks associated with a readwrite lock a read lock and a write lock.

A thread that wants to apply a read lock to the readwrite lock can use either pthread_rwlock_rdlock() or pthread_rwlock_tryrdlock(). If pthread_rwlock_rdlock() is used, the thread acquires a read lock if a writer does not hold the write lock and there are no writers blocked on the write lock. If a read lock is not acquired, the calling thread blocks until it can acquire a lock. However, if pthread_rwlock_tryrdlock() is used, the function returns immediately with the error EBUSY if any thread holds a write lock or there are blocked writers waiting for the write lock.

Similarly, a thread that wants to apply a write lock to the readwrite lock can use either of two functions: pthread_rwlock_wrlock() or pthread_rwlock_trywrlock(). If pthread_rwlock_wrlock() is used, the thread acquires the write lock if no other reader or writer threads hold the readwrite lock. If the write lock is not acquired, the thread blocks until it can acquire the write lock. However, if pthread_rwlock_trywrlock() is used, the function returns immediately with the error EBUSY if any thread is holding either a read or a write lock.

The pthread_rwlock_unlock() function is used to unlock a readwrite lock object held by the calling thread. Results are undefined if the readwrite lock is not held by the calling thread. If there are other read locks currently held on the readwrite lock object, the readwrite lock object shall remain in the read locked state, but without the current thread as one of its owners. If this function releases the last read lock for this readwrite lock object, the readwrite lock object will be put in the unlocked read state. If this function is called to release a write lock for this readwrite lock object, the readwrite lock object will be put in the unlocked state.

The same POSIX working group that developed POSIX.1b and POSIX.1c is currently developing the POSIX.1j draft standard, which specifies a set of extensions for realtime and threaded programming. This includes readerswriter locks that are nearly identical to the Single UNIX Specification, Version 2, readwrite locks. The Aspen Group was aware of this draft standard, but felt that there was an immediate and urgent need for standardization in the area of readwrite locks.

The following table maps the Single UNIX Specification, Version 2, readwrite lock functions to their equivalent POSIX.1j draft 5 functions:

SUS, V2& IEEE PASC P1003.1j

pthread_rwlock_init() rwlock_init()

pthread_rwlock_destroy() rwlock_destroy()

pthread_rwlock_rdlock() rwlock_rlock()

pthread_rwlock_tryrdlock() rwlock_tryrlock()

pthread_rwlock_wrlock() rwlock_wlock()

pthread_rwlock_trywrlock() rwlock_trywlock()

thread_rwlock_unlock() rwlock_unlock()

The pthread_setconcurrency() function enables an application to request more kernel entities, that is, specify a desired concurrency level. However, this function merely provides a hint to the implementation. The implementation is free to ignore this request or to provide some other number of kernel entities. If an implementation does not multiplex user threads onto a smaller number of kernel execution entities, the pthread_setconcurrency() function has no effect.

The pthread_setconcurrency() function may also have an effect on implementations where the kernel mode and user mode schedulers cooperate to ensure that ready user threads are not prevented from running by other threads blocked in the kernel.

The pthread_getconcurrency()function always returns the value set by a previous call to pthread_setconcurrency().

Thread Stack Guard Size

DCE threads introduced the concept of a thread stack guard size. Most thread implementations add a region of protected memory to a thread's stack, commonly known as a guard region, as a safety measure to prevent stack pointer overflow in one thread from corrupting the contents of another thread's stack. The default size of the guard regions attribute is PAGESIZE bytes and is implementationdependent.

Some application developers may wish to change the stack guard size. When an application creates a large number of threads, the extra page allocated for each stack may strain system resources. In addition to the extra page of memory, the kernel's memory manager has to keep track of the different protections on adjoining pages. When this is a problem, the application developer may request a guard size of 0 bytes to conserve system resources by eliminating stack overflow protection.

Conversely, an application that allocates large data structures such as arrays on the stack may wish to increase the default guard size in order to detect stack overflow. If a thread allocates two pages for a data array, a single guard page provides little protection against thread stack overflows because the thread can corrupt adjoining memory beyond the guard page.

The Single UNIX Specification, Version 2, defines a new attribute of a thread attributes object; that is, the guardsize attribute that allows applications to specify the size of the guard region of a thread's stack.

An implementation may round up the requested guard size to a multiple of the configurable system variable PAGESIZE. In this case, pthread_attr_getguardsize() returns the guard size specified by the previous pthread_attr_setguardsize() function call and not the rounded up value.

If an application is managing its own thread stacks using the stackaddr attribute, the guardsize attribute is ignored, and no stack overflow protection is provided. In this case, it is the responsibility of the application to manage stack overflow along with stack allocation.

Parallel I/O

Many I/O intensive applications, such as database engines, attempt to improve performance through the use of parallel I/O. However, POSIX.1 does not support parallel I/O very well because the current offset of a file is an attribute of the file descriptor.

Suppose two or more threads independently issue read requests on the same file. To read specific data from a file, a thread must first call lseek() to seek the proper offset in the file and then call read() to retrieve the required data. If more than one thread does this at the same time, the first thread may complete its seek call, but before it gets a chance to issue its read call, a second thread may complete its seek call, resulting in the first thread accessing incorrect data when it issues its read call. One workaround is to lock the file descriptor while seeking and reading or writing, but this reduces parallelism and adds overhead.

Instead, the Single UNIX Specification, Version 2, provides two functions to make seek/read and seek/write operations atomic. The file descriptor's current offset is unchanged, thus allowing multiple read and write operations to proceed in parallel. This improves the I/O performance of threaded applications. The pread() function is used to do an atomic read of data from a file into a buffer. Conversely, the pwrite() function does an atomic write of data from a buffer to a file.

More Information

More information on the Single UNIX Specification, Version 2, can be obtained from The Open Group Source Book, Go Solo 2 The Authorized Guide to Version 2 of the Single UNIX Specification, 500 pages, ISBN 0135756898. This book provides complete information on what's new in Version 2, with technical papers written by members of the working groups that developed the specifications, and a CDROM containing the complete 3,000-page specification in both HTML and PDF formats (including PDF reader software). For more information on the book, see <https://www.UNIXsystems.org/gosolo2>. Additional information on the Single UNIX Specification can be obtained at The Open Group WWW site, <https://www.UNIXsystems.org/>.