Portably Solving File TOCTTOU Races with Hardness AmplificationAbstract:The file-system API of contemporary systems makes programs vulnerable to TOCTTOU (time of check to time of use) race conditions. Existing solutions either help users to detect these problems (by pinpointing their locations in the code), or prevent the problem altogether (by modifying the kernel or its API). The latter alternative is not prevalent, and the former is just the first step: programmers must still address TOCTTOU flaws within the limits of the existing API with which several important tasks can not be accomplished in a portable straightforward manner. Recently, Dean and Hu addressed this problem and suggested a probabilistic hardness amplification approach that alleviated the matter. Alas, shortly after, Borisov et al. responded with an attack termed “filesystem maze” that defeated the new approach. We begin by noting that mazes constitute a generic way to deterministically win many TOCTTOU races (gone are the days when the probability was small). In the face of this threat, we (1) develop a new user-level defense that can withstand mazes, and (2) show that our method is undefeated even by much stronger hypothetical attacks that provide the adversary program with ideal conditions to win the race (enjoying complete and instantaneous knowledge about the defending program's actions and being able to perfectly synchronize accordingly). The fact that our approach is immune to these unrealistic attacks suggests it can be used as a simple and portable solution to a large class of TOCTTOU vulnerabilities, without requiring modifications to the underlying operating system.
1 IntroductionThe TOCTTOU (time of check to time of use) race condition was characterized in 1974 by McPhee as the situation which occurs “if there exists a time interval between a validity-check and the operation connected with that validity-check [such that], through multitasking, the validity-check variables can deliberately be changed during this time interval, resulting in an invalid operation being performed by the control program.” [25]Dissecting a 1993 CERT advisory [7], Bishop was the first to systematically show that file-systems with weak consistency semantics (like Unix and Windows) are inherently vulnerable to TOCTTOU races [3,4]: First, a program checks the status of a file using the file's name. Then, depending on the status, it applies some operation to the file, unjustifiably assuming the status has not changed since it was checked. This error is caused by the fact that the mapping between file names and file objects (“inodes”) is mutable by design, and might therefore change between a status check and a subsequent operation. Researchers have put a lot of effort into trying to solve or alleviate the problem, (1) developing compile-time tools to pinpoint locations in the source code that are suspect of suffering from a TOCTTOU race [4,37,10,8,30], (2) modifying the kernel to log all relevant system calls and analyzing the log, postmortem, to detect TOCTTOU attacks [20,16,21,19,39,1], (3) having the kernel speculatively identify offending processes and temporarily suspend them or fail their respective suspected system calls [11,34,27,35,28], and finally (4) designing new file-system interfaces to make it easier for programmers to avoid the races. [3,29,24,40]. None of the above helps programmers to safely and portably accomplish a TOCTTOU-prone task on existing systems, as kernels that prevent races are currently an academic exercise, whereas new-and-improved file-systems are unfortunately not prevalent (and certainly not standard). Thus, regardless of how programmers become aware of the problem, whether through compile-time tools or just by being careful, they must still face the problem with the existing API. At the same time, resolving a TOCTTOU race is not as easy as, e.g., fixing a buffer overflow bug, because the programmer must somehow achieve atomicity of two operations using an API that was not designed for such a purpose. In fact, overcoming TOCTTOU races in a portable manner is notoriously hard, sometimes even for experts (see Section 2.3). Hence, it is probably impractical to expect average programmers to successfully accomplish such tasks (or attempt them) on a regular basis. Indeed, to date, TOCTTOU races pose a significant problem, as exemplified by Wei and Pu, which analyzed CERT [36] advisories between 2000 and 2004 and found 20 reports concerning the issue, 11 of which provided the attacker with unauthorized root access [39]. Figure 1 shows the yearly number of TOCTTOU “symlink attack” vulnerabilities reported by NVD (National Vulnerability Database) [26]. These affect a wide range of mainstream applications and tools (e.g., bzip2, gzip, FireFox, make, OpenOffice, OpenSSL, Kerberos, perl, samba, sh), environments (e.g., GNOME, KDE), distributions (e.g., Debian, Mandrake, RedHat, SuSE, Ubuntu), and operating systems (e.g., AIX, FreeBSD, HPUX, Linux, Solaris).
We contend that the situation can potentially be greatly improved if programmers are able to use some portable, standard, generic, user-mode check_use utility function that, given a 'check' operation and a 'use' operation, would perform the two as a kind of “transaction”, in a way that appears atomic for all relevant purposes. This paper takes a significant step towards achieving such a goal. The first step in this direction was taken in 2004 by Dean and Hu, which implemented a transaction-like access_open routine that set out to solve a single race [12]: the one which occurs between the access system call (used by root to check if a user has adequate privileges to open a file) and the subsequent open. Their idea (later termed K-race [5]) was to use hardness amplification as found in the cryptology literature [41], but applied to system calls rather than cryptologic primitives. In a nutshell, if an adversary has a probability p < 1 to win a race, then the probability pK to win K races can be made negligible by choosing a big enough K. Indeed, by mandating attackers to win K consecutive races before agreeing to open the file, access_open seemingly accomplished its “transactional” goal of aggregating access and open into a single “atomic” operation. But the new and intriguing K-race defense did not stand the test of time. In 2005, Borisov et al. orchestrated their filesystem maze attack and showed that an adversary can in fact win every race (hence making the assumption that p < 1 wrong) [5]. Roughly speaking, the adversary is able to slow down, and effectively “single step”, the proposed algorithm by feeding it with a carefully constructed file name (the “maze”) and polling the status of certain components within the name. This induces perfect synchronicity between the adversary and the K-race, thereby enabling the adversary to win all races (p ≈ 1). Indeed, in his on-line publication list, adjacent to his 2004 paper [12], Alan Hu concedes that “The scheme proposed here has been beautifully and thoroughly demolished by Borisov, Johnson, Sastry, and Wagner [5]. The theory is, of course, still valid, but it relies on an assumption of the attacker having a non-negligible probability of losing races. Borisov et al. came up with ingenious means (1) to force the victim to go to disk on each race, thereby allowing plenty of time for the attacker to win races, and (2) to determine precisely what protocol operation the victim is doing at any point in time, thereby foiling the randomized delays. The upshot is that they can win these TOCTTOU races with almost complete certainty.” [17] Dean and Hu were only concerned with finding a way to correctly use the acess system call; likewise, the explicit goal of Borisov et al. was to prove that access should never be used. But the consequences of the filesystem maze attack are much more general. In fact, mazes constitute a generic way to consistently win a large class of TOCTTOU races. This is true because any 'check' operation can be slowed down and single-stepped, if provided with a filesystem maze as an argument. Consequently, the common belief that
“TOCTTOU vulnerabilities are hard to exploit, because they [...] rely on whether the attacking code is executed within the usually narrow window of vulnerability (on the order of milliseconds)” [39] is no longer true: With filesystem mazes, the attacker can often proactively prolong the vulnerability window, while simultaneously finding out when it opens up. Motivated by the alarmingly wide applicability of the filesystem maze attack, we set out to search for an effective defense, with the long-term goal of providing programmers with a generic and portable check_use utility function that would allow for a pseudo-atomic transaction of the 'check' and 'use' operations. Importantly this should work on existing systems, without requiring changes to the kernel or the API it provides. This paper is structured as follows: After exemplifying the TOCTTOU problem in detail, surveying the existing solutions, and pointing out their shortcomings and the elusiveness of a contemporary practical solution (Section 2), we go on to explain how hardness amplification was applied to solve file TOCTTOU races, and why it has failed (Section 3). We then show how to turn this failure to success (Section 4) and experimentally evaluate our solution by subjecting it to a hypothetical attack far more powerful than filesystem mazes (Sections 5-6). We discuss how to generalize our solution, its limitations, and how/when its probabilistic aspect can be eliminated (Section 7). Finally, we present our conclusions (Section 8).
|
|
Another well known TOCTTOU example, initially documented by Bishop, is that of a mail server which appends a new message to the corresponding user's Inbox file [3,4]. Before open-ing the Inbox, the server lstat-s it to rule out the possibility the user has replaced it with some symbolic link pointing to a file that lies elsewhere. Table 2 shows how the inevitable associated TOCTTOU race can be exploited to add arbitrary data to the /etc/passwd file, providing the attacker with the ability to obtain permanent root access.
|
A third example concerns the setuid bit that Unix-like systems associate with an executable to indicate it should run with the privileges of its owner, rather than the user that invoked it (as is the normal case). Of course just handing off root privileges is not a good idea, which is why the access system call conveys setuid programs the ability to check whether an invoker has adequate privileges:
Alas, the access/open idiom constitutes the archetypal, and arguably the most infamous, TOCTTOU flaw.1 Table 3 illustrates how this race can be exploited to access any file; access was therefore deemed unusable, as e.g. indicated by its FreeBSD manual, explicitly stating that “the access system call is a potential security hole due to race conditions and should never be used.” [22]
|
Considerable research effort have been put into providing solutions for TOCTTOU vulnerabilities like the ones described above. In order to highlight the contribution of this paper we first survey this work, which can be subdivided into four categories:
In 2003, Tsyrklevich and Yee developed a more general approach that was capable of generically preventing most TOCTTOU attacks [34]. They patched the kernel to suspend any process that interferes with a “pseudo transaction” (check/use pair that agree on the target file), such that the worst outcome of a false-positive detection is a temporary suspension of the corresponding process. Several similar solutions followed [27,35,28]; the latter of which, by Pu and Wei, was argued to be “complete”, being based on their aforementioned earlier work [39].
To obtain a more general solution, a bigger change is needed, such as replacing (or augmenting) Unix semantics with that of a transactional file-system [29,40]: Atomicity would then insure that a check/use pair that was annotated by the programmer as a single transaction would be executed with no interference.
A more radical approach was suggested by Maziéres and Kaashoek [24]. They proposed to use the fact that the binding between file descriptors and inodes is immutable (and thus cannot be exploited) to devise a safer programming paradigm that would make it harder for the programmer to make mistakes. By this paradigm,
Notice that all the existing solutions surveyed above do not help programmers in resolving a known TOCTTOU flaw within existing systems. Static detection techniques are invaluable in locating such flaws, but what are programmers to do if/once they are aware of the vulnerability? Surely they cannot wait until all contemporary kernels employ dynamic prevention (if ever, as significant complexity and performance penalty might be involved). Likewise, programmers cannot wait until all contemporary OSs portably support transactional file-systems (or constructs like the aforementioned API suggested by Maziéres and Kaashoek).
The fact of the matter is that, in order to achieve a portable solution, programmers are bound to handling the matter with a decades-old API. Importantly, as mentioned earlier, a portable user-mode solution to a given TOCTTOU race (if exists) is often much harder and more elusive than e.g. fixing a buffer overflow bug: even experts that explicitly target a specific TOCTTOU problem are prone to getting it wrong.
Consider for example the access/open race depicted in Table 3. Tsyrklevich and Yee suggested two solutions to this flaw [34]. The first argues that “to avoid this race condition, an application should change its effective id [with set∗uid system calls] to that of a desired user and then make the open system call directly.” However, after carefully evaluating this suggestion, Dean and Hu found that
“Unfortunately, the setuid family of system calls is its own rats nest. On different Unix and Unix-like systems, system calls of the same name and arguments can have different semantics, including the possibility of silent failure [9]. Hence, a solution depending on user id juggling can be made to work, but is generally not portable.” [12]
The second suggestion by Tsyrklevich and Yee was “to use fstat after the open instead of invoking access”. As the input of fstat is a file descriptor, the latter is permanently mapped to the underlying inode and hence can never be abused by an attacker; the user is then expected to inspect the ownership information returned by fstat and check if the invoker was indeed allowed to open the file. But this will not work, as file access permissions can not be deduced in such a way; rather, they are the conjunction of all the (inode) permissions associated with each component in the respective path. For example, if a file's name is x/y such that x is solely accessible by its owner, then other users are forbidden from reading y even if fstat indicates it is readable by all (which may very well be the case when root invokes the fstat).
A third alternative is to fork a child that permanently drops all extra privileges and then attempts to open the file; if successful, the child can then pass the open file descriptor across a Unix-domain socket and exit. Borisov et al. [5] have mistakingly attributed the claim that this version is portable, to Dean and Hu [12]. But the latter have actually argued the contrary, stating that, with respect to the Unix-domain approach, “some of the above [user id juggling] caveats still apply”. Indeed, as mentioned earlier, dropping privileges is a non-portable operation [9]. (Regardless of whether it is being done by a parent or a forked child.) Furthermore, we find that passing an open descriptor alone, even without dropping privileges, suffers from serious portability issues.4
A fourth failed attempt will be discussed next.
In 2004, noting that no prior art helps programmers to portably resolve TOCTTOU vulnerabilities on existing systems, Dean and Hu took the first step towards a portable solution [12], explicitly focusing their efforts on the aforementioned access/open TOCTTOU race.
Their solution, termed “K-race”, was inspired by the hardness amplification technique that is commonly used in cryptology contexts [41]. The idea underlying hardness amplification is to use a problem which is computationally “somewhat hard”, in order do devise another computational problem that is “really hard”. In a TOCTTOU access/open scenario, the “somewhat hard” problem is timing and completing the attack (removing one file and linking another) within the exact window of opportunity delimited by the access and open calls (see Table 3). The “really hard” problem is requiring the attacker to succeed in doing this for 2K+1; consecutive times.
The K-race routine, shown in Figure 2, starts with a standard call to access, followed by an open, followed by K strengthening rounds. Each round consists of an additional access check and a corresponding open, which are then followed by a statement that verifies that the currently opened file is the same file that was opened in the previous round. Note that when K=0, the routine degenerates to the standard access/open TOCTTOU race.
|
To be successful, an attacker must indeed win 2K+1; races: This is true because, on each round, the access check must be applied to some user accessible file, or else permission is denied; On the other hand, every open must be applied to the same inaccessible target file, or else the verification that all file-descriptors refer to the same file-object would fail. Thus, assuming each race is an independent random event with some probability p < 1 for the attacker to win, the overall probability of tricking a K-race is p2K+1. (Independence of events is supposedly obtained by introducing short random delays between successive system call invocations: as delays are randomized, an adversary wouldn't be able to synchronize with the K-race.) After measuring several systems (among which are SMP systems), Dean and Hu concluded that K=7 is enough to make the probability of success negligible for all practical purposes.
In 2005, Borisov et al. defeated the K-race technique [5]. They have done so by refuting the (then widely accepted) assumption that the probability p for an attacker to win a race is significantly smaller than one. In fact, they have managed to effectively make it a certainty (p ≈ 1). The heart of the attack consists of a filesystem maze, which, in simple terms, is the longest and most nested filepath a user can pass as an argument to a system call, without causing it to fail due to hardcoded kernel limits.
To form a maze, the attacker connects chains by placing a symbolic link at the bottom of chaini+1 that points to chaini. The final symlink, at the bottom of chain0, points to an exit symlink which, in turn, points to the actual target file. Finally, the entry point to the maze, sentry, is a symlink pointing to the highest chain. This is illustrated in Figure 3.
Unix systems impose a limit on the total number of symlinks that a single filename lookup can traverse, e.g., Linux 2.6 limits this number to 40. This places a limit on the number of chains composing the maze. Still, even with this limit, a maze can be composed of nearly 80,000 directories which may require loading about 300MB from the disk, just to resolve the associated name.
Importantly, if even one of the corresponding directory entries is not found in-memory, in the filesystem cache, the process that invoked the system call on behalf of which the path resolution is performed would be put to sleep, blocked-waiting for IO.
The attack is started by invoking the access_open K-race routine with the following filepath as an argument
This filepath is then passed along to the initial access call, which forces the K-race routine into the first maze. As a result, two things occur
Eventually, the IO operations complete and the access finishes successfully. When the K-race calls the subsequent open, the exact same scenario occurs: the kernel updates the atime of dir2/sentry, the K-race routine sleeps on IO when loading parts of the respective maze that are not cached, the attacker consequently resumes and notices the updated atime of dir2/sentry, the attacker switches activedir to point to dir3, and the K-race routine completes the open successfully. This sequence of events repeats itself until all the system calls composing the K-race complete, and the attacker has managed to fool the K-race and open the protected file.
was found to be especially effective in this respect.
Finally, for completeness, Borisov et al. considered a K-race version that randomly flips the order of the calls to access and open within the strengthening loop (this is a valid and technically sound defense against their maze attack). They defeated this approach as well, by deducing which system call is currently being executed with the help of various kernel variables exported through the /proc file-system. For example, in Solaris 9, any process can read the current system call number of any other process from /proc/pid/psinfo.
The maze attack is a generic way to systematically win TOCTTOU races. By utilizing complex file names, an attacker can slowdown the victim application, effectively single-step it, and gain a decisive advantage, which allows it to defeat the probabilistic K-race approach. In this section we show that this advantage is in fact not inherent. Defenders need not play by the rules that are dictated by the attacker. Rather, they can impose new rules that make it practically impossible for an attacker to win.
The key observation is simple and well known: system calls like
open, stat, chdir, access,
chown etc. that operate on a specified file name, are in
fact O(n) algorithms, where n is the number of components
composing the name (n also embodies symlinks that are part of the
name as well as the components of the soft links that must be
recursively traversed).
And so, in order to resolve an n-component name, the associated
system call must sequentially iterate through n inodes.
In the case of the K-race approach this is done K times, so the number
of traversed inodes is actually n × K.
The order in which the traversal is performed is crucial for the
success of the maze attack; assuming a file name of the form
/f1/f2/f3 (with no symbolic links along the way) and assuming
K = 2, this order would be:
Our approach contends that row-orientated traversal, while seemingly
dictated by the system call API, is not carved in stone.
There is actually no technical difficulty preventing us from doing
a different inode traversal that would better suit our needs.
Specifically, column-oriented traversal is perfectly aligned with our
intent to make it harder for an adversary to win a race.
This approach is illustrated in Figure 4 (right).
The idea is to resolve a path one component at a time, atom by atom,
such that on each step we effectively conduct a kind of “short race”
or “atom race”, as part of the K-strengthening doctrine.
This approach provides a clear advantage: an adversary no longer has
control over the duration of the elapsed time between consecutive
visits at fi, e.g. the traversal order in the above example would
be:
We will now describe our algorithm in a bottom-up fashion (all source code included, as an indication of its simplicity). Doing a column-oriented traversal entails a price, which is having to handle the parsing of the file path ourselves when splitting it into atoms. For our purposes, however, the chop_1st function (as listed in Figure 5) was all that was needed in this respect. This function gets a relative path and “chops off” the first component while returning the remainder to the caller. By repeatedly invoking this function (using the remainder of the path from the previous invocation as the input to the current invocation), we gradually consume the file path in a column-oriented manner.
char* chop_1st(char *path) { // Find the end of the first component and // null-teminate it char *p = strchr(path,'/'); if( p == NULL ) return NULL; *p++ = '\0'; // Handle multiple consecutive occurrences // of '/'. This ensures that the remainder // of the path is returned in a "relative" // form (without preceding slashes) for(; *p == '/'; ++p) ; // Returning NULL to indicate end of path return *p ? p : NULL; } |
A second difficulty one faces when doing a user-level path resolution is having to handle atom components that are in fact symbolic links. To handle this caveat we used the simple is_symlink function (listed in Figure 6) that gets as input the atom that was just chopped off the prefix of the full file path. Note that by applying the lstat system call upon the given atom we make sure that the invoker is not forced to go through a maze. If this atom happens to be a symbolic link, then is_symlink copies the name of the target file to the memory pointed to by the appropriate argument; this would be later processed recursively. However, if the atom is a hard link (read: not a symlink), then the result of the lstat operation (as recorded by the given stat structure) will be used as a reference point within the race, when inodes are compared, as described next.
|
Having dealt with all the low-level details, we go on to consider how a race would actually be conducted when a hard link is finally encountered. Recall that the access permissions of a file are more than just the per-inode access bits (user/group/all read/write/execute etc.): they are the conjunction of all the permissions of each and every directory component along the path. For example, even if an inode indicates it is readable by all, if it nevertheless resides within a private directory, then obviously no one should be able to access the associated file. Therefore, before descending into the next directory component, the algorithm must verify that the invoker has the appropriate permissions. However, since this entails a TOCTTOU vulnerability, each such check must be K-strengthened.
Figure 7 shows how a per-atom K-race is conducted. Note that the security of our algorithm is reduced to the security of atom_race (all other functions are completely safe). The information encapsulated by the stat structure input was placed there by the is_symlink function that has just been invoked using the very same atom. Thus, it is likely that the inode (that is associated with the atom) is still in the cache. Further, since the atom is in fact an “atom” (one component file) that has just now been verified to be a hard link, it is also likely that the initial call to access and open would operate on the same inode. However, since there is a chance the attacker has managed to (1) unlink the previously lstated atom, and to (2) symlink it to a maze, strengthening steps are still required. The algorithm therefore continues into a K-loop that is almost identical to the one suggested by Dean and Hu (Figure 2). All the original operations are still present. The difference is that now, on each iteration, the algorithm also verifies that the atom is still a hard link. This check is necessary in order for the defense to recover, if the attacker somehow managed to win the first race and to force the algorithm into a maze while doing the access and open operations. Since the lstating of an atom is an operation that is not affected in any way by the target that a symbolic link might have, our algorithm is not vulnerable in this respect. The only other additions we have made are (1) to check that fstating the initial file we open (fd1) yields identical information to that pointed to by s0, as the K strengthening rounds utilize s0 for the verification checks, and (2) to check that the lstated inode matches the initial inode, similarly to the original check with regard to the information that is retrieved by fstat.
|
Note that the two invocations of DO_CMP within the strengthening loop insures that all three stat structures are equal (s0 = s1 = s2), a check that is needed for the following reasons. By verifying that s1 is equal to s2, we know for a fact that the lstated and the opened files are one and the same, which means we deterministically force an adversary to win a race involving a non-symlink atom, on each round. This by itself, however, is not enough, as we must also make sure that s1 and s2 are equal to s0: failing to do so would make the K-loop meaningless, allowing an attacker to unlawfully open the file after winning only two races, as follows
We now have everything we need in order to implement a column-oriented K-race traversal. The access_open procedure we implement does this in a straightforward manner, as is shown in Figure 8. The first chunk of code simply makes sure that the traversal is only conducted with the help of relative names (that do not start with a slash). The second chunk is the traversal per-se. This part simply iterates through the atom components, one component at a time, and takes the necessary action according to whether the atom is a symbolic link or not. The latter is the simpler alternative: if the atom is a hard link, a short atom_race is conducted and the atom is directly opened. However, if the atom is a symbolic link, the algorithm calls itself recursively to handle the newly encountered composite path. In both cases, if a valid file descriptor is returned, the algorithm is allowed to continue to the next step after fchdiring to the current directory component. This strategy ensures us that there is a high probability that all relevant inodes reside in the cache during the time in which this is critical: when the K-race takes place.
|
For brevity, the presented algorithm does not handle several minor details that should be addressed in a real implementation:
First, it lacks a defense mechanism against circular symbolic links. This can be easily incorporated within the procedure shown in Figure 8 in the exact same manner as it is done within the kernel, that is, by counting the number of traversed symbolic links and aborting the procedure if the count violates some predefined threshold.
Second, our algorithm opens a file for reading only. It does not allow the caller to specify other / additional flags to be passed along to open (such as O_RDWR, O_APPEND, etc). There is no technical difficulty preventing us from adding a “flags” parameter that allows this, as long as we provide special treatment for file truncation (O_TRUNC) and forbid file creation (O_CREAT). Truncation is problematic as the first open would truncate the file regardless of whether the real user has adequate permissions to do so; the solution is to access/open the file without O_TRUNC and, if successful, to ftruncate the resulting descriptor. File creation raises other (independent and well-known) TOCTTOU issues that are commonly associated with the problem of creating temporary files [11]; these are outside the scope of this paper.
Additional details that should be handled are (1) setting errno to EACCES when appropriate, namely, when DO_CMP and DO_CHK fail, (2) closing already opened file descriptors (if exist) upon errors, e.g., when fstat fails in Figure 7, and (3) saving and restoring the working directory before and after the invocation of access/open, to undo the effect of using fchdir.
The final item raises an important point we wish to make explicit: our access/open implementation is inadequate for multithreaded applications if some other thread (different than the one performing the access/open) requires the working directory to remain unchanged, as this directory is shared by all threads. We note in passing that the relatively new system call openat (which opens a filepath relative to a given directory file descriptor [23]) would solve this problem, as it will eliminate the need for using fchdir; openat is proposed for inclusion in the next revision of POSIX [18].
It should come as no surprise that the new access_open algorithm is completely immune from the maze attack, as the latter completely lost its timing ability: the attacker colossally fails to synchronize with the activities of the defender, and has no clue about when it would be most beneficial to unlink/link the targeted file in order to fool the defense. Nevertheless, while we believe it is improbable, it is still possible that somebody someday would come up with some surprising approach that would allow an attacker to achieve synchronicity once again. Hence, we seek a much stronger result.
To this end, we run an experiment in which the defender is completely “exposed”: any attacker would be able to precisely know which actions are taken by the defender and when. In other words, our experiment fully reinstates the synchronicity capabilities to potential attackers, make these capabilities orders of magnitude more powerful and precise, and measures the probability attackers have to win a single round in light of the new approach; the bigger question being: Do file TOCTTOU races still pose a problem in the face of a column-oriented traversal? And if so, to what extent?
To answer this question we have implemented a defender program that provides information regarding its activities to any interested party through a shared-memory integer variable (instated with the help of SysV IPC facilities). The code of the defender is listed in Figure 9. It essentially does all of the defense-steps that are listed in Figure 7, but now each step is executed only after the defender publishes (through the shared integer) the next action to be performed. Note that the DO_SYS macro is redefined to record a system-call failure (instead of returning). This is done so that the defender process will not terminate. But it also means the defender maintains a fixed order of operations and thereby simplifies the code of the attacker (which is exempt from considering various corner cases). Importantly, an attacker may safely assume that the defender performs the same exact operations in the same exact order within each iteration.
|
In accordance to the column-oriented doctrine, the defender is operating on a file which is an atom, namely, composed of only one component that is arbitrarily called “target”. Upon each iteration, after the operation sequence is over, the defender checks whether the attack was successful, and if so increments its losses count to be printed at the end of the run. The conditions that are asserted at the end of each iteration are identical to those that are checked on the fly within Figure 7, with only one addition: the defender is made aware beforehand of the inode of the private file that the attacker wants to read; obviously, an attack is successful only if it managed to fool the defender into opening this file.
We now go on to review the attacker's code, as given in Figure 10. Initially, the attacker must make sure that the file to be lstated is not a symbolic link. Additionally, since the defender is going to compare the inode of the lstated file to that of the opened file (which is the private file if the attacker gets his way), the 'target' file should point to the private file at this point. The attacker then waits until the defender is ready to lstat. As explained, the attacker's interest dictates that the defender would be able to successfully lstat the private file, and so the attacker must give it enough time to do so. This is also the reason for the next 'while' loop that ends when the defender finishes the lstat, or before, depending on the heuristic we have chosen to prematurely terminate the busy-waiting: We have evaluated a wide range of T1 values (see next section); Note that when T1 = 0, the busy wait period continues until the shared variable changes. But when T1 > 0 waiting may be shorter, as T1 bounds the number of busy-wait iterations and so the smaller it is, the shorter the wait.
|
After the defender lstats the private file, the real race is on, as the defender is about to check access and so the attacker must arrange things such that 'target' will point to an appropriate location. Additionally, the attacker aspires to slow down the defender by forcing him into a maze, in order to have a better chance of winning future races. The attacker therefore symlinks the target to a maze. Much like with the initial lstat operation, the attacker must now speculate when the access operation is already in flight. Once again, it may be advisable to end the busy waiting before the shared variable changes, and so another timer limit — T2 — is employed; We allow for two different limits so as to maximize the chances of success. The attacker is now hopeful that the defender has been forced into the maze, which would mean he can safely prepare towards the next open by linking to the private file. But even if the attacker was not successful, this is the correct thing to do in preparation for the defender's next lstat at the beginning of the next round.
Our goal is to find out whether the column-oriented traversal technique is effective against the above hypothetical attack. (If this turns out to be the case, we can be reasonably sure that our solution would be effective in real-life scenarios where the defender is not exposed.)
We obtain our goal by quantifying the expected time that a
hypothetical attack should run in order to achieve k consecutive
wins.
Let this time be denoted Bk.
If p is the probability for an attacker to win one round (iteration)
within the exposed defender's loop, and t is the time it takes to
conduct one round, then
In order to increase the attackers' chances to win, we run the experiments on multiprocessors only. This way, attackers will have processors of their own to continuously and repeatedly attempt to fool the defender. In an effort to generalize the results, the experiments are conducted on older and recent machines, from different vendors, running different operating systems, as follows
Processor | Operating system | CPUs | Clock | Memory |
UltraSPARC-II | Solaris 8 | 4 | 448 MHz | 2 GB |
Pentium-III | Linux 2.4.26 | 4 | 550 MHz | 1 GB |
Power4 | AIX 5.3 | 8 | 1450 MHz | 16 GB |
Dual Core AMD | Linux 2.6.22 | 4 | 2200 MHz | 8 GB |
Intel Core 2 Duo | Linux 2.6.20 | 2 | 2400 MHz | 4 GB |
The 'maze' file we use is constructed to be the biggest that is possible on the respective OS, considering the aforementioned limits on the size of a filepath and the number of symbolic links it entails. Like Dean and Hu [12] and Borisov et al. [5] before us, we use a local file system for our experiments. These are the results we next describe; Afterwords, we also describe our additional findings from when running the experiments across NFS.
All the machines we use have a relatively big memory (that is, relative to the size of mazes), which as argued by Borisov et al., works against the attacker (more inodes can reside in core). However, we had appropriate permissions to change the Linux kernel running on the Pentium-III machine to one that only utilizes 256MB of the available memory. Other techniques we have experimented with in an attempt to increase the chances of the attacker to win are to simultaneously run multiple recursive grep-s during attacks in accordance to the suggestion by Borisov et al. [5], to launch attacks from within a huge directory that contains tens of thousands of files in accordance to Maziéres and Kaashoek's suggestion [24], and to simultaneously run several exposed-defenders on the same machine. We found that none of these techniques had a significant affect on the results, and therefore we do not report them here.
Conversely, Wei and Pu have recently shown that simultaneously running multiple identical attackers (attacking the same file) on a multiprocessor system, dramatically increases the chance of a TOCTTOU attack to prevail [38]. This technique turned out to be rather successful (from the attackers' perspective) and is therefore explicitly addressed below.
Recall that the synchronized attacker has two tunable parameters — T1 and T2 — that place an upper bound on the two busy-wait loops the attacker must employ. We have independently set each of these two values to be either zero (no upper bound) or 2j, where j = 0, 1, 2, …, 20 . This means that we conduct 484 (= 222) experiments for any specified number of simultaneous attacker (1-6), amounting to a total of 2,904 runs, per machine.
The probability p to win a round is only one of two factors that determine the expected time Bk until a successful attack, as shown in Equation 1; The other factor is the time t it takes to complete the round, such that the bigger t is, the longer it would take to accomplish a successful attack. Figure 12 plots the values of t and shows that they too can be rather high with top values typically at tens of milliseconds, and outrageously, a few seconds in the case of Sparc/Solaris.
Importantly, the time to complete a round and the probability to win it are far from being independent variables. In fact, as shown in Figure 13, there is a distinct linear connection between the two, which means the bigger the probability to win the round, the longer the round takes. Indeed, this makes perfect sense, as the prime objective of an attacker is to slow down the defender by throwing it into a maze. These are the two opposing side effects of the attacker's actions: maximizing p immediately translates to maximizing t, and so whatever ends up happening, the attacker inevitably contributes, to some extent, to making Bk larger.
Figure 14 assigns the t and p values of each of our experiments into Equation 1 in order to finally compute Bk, namely, the expected number of years an attack should execute until k consecutive rounds are won, for three different k values. When using k = 7 (the value recommended by Dean and Hu [12]) we see that a successful attack is potentially possible after a bit more than a month, in the case of Power4/AIX. Increasing k to be 8 and 9 raises the minimal expected duration to be more than 2.5 and 53 years, respectively, making the latter a safer choice in the face of our theoretical attack.
|
“run some limited experiments attacking files across NFS and observed substantial numbers of successes. We chose not to continue these experiments, however, because NFS-accessed files are usually not the most security-critical, root privileges typically don't extend across NFS, the data displayed enormous variance depending on network and fileserver load.” [12]
But the set of attack experiments we conducted across NFS reveals that, while individual machines behave differently, the overall conclusion regarding the value of k does not dramatically change. The following table compares between minimal Bk values devised when running the attack on local and a networked filesystems (each table entry is the minimal result obtained across the 2,904 respective runs; values denote years, and, if bigger than 1000, are rounded down to the closest power of ten):
Platform | Local FS | NFS | ||||
k=8 | k=9 | k=10 | k=8 | k=9 | k=10 | |
SPARC Solaris 8 | 5.8 | 103 | 103 | 0.3 | 2.6 | 21 |
P-III Linux 2.4 | 109 | 1011 | 1013 | 0.1 | 0.8 | 5.8 |
Power4 AIX 5.3 | 2.5 | 53 | 951 | 108 | 1011 | 1013 |
AMD Linux 2.6 | 103 | 104 | 106 | ∞ | ∞ | ∞ |
Intel Linux 2.6 | 106 | 108 | 109 | 9.9 | 129 | 103 |
We see that machines can become less or more vulnerable to the hypothetical attack when it is conducted across NFS. The Pentium-III machine demonstrates the most notable change, being the least susceptible to the attack within a local file system (see also Figure 14) and becoming the most vulnerable with NFS. Conversely, with the Power4 machine, it's exactly the opposite, as it transitioned from being the most vulnerable to being nearly the least, second to only the AMD machine for which no attacker wins were observed with NFS.
Nevertheless, programmers can not be expected to tailor a CKR for every legitimate check-use scenario. We therefore aspire to devise a generic utility function that can e.g., be added to libc. A first immediate step is to convert our access_open into a check_open function, by allowing the caller to pass the check operation as a pointer-to-function argument (getting an atom hardlink filename and returning zero upon success.) This operation would replace the call to access in Figure 7, allowing programmers to pass along access, or stat, or any other conceivable filename check operation they may require.
Note that the focus on open as the 'use' operation is not as limited as might initially seem: Recall that bindings of file descriptors to file objects are immutable and therefore completely immune from TOCTTOU attacks. Thus, once a valid file descriptor is safely opened and returned, the programmer can securely use the wealth of system calls that operate on file descriptors (fchown, fchmod, fchdir, fstat, ftruncate, etc.), rather than their respective insecure TOCTTOU-prone counterparts that operate on file names (chown, chmod, chdir, stat, truncate etc.).
In addition to the filepath, check_use would get four pointer-to-function arguments Fchkdir, Fchklink, Fchklast, and Fuselast. The first three are 'check' operations, respectively applied to each directory, symlink, and the last component in the given filepath, at the time the associated atom component is consumed by the path resolution traversal. Their input arguments are the atom name and the respective 'stat' structure and file descriptor (-1 for symlinks); their return value is zero to indicate the path-resolution may continue, or nonzero to indicate it should fail. The Fuselast encapsulates the 'use' operation, but otherwise has the same input and output as of the 'check' operations. All operations are invoked while the working directory of check_use is that of the atom that is currently being processed. Finally, the return value of check_use is the return value of the last operation that has failed, or that of Fuselast if all other operations succeeded.
With this design it is trivial to solve e.g., the race in Table 1. The garbage collector defines Fchkdir and Fchklast to always return 0, Fchklink to always return -1, and Fuselast to unlink the atom file; thus, any symlink that is encountered along the way would make check_use fail, thereby insuring all deleted files are under the /tmp/ directory, as required. Importantly, it does not matter whether the last (unlinked) atom is juggled by the attacker (symlink/hardlink to some sensitive file), as in this case the outcome would merely be that some link created by an attacker is deleted, a fact that does not affect the target file.
A facility similar to the check_use function suggested above, if made a standard library function, would serve three purposes. First, it will allow programmers and designers to make conscientious decisions regarding the efficiency-safety tradeoff, e.g., between insecurely opening a file with a single open call, or doing it in user-mode, component by component, while enforcing repeated credential checks to avoid TOCTTOU races, or maybe making the effort to develop another alternative. Second, a well-designed check_use facility would encapsulate the execution of vulnerable check-use pairs. When the time comes and e.g. transactional filesystems (or other relevant improvements) are made more prevalent, the internal implementation can be replaced with a more efficient alternative. Thirdly, the inclusion of a check_use routine in the standard API would serve educational purposes, as new programmers get familiar with the API and through it become aware of the TOCTTOU problem.
The POSIX API is broken: Its semantics inherently promote TOCTTOU races between check-use operations and make systems vulnerable to malicious attacks. Existing solutions can help locate these problems, but otherwise relate to future non-prevalent systems, leaving programmers to individually come up with solutions from scratch, to numerous variants of what is provably a hard and elusive problem. We suggest to alleviate the situation by providing programmers with standard generic abstractions that effectively bind check-use pairs into a single pseudo-atomic transaction. We further show that this goal can be obtained, to a large extent, in a portable manner, in user-mode, without changing the kernel.