Prior to website authentication tools gaining popularity, Whalen and Inkpen conducted a study to evaluate web browser security indicators [15]. They collected data on which indicators users considered when evaluating a webpage’s security by asking participants to perform a set of tasks while focusing on the each website’s security. Most participants checked for either the lock icon or https in the URL bar, but few checked for or understood certificates. Similarly, in an effort to understand why phishing attacks are successful, Dhamija et al. measured how users evaluate possible phishing websites [6]. Participants were presented with a series of websites and asked to determine if the site was real or fake. 23% of the participants based their decision solely on indicators they found in the webpage content. Some participants looked for the lock icon but mistook lock images in the webpage content for trusted security indicators.
Web Wallet is an anti-phishing tool that alters how password are entered [16] by providing an interface for entering sensitive information, other than the web form provided by the website. It helps the user by removing the guesswork of which websites have been visited in the past. A usability study showed Web Wallet was effective at helping the participants identify the real website but participants were easily tricked by spoofs of the interface. Wu et al. also evaluated the usability of toolbars to assess if they assisted users in identifying phishing websites [17]. The results of the usability study indicate the toolbars are ineffective in assisting users on well-designed spoofs. Another study evaluating website authentication was Jackson et al.’s evaluation of whether browser indicators of extended validation certificates assisted users in identifying phishing attacks [11]. The results showed new indicators like a green URL bar for an EV certificate did not offer an advantage over the existing indicators.
Schechter et al. [13] conducted an in-lab study to evaluate a website authentication technology where each user has a personalized image. Participants were asked to perform a series of online banking tasks while security indicators were gradually removed. Their results show participants fail to recognize the absence of security indicators, like the SSL lock and HTTPS, and will enter their password in the absence of their personalized image.
Usability study design is a well-studied area [10], however, designing security usability studies creates additional challenges. One issue is how to design a study where the test administrators attack the participants [3]. More recently, usability studies have been designed to evaluate methods of conducting security usability studies. For example, Schechter et al. conducted a between-subject usability study to measure the effect of asking a participant to play a role and use fake credentials rather than their personal information [13]. They found participants who used their real data act more securely during the tasks. To help usability study designers, SOUPS made kits available from the papers in their proceedings [14]. The kits provide usability study material but are fairly specific and reusing the material would require a number of changes.
In RUST, usability is measured by the technology’s spoofability, learnability, and acceptability. Spoofability is an attacker’s ability to trick the participant into entering personal information on an illegitimate website. Learnability is the user’s ability to correctly use the technology with and without instruction. Acceptability is the user’s reaction to the technology; if users do not understand why a security process is necessary they will find ways to break the process [1].
We chose an in-lab study as the method of evaluation so we could attack our participants and measure spoofability without raising ethical concerns [12]. Because we wanted to see how participants would behave under conditions of attack, we did not disclose the purpose of the study beforehand, since doing so would place an unrealistic focus on security. We supplied participants with credentials to eliminate privacy concerns and because users do not already have the necessary credentials to use with the novel technologies we were testing. We asked participants to play a role during the session to justify the use of fake credentials and motivate the participant to act securely.
To evaluate both spoofability and usability, we asked participants to complete tasks at real and spoofed banking websites. Spoofability is measured by the number of successful attacks in a session. To evaluate learnability, four tasks are given before the participant is provided with instructions in the fifth task. This provides the chance to gather data on the technology’s ease of use prior to the participant reading documentation. The acceptability of the technology is based purely on a participant’s subjective opinion and cannot be measured through direct observation. Instead, we collected feedback through questions, using Likert scales to classify their reactions, and open-ended questions to comment on their thoughts during the session.
We designed the study as a within-subject study, where each participant is given the same set of tasks under the same conditions. We collected session data by the test harness and through self-reported feedback. Before beginning the study, we gave each participant a demographic survey with questions to gauge their experience with web browsing. We gave them copies of the study instructions, the role they are asked to play, and personal information to accompany the role. The instructions state the goal of the study is to improve online banking. We asked participants to imagine that they have an uncle who is in the hospital for an unexpected extended period of time and needs someone to assist in managing his finances. In addition, we asked the participant to act normally and to treat their uncle’s information as they would their own.
During a session, we sent the participant eight emails, each of which contains a task and a link to the website where they should complete the task. Four of the emails are phishing attacks and direct the participant to an illegitimate site. The other four emails direct the participant to a real financial institution’s website. Some of them are requests from Uncle John for a specific action to be taken, and one is an email from the bank introducing the new technology and providing basic instructions. The first task directs the participant to the real site, and allows them to experience the technology working properly before an attempted spoofing attack. Between each of the tasks, we asked the participants to comment on their experience if they completed the task, or why they decided not to complete the task for any reason. After the tasks are completed we asked participants to express their opinion of the technology in a post-study questionnaire.
The test harness component of the RUST testbed creates a transparent testing environment that can be easily configured for different technologies, thus allowing for a simplified process for conducting usability evaluations. The use of proxies and logging tools allow for data to be collected on the participant’s actions while providing the environment for serving the necessary study webpages. RUST collects data by monitoring the URLs requested while keeping track of the order that pages are accessed, which indicates whether a phishing attack was successful. Additionally, the test harness monitors the time spent on a webpage. Time monitoring is important for a technology with multiple steps or pages because the time spent on a page may indicate possible problems in the login sequence. To monitor the order the webpages are visited and the time spent on each page, a JavaScript beacon is inserted on each test webpage the participant may visit. The test harness also includes scripts to convert the logging output into a more manageable format. A Python based script is used to send MIME based emails during the test session. Also, a simple shell script is included in the testbed to send specific emails to participants at fixed time intervals during the session. The complete details of the design are contained in [4].