Should Security Researchers Experiment More and Draw More Inferences? Back to Program
Two methodological practices are well established in other scientific disciplines yet remain rare in computer-security research: comparative experiments and statistical inferences. Comparative experiments offer the only way to control factors that might vary from one study to the next. Statistical inferences enable a researcher to draw general conclusions from empirical results.
Despite their widespread use in other sciences, these practices are haphazardly used in security research. Choosing keystroke dynamics as an example to study, we survey the literature. Of 80 papers wherein these practices would be appropriate, only 43 (53.75%) performed comparative experiments, and only 6 (7.5%) drew statistical inferences.
In disciplines such as medicine, comparative experiments and statistical inferences save lives and cut costs. Rigorous methodological standards are required. We see no reason why security research, another discipline where the stakes are critically high, cannot or should not adopt these practices as well. Failure to take a more scientific approach to security research stalls progress and leaves us vulnerable.
No Plan Survives Contact: Experience with Cybercrime Measurement Back to Program
An important mode of empirical security research involves analyzing the behavior, capabilities, and motives of adversaries. By definition, such measurements cannot be conducted in controlled settings and require "engagement" directly with adversaries, their infrastructure or their ecosystem. However, the operational complexities required to successfully carry out such measurements are significant and rarely documented; blacklisting, payment instruments, fraud controls and contact management all represent real challenges in such studies. In this paper, we document our experiences conducting such measurements over five years (covering a range of distinct studies) and distill effective operational practices for others who might conduct similar experiments in the future.
Salting Public Traces with Attack Traffic to Test Flow Classifiers Back to Program
We consider the problem of using flow-level data for detection of botnet command and control (C&C) activity. We find that current approaches do not consider timing-based calibration of the C&C traffic traces prior to using this traffic to salt a background traffic trace. Thus, timing-based features of the C&C traffic may be artificially distinctive, potentially leading to (unrealistically) optimistic flow classification results. In this paper, we show that round-trip times (RTT) of the C&C traffic are significantly smaller than that of the background traffic. We present a method to calibrate the timing-based features of the simulated botnet traffic by estimating eligible RTT samples from the background traffic. We then salt C&C traffic, and design flow classifiers under four scenarios: with and without calibrating timing-based features of C&C traffic, without using timing-based features, and calibrating C&C traffic only in the test set. In the flow classifier, we strive to use features that are not readily susceptible to obfuscation or tampering such as port numbers or protocol-specific information in the payload header. We discuss the results for several supervised classifiers, evaluating botnet C&C traffic precision, recall, and overall classification accuracy. Our experiments reveal to what extent the presence of timing artifacts in botnet traces leads to changes in classifier results.
Beyond Simulation: Large-Scale Distributed Emulation of P2P Protocols Back to Program
This paper presents details on the design and implementation of a scalable framework for evaluating peer-to-peer protocols. Unlike systems based on simulation, emulation-based systems enable the experimenter to obtain data that reflects directly on the concrete implementation in much greater detail. This paper argues that emulation is a better model for experiments with peer-to-peer protocols since it can provide scalability and high flexibility while eliminating the cost of moving from experimentation to deployment. We discuss our unique experience with large-scale emulation using the GNUnet peer-to-peer framework and provide experimental results to support these claims.
Automating Network Monitoring on Experimental Testbeds Back to Program
Despite experimental testbeds' rapid growth and continued strong demand by researchers, the power of testbeds can be further increased by providing additional tools to help experimenters instrument their experiments. Experimenters with improved instrumentation support can deepen their understanding of experiment operation, and have an easier task of generating high quality datasets to share with the community.
We introduce a prototype tool that automatically deploys an instrumentation overlay on an existing testbed experiment. Netflowize modifies instantiated experiments to collect experiment-wide flow statistics. The resources consumed by the flow collection process are specified by the experimenter. NetFlow records are widely used by the networking and security research communities for tasks ranging from traffic engineering to detecting anomalous behaviors associated with zero-day attacks. We discuss tool design and implementation, present usage examples, and highlight the many challenges of auto-deploying an experiment-wide monitoring infrastructure.
Challenges in Experimenting with Botnet Detection Systems Back to Program
In this paper, we examine the challenges faced when evaluating botnet detection systems. Many of these challenges stem from difficulties in obtaining and sharing diverse sets of real network traces, as well as determining a botnet ground truth in such traces. On the one hand, there are good reasons why network traces should not be shared freely, such as privacy concerns, but on the other hand, the resulting data scarcity complicates quantitative comparisons to other work and conducting independently repeatable experiments.
These challenges are similar to those faced by researchers studying large-scale distributed systems only a few years ago, and researchers were able to overcome many of the challenges by collaborating to create a global testbed, namely PlanetLab. We speculate that a similar system for botnet detection research could help overcome the challenges in this domain, and we briefly discuss the associated research directions.
ExperimenTor: A Testbed for Safe and Realistic Tor Experimentation Back to Program
Tor is one of the most widely-used privacy enhancing technologies for achieving online anonymity and resisting censorship. Simultaneously, Tor is also an evolving research network on which investigators perform experiments to improve the network's resilience to attacks and enhance its performance. Existing methods for studying Tor have included analytical modeling, simulations, small-scale network emulations, small-scale PlanetLab deployments, and measurement and analysis of the live Tor network. Despite the growing body of work concerning Tor, there is no widely accepted methodology for conducting Tor research in a manner that preserves realism while protecting live users' privacy. In an effort to propose a standard, rigorous experimental framework for conducting Tor research in a way that ensures safety and realism, we present the design of ExperimenTor, a large-scale Tor network emulation toolkit and testbed. We also report our early experiences with prototype testbeds currently deployed at four research institutions.
On the Design and Execution of Cyber-Security User Studies: Methodology, Challenges, and Lessons Learned Back to Program
Real-world data collection poses an important challenge in the security field. Insider and masquerader attack data collection poses even a greater challenge. Very few organizations acknowledge such breaches because of liability concerns and potential implications on their market value. This caused the scarcity of real-world data sets that could be used to study insider and masquerader attacks. Moreover, user studies conducted to collect such data lack rigor in their design and execution. In this paper, we present the methodology followed to conduct a user study and build a data set for evaluating masquerade attack detection techniques. We discuss the design, technical, and procedural challenges encountered during our own masquerade data gathering project, and share some of the lessons learned from this several-year project.
Experimental Challenges in Cyber Security: A Story of Provenance and Lineage for Malware Back to Program
Rigorous experiments and empirical studies hold the promise of empowering researchers and practitioners to develop better approaches for cyber security. For example, understanding the provenance and lineage of polymorphic malware strains can lead to new techniques for detecting and classifying unknown attacks. Unfortunately, many challenges stand in the way: the lack of sufficient field data (e.g., malware samples and contextual information about their impact in the real world), the lack of metadata about the collection process of the existing data sets, the lack of ground truth, the difficulty of developing tools and methods for rigorous data analysis.
As a first step towards rigorous experimental methods, we introduce two techniques for reconstructing the phylogenetic trees and dynamic control-flow graphs of unknown binaries, inspired from research in software evolution, bioinformatics and time series analysis. Our approach is based on the observation that the long evolution histories of open source projects provide an opportunity for creating precise models of lineage and provenance, which can be used for detecting and clustering malware as well.
As a second step, we present experimental methods that combine the use of a representative corpus of malware and contextual information (gathered from end hosts rather than from network traces or honeypots) with sound data collection and analysis techniques. While our experimental methods serve a concrete purpose—understanding lineage and provenance—they also provide a general blueprint for addressing the threats to the validity of cyber security studies.
Active Learning with the CyberCIEGE Video Game Back to Program
Hands-on exercises promote active learning where student experience reinforces material presented in lectures or reading assignments [1]. Drawing the student into a meaningful context where student decisions have clear consequences strengthens the learning experience and thus improves the potential for internalization of knowledge. The CyberCIEGE video game was designed to confront students with computer security decision points within an environment that encourages experimentation, failure and reflection. The game includes over twenty scenarios that address a range of computer and network security concepts. CyberCIEGE is extensible through use of a scenario development language that allows instructors to create and customize game scenarios. The Naval Postgraduate School uses the game in our Introduction to Computer Security course, and it has been used by hundreds of educational institutions worldwide. The game's tools allow ongoing experimentation with the student's learning experience. Student assessment is facilitated by log generation, collection and analysis. These logs help the game's developers identify areas within scenarios that may be confusing or may require additional player feedback. Ongoing development is focused on ultimately adapting the game and its student assessment functions for deployment in a broader range of formal education environments.
Investigating Energy and Security Trade-offs in the Classroom with the Atom LEAP Testbed Back to Program
We recently used the Atom LEAP as the foundation for CS 188, an undergraduate research seminar investigating potential trade-offs between security and energy consumption in a hypothetical, battery-powered tablet device. Twenty-three students, in five groups, researched the energy costs of full disk encryption, network cryptography, and sandboxing techniques, as well as the potential savings from two concepts: offloading security computation, and enabling user-level applications to modulate their security behavior based on battery capacity and environmental security. The Atom LEAP is an exciting and powerful tool. A self-contained energy measurement platform, it can generate 10,000 component-level power samples per second during runtime. The Atom LEAP synchronizes individual samples to the time stamp counter of the Intel Atom CPU, allowing us to measure small code segments in the kernel or in user space. The success of CS 188 was possible because of the Atom LEAP's unique capabilities and ease of use. Following the success of the class, we are working to improve the hardware and software tools, in the hope that the Atom LEAP might someday become a widespread tool for energy research and education.
Experiences in Cyber Security Education: The MIT Lincoln Laboratory Capture-the-Flag Exercise Back to Program
Many popular and well-established cyber security Capture the Flag (CTF) exercises are held each year in a variety of settings, including universities and semi-professional security conferences. CTF formats also vary greatly, ranging from linear puzzle-like challenges to team-based offensive and defensive free-for-all hacking competitions. While these events are exciting and important as contests of skill, they offer limited educational opportunities. In particular, since participation requires considerable a priori domain knowledge and practical computer security expertise, the majority of typical computer science students are excluded from taking part in these events. Our goal in designing and running the MIT/LL CTF was to make the experience accessible to a wider community by providing an environment that would not only test and challenge the computer security skills of the participants, but also educate and prepare those without an extensive prior expertise. This paper describes our experience in designing, organizing, and running an education-focused CTF, and discusses our teaching methods, game design, scoring measures, logged data, and lessons learned.
|