LISA '05 Paper [LISA '05 Technical Program]

Visualizing NetFlows for Security at Line Speed: The SIFT Tool Suite

William Yurcik - National Center for Supercomputing Applications (NCSA)
Pp. 169-176 of the Proceedings of LISA '05: Nineteenth Systems Administration Conference,
(San Diego, CA: USENIX Association, December 2005).

Abstract

The first step in improving Internet security is measurement - security events must be made visible. The irony in making this happen is that there is no lack of security measurement data, in fact, quite the opposite. However, making security manifest faces a major challenge: the large volume and multi-dimensional nature of security data typically obscures valuable security events. NCSA has developed a suite of tools that solves this problem and is making this software available to the Internet community.

We present two visualization tools, [Note 1] (1) NVisionIP and (2) VisFlowConnect-IP. Both of these tools have been developed based on system administrator requirements, their design peer-reviewed in security research forums, and usability testing is in process. These tools both present large volume complex data transparently to system administrators in simple intuitive visual interfaces that support human cognitive processes. NVisionIP visually represents the state of all IP addresses on large networks on a single screen window (we use a Class B address space as the default) with capabilities to filter and drill-down to subnets and individual machines for details-on-demand. VisFlowConnect-IP visually represents flows between internal network IP hosts and the Internet showing who is connecting with whom with capabilities to filter and drill-down to subnets and individual machines for details-on-demand. NVisionIP and VisFlowConnect-IP can be used individually or in unison for correlating events. This work is distinguished from others in that these are the first Internet security visualization tools to be freely available on the Internet and deployed in large production environments.

Introduction

Organizations use computer network infrastructures that hold a vast amount of information for system administrators and security engineers. There are typical logs common to most computer networks, but the systems are often large and dynamic, making it difficult to extract knowledge from the sea of information. Individually, each system log can be massive, causing operator overload. When overload occurs, security events can slide by unnoticed. Overload can also cause operators to disregard alarms due to high false positive rates. Even in homogeneous infrastructures, solutions from a single vendor fail to scale to medium or large networks. However, the problem is compounded because most organizations have network infrastructures from multiple vendors.

We have developed Security Incident Fusion Tools (SIFT) [9], an integrated suite of tools for evaluating the security of an entire computer network on a single screen. We address the need to discover security incidents that currently go undetected by security operations systems. Specifically two SIFT tools, (1) NVisionIP and (2) VisFlowConnect-IP, leverage human visual cognitive abilities to process log data into knowledge for situational awareness of network security. It is estimated that human beings can visually process a screen of information at 150 Mbits per second [10], with the ability to discriminate relatively minor shifts in color, shape, and motion. By presenting network data visually, it can be scanned quickly, patterns in complex data rise to the surface, and inferences become intuitive. Once a security professional becomes familiar with the normal appearance of the network being monitored, it is much easier to spot attacks including new so-called "zero-day attacks." The tools are designed to give security engineers situational awareness of an entire network in order to help them determine when a network is under attack, what is being attacked, and what form the attack is taking.

The remainder of the paper is organized as follows: The next section discusses NetFlows data management and introduces the first tool in the SIFT suite: CANINE. Subsequently we present the two SIFT visualization tools - NVisionIP and VisFlowConnect-IPand close with a summary and on-going future work.

Data Management

NetFlows Source Data

While this paper focuses on visualization, we would be remiss if we did not address data management since it is arguably the greatest obstacle in realizing any scalable visualization system. We address the challenge of processing high-bandwidth data streams by instrumenting networks with distributed NetFlows sensors and then combining this sensor data into a unified format. While in the recent past NetFlows were solely router-based, PC-based NetFlow sensors (Argus) make this a feasible solution for most organizations. The first tool in our suite is a NetFlows converter/anonymizer called CANINE which can handle different NetFlows formats so independent implementations can be interoperable with SIFT visualization tools. NetFlow logs have proven to be the appropriate granularity to process heavily loaded networks and high bandwidth connections (Gb/s) in near-real-time (five minute monitoring windows).

A network flow is defined as a sequence of packets that are transferred between two endpoints within a certain time interval. The endpoints are identified at the network layer by IP addresses and at the transport layer by port numbers. In addition to data format differences, there are other interoperability problems in practical NetFlows implementations:

Cisco NetFlows are defined as unidirectional and generated through intelligent flow cache management, which contains a set of specialized algorithms [4].
Argus NetFlows are defined as bidirectional containing two distinct sub-flows, one in each direction [2].
Cisco and Argus NetFlow formats have different fields (e.g., flags etc.) [3, 5].

For a more detailed comparison between different NetFlows formats see [15].

CANINE

With the increased use of NetFlows for security monitoring and the fact that NetFlows come in different and incompatible formats, we have developed CANINE (Converter and ANonymizer for Investigating Netflow Events) [7, 8] which can be downloaded from https://security.ncsa.uiuc.edu/distribution/CanineDownLoad.html. CANINE allows tools designed for a specific type of NetFlows to be interoperable with any NetFlow format. CANINE consists of the two main modules: (1) the CANINE GUI and (2) the conversion/anonymization engines. For the purposes of this paper we will only discuss the conversion engine (for information about the anonymization engine see [7, 8]). The CANINE GUI accepts user input to identify the NetFlow file for conversion, sends the request to the processing engine which performs the conversion to the newly specified output file, and lastly summarizes the results of the performed actions in a pop-up window. At present CANINE supports conversion to/from Cisco version 5/7, Argus, NFdump, and our own NCSA internal NetFlows format. Future formats to be included in CANINE include Cisco version 9 and the future IETF IPFIX standard.

Network Instrumentation

With the development of high-speed network infrastructure has also come the need for high-speed security - security at line speed - for current 2005 networks this is 4 GB/s at the edge and higher within the core [15]. Unfortunately, high network bandwidths present special problems for security monitoring.

The first challenge is the streaming nature of security sensors. It is important to note that security sensors generate streaming data and not batch log files. Since streaming analysis is an open research question, security systems typically create batch log files by collecting streaming data over defined time periods. However, depending on the network size and traffic volume these log files can become large and difficult to handle. Tuning is required to determine the best time period of analysis to match the preferred log size to the network size and traffic volume. Creating logs over longer time intervals may risk losing NetFlows records upon high transmission rates from overflow or blocking.

The second challenge is observation point. Security cannot be measured where it is not observed thus sensors need to be placed to cover the entire network space. Typical deployment for NetFlows includes the border router for Internet traffic and Argus sensors for internal network observation. There are blind spots from VLANs and switched networks which do not leave IP (network layer) traces - future sensors based on S-Flows are developing to address this gap.

The third challenge is CPU speed to generate and process NetFlows at line speed. As routers have increased speed, monitoring techniques have shifted to sampling NetFlows. While sampling is statistically sufficient for network planning, it is not a good idea for security analysis. NetFlow records are created by sampling packets (not flows), letting the majority of the packets go unnoticed, which may lead to missing important security events. A possible justification for sampling is that an attack may be high traffic volume, at least part of which may be captured with high probability (such as high-volume denial-of-service attack or indiscriminate scanning by propagating worms and viruses). A preferred approach we recommend for security at line speed is the parallel processing NetFlows in a distributed manner. Instead of instrumenting only the high-speed border router that may only be able generate sampled NetFlows, instead instrument all the routers feeding into the border router. This technique effectively relieves the load on each flow collector so that it will not be over subscribed. The drawback is that multiple flow collectors are required and NetFlows records from different routers must be merged to eliminate duplicate flows (the same flow that passes through multiple routers).

NetFlows Visualization Tools

Design By Requirements

We firmly believe that the first step to improve Internet security is by measurement. Measurement allows one to accurately assess the degree of the problem at a specified time and then further measurements track whether solutions are having the desired effect. However, not all measurements are equal, users have a mental model based on experience and tools should be designed to enhance and augment these mental models for the most effective results [16, 17, 18].

For this work, we did two important things often neglected from security tool design: (1) taking time to work with security engineers in their operational production environment in order to learn their mental models and thus tool requirements and (2) the capability to design new visualization models from scratch to meet these requirements without having to incorporate legacy constructs. The results have been very satisfying in that most security engineers who view our visualization tools for the first time immediately begin inferring hypotheses based on the content displayed.

To briefly summarize the major findings from our requirements analysis there are two primary findings. First, security engineers need to answer questions such as these posed by upper management: What is the state of the network? Is the network being attacked? How is the network being attacked? Who is attacking the network? While these may appear to be basic questions, the answers are not immediately available using current security tools and when available after much analysis the answers are complex. Visualization provides a rich representation to help answer these questions concisely.

Second, security engineers have mental models based on their experience with the network infrastructure, knowledge of people within the organization, and security expertise learned over many years. While most tenets of information visualization design are useful in designing within our specific security domain, we did find that leveraging the mental model of security engineers caused us to break some of these consensus rules (after much consternation). Instances when the security engineer mental model overrides information visualization design best practices are highlighted in our discussion of each of our visualization tools.

NVisionIP

Our first and most mature security visualization tool is NVisionIP [1, 6] which we designed to answer the question: What is the state of the network?

Figure 1 shows the Galaxy view of NVisionIP which can be downloaded from https://security.ncsa.uiuc.edu/distribution/NVisionIPDownLoad.html. The Galaxy view represents an entire Class B IP address space (in this single window!) as a matrix with subnets along the horizontal axis and hosts along the vertical axis. Each IP address is represented as a dot (actually four pixels) and the state of each IP address is represented with color or shape as determined by the user in the color and shape legend. Two magnification options are available to see the IP addresses: linear and fisheye.

NVisionIP allows the security engineer is to load one (or multiple) NetFlow files and perform visual queries. NVisionIP has taken all the possible NetFlow database query combinations and hard coded them into the tool as drop-down and point-and-click commands. A user would typically start with primary queries such as how bytes per IP address or how many connections per IP address. A filter then allows the user to select secondary queries to view only source or destination traffic, different protocol (IP, UDP), and different ports (destination or source ports, specific ports or collections of ports) or any combination thereof.

At the Galaxy view, NVisionIP can identify large or small levels of traffic as measured in bytes (based on expectations for the class of machine - laptop or server). This may indicate malware is being served to/from a machine or the machine is involved in a denial-of-service event. Worm and virus scans as indicated by number of connections can also be easily detected based on variance from expected levels.

There are aspects of the Galaxy view design that are contrary to information visualization best practices: the IP address space is laid out logically in matrix space without organizing IP addresses into known classes or enlarging the part of the IP address space with more activity (thus patches of white space or inactive IP address space appears). This design was intentional to retain security engineer knowledge of the IP address space based on logical numbering for subnets/hosts and mental mapping between logical addresses (e.g., cluster compute nodes with contiguous IP addresses) and physical locations (IP subnets are usually physically located in the same area such as a building floor etc.). The white space of inactive IP addresses actually has other advantages and is not wasted space - any traffic activity shown there is anomalous (unallocated address space that should have no legitimate traffic).

While an overall view is important, it is of limited use without the ability to drill down to find more detailed information when something interesting is identified. Figure 2 shows the drill-down levels of NVisionIP which are activated with a mouse click and a drag over a region of interest. These levels are the (1) Small Multiple View and (2) Machine View.

The Small Multiple View allows the user to quickly scan and compare traffic activity across subnets on many machines simultaneously. Each machine is a box with two sets of histograms, an upper set of histograms representing traffic on well-known ports and a lower set of histograms representing traffic on ports over 1024. The well-known ports are color-coded in a user legend. The ports over 1024 are ordered from most active to least active (top N ports). Note that no numbers are shown in the small multiple view, this view is designed for the user to identify activity of interest and then drill-down for raw data details on-demand.

Figure 1: NVisionIP Galaxy view of an entire Class B IP address space.

Figure 2: The three levels of NVisionIP (top to bottom): (1) Machine view, (2) Small multiple view, and (3) Galaxy view.

Figure 3: VisFlowConnect-IP: Main view.

Figure 4: Drill-down layers of VisFlowConnect-IP: left Domain View and Bright Internal View.

At the Small Multiple View, NVisionIP has been used to quickly identify anomalous services that violate security policy such as unauthorized services (web or mail server) and exposed services that need to be patched or otherwise protected due to vulnerabilities.

If a user sees a machine with interesting traffic activity in the Small Multiple View, they may click on that block to drill-down to the Machine View. The Machine View organizes all the data from a particular machine in multiple tabs - each tab showing aggregate activity in an upper histogram and source/destination directional flows in two lower histograms. Note in the Machine View each histogram is fully labeled with port numbers and traffic level. At the lowest level, the raw NetFlows data for that machine is available for inspection in this Machine View as a tab. At this lowest level Machine View, details of most security events can be revealed.

The overall effect of using NVisionIP (with its interactive drill-down levels) is that relationships between aggregate network activity and individual machine activity can be more easily discovered and comprehended by human operators - providing situational awareness of network system state. Future work is progressing on optimizing Galaxy View animation to show IP address state changes over time as well as a difference view to visually compare current network traffic versus benchmark network traffic.

VisFlowConnect-IP

The second security visualization tool in the SIFT tool suite focuses on answering the question: Who is connecting to whom on the network? This basic question has been attempted in the past with topology-based diagrams based on network infrastructure, however, these results were either (1) not providing information relevant to real-time traffic or (2) not scalable since representing network traffic over time produces obscured lines in relatively short order. We solve both of these problems with VisFlowConnect-IP [12, 13, 14] which is available at https://security.ncsa.uiuc.edu/distribution/VisFlowConnectDownLoad.html.

VisFlowConnect-IP is a security visualization tool based on the parallel axes concept drawn from data mining. It is a complementary tool to NVisionIP since it visualizes the same NetFlows source data - the design similarities will become apparent in the following description. VisFlowConnect-IP allows a user to visually assess the connectivity of large and complex networks (in a single window!) by providing a main view of the network with filter and drill-down views that provide more details on-demand. The three views of VisFlowConnect-IP are: (1) Main, (2) Domain, and (3) Internal. The Main View is shown in Figure 3 with the Domain and Internal views shown in Figure 4.

The VisFlowConnect-IP Main View utilizes the parallel axis view with the left-most and right-most vertical axes representing the external domains and the center vertical axis representing host IP addresses within the internal edge network domain (See Figure 3). Lines connecting external domains and internal hosts represent directional data flows, with line darkness being proportional to the logarithm of the volume of data transferred. VisFlowConnect-IP can filter/highlight flows to certain hosts or traffic on specific ports and protocols using a filter drop-down menu and selection boxes on the main view. Ports indicated in the selection boxes are represented in different colors within the network traffic or may be isolated from network traffic for focused analysis. The overall effect is visualization of traffic into-an-edge-network-from-the-Internet and traffic out-from-an-edge-network-to-the-Internet.

Figure 4 shows the two drill-down views within VisFlowConnect-IP. While we would have liked to represent each individual external host IP address connecting into the internal edge network symmetrically on both the left-most and right-most axes, this is not possible due to scalability. Preliminary measurements of NCSA's network showed over 100,000 different IP addresses commonly appeared in the NetFlow files we wished to visualize and this is too many for the vertical line pixel space of a single window without scrolling. Instead we implemented a drill-down Domain View which is invoked by the user clicking on a drop-down menu while having an external domain highlighted on the vertical external domain axis. The resulting Domain View is a mirror image of the Main View except it only shows traffic within the highlighted external network domain to/from the internal edge network. This has turned out to be very valuable since typically hackers "own" entire subnets or even "own" entire network domains so it is common to see malicious activity captured within a Domain View.

Figure 4 also shows the drill-down Internal View which is invoked as a toggle button on the Main View. While monitoring for external Internet hacker activity is sexy, we have found this Internal View very useful since it shows only traffic that both sources and sinks within the internal edge network. There are only two vertical lines in this view, internal edge network IP addresses are ordered symmetrically in a mirror image on the left-most and right-most axes (no middle axis). This Internal View has helped security engineers determine important security events like the initial source of a worm infection which infiltrated the edge network from the inside, and the insider attacks from those misusing privileged access.

The VisFlowConnect-IP Main View has a time axis at the bottom which is used to solve the scalability problem we referred to as the major challenge for this tool. The user loads a NetFlow file for visualization and then may select multiple filters to determine how this traffic is to be represented in animation including intensity, byte size, and a sliding time window. The sliding time window provides scalability by only representing traffic within the window and ignoring traffic outside the window. Thus the sliding time window can be adjusted to any size network and any traffic volume - the general rule for clear viewing is the more traffic the smaller the sliding time window. The window size itself is represented to the user by a red box (where the length of the red box is proportional to window size) that travels along the time axis as the traffic is animated (as shown in the Domain and Internal Views within Figure 4).

VisFlowConnect-IP has also implemented a filter language using real expressions that is beyond the scope of this paper [12]. With this filter language capability, VisFlowConnect-IP can create mechanisms for storing/retrieving filter profiles. These profiles can store customized filters that remove "uninteresting" information from view-thus leaving only the more security relevant data to be displayed.

Summary

Visualization is the future of security monitoring and NetFlows are the source data for high-speed networks. In this paper we marry security visualization with NetFlows by presenting the SIFT suite of tools along with accompanying techniques for security at line speed. The goal is to enable security engineers to go beyond binary/text command line log file analysis toward real-time network security situational awareness. A growing community of researchers has formed on security visualization, see [11] for more information.

The three specific tools of the SIFT suite presented in this paper (CANINE, NVisionIP, and VisFlowConnect-IP) are available for download at the URLs provided in the text. We are currently conducting usability tests with human subjects to quantify the utility of these tools and preliminary results from these tests are very promising. We intend to go open source with these tools after the software is stable, at present we are still developing the software with new versions posted on the corresponding webpages. We enthusiastically invite feedback from users about the use of these tools.

Author Biography

William (Bill) Yurcik is currently Manager, Security R&D and Senior Systems Security Engineer at NCSA. Prior to this he was Head of Security Operations at NCSA, so he has both a theoretical and practical background in computer network security. Prior to joining NCSA he has 12 years of professional experience as a Network Engineer for large networks (Naval Research Laboratory, NASA, Verizon, and MITRE). He is a graduate of Johns Hopkins University (MS Electrical Engineering 1990, MS Computer Science 1987), the University of Maryland (BS Electrical Engineering 1984), and is Ph.D. ABD from the University of Pittsburgh (1994-99). Bill can be reached at byurcik@ncsa.uiuc.edu.

References

[1] Bearavolu, Ratna, Kiran Lakkaraju, and William Yurcik, "NVisionIP: An Animated State Analysis Tool for Visualizing NetFlows," FLOCON, 2005.
[2] Bullard, Carter, Argus, the network Audit Record Generation and Utilization System, https://www.qosient.com/argus/, accessed 26 September, 2005.
[3] Bullard, Carter, Argus Record Format, https://www.qosient.com/argus/argus.5.htm/, accessed 26 September, 2005.
[4] Cisco Systems, Cisco NetFlow Services and Applications White Paper, https://www.cisco.com/warp/public/cc/pd/iosw/ioft/neflct/tech/napps_wp.htm, accessed 26 September, 2005.
[5] Cisco Systems, NetFlow Overview Presentation, https://www.cisco.com/application/vnd.mspowerpoint/en/us/guest/tech/tk362/c1482/ccmigration_09186a0080182b50.ppt, accessed 26 September, 2005.
[6] Lakkaraju, Kiran, William Yurcik, Adam J. Lee, Ratna Bearavolu, Yifan Li, and Xiaoxin Yin, "NVisionIP: NetFlow Visualizations of System State for Security Situational Awareness," CCS Workshop on Visualization and Data Mining for Computer Security (VizSEC/DMSEC), 2004.
[7] Li, Yifan, Adam Slagell, Katherine Luo, and William Yurcik, "CANINE: A Combined Converter and Anonymizer Tool for Processing NetFlows for Security," International Conference on Telecommunication Systems - Modeling and Analysis (ICTSM), 2005.
[8] Luo, Katherine, Yifan Li, Adam Slagell, and William Yurcik, "CANINE: A NetFlows Converter/Anonymizer Tool for Format Interoperability and Secure Sharing," FLOCON, 2005.
[9] SIFT Project Webpage, https://www.ncassr.org/projects/sift/, accessed 26 September, 2005.
[10] Tufte, Edward, A One-Day Course: Presenting Data and Information, Madision WI, (https://www.edwardtufte.com/tufte/courses, accessed 26 September, 2005), August, 2005.
[11] VizSEC Community Homepage, https://www.ncassr.org/projects/sift/vizsec/, accessed 26 September, 2005.
[12] Yin, Xiaoxin, William Yurcik, and Adam Slagell, "VisFlowConnect-IP: An Animated Link Analysis Tool for Visualizing NetFlows," FLOCON, 2005.
[13] Yin, Xiaoxin, William Yurcik, and Adam Slagell, "The Design of VisFlowConnect-IP: a Link Analysis System for IP Security Situational Awareness," Third IEEE International Workshop on Information Assurance (IWIA), 2005.
[14] Yin, Xiaoxin, William Yurcik, Michael Treaster, Yifan Li, and Kiran Lakkaraju, "VisFlowConnect: NetFlow Visualizations of Link Relationships for Security Situational Awareness," CCS Workshop on Visualization and Data Mining for Computer Security (VizSEC/DMSEC), 2004.
[15] Yurcik, William and Yifan Li, "Case Study: Instrumenting a Network for NetFlow Security Visualization Tools," 21st Annual Computer Security Applications Conference (ACSAC), 2005.
[16] Yurcik, William, Kiran Lakkaraju, James Barlow, and Jeff Rosendale, "A Prototype Tool for Visual Data Mining of Network Traffic for Intrusion Detection," Workshop on Data Mining for Computer Security (DMSEC), 2003.
[17] Yurcik, William, James Barlow, and Jeff Rosendale, "Maintaining Perspective on Who Is The Enemy in the Security Systems Administration of Computer Networks," ACM CHI Workshop on System Administrators Are Users, Too: Designing Workspaces for Managing Internet-Scale Systems, 2003.
[18] Yurcik, William, James Barlow, Kiran Lakkaraju, and Mike Haberman, "Two Visual Computer Network Security Monitoring Tools Incorporating Operator Interface Requirements," ACM CHI Workshop on Human-Computer Interaction and Security Systems (HCISEC), 2003.

Footnotes:
Note 1: Funded in part by grants from the Office of Naval Research (ONR) under the auspices of the Technology Research, Education, and Commercialization Center (TRECC) and the National Center for Advanced Secure Systems Research (NCASSR) both established at NCSA/University of Illinois.

Need help?

Last changed: 11 Nov. 2005 jel