FAST '10 Tutorials

TUTORIAL PROGRAM

Overview | Tutorial Descriptions

Tuesday, February 23, 2010

Half-Day Morning Tutorial (9:00 a.m.–12:30 p.m.)

T2 Solid-State Storage: Technology, Design, and Application UPDATED!
Richard Freitas and Larry Chiu, IBM Almaden Research Center

Most system designers dream of replacing slow, mechanical storage (disk drives) with fast, non-volatile memory. The advent of inexpensive solid-state disks (SSDs) based on flash memory technology and, eventually, on storage class memory technology is bringing this dream closer to reality.

This tutorial will briefly examine the leading solid-state memory technologies and then focus on the impact the introduction of such technologies will have on storage systems. It will include a discussion of SSD design, storage system architecture, applications, and performance assessment.

Richard Freitas is a Research Staff Member at the IBM Almaden Research Center. Dr. Freitas received his PhD in EECS from the University of California at Berkeley in 1976. He then joined IBM at the IBM T.J. Watson Research Lab. He has held various management and research positions in architecture and design for storage systems, servers, workstations, and speech recognition hardware at the IBM Almaden Research Center and the IBM T.J. Watson Research Center. His current interest lies in exploring the use of emerging nonvolatile solid state memory technology in storage systems for commercial and scientific computing.

Larry Chiu is Storage Research Manager and a Senior Technical Staff Member at the IBM Almaden Research Center. He co-founded the SAN Volume Controller product, a leading storage virtualization engine which has held the fastest SPC-1 benchmark record for several years. In 2008, he led a research team in the US and in the UK to demonstrate one million IOPS storage system using solid state disks. He is currently working on expanding solid state disk use cases in enterprise system and software. He has an MS in computer engineering from the University of Southern California and another MS in technology commercialization from the University of Texas at Austin.

Half-Day Afternoon Tutorials (1:30 p.m.–5:00 p.m.)

T3 Storage and Network Deduplication Technologies NEW!
Michael Condict, NetApp

Economic and environmental concerns are currently motivating a push across the computing industry to do more with less: less energy and less money. Deduplication of data is one of the most effective tools to accomplish this. Removing redundant copies of stored data reduces hardware requirements, lowering capital expenses and using less power. Avoiding sending the same data repeatedly across a network increases the effective bandwidth of the link, reducing networking expenses.

This tutorial will provide a detailed look at the multitude of ways deduplication can be used to improve the efficiency of storage and networking devices. It will consist of two parts.

The first part will introduce the basic concepts of deduplication and compare it to the related technique of file compression. A taxonomy of basic deduplication techniques will be covered, including the unit of deduplication (file, block, or variable-length segment), the deduplication scope (file system, storage system, or cluster), in-line vs. background deduplication, trusted fingerprints, and several other design choices. The relative merits of each will be analyzed.

The second part will discuss advanced techniques, such as the use of fingerprints other than a content hash to uniquely identify data, techniques for deduplicating across a storage cluster, and the use of deduplication within a client-side cache.

Michael Condict received his BS in mathematics at Lehigh University in 1976 and an MS in computer science from Cornell University in 1981. He worked on the first Ada compiler while a research scientist at New York University, investigated circuit-design languages at Bell Labs, Murray Hill, and contributed to the Amoeba OS project under Andrew Tannenbaum at the Free University, Amsterdam. Returning to industry, he spent seven years in the Open Software Foundation Research Institute, helping to design and build the Mach micro-kernel-based version of OSF/1 and also OSF/AD, the version that ran on several commercial massively parallel computing systems. Following this he joined several startups, including InfoLibria (Web caching), Oryxa (component-based storage programming), and BladeLogic (data-center automation), the last of which went public in the summer of 2007. Currently he is a member of the Advanced Technologies Group at NetApp, where his research interests include deduplication and the innovative use of flash technology.

T4 Clustered and Parallel Storage System Technologies UPDATED!
Marc Unangst, Panasas

Cluster-based parallel storage technologies are now capable of delivering performance scaling from 10s to 100s of GB/sec. This tutorial will examine state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial has two main sections. In the first section, we will describe the architecture of clustered, parallel storage systems, including the Parallel NFS (pNFS) and Object Storage Device (OSD) standards. We will compare several open-source and commercial parallel file systems, including Panasas, Lustre, GPFS, and PVFS2. We will also discuss the impact of solid state disk technology on large-scale storage systems. The second half of the tutorial will cover performance, including what benchmarking tools are available, how to use them to evaluate a storage system, and how to optimize application I/O patterns to exploit the strengths and weaknesses of clustered, parallel storage systems.

Marc Unangst is a Software Architect at Panasas, where he has been a leading contributor to the design and implementation of the PanFS distributed file system. He represents Panasas on the SPEC SFS benchmark committee and he authored draft specification documents for the POSIX High End Computing Extensions Working Group (HECEWG). Previously, Marc was a staff programmer in the Parallel Data Lab at Carnegie Mellon, where he worked on the Network-Attached Storage Device (NASD) project. He holds a BS in electrical and computer engineering from Carnegie Mellon.

Need help? Use our Contacts page.

Last changed: 16 Feb. 2010 jp