FAST '08 – Abstract
Pp. 111–125 of the Proceedings
Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics
Weihang Jiang, Chongfeng Hu, and Yuanyuan Zhou, University of Illinois at Urbana-Champaign; Arkady Kanevsky, Network Appliance, Inc.
Abstract
Building reliable storage systems becomes increasingly challenging as
the complexity of modern storage systems continues to grow.
Understanding storage failure characteristics is crucially
important for designing and building a reliable storage system. While
several recent studies have been conducted on understanding storage
failures, almost all of them focus on the failure characteristics of
one component - disks - and do not study other storage component failures.
This paper analyzes the failure
characteristics of storage subsystems.
More specifically, we analyzed the storage logs
collected from about 39,000 storage systems commercially deployed
at various customer sites. The data set covers a period of 44 months
and includes about 1,800,000 disks hosted in about 155,000 storage
shelf enclosures. Our study reveals many interesting findings,
providing useful guideline for designing reliable storage systems.
Some of our major findings include: (1) In addition to disk failures
that contribute to 20-55% of storage subsystem failures, other components such
as physical interconnects and protocol stacks also account for
significant percentages of storage subsystem failures.
(2) Each individual storage subsystem failure type and storage subsystem failure as a whole
exhibit strong self-correlations.
In addition, these failures exhibit ``bursty'' patterns. (3) Storage
subsystems configured with redundant interconnects experience 30-40%
lower failure rates than those with a single interconnect. (4)
Spanning disks of a RAID group across multiple shelves provides a more
resilient solution for storage subsystems than within a single
shelf.
- View the full text of this paper in HTML and PDF.
Listen to the presentation in
MP3 format.
The Proceedings are published as a collective work, © 2008 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
|