MSST Conference

Program

May 6^th — 10^th, 2013
The Queen Mary
Long Beach, CA

Massive Data Storage (Tutorials)

Monday, May 6^th, 2013

8:00—9:45 Ceph (Object Storage) (Presentation)

Dr. Sage Weil,

Ceph (Bio)

10:15—12:00 iRods (Federation and Metadata Management) (Presentation)

Reagan Moore,

Renaissance Computing Institute (Bio)

1:00—3:00 End-To-End System Design for Reliability and Resilience (Presentation) (Presentation)

Henry Newman,

Instrumental (Bio)

Bernie Siebers,

NOAA

Abstract: In this tutorial, we discuss how to implement end-to-end reliability and resilience into very large storage systems. We start with the basics, including how applications interface to the operating system and local and global namespace semantics and implementation. Then we cover low-level hardware issues, including storage interfaces like SAS and SATA, RAID and parity checking, checksums and ECC, comparing and contrasting different approaches for achieving resilience in current systems. Then we show specific examples from today's large storage systems where low-level bits can change in a way that remains undetected in the system. Finally, we discuss current efforts in standards like T10 for new interface design and new approaches to system design to address these issues at scale.

3:30—5:30 Using Solr and Cassandra to Implement Big Data Analytics on the Web (Presentation)

Jason Rutherglen,

Datastax

Abstract: In this talk, we discuss the latest releases of Solr and Cassandra and how they can be used to build fast, web-enabled search and analytics software.

Massive Data Storage

Tuesday, May 7^th, 2013

The 1000x Rule: How to Design for Scalability at Internet Scale (Presentation)

Justin Stottlemyer,

Fellow, Storage Architecture, Shutterfly

Abstract: Shutterfly has built a nearly 100-petabyte digital photo archive, and plans to scale it aggressively in the future. In his keynote, Justin will outline the techniques, based on his experiences at Shutterfly, Facebook, eBay and PayPal, that he and his team used to accomplish this, and he will describe the general rules for building scalable web storage systems.

Chair: Matthew O'Keefe,

University of Minnesota

Accelerating Lustre Development (Presentation)

Brent Gorda,

Intel (Bio)

Abstract: In July of last year, Intel acquired Whamcloud, which was the company created by the community to preserve Lustre. In the past three years, the community has come together to ensure Lustre survives and today there is a vibrant ecosystem of developers, maintainers and vendors offering it. As a result, there has been a regular cadence of feature and maintenance releases and building momentum of users of the technology. We will discuss the current state and future direction of this work.

HPSS for Exascale Computing (Presentation)

Jim Gerry,

IBM (Bio)

DOE FastForward Program (Presentation)

Gary Grider,

Los Alamos National Laboratory (Bio)

Abstract: The US DOE Office of Science and National Nuclear Security Administration Exascale activities leading to Exascale class computing in the next decade involves a number of initiatives, including the Fast Forward industry technology concepts funding activity. The current Exascale activities and coordination mechanisms for those activities will be explained, including the Fast Forward initiative. The Storage and IO Fast Forward project will also be described, including schedules, project management, and technical aspects.

Chair: Ken Wood,

Hitachi

Optical Media Technical Roadmap: The Revival of Optical Storage (Presentation)

Ken Wood,

Hitachi (Bio)

Abstract: Optical storage has been seeing a resurgence in many industry verticals for it's unique preservation and environmental qualities. Recent developments have increased capacities and functionality while maintaining decades of backwards compatibility. This is due to the wide range of industries and markets that support this medium.

Optical Library System with Extended Error-Correction Coding for Long-Term Preservation (Presentation)

Akinobu Watanabe,

Hitachi (Bio)

Abstract: Hitachi has developed an archive system with long-term preservation capability, storing data on optical disks. Extended parity mounting technology improves durability against scratches while maintaining compatibility with optical disk specifications.

Achieving 1000-year Data Persistence: "Engraved in Stone" (Presentation)

Doug Hansen,

MDisc

Abstract: Proper choices in materials coupled with the flexibility of Optical Data Storage hardware enables the implementation of truly persistent digital data on a DVD or Blu-ray disc. Recently completed accelerated lifetime studies conducted in accordance with the ISO 10995 test standard demonstrate that a lifetime on the order of 1,000 years is achievable in a mass-market-priced product.

Chair: Wook,

House of Moves

Issues in Large-Scale Storage and Computing Systems for Film Production

Andy Hendickson,

CTO, Walt Disney Animation Studios (Bio)

Wook,

Data Architect, House of Moves

Darin Grant,

CTO, Method Studios (Bio)

Petascale Storage Solutions (Presentation)

Mike Feuerstein,

Xyratex

Storage Networking Industry Association Long Term Retention Technical Working Group(Presentation)

Sam Fineberg,

HPC Storage Solutions for Research Markets (Presentation)

John Buchanan,

Hitachi Data Systems

Wednesday, May 8^th, 2013

Analytics Drives Big Data Drives Infrastructure - Confessions of Storage turned Analytics Geeks (Presentation)

Aloke Guha,

Cruxly

Abstract:This talk is about how "form follows function" bu,t in iterative steps, on how the infrastructure of big data analytics has evolved, from early days preceding Hadoop and Map Reduce, and continues to evolve even today. With new and emerging data sources, data types and diverse analytics on them, there is different and growing needs on the processing, the storage of persistent data and its subsequent access. We believe data processing and the associated data and storage architectures to support today's and future analytics will become even more demanding, and simple Hadoop processing with local storage will not suffice. We will illustrate our learning from our own experiences with two analytics services that evolved, from batch mode analytics processing in 2008, to today's hybrid of both real-time processing and batch mode querying on stored data used at Cruxly.

Chair: Ben Kobler,

NASA

Advanced Tape Technologies for Future Archive Storage Systems (Presentation)

Dr. Mark Watson,

Oracle (Bio)

Building Blocks Required for Long Term Retention and Access to Enormous Quantities of Data (Presentation)

Matt Starr,

CTO, Spectra Logic

Abstract: This session is designed to give an introduction to the newest file management technologies that have emerged in 2013 for Active Archive. Anyone managing a mass storage infrastructure for HPC, Big Data, Cloud, research, etc., is painfully aware that the growth, access requirements and retention needs for data are relentless, and show no sign of letting up. This growth directly represents the increase in storage infrastructure for business, new file-oriented applications used in both enterprise and technical computing, and expanded server and desktop virtualization projects.

The result is a flood of data that needs to be readily available, on the appropriate media in an active state at all times, even though the bulk of that data is seldom accessed. And at the heart of that problem is the need to (1) rationalize the way that data is managed; and (2) create online access to all that data without maintaining it in a continuous, power-consuming state.

The solution lies in creating an active archive that enables straight-from-the-desktop access to files stored at any tier for rapid data access. Active archive software technologies allow existing file systems to expand over flash, disk and tape library storage technologies. Long term planning is also a key factor when considering an active archive approach. Storage hardware is by its nature short term, while data longevity is long term. A true active archive environment should contemplate and provide for a seamless upgrade to future technologies across any or all performance tiers.

The Economics of Tape, Disk, and Flash for Petabyte Storage (Presentation)

Robert Fontana,

IBM (Bio)

Chair: Sean Roberts,

Yahoo!

Building Private Clouds with OpenStack (Presentation)

Joshua McKenty,

Piston Cloud Computing

Scalable Object Storage for Public and Private Clouds (Presentation)

John Dickinson,

SwiftStack (Bio)

How Rackspace is using Private Cloud for Big Data (Presentation) (White Paper)

Bryan Thompson,

Rackspace

Chair: Matthew O'Keefe,

University of Minnesota

PCIe FLASH and Stacked DRAM for Scalable Systems (Presentation)

Joe Jeddeloh,

Micron (Bio)

Abstract: Low power, high bandwidth main memory systems and storage will be a major architectural focus in the next three to five years. Chip stacking with through-silicon via's (TSV) opens a door of innovation not available to computer architects in the past 25 years. The NAND roadmap is providing new opportunities to SSD developers. This presentation will cover both volatile and non-volatile memory trends and roadmaps. The Hybrid Memory Cube and Micron’s PCIe SSD architecture will be introduced.

Hard Drives: Obstacles and Opportunities in the Next Three Years (Presentation)

Dave Anderson,

Seagate (Bio)

Abstract: This talk will share data on recent changes in the areal density growth rate and describe techniques under development to enable resumption of higher rates in capacity growth in the future.

Revolutions in Storage (Presentation)

James Hughes,

Seagate (Bio)

Abstract: The trends of technology are rocking the storage industry. Fundamental changes in basic technology, combined with massive scale, new paradigms, and fundamental economics leads to predictions of a new storage programming paradigm. The growth of low cost/GB disk is continuing with technologies such as Shingled Magnetic Recording. Flash and RAM are continuing to scale with roadmaps, some argue, down to atom scale. These technologies do not come without a cost. It is time to reevaluate the interface that we use to all kinds of storage, RAM, Flash and Disk. The discussion starts with the unique economics of storage (as compared to processing and networking), discusses technology changes, posits a set of open questions and ends with predictions of fundamental shifts across the entire storage hierarchy.

The Elusive "Promise" of Holographic Data Storage (Presentation)

Dr. Ken Anderson,

CEO, Akonia Holographics

Lustre Acquisition: Details of Xyratex Lustre® Assets (Presentation)

Mike Feuerstein,

Xyratex

Massive Data Storage (Research Track)

(Presenter names are in bold font)

Thursday, May 9^th, 2013

Chair: Dr. Theodore Wong,

IBM Research

The Impact of Areal Density and Millions of Square Inches (MSI) of Produced Memory on
Petabyte Shipments of TAPE, NAND Flash, and HDD Storage Class Memories (Presentation)

Robert Fontana (Bio), Gary Decad, and Steven Hetzler,

IBM