tailieunhanh - Finding a needle in Haystack: Facebook’s photo storage

Several proposals have been submitted to ease the pain of RSS on webmasters. Many of these are described in detail in the RSS Feed State HOWTO [17]; exam- ples include avoiding transmission of the feed content if it hasn’t changed since the client’s last request, gzip compression of feed data, and clever ways to shape the timetable by which clients may poll the RSS feed. Unfortunately, because the schedule of micronews is essentially unpredictable, it is fundamentally impossible for clients to know when polling is necessary. Werner Vogels puts it succinctly: Uncontrolled Polling of RSS Resources Does Not Scale [24] | Finding a needle in Haystack Facebook s photo storage Doug Beaver Sanjeev Kumar Harry C. Li Jason Sobel Peter Vajgel Facebook Inc. doug skumar hcli jsobel pvg@ Abstract This paper describes Haystack an object storage system optimized for Facebook s Photos application. Facebook currently stores over 260 billion images which translates to over 20 petabytes of data. Users upload one billion new photos 60 terabytes each week and Facebook serves over one million images per second at peak. Haystack provides a less expensive and higher performing solution than our previous approach which leveraged network attached storage appliances over NFS. Our key observation is that this traditional design incurs an excessive number of disk operations because of metadata lookups. We carefully reduce this per photo metadata so that Haystack storage machines can perform all metadata lookups in main memory. This choice conserves disk operations for reading actual data and thus increases overall throughput. 1 Introduction Sharing photos is one of Facebook s most popular features. To date users have uploaded over 65 billion photos making Facebook the biggest photo sharing website in the world. For each uploaded photo Facebook generates and stores four images of different sizes which translates to over 260 billion images and more than 20 petabytes of data. Users upload one billion new photos 60 terabytes each week and Facebook serves over one million images per second at peak. As we expect these numbers to increase in the future photo storage poses a significant challenge for Facebook s infrastructure. This paper presents the design and implementation of Haystack Facebook s photo storage system that has been in production for the past 24 months. Haystack is an object store 7 10 12 13 25 26 that we designed for sharing photos on Facebook where data is written once read often never modified and rarely deleted. We engineered our own storage system for photos because traditional .