Ray Valdes

A member of the Gartner Blog Network

Ray Valdes
Research VP
9 years at Gartner
30 years IT industry

Ray Valdes is research director in Gartner Research, where he is part of the Internet Platforms and Web Services team. Read Full Bio

Coverage Areas:

Lossage and Leakage of Social Data

by Ray Valdes  |  February 23, 2009  |  Comments Off

Among all the big stories of the past week, a small one that may have escaped your attention is the unfortunate story of Magnolia, a Web 2.0 social bookmarking site which suffered a catastrophic data loss. The events began several weeks ago, but came to a resolution at the end of last week.

Magnolia is a bookmarking site similar to Delicious, Furl, and Diigo. I don’t know how many users they had, but according to the founder and sole employee Larry Halff (in a podcast interview with Citizen Garden), the data repository consisted about a half-terabyte of data. If each bookmark were to consume about 1k bytes of storage (the URL plus metadata plus internal DBMS structures), that’s about 500 million bookmarks. I am not a heavy user of social-bookmark services, but I have stored about a thousand bookmarks in the cloud. If one extrapolates from that data point, that could represents a half-million users affected. Some of those users reportedly had been stashing bookmarks for four years.

Now that all data has mostly vanished.

There was a backup system in place, but the incident was not a hard drive failure, but rather data corruption above the physical layer (which means that the bad data gets replicated to the back-up system, and overwrites the good data). And there was only a single backup, no offsite replica. Despite the site’s professional appearance as a good-looking and functional Web 2.0 venture, the modest reality is that the Web site ran on four Macintosh minis (the same entry-level computer my 8-year-old uses) with a back-end of two Apple Xserve pizzaboxes (of the kind available on the used market for about $600) connected via a Firewire cable.

Despite this catastrophic loss, community response has been even-tempered, perhaps due to the candid and open communication style. There are ongoing attempts to retrieve data echoed or replicated to other social aggregation sites such as Friend Feed, and from Web caches. These piecemeal efforts have panned out, but only for a portion of data for some users.

Going back several weeks, a less visible but similar cautionary tale was the demise of Journalspace in January. Journalspace was a small social networking and blogging commuinty similar to LiveJournal or Blogger. Details of the data disaster are not available, but it appears that either the data was not backed up, or corrupt data was replicated to the backup machine, or possibly even the result of a departing, disgruntled employee. The outcome was the total cessation of activities. The domain name and business were put up for sale on Ebay, and were sold to a Russian company for a little over $5000. There is now an entirely new site up at the old domain, running on a different software platform (and presumably with better backup procedures). Some former members of the original user community have found their way back and are starting fresh with new journals.

Some lessons:

  • Just because a Web service is available from the cloud does not mean that it is cloud computing in the sense we expect or would like. That is: “cloud” in the sense of robust, scalable, and secure, such as from brand-name providers such as Amazon, Google, Microsoft and Salesforce.com.The reality may be that your favorite cloud niche-service is from one guy with a bunch of computers in his living room, flying without a net.
  • Even if the service is robust secure, your data is still locked up in that provider’s repository, and vulnerable to any business vicissitudes. Before you spend years pouring data into an external repository, make sure you can get it back out (i.e., have a data extraction and recovery process, and make sure you run that process repeatedly)
  • Data that you think is private or contained within the confines of one site, might not be. Data can leak or be echoed to aggregators, for better or worse. Be aware of what other sites your data is spilling onto, and make sure that is what you want.
  • Social data is not just lists of friends and contacts. There are as many types of social data as there are social sites. Each type of data may require its own approach to management.

How do you manage your social data?

Comments Off

Category: Uncategorized     Tags: