Wikileaks has become the world’s most visible and newsworthy user of cloud computing. Its current situation provides some useful enterprise lessons on the unique attributes of digital information.
1) Digital data is very fragile: Even within something as simple as a PC filesystem, a digital file doesn’t exist as a single contiguous body of information. Files are composed of blocks, linked to by a table, which itself is linked to by another table. If the link is removed or damaged, then the file is as good as gone (forensic recovery is often possible on small filesystems, but not on highly distributed ones that sprinkle chunks of the file across multiple devices). The Wikileaks data within Amazon’s cloud could be deleted in the blink of an electronic eye.
2) Redundancy more than compensates for fragility: Unlike physical representations of information, digital data can be immediately, precisely, and inexpensively reproduced. The best practice for maintaining the ongoing existence of electronic information is not to carefully protect the vessel and its contents, but to make multiple copies and ensure that a data loss event cannot effect all of them simultaneously. Amazon’s much-publicized attempt to digitally exile Wikileaks was effectively a recruiting call for the network army. Chopping a head from a cloud-based hydra just encourages further replication.
3) If you can’t find it, it doesn’t exist: Wikileaks pats itself on the back for having outed numerous organizations, but you can’t prove this from the current site and its many mirrors. If there are any leak links to anything not related to Afghanistan or the US Dept of State, I can’t find them. Where’s the search window? The larger the body of information, the greater the need for an index. Undocumented and unlinked to URLs are useless.
4) You can’t recover from a broken mirror: Wikileaks is currently claiming over 1300 mirrors, many of which seem to be active and accurate. Whatever is on the primary site is faithfully replicated to all the mirrors. I can’t find the missing link to the non-US Govt leaks on any of these sites, nor do I expect to, because they are duplicates of today’s site, not last year’s. Depending upon the efficiency of replication, a sucessfully data attack on the primary site would result in the damage or loss of the files on all the mirrors.
5) Online backup is an oxymoron: If the canonical copy of a file is damaged or lost, so are its realtime replications. If you want to be able to recover from digital data damage, then you must keep a copy of your data that is fully segregated from the original. On the assumption that accidents happen and that mistakes are often not immediately discovered, 8000 years of experience in maintaining data has consistently demonstrated the advantages of maintaining multiple copies and multiple versions.
Enterprise buyers are being subjected to a great deal of FUD about the relative advantages of cloud-based systems that ‘simultaneously replicate’ your data, and some cloud service providers are claiming that offline backups are more trouble than they are worth. You can bet that Wikileaks did not rely on Amazon as its sole storage location. The cloud of Wilileak mirrors provides a reliable vehicle only for the dissemination of the most recent leaks. Beyond this public face, Wikileaks relies on the network army to keep multiple copies of their raw and edited data squirreled away offline in multiple offline locations.