by Jay Heiser | February 28, 2013 | Comments Off
Any time your internal policies include the lawyerly language “Includes, but not limited to…”, it should be a sign that somebody needs to reexamine the text.
This is often a sort of cop out, an admission on the part of the policy writer that they actually do not know what the rules should be—but a warning that if you do not follow these yet-to-be specified rules, you will be in trouble. It doesn’t constitute useful guidance.
Choose your policy battles carefully. There is only so much influence you can exert over end user behavior through written policies, so don’t squander the attention and patience of your users with vague warnings, puzzles, and scavenger hunts.
If you cannot tell your end users what specifically they must do, or must not do, and if you cannot provide them with useful principles that would reasonably allow them to figure it out on your own, then you’ve got no basis for a policy element.
Category: IT Governance Policy Tags:
by Jay Heiser | February 27, 2013 | 2 Comments
“We have decided to do this new thing. We think it has risks. What should we to to make sure that it doesn’t have any risks. This new thing that we’ve decided to do. Without knowing what the risks are, or whether the best practices for risk mitigation have matured.”
Category: risk management Tags:
by Jay Heiser | February 15, 2013 | 1 Comment
As 4,200 disgruntled holiday goers, trapped on the ironically named cruise ship Triumph, finally end their 5 day ordeal, it serves as a reminder that the eggs can have more stake in the state of the basket than the basket holder does.
From the point of view of the cruise line, each booked up ship represents a concentration risk, containing thousands of human beings, their fate, indeed their very lives, dependent upon the correct functioning of a very large and complex system. From the point of view of the passengers, a cruise ship represents recovery risk.
While cloud computing has been relatively smooth sailing for the majority of its passengers, there have been multiple multi-day incidents that required a recovery process of uncertain duration, with ambiguous hopes for success. There have even been clouds that ran aground in the shallow waters of a highly competitive marketplace, leaving their passengers permanently stranded.
Most cloud service providers are able to weather a single packet storm, returning to operational status and compensating their customers with credit for time lost, and maybe even a bit of extra credit. For those who haven’t had their enthusiasm permanently squelched by 5 days without toilets or dinner, the cruise line is offering free cruises to the victims of this mishap. Many of the unfortunate Triumph passengers have lost a week’s worth of vacation—something that they can never recover. Likewise, when a cloud fails, thousands of customers are likely to experiences forms of loss that cannot be compensated for.
Both the cloud and cruising industries have proven relatively reliable, but failures do happen. One lesson that cloud customers can take from a series of vacation-ending fires and floodings is that when a single incident simultaneously impacts thousands of customers, the recovery will be slow and frustrating, and the provider will have no way of compensating their customers for their lost time.
Category: Cloud risk management Tags: cloud failure, cloud risk, concentration risk, portfolio risk, recovery risk, risk, risk management
by Jay Heiser | January 9, 2013 | 1 Comment
Today’s library user takes electronic catalogs for granted. Being able to remotely search the contents of a library is not only convenient, but it also allows for a tighter integration between the lending practices—you can see if a book is loaned out.
During a period of several decades, a number of service firms made very profitable business out of the digitization of the paper-based library catalogs used by public, educational, and private libraries. Old fashioned card catalogs were a form of analog database, with each card constituting a single record.
The structured data: title, author, publication date, subject, LOC number, etc, could easily make the transition from paper form to electronic database. However, many librarians had added unstructured data to many of the cards in their catalog, including information on the quality and status of the book, and other comments that would be useful to either the maintenance of or reference to the book. These annotations represented a rich set of stored knowledge that were largely lost during the brute force digitization process.
Annotations are a form of metadata that, because of their informality, are typically not recognized as having organizational value. Life goes on, and the loss of a several generation’s worth of neatly scribbled notations around the edges of well-rounded index cards are hardly the biggest problem confronting today’s library.
Are we likewise putting organizational knowledge at risk by not providing our users with a robust and portable annotation mechanism to support their use of digital documents? This has obviously not been an unsustainable problem. Its debatable just how much electronic marking up has taken place on workstations and laptops, but the ubiquity of tablets, which are clearly much more convenient for the reading—and annotation—of longer documents and books, likely means that the sum total of digital annotations is growing at an accelerating rate.
What’s the value to the enterprise of the stored knowledge represented by digital document annotations?
Should the CIO be looking for ways to facilitate the creation and exploitation of this form of stored knowledge?
Does it represent a form of metadata that is worth managing and protecting to ensure that it is available as long as it is useful?
Category: Applications risk management Tags: annotations, metadata
by Jay Heiser | January 4, 2013 | 1 Comment
We’ve recently moved house, and my collection of books, many of them heavily marked up with multi-colored highlights, Post-Its, and bookmarks, remains something of a storage issue. Over the last several months, I’ve been experimenting with digital books on an iPad.
There’s a lot to be said both for and against services like Amazon’s Kindle and Apple’s iBook. The selection and convenience is a strong positive, and eBooks not only don’t fill up my groaning Swedish flatpack bookshelves, they also cost less, which is no small consideration for a heavy reader. I might read a paperback novel, or borrow one from the library, and never need to refer to the thing again. I’ve subscribed to a weekly UK photography magazine for 6 years, but it costs a lot more since we moved to the States. I rarely save the paper copy, so why not save some money and some trees by reading this, and other magazines, online?
However, if I spend hours working my way through a non-fiction book, marking it up and ‘penciling in’ comments, its done with the assumption of perpetual access to that book and my annotations. The primitive highlighting and markup functionality of Kindle and iBooks is annoying for the serious annotator, but my biggest concern about the commercial eBook model is that I’m totally beholden to the long term viability of the vendor. If I’m using a proprietary file format, locked up with a digital rights mechanism, I’m dependent upon access to that vendor’s server, and I’m dependent upon reliable support for my device (and its successors)—indefinitely. Its not a very open system.
On the plus side, if our house burns down, at least I’ve still got copies of all my eBooks. If I get stuck somewhere without my iPad, I can still access a relatively recent copy of an annotated book on my iPhone, and magazines can be downloaded on the fly. But for long term access to the intellectual property I’ve paid for, and for the added metavalue of my personal annotations, proprietary and rights-managed formats represent a significant risk. If the bookseller goes out of business, they take my books with them.
When you pay for paper, you are control of the destiny of that document, and all of the metadata that you and other readers have added to that information medium. When you pay for an eBook, you are only leasing it. That’s a great model for light reading, but its detrimental to long term scholarship.
Category: Applications BCP/DR Cloud risk management Tags: contingency planning, continuity, DRM, ebooks, Kindle, PDF, rights management, standards
by Jay Heiser | November 28, 2012 | 1 Comment
Anyone with a stake in the overall success of cloud computing should take a few minutes to read the recent NYT interview with Peter G. Neumann, a highly-respected computer security researcher who, now entering his 9th decade, continues to do ground breaking work on digital reliability.
Commercial cloud computing creates new levels of urgency for structural weaknesses that Dr. Neumann has been warning about for decades, including the the dangers inherent in complex systems and in monocultures.
Concerns such as this are often treated as being hypothetical—outside of the community of academics and government researchers who spend their lives working in the field of digital security. Neumann’s scientific opinion represents what is considered orthodox within this field.
There really is no room for doubt that the robustness of our current computing environment, not the least of which includes the complex Internet-enabled public ‘cloud’, is to a large degree dependent upon ‘band-aids’, and fails to take full benefit of a half century of research into computer security. The open question that Dr. Neumann cannot answer is how long this continues to be sustainable.
The reality of most of the human-designed world is that it is non-optimal, and kludged together, but we muddle along pretty well in spite of poor design and misplaced priorities. Today’s compute environment may last for decades, as we continue to extend last century’s flawed architectures and sloppy code across increasingly complex and exposed service offerings, patching security and reliability holes with digital chewing gum and baling wire. If this does eventually become unsustainable, its good to know that some highly-qualified researchers have been putting a lot of effort into ‘rethinking the computer.’
Category: BCP/DR Cloud risk management security Tags: complexity, Peter G. Neumann, security, security history
by Jay Heiser | November 2, 2012 | 2 Comments
Our home telephone is totally dependent upon the electrical power grid, and a lead acid battery of unknown age is all that stands between us and total loss of external connectivity.
Fiber to the home, which we’ve now had in 2 different houses, represents an opportunity for high speed, flexibility, and economics, providing a single source for television, telephone, and Internet. Unlike analog phones and broadcast TV, ‘advanced residential communications’ in ‘Smart Neighborhoods’ offering ‘Blazing Speed’ that will ‘exceed your expectations’ are totally dependent upon a powered-up interface box. Unlike an old-fashioned copper phone line, or a TV antenna, you can’t receive fiber optic transmission without a powered device that splits out the three services and interfaces them to the in-home wiring. If the power goes out, the fiber no longer blazes—it flares out.
In order to maintain telephone service, high-tech homes have a backup battery hidden in the customer premise equipment. Nobody claims they last over 8 hours, they are not routinely maintained, and common wisdom is that they often do not last that long.
It isn’t just the home fiber interface that requires power. The (currently unapproved) franchise agreement between our provider and the county requires 2 hour of backup for all distribution amplifiers and fiber optic nodes, 24 hours for all head end tower and HVAC, and at least one dispatchable portable generator to do something somewhere. I don’t know how reassuring that is to people who have already experienced 2 multiday power outages this year.
Clearly, there are reliability advantages to the plain old telephone system (POTS), which only requires emergency power at the central office. Given a choice, telecommuters with 2 lines sometimes do decide to make one of them analog—but increasingly, you don’t get that choice. Once a neighborhood switches over to fiber, the providers become extraordinarily reluctant to support copper. Our new neighborhood has no POTS, and the single telecom provider has exclusive cabling rights for the remainder of my lifetime—and well beyond.
Obviously, there are many advantages to wireless, which becomes the channel of choice when the home or office phone is powered out. Unfortunately, it tends to fail when it is most needed. After hurricane Katrina, the FCC attempted to force providers to include 8 hours of backup for all cells (which would barely last past the excitement of the storm). This 2007 blog post, correctly discussing the unlikelihood of that happening, states “Well, we are likely headed for the big one here soon and it stands to reason we’ll want to have some cell phone service in the aftermath. As we saw last month during a 5.6 earthquake, you don’t have to have cell towers go down to lose service. There was enough congestion in that first hour to bring conversations to a halt. But in a much bigger scenario, having additional power could keep information flowing in the hours after a disaster, helping speed aid and relief to the right places.” New York and New Jersey have just had their big ones, and information is still not flowing in the aftermath of that disaster.
Reporters based in New York city, and Gartner staff living in the areas hardest hit by Sandy have reported total failures of cell phone in their neighborhoods, with some providers apparently doing worse than others. The FCC reported yesterday that “the number of cell site outages overall has declined from approximately 25 percent to 19 percent” (the perceptive observer might ask, percentage of what population of sites).
In addition to significant traffic increases during a natural disaster, there are at least 3 reasons for cell phone failure, with the first one being particularly acute for cell systems:
- Power: Batteries get drained pretty quickly. While a growing number of cells do have generators, the generators need fuel replenishment, which in the post-Sandy world is becoming a logistical problem for several reasons. At the same time that the power grid is coming back online, a growing number of cell sites are running out of backup power.
- Physical damage: wind damage to antenna, or water damage to electronics can impact service, and it takes time after a disaster to deploy existing repair crews across a transportation-challenged region.
- Network failures: The backhaul networks between towers and the switching offices are subject to physical damage, especially from flood water, and they require electrical power (see 1 above).
There’s a lot to be said for the continuity advantages of POTS and analog phones, but other than rural areas, its likely to be phased out in favor of home digital connectivity and cell phones. If you want to do some contingency planning, you might want to scout your neighborhood for pay phones.
Specific details on the post-Sandy status of each wireless provider can be found in yesterday’s NYT blogs.
Category: BCP/DR risk management Tags: cell phones, Hurricane Sandy, power failure, redundancy, Sandy
by Jay Heiser | October 30, 2012 | 1 Comment
Preparing for Sandy’s imminent arrival, I didn’t fill up any bathtubs with water, but I did charge up all the phones, tablets, and MiFis in the house. Frankenstorm didn’t end up having a huge impact on my part of the country, and we never suffered a prolonged power outage. My son, holed up in his dorm at what is currently a very quiet university, has gone 14 hours without power. I suggested that it looked like he’d have an additional 2 days this week to study. He reminded me that all of his text books are digital.
Under pressure to reduce the weight and volume of printed matter in the house, I’ve been experimenting with eBooks on my iPad. Assured that I’ll love reading books electronically—once I get used to it—I’m still trying to figure out how to change the color of the highlighting. I miss all those colorful Post-It tabs sticking out the sides of the pages. Digital format seems like a great way to read things that you’ll throw away, like beach novels and magazines, but the annotation mechanisms are still weak, and the aesthetic satisfaction of a crowded bookshelf is totally missing.
Recognizing the convenience of being able to stuff multiple books and magazines, not to mention thousands of podcasts, into a single slim device, I’m ready for an upcoming multi-day trip. Even if I get delayed by weather, I should still have plenty to read. While my battery lasts. In many ways, the digital option is a lot more convenient, but its dependent upon external power. I wonder how many people Sandy has trapped between a tablet and an empty battery.
While it is way too early to begin collecting continuity and recovery lessons from Sandy’s aftermath, the fact that only one hospital outage has been reported, suggests that a lot of emergency power systems worked very well last night. NYU’s Langone Medical Center lost power last night (and less dramatically, Coney Island Hospital’s), and several sources today have reported that not only did the backup power fail, but also the backup to the backup. Back in June (the other ‘storm of the century’ earlier this year, not to be confused with last year’s storm of the century), Amazon experienced a similar (failure)3 when a single incident took both utility substations offline, followed by an overheated generator, and then a failure due to the misconfiguration of the secondary backup.
Anyone who has spent significant time dealing with data centers, or any other critical system, likely has multiple war stories about failed power. Its a mundane but important topic. Microsoft has been bemoaning the lack of researchers, developers, and engineers, but maybe what we really need are more mechanics and electricians.
Category: BCP/DR Cloud risk management Tags: contingency planning, electricity, Hurricane Sandy, power, redundancy, weather
by Jay Heiser | August 29, 2012 | Comments Off
Replacing a button on one of my customer-facing shirts this weekend motivated some thoughts on resiliency.
Why did the button fall off in the first place? It was sewn on by machine, a clever bit of automation that is based on interlocking threads from the top of the garment with threads from the bottom. It is fast, efficient, an inexpensive manufacturing process. It’s a sort of leverage.
The downside of machine stitching is that it lacks resiliency. Once any part of the thread suffers a loss of integrity, the entire mechanism begins to unravel. The entire strength of the joint rests on every single link of the chain of fiber, which is under constant stress from motion and washing.
In contrast, when a human sews a button on manually, the same piece of thread is looped around and around. Unlike a sewing machine, a human can push the same needle down from the top, and up from the bottom. This is slow, inefficient, and expensive. The result is a highly robust join of button to shirt with no single point of failure.
The upside of manual stitching is that it maximizes resilience. Even if part of the thread breaks or is worn, the rest of the loops remain in place. The button cannot fall off unless every single loop fails.
No enterprise can afford to hand stitch every single one of its digital buttons, but critical applications need to avoid short cuts that introduce single points of failure.
Category: risk management Tags: resiliance, resiliency
by Jay Heiser | August 13, 2012 | 2 Comments
Has anyone ever created a web-based application that wasn’t flaky and prone to data loss?
Every time Facebook comes out with some new functionality, the entire service gets slower, and harder to use. I’m not sure that there could be a more efficient way to lose text as it is entered than by trying to type it into Facebook in real time, disappearing into limbo as some new advertising link is downloaded, or a content change causes the whole page to slide out of visibility. I’ve been using web-based email for over 15 years. Without any useful increase in the basic functionality (other than some vendors being able to control spam), every year it gets slower and less reliable. Recognizing my impatience with the thing, Yahoo Mail sometimes offers to let me revert to the ‘classic’ and more simple version, the less attractive but infinitely more reliable interface. Unfortunately, it never seems to offer to let me do this unless I’m at such a low bandwidth location that online text entry is infeasible.
Wacky web isn’t just a consumer problem. Like most Gartner bloggers, I use Live Writer to create my blog entries locally. Could there be a simpler text editing process than creating the short and minimally-formatted material that constitutes the typical blog posting? Yet few bloggers have the patience to do all of their text entry and editing online through a web browser. If you’ve ever tried to paste some pictures into a blog and make sure the thing comes out the way you expected, you’ve had even bigger motivation to use a local client for composing blog server material. Inside Gartner, we’ve been experimenting for at least a year with a browser-enabled package of high interest to all customer-facing staff. The amount of mobile code modules and dynamic content downloaded on an ongoing basis means that every user has their own special failure modes. There’s a lot to be said for the Lotus Notes and Outlook clients. I’m pretty happy with the calendar and email clients on my iPhone, which reliably cache data locally, gracefully and almost invisibly dealing with the inevitable perturbations of the packet-based Internet.
Thick clients are reliable and provide a rich experience, but are old-fashioned and don’t generate the right color of money. Thin clients are cheap and functional, but few have the courage to implement something so mundane. As a result, we’ve been saddled with the middleweight client, a beast of a thing that maximizes the disadvantages of all possible network architectures.
I’m been told that the solution to this problem is HTML5, which I’m further told is not really a single technology, so much as a set of related things that have equal rights to the same dorky logo. We will increasingly overload the stateless HTTP protocol with state mechanisms, becoming even more dependent upon ever greater volumes of code that is downloaded in real time. We’ve already reached the point where the browser (ie the malware invitation layer), just one layer in a growing stack of nested mechanisms, is itself more complex than a Sun 3.
I’ve got a simple solution to inflated expectations, technology bloat and pervasive plugins. I suggest that instead of using the old, unfashionable term ‘client server’ to refer to the simplest, most secure, and most reliable form of networked computing, we rename it as ‘HTML 6’.
Category: Applications Tags: client server, HTML5, malware, reliability, www