by Richard Hunter | August 6, 2012 | Submit a Comment
In the days since Knight Capital Group suffered a “computer glitch” that cost the company $440M in losses, I’ve been discussing with my colleagues how this catastrophe might have been prevented.
Some of my colleagues have argued that the failure was basically about IT governance–that the IT team at Knight was responsible only for implementing flawed trading algorithms specified to them by their non-IT colleagues. The argument essentially boils down to this: the fault lies with those who did not understand, and therefore did not adequately specify or test, what would happen when the technology operated according to the rules it was given. The implication in this argument is that disaster could have been prevented if the people involved had made better decisions in the requirements/specification/design/testing/implementation/etc. of the software.
My argument is that none of that is sufficient to prevent disaster—including future disasters—and that focusing on the technology is the only effective approach to solving the problem, meaning to reduce the risks to manageable levels. Indeed, I would argue that the risks of high speed trading systems are intrinsic, ungovernable, and potentially threatening to all participants in the markets. “Intrinsic” means that the problems these systems supposedly model with logical rules are beyond the ability of logic to solve. “Ungovernable” means that the risks introduced by these systems can’t be resolved by the tools of governance, largely because of the intrinsic logic problem. You cannot govern reality away; you operate within the bounds of reality, or reality teaches you to do so, more or less brutally and directly.
Markets by their nature produce unforeseen circumstances. It is as impractical to expect a piece of technology to respond appropriately—or even predictably–to every unforeseen circumstance as it is to expect a human being to do so. When technology is empowered to execute massive trades instantly, guided by rules based on various combinations of market circumstances, bad things can be expected to happen as soon as unforeseen circumstances arrive—in fact, at the very moment unforeseen circumstances arrive.
What is happening now on Wall Street is that in pursuit of competitive weapons, firms are empowering their machines to make bigger and bigger decisions faster and faster—literally, to trade millions or hundreds of millions of shares in small fractions of a second. It’s an arms race, and the weapons in question are being deployed at many, many firms, each of which has their own views on what constitutes “acceptable risk.”
If we define this as a governance issue, then the solution must be for the firms involved to make smarter decisions about the risks. The first problem with this approach is, as Harvey Keitel said in “Thelma and Louise,” that brains will only take you so far (to which he added that luck always runs out, something every executive should remember every day). High speed trading systems create severe risks that are not only unanticipated, but which realistically can never be anticipated in an environment where technology is continuously pushed to the limit.
In short, better governance won’t solve the problem because the people involved in governance are no more able to anticipate all possible failure modes than the people involved in designing and building the systems. Even if they were, it scarcely needs saying that Wall Street traders in general are heavily incented to take risks, and that they are often able to make others pay the price for risks gone bad–circumstances that do not inspire confidence in the “governor’s” ability to manage risks down. Finally (for the governance argument), there is no reason to believe that all players will adopt “good” (meaning in this case risk-aware) governance policies–and a single point of high-speed trading failure can potentially impact many players in the markets.
If you take the point of view that disasters such as this are the result of using technology in a way that it should not be used–to solve a problem that computer logic cannot solve, at least in the current state of the art–then the solution is to prevent the technology from being used in that way, either by banning it outright or by heavily taxing the proceeds of trades that are too short-lived to be called “investments.” I appreciate that regulating high-speed trading systems out of existence one way or another is a drastic approach. I believe that the risks–which extend to market participants far removed from the businesses that create these events–justify the means. There’s no more reason to allow individual trading companies to implement technology that potentially destroys markets than there is to allow private citizens to carry nuclear weapons. In both cases we could argue that careless or deranged (or whatever pejorative you like) individuals are the real problem. I agree that this argument is valid to a point; triggers don’t pull themselves, and we’d all be better off if everyone behaved decently. But positioning “better governance” as the solution to the problem doesn’t work when the consequences of one failure of governance are so severe.
As the saying goes, one atomic bomb can really ruin your day. That’s why we’re all glad that atomic bombs are not for sale to anyone who wants one, and why we should really, really question why we need automated trading programs on Wall Street.
There may be other solutions to this problem, and I’d be delighted to hear from readers about what they think might work. One thing I’m certain will not work is to continue on the current path, with the potential for bigger and bigger disasters. (But if you’d like to argue that point, feel free to do so.)
Category: IT risk Tags:
by Richard Hunter | August 2, 2012 | 1 Comment
My colleagues at Gartner and I have recently been discussing the importance of IT risk, often in the context of Cloud adoption, where the discussion is usually about the extent to which risks in the Cloud will slow adoption. (Our basic take on that question is some,but not enough to significantly impede the march to Cloud.) Some of my colleagues are skeptics about the potential impact of cloud failures; being a risk maven, I’m pretty bearish on the topic. The question my more bullish colleagues often ask is: how bad could it get, anyway? Has any company ever failed because of an IT risk come to fruition?
The answer is yes, of course. Cardsystems Inc. lost 95% of its revenues within 3 weeks of the breach it announced in 2005, and was sold shortly thereafter for a fraction of its pre-incident worth. ComAir didn’t fail as a result of the December 2004 incident in which their crew scheduling system went down on Christmas Eve, stranding an estimated 30,000 passengers during the Christmas holidays; however, the company lost 7% of its revenue for the year, a pretty big deal for an airline, and was the subject of an FTA investigation, which no one much enjoys. And the president of the company lost his job; not the CIO, the president. Most businesses would consider that to be a pretty steep price for IT failure.
The interesting thing isn’t that some companies have failed when their IT failed; the interesting thing is that the risks are almost certainly increasing. Plenty of executives don’t yet understand that while IT spend only represents 5% or less (on average) of enterprise revenues, the impact of IT on revenues is far higher than that. To put it another way, many executives don’t yet realize that their businesses don’t run much, if at all, without IT, and when IT is misused or fails, the impacts can be very large indeed. The recent events involving the Knight Capital Group make it clear how far we’ve come in terms of the importance of IT risks.
According to this NY Times article, Knight Capital Group lost $440 million on Wednesday in a matter of a few minutes when a “computer glitch” resulted in the purchase of a very large pile of stocks on behalf of the company. (The losses were incurred when the stocks were sold.) I quote the Times:
“In its statement, Knight Capital said its capital base, the money it uses to conduct its business, had been ‘severely impacted’ by the event and that it was ‘actively pursuing its strategic and financing alternatives.’”
The Times added: “The losses are greater than the company’s revenue in the second quarter of this year, when it brought in $289 million.” The article goes on to quote Christopher Nagy of KOR trading as saying that this might be “the beginning of the end for Knight.”
So the basic story is that Knight put in a new trading system; the system went haywire; the malfunction produced $440 million in losses in less than 5 minutes; and the company may fail as a result. Let there be no doubt: in the modern era companies fail because of IT misuse or failure. Period. This is not the same as civilization failing, of course. But it’s pretty serious for the owners and employees (and maybe customers) of Knight Capital Group.
What this means is that it’s more important than ever for IT professionals to make the connection for the rest of the executive team between what IT does and what everybody in the enterprise does with IT–to identify clearly what business outcomes might result from an IT failure. It’s possible that doing so would not have prevented this incident; I have no idea how many tests would have been necessary to discover and eliminate the “glitch” that cost Knight Capital Group $440 million. But I wonder whether the executive team at Knight was fully aware of just how bad a “computer glitch” could be–and I know that executives at many other companies are not.
Category: cloud IT risk Tags:
by Richard Hunter | July 18, 2012 | Submit a Comment
This is my first blog for Gartner. Anyone who’s read the books I co-wrote with George Westerman on Harvard Business Press, “IT Risk” and “The Real Business of IT,” knows that the issues that interest me most revolve around IT’s value to the enterprise. Since a blog is first and foremost a means of personal expression, I expect to write plenty here about IT value and its reflection in IT risk.
There’s plenty to write about in those terms. IT has become an essential lever for creating value in every enterprise of any size worldwide. As the importance of IT in creating value increases, so do the risks. This morning I was alerted by Paul Proctor, one of my colleagues, to a video of a fireworks show in San Diego on July 4th in which a purported computer glitch fired off 18 minutes worth of fireworks in about 30 seconds. (By the way, I found the 30-second version to be truly thrilling, to an extent far surpassing the usual fireworks show. Only the last 30 seconds are really exciting in most fireworks shows anyway.) When people lit fireworks by hand, you didn’t get that kind of outcome.
We are at the beginning of major platform and demographic changes in IT, and the role of IT organizations in the enterprise is changing even as the value they provide goes up and up. It’s an exciting time to be in IT, in fact the best time ever to be an IT professional, and I’m looking forward to writing about it. So stay tuned.
Category: IT risk value Tags: risk, value
by Richard Hunter | July 18, 2012 | Submit a Comment
I hear a lot lately about cloud as a means to “transformation,” and when I hear that word my personal value-meter kicks in, especially since so many people use the word “transformation” in ways that conflict with its definition in IT portfolio management (from whence the run-grow-transform model emerged in the early 2000s). In this post, I’m going to lay out what “run”, “grow”, and “transform” mean in this context, and how they relate to value.
By definition, to “transform” means to enter new markets with new value propositions for new customer segments. To “grow” means to enhance business performance in established markets serving established customer segments with established value propositions; to “run” means to carry out essential enterprise activities that do not connect directly to a particular customer segment (or, to put it another way, to a particular revenue stream).
When Apple entered the iTunes business, or IBM went full-tilt into the services business, those were transformations. When Apple brings out the iPhone 5, or IBM brings out a new mainframe, that’s growth, because the market, the customer segment(s), and the value proposition are well-established. When Apple or IBM add capacity to data centers that support a wide range of enterprise activities, that’s running the business.
Lots of enterprise and vendors use the word “transformation” to mean a big change, but extent of change isn’t what defines a transformation. It’s not a “transformation” when a supply chain’s costs are dramatically reduced. That’s “growth” in the run-grow-transform model unless a new market is being addressed with a new value proposition. So for example, if Wal-Mart goes into the logistics business, taking advantage of its supply chain capabilities to offer supply chain services to other businesses, that’s transformation (new market, new value prop, new customer segment); if Wal-Mart restructures its supply chain to deliver lower cost, even drastically lower cost, that’s growth (lower cost of doing business/higher margins/more capability in established markets). The value of both growth and transformation investment is ultimately expressed in terms of ROI, which is feasible because we can connect the investment in change to a paying customer (and so have actual returns for the investment).
The value of run-the-business stuff is expressed in terms of price-for-performance, not ROI, in particular because there is no revenue stream (returns) to which run-the-business services can be connected. In that sense, email in the cloud, which some enterprises think of as “transformational,” is anything but—it’s simply about achieving competitive price-for-performance for an essential enterprise service that can’t be tied to a particular revenue stream, which is a classic run-the-business value proposition.
As my colleague Matt Cain has pointed out in his research, the price-for-performance ratio for cloud-based email may not really be all that superior, depending on how “performance” is defined. Low prices for cloud services currently may reflect low levels of provider investment in security/availability/reliability/upgradeability/etc. Enterprises pay for that lower performance over time, by supplying mechanisms to fill the gaps in the performance that their own IT team used to provide. Internal IT organizations have long ago priced the costs of high performance into their services, and cloud providers will sooner or later, meaning that at some point cloud pricing for run-the-business services will approach internal provider unit costs.
Until that happens, remember to focus first on performance in any negotiation on services, because performance drives price–and if performance requirements can’t be met, price is completely irrelevant.
Category: cloud value Tags: cloud, grow, run, transform, value