Jay Heiser
Research VP
6 years at Gartner
24 years IT industry
Jay Heiser is a research vice president specializing in the areas of IT risk management and compliance, security policy and organization, forensics, and investigation. Current research areas include cloud and SaaS computing risk and control, technologies and processes for the secure sharing of data… Read Full Bio
by Jay Heiser | August 29, 2012 | Comments Off
Replacing a button on one of my customer-facing shirts this weekend motivated some thoughts on resiliency.
Why did the button fall off in the first place? It was sewn on by machine, a clever bit of automation that is based on interlocking threads from the top of the garment with threads from the bottom. It is fast, efficient, an inexpensive manufacturing process. It’s a sort of leverage.
The downside of machine stitching is that it lacks resiliency. Once any part of the thread suffers a loss of integrity, the entire mechanism begins to unravel. The entire strength of the joint rests on every single link of the chain of fiber, which is under constant stress from motion and washing.
In contrast, when a human sews a button on manually, the same piece of thread is looped around and around. Unlike a sewing machine, a human can push the same needle down from the top, and up from the bottom. This is slow, inefficient, and expensive. The result is a highly robust join of button to shirt with no single point of failure.
The upside of manual stitching is that it maximizes resilience. Even if part of the thread breaks or is worn, the rest of the loops remain in place. The button cannot fall off unless every single loop fails.
No enterprise can afford to hand stitch every single one of its digital buttons, but critical applications need to avoid short cuts that introduce single points of failure.
Category: risk management Tags: resiliance, resiliency
by Jay Heiser | August 13, 2012 | 2 Comments
Has anyone ever created a web-based application that wasn’t flaky and prone to data loss?
Every time Facebook comes out with some new functionality, the entire service gets slower, and harder to use. I’m not sure that there could be a more efficient way to lose text as it is entered than by trying to type it into Facebook in real time, disappearing into limbo as some new advertising link is downloaded, or a content change causes the whole page to slide out of visibility. I’ve been using web-based email for over 15 years. Without any useful increase in the basic functionality (other than some vendors being able to control spam), every year it gets slower and less reliable. Recognizing my impatience with the thing, Yahoo Mail sometimes offers to let me revert to the ‘classic’ and more simple version, the less attractive but infinitely more reliable interface. Unfortunately, it never seems to offer to let me do this unless I’m at such a low bandwidth location that online text entry is infeasible.
Wacky web isn’t just a consumer problem. Like most Gartner bloggers, I use Live Writer to create my blog entries locally. Could there be a simpler text editing process than creating the short and minimally-formatted material that constitutes the typical blog posting? Yet few bloggers have the patience to do all of their text entry and editing online through a web browser. If you’ve ever tried to paste some pictures into a blog and make sure the thing comes out the way you expected, you’ve had even bigger motivation to use a local client for composing blog server material. Inside Gartner, we’ve been experimenting for at least a year with a browser-enabled package of high interest to all customer-facing staff. The amount of mobile code modules and dynamic content downloaded on an ongoing basis means that every user has their own special failure modes. There’s a lot to be said for the Lotus Notes and Outlook clients. I’m pretty happy with the calendar and email clients on my iPhone, which reliably cache data locally, gracefully and almost invisibly dealing with the inevitable perturbations of the packet-based Internet.
Thick clients are reliable and provide a rich experience, but are old-fashioned and don’t generate the right color of money. Thin clients are cheap and functional, but few have the courage to implement something so mundane. As a result, we’ve been saddled with the middleweight client, a beast of a thing that maximizes the disadvantages of all possible network architectures.
I’m been told that the solution to this problem is HTML5, which I’m further told is not really a single technology, so much as a set of related things that have equal rights to the same dorky logo. We will increasingly overload the stateless HTTP protocol with state mechanisms, becoming even more dependent upon ever greater volumes of code that is downloaded in real time. We’ve already reached the point where the browser (ie the malware invitation layer), just one layer in a growing stack of nested mechanisms, is itself more complex than a Sun 3.
I’ve got a simple solution to inflated expectations, technology bloat and pervasive plugins. I suggest that instead of using the old, unfashionable term ‘client server’ to refer to the simplest, most secure, and most reliable form of networked computing, we rename it as ‘HTML 6’.
Category: Applications Tags: client server, HTML5, malware, reliability, www
by Jay Heiser | August 10, 2012 | 2 Comments
The process in which the buyer asks a random list of questions that might have some minor relevance to some aspect of a provider’s security posture, and the potential provider pretends to answer them.
Category: Cloud risk management security Tags: cloud computing risk, cloud security standards, risk assessment, security
by Jay Heiser | August 8, 2012 | Comments Off
I was recently forced to change my password on a UK pension system, and my first 4 password offerings were unacceptable. I was baffled as to what part of the password didn’t meet the requirements. Today, I needed to login and review a pay stub, had to reset my password, and the exact same thing happened.
My locally-generated 8-character password was h{35(Kmp . As happened on the pension site yesterday, today the payroll system responded that it was out of compliance with their requirements:
Length: between 8 and 14 characters
Must contain at least:
1 upper case letter
1 lower case letter
1 number
1 standard special character (like %,@, or #).
Passwords are case sensitive.
You may not re-use any of your last 4 passwords.
Logon ID and password cannot match.
Although this was a completely different site, on a different continent, I decided that it might be suffering from the exact same bug. They didn’t really mean ‘such as %, @, or #’. What they really meant was ‘including one of the following 3 characters, %, @, or #’. Substituting a pair of @ symbols for the 2 characters that were much more special to me, the system immediately accepted my password choice. This was simultaneously a diction fault, and an egregious avoidance of entropy.
A Google search for ‘standard special characters’ results in 40,000 hits. Its anybody’s guess whether these 2 different financial systems are operating under IBM’s definition of standard special, the HTML definition, or what. I’m sure that the Linux and Unix communities are happy to consider parens and braces as standard special characters. Given that they are ASCII characters, widely available on digital input devices such as keyboards, I personally feel that they are very special characters.
Given that so few passwords are actually cracked these days, but are instead slurped by malware, complexity requirements are a farcical exercise. Pretending to be complex, while limiting a user’s choice of characters, is just stupid and annoying.
Category: Cloud security Tags: authentication, password complexity, password reuse, password slurping, passwords
by Jay Heiser | August 7, 2012 | Comments Off
The financial sector links otherwise weakly coupled economic sectors, particularly during economic declines. Such links increase economic risk and the extent of cascading failures. Our results suggest that firewalls between financial services for different sectors would reduce systemic risk without hampering economic growth.
From “Networks of Economic Market Interdependence and Systemic Risk”, by Dion Harmon, Blake Stacey, Yavni Bar-Yam, Yaneer Bar-Yam
As a society we have largely ignored the implications of rising complexity because we are adaptive to it. At its core, furthermore, grasping the vast conditional complexities of our dependencies is an intuitive exercise, which strives for a picture of the whole when we can see only the parts. This is an anathema to the analytic culture that prizes computable precision.
From “Trade-Off Financial System Supply-Chain Cross-Contagion: a study in global systemic collapse”. by David Korowicz
Category: risk management Tags: complexity, risk, systemic risk
by Jay Heiser | August 3, 2012 | Comments Off
If you wanted to sabotage a trading system, you might set out to design suicide mechanisms that look very much like today’s automated trading mechanisms. Blaming Knight Capital’s screwed pooch on ‘software bug’ is a simplistic and flawed starting point for understanding the bigger risk picture.
Automated mechanisms within trading systems act as positive feedback loops, latching onto any tiny bits of information, and leveraging it into buys or sells. Any significant success in the use of a trading algorithm inevitably invites attention, which invariably results in reverse engineering and an escalating rate of copying until the opportunity is over-exploited (fished out) and disappears. Arbitrage does provide liquidity into a market, and those who engage in it claim, with some justification, that their efforts are ensuring that all buys and sells are based on the maximum access to market information.
Under normal circumstances, opportunity algorithms tend to work towards their own obsolescence, but if events unfold too rapidly for a graceful dismantling of a particularly popular algorithm, then the trading floor turns into the Indian electrical grid. Automated market runs happen when a perfect storm scenario is created by a sufficient number of identical trading algorithms that have not yet obsoleted each other, all suddenly kicking into extraordinary levels of play because of some significant new information.
Temporarily avoiding politically-loaded words such as ‘regulation’, let’s try to objectively understand what types of things could prevent feedback-based systems from spiraling into feedback-based overload. If you set up a microphone, an amplifier, and a speaker, and the mike can pick up sounds emanating from the speaker, the natural tendency is for a feedback loop, causing a loud and painful destabilization of the amplifier. All modern public address systems have mechanisms (now digital, originally variable capacitors) that avoid unwanted oscillations by dampening feedback. AWS splits its cloud up into regions that prevent failures in one region from cascading into another. Lawnmower engines have a simple mechanical governor that feeds back information on engine speed into the fuel system. Splitting the US power grid into multiple systems ensures that no single point of failure can simultaneously impact 48 states. As explained in a recent MSDN blog, “Windows Azure’s network infrastructure uses a safety valve mechanism to protect against potential cascading networking failures by limiting the scope of connections that can be accepted by our datacenter network hardware devices.”
ALL COMPLEX, AND MANY SIMPLE SYSTEMS, NEED GOVERNORS AND ANTI-FEEDBACK MECHANISMS TO MAINTAIN STABILITY. Given this basic fact of engineering design, it is a wonder of today’s economy that trading systems work as well as they do, given that the participants in the system have financial incentive to implement anti-anti-feedback mechanisms.
When the participants in a trading system refuse to self-govern, and game theory suggests that this will inevitably be the case, then the only possibility for self-governance is for the exchange to force a mechanism onto the market–or a government does so. This isn’t about finding and fixing ‘bugs’.
Category: Policy risk management Strategic Planning Tags: brittleness, cascading failure, reliability, resiliency, systemic risk, too big to fail
by Jay Heiser | August 1, 2012 | 1 Comment
I spent a frustrating 5 minutes this weekend enduring a forced password change on a retirement account containing $400. I was sure that the randomly generated and completely unmemorizable string my password utility came up with exceeded 7 characters, contained upper and lower case letters, at least 1 number, and a special character. It finally sunk in that the detailed password complexity policy only considered 3 characters as special ones, an inexplicable avoidance of entropy that was incompatible with my automated choice of the underline character. What a useless exercise.
Speaking of password compromise, it had recently been reported that some individuals believed their email addresses had been compromised through Dropbox. After an investigation, Dropbox has determined what happened, explaining in a July 31 blog post:
Our investigation found that usernames and passwords recently stolen from other websites were used to sign in to a small number of Dropbox accounts. We’ve contacted these users and have helped them protect their accounts.
A stolen password was also used to access an employee Dropbox account containing a project document with user email addresses. We believe this improper access is what led to the spam. We’re sorry about this, and have put additional controls in place to help make sure it doesn’t happen again.
It seems that both a Dropbox employee and several Dropbox users had the same password within Dropbox as they had on at least one system outside of Dropbox. The passwords were stolen outside of Dropbox (my money is on password slurping malware), so no amount of password complexity on the part of Dropbox could have prevented these incidents. The only difference between a 1024 bit random string and a 3-character first name would be that one comes with a false sense of security.
As I pointed out last December, the exploit community has long recognized that a high percentage of people use the same password on multiple systems. Once somebody finally comes up with an acceptably complex password that they can remember, who can blame them blame them for wanting to use that password on multiple systems?
Maybe I’m being overoptimistic, given their brain dead complexity requirement, but I find it very hard to believe that the login mechanisms for my pension site, or Dropbox, would allow brute force attacks. So what’s the point of password complexity requirements, or any of the other useless and impractical policies typically associated with this fatally flawed mechanism? Its a cynical exercise in denial.
The sad truth is that passwords are a problem that nobody really wants to solve. Users want to do whatever is easiest, and don’t want to be burdened by the inconvenience of strong authentication. System owners don’t want to spend any money on stronger authentication, and lack the will to enforce an unpopular mechanism on users.
If slurping is more common than cracking, then complexity is counterproductive. We are trapped in a cynical convention in which system owners can claim that they are doing everything possible to protect their users, when in reality, they are doing everything they can to leave their users out to dry. Recognizing that a vulnerability exists, and choosing to consciously live with it and manage it is an acceptable risk management decision. Pretending that an obsolete practice has solved the problem is a cynical exercise, an institutional abrogation of responsibility that consists of dumping the hot potato of risk into the laps into a user base that has no choice but to play along if they want to participate in the economy.
Category: security Tags: authentication, Dropbox, hacking, password slurping, passwords, SaaS security, security
by Jay Heiser | July 4, 2012 | 3 Comments
I managed to miss the excitement of yet another weather-related disaster by being in Japan during the the derecho incident that knocked out power to approximately 3 million customers, including my parents in Ohio and our house in Virginia, 400 miles away.
Another disaster that I just missed took place here in Japan, a major failure at a large and prominent cloud service provider, FirstServer. While this was extremely inconvenient for the approximately 5000 customers (1/10 of the total) who experienced unrecoverable data loss, it provided a great example to cite in my Disaster in the Cloud presentation, which I had been scheduled to present multiple times. Off the incident radar outside of the Japanese-speaking world, Gartner’s Tokyo-based security and consulting staff explained that this June 20th failure was a high-profile data loss event at a prominent national provider.
Like the gmail outage early last year, which required 4 days to recover service for what Google described as constituting only .02% of their user base, and an AWS incident about the same time that resulted in some permanent loss of data, the FirstServer incident was the result of a software upgrade.
It is hardly surprising that live upgrades of clouds sometimes result in failures. Replacing a code module within a running service is the equivalent of transplanting an organ without any anesthetic. Not only does the patient not have any anesthetic, the patient isn’t even lying down. The operation takes place while the patient is hard at work, performing heavy lifting on behalf of thousands of tenants simultaneously.
The economics of cloud computing drive service providers to make frequent software upgrades, and to do it without any downtime. Success in a highly-leveraged market is dependent upon providing as many forms of service to as many tenants as possible, virtually ensuring that most commercial clouds will be highly dynamic collections of interdependent software modules. The large number of customers makes it difficult or impossible to schedule downtime.
The good news is that a huge number of live upgrades take place without any negative impact. Failures have been relatively uncommon, and most have not yet resulted in unrecoverable data loss.
Cloud service buyers should ask current and potential providers about their software upgrade practices, but should not expect any agreement on the best practices for evaluating any particular provider’s practices. One thing I am confident about, however useful an ISO 27000 certification or a SOC2 or SOC3 assessment may be, it is unlikely to go into much depth on this increasingly critical topic.
Category: Cloud risk management security Tags: data loss, failure, quality, resiliance, software, upgrades
by Jay Heiser | June 20, 2012 | Comments Off
It is only Wednesday, and already I’ve reviewed at least 3 different policies that require employees to obey applicable laws. This is not just self-evident—its a professional cop-out.
Somebody doesn’t need to spend years at a prestigious law school and then suffer through an 80-hour a week apprenticeship at a major law firm to provide you with the advice that the law requires…obeying the law. I would hazard a bet that virtually every one of your employees already know this. Reminding them of this self-evident and universal requirement may provide the enterprise with some CYA, in the outside chance that one of your employees breaks the law on your time, but it has virtually no positive effect on what individuals actually do on a day to day basis. (“Doh! I suddenly remembered not to break the law!”)
When a lawyer provides you with verbiage that says “You must obey all applicable laws,” it either means that they are lazy, or they are ignorant, neither of which is considered desirable for high-paid corporate counselors. Any time a lawyer (or an auditor, or an information security specialists) provides you with either an open-ended category of risk, or an awkwardly long list of possible risks, take it as a sign that their priority is self-protection, and they really do not care to help you do your job. If anything bad happens, then they can say that they warned you, and it is your fault, not theirs.
A policy element that says “you must obey all applicable laws,” is useless, unless some legal expert has the courage to provide a list of what those laws are, and what needs to be done to follow them. Instead of demanding that the business units obey all laws, how about a policy requiring corporate legal to provide a list of all relevant regulations, in priority order?
The brutal fact of the matter is that nobody knows what laws potentially apply to the corporate use of information. The legal field can easily come up with a list of laws that have applied in the past, and it can provide some degree of speculation on where regulatory actions are likely to take place in the future. However, in our increasingly ambiguous and complex world, legal surprises are going to happen. That’s a cost of doing business. If ambiguity exists, it should be identified as such, it should be the subject of a business decision, and if the decision is made to accept the risk, then the situation should be monitored—that’s a perfectly reasonable approach. What is unreasonable is to expect business managers or end users to be legal experts, dumping responsibility in their laps for obscure regulatory risks that the the legal profession refuses to take a stand on.
Category: IT Governance Policy risk management Tags: law, lawyers, policy, regulatory compliance
by Jay Heiser | June 19, 2012 | Comments Off
Its not that I am categorically against the idea of law, but I am convinced that your typical corporate counsel is more motivated by personal convenience than by a sense of organizational proportion.
I recognize why virtually every organizational IT policy has the requirement “you must obey the law”, but I question the utility of it.
Has there EVER been a documented case in which an organization managed to protect itself by placing this bit of legal voodoo inside their end user or acceptable use policy? Has there EVER been an example of a company that actually could NOT discipline an employee who significantly broke a law through some IT-related activity, just because they had not proactively taken the time to write a generic policy against illegalities?
I’d love to see some case law on this one.
Category: Policy risk management Tags: law, legalism, policy