Blog post

On “Internally Lost Data” and DLP Discovery

By Anton Chuvakin | December 27, 2012 | 5 Comments


If a piece of sensitive data is visible to everybody with access to organization’s network (such as posted to an internal file share), is that a data breach? Most people will say “no.”  However, what about an organization with 100,000 users, lots of Internet links and many facilities?  Such an organization can never have tight controls over who can access to its network and also have at least a dozen of compromised/infected endpoints at its network at any time?  All the whining about “perimeter is dead” somehow did not get connected to “the whole data breach thing.”

So, if …

  1. You have many thousands of users on your network
  2. You do not have tight Internet access policies, and old accounts are not always removed
  3. Your VPN access is via username/password, and such users are given access to the entire internal network (“just like in the office”)
  4. You probably allow BYOD
  5. Your anti-malware endpoint coverage is at 99% with 99% of them updated (=you have thousands of machines with no effective AV – eh… provided you consider AV effective Smile)
  6. You have legitimate and rogue wireless access all over place.
  7. (in other words, you are a “typical” large company today)

Then why…

  • You do not consider an Excel spreadsheet full of credit card numbers on an internal file share to be a data breach?

In  any case, with this long preface I wanted to focus on DLP discovery capabilities. “Security industry lore” Smile indicates that at least some of the recent data incidents involved theft of data from “other than authorized” locations. Indeed, why hack SAP, if you can own a mere workstation where Excel exports from the same application abound? Why hack payment systems if you can own test systems, where [in dire violation of PCI DSS] the same data resides? Why compromise a database inside finance enclave, if you can break into a backup server inside IT? The data in such locations is not as well protected  (or: not protected at all), not encrypted and access to it is not logged. It is essentially just there for the taking. Free data!!!

Thus, the phenomenon of “internally lost data” is way more pervasive than most people think. I’d bet if you think that it is pretty pervasive, then it is EVEN MORE pervasive Smile Confidential, regulated and “merely” sensitive data on “all access” internal file shares, SharePoint boxes, team web servers, internal blogs, etc is literally all over the place.

And the fact that you don’t know where the data is, does NOT mean that the attacker won’t either. Back in 2002, when I was heavily involved in honeypot research, we had cases of attackers (who used to be called “script kiddies” back then) deploying simple data discovery scripts as a part of their initial system takeover (along with backdooring, IRC botting and patching the holes they used to break in). Do you think this knowledge is lost in the underground? No it is not. Thus, you cannot simply rely on obscurity of such data and the size of your messy, confusing IT environment Smile

Now, one may try to say that, as far as DLP technology is concerned, it is STILL more useful to detect what is being stolen now vs what is being exposed to the internal audience. “Yes, but!” If you only look at what is being taken out now, you are going to lose a DLP battle after a protracted, painful and frustrating fight. On the other hand, if you tighten down what is exposed internally AND watch for what is being taken out, you can lose the same battle with a lot more honor.

Moreover, you can do it even better: “sniff –> scan” approach worked well for some organizations. They first saw *it* on the wire, got mad – and then got curious: just where exactly is it stored internally? “Oh, in 537 different places!”  Next they fought the battle for reducing the internal exposure and then – surprise! – the occurrences of that piece of data being seen on the wire decreased as well…

So, if you got a DLP tool, plan for using its discovery capabilities. Hit those shares, SharePoints, team servers, intranet web sites, etc, etc.  And, yes, you need a process, not just a tool!

Related posts:

The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Comments are closed


  • As every location of sensitive data is found and remediated, you are lowering your data breach risk. These unknown locations, often copies of production data, are not properly secured and present a huge risk to an organization. How can your data security staff protect data they do not even know exists?

  • Well, your argument makes sense. Sadly, this is now what many organizations actually do 🙁 I suspect they want to focus on actually, ongoing leakage and not on internal sprawl – without realizing that they will never win.

  • On a related note, one vendor describes it thus: “The problem with focusing on data-at-rest at the outset is that it focuses on implied risk, not actual risk.”

  • Terry Miller says:

    This is an ages old debate, pretty much on the lines of raising the question “… Chicken or the Egg …”. Truth is neither Chicken nor Egg can substitute each other. Now, in the context of DLP – if DLP is done properly and the right set of DLP tools and policies are deployed, the question whether DLP at rest or in motion gets greatly mitigated if not completely obviated.

    If DLP is done properly, copying of important data on unsecured locations will not be allowed in the first place (is this ‘Data in Motion’ problem ?) while if important data has somehow landed on unsecured location then it should not go ‘undetected’ (is this ‘Data at Rest’ problem ?).

    To Anton’s point about the misguided approach of many (most ?) of the companies – that is an unfortunate reality that many of the organizations are not all that astute about this. The more fundamental issue really seems to be that the traditional DLP vendors (and some of the self-proclaimed industry pudits on the pay-roll of these vendors) have done a tremendous job of screwing up the “belief system” as to how the DLP challenge should be addressed – and IT folks in position of responsibility in many organizations are really not the brightest lot to be able to see through that, quite frankly.

  • Thanks for the comment.

    >Truth is neither Chicken nor Egg can substitute each other.

    Yes, indeed. It does sound like it sometimes and people still hold pretty fierce debates about this.

    >if DLP is done properly, copying of important data on unsecured
    >locations will not be allowed

    to unsecured INTERNAL locations? I have rarely seen a deployment of DLP that’d be block it (and not because the tool cannot do it, but because the team does not architect it this way)