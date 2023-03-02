All these four-letter-privacy-acronyms today continue to drive a bunch of client questions our way. GDPR, LGPD, CCPA/CPRA, PIPL… I bet we can make a bingo game of another fair few. Here’s another one: ‘DPIA’.



Number one: DPIAs are not to be confused with a PIA.

In PIAs, ideally, on a business process level, one outlines necessary combinations of personal data to be used for predefined processing purposes -among other things. A DPIA however assesses in more detail the risks involved and shows you what ‘adequate’ security is. This includes considerations as to the level of (pseudo- or) anonymization possibly applied, to prevent further misuse of people’s information in identifiable forms. But what is anonymous?



Number two: Compliance is not the same as acting with common sense.

If you were about to run an online webshop in 1999, conduct a DPIA on your customers’ usernames and passwords, you may well have found that one-way hashing these with MD5 would suffice. Here’s the trick: though regulations like GDPR may tell you to revise a DPIA at every new or considerably changed processing activity, if you’re happy as a niche webshop and you wouldn’t have done much else than keep basic security updated, you’d have never given it another thought. Yet everyone knows that MD5 is as dead as possible, and any 7-year old who can find the right URL (which I won’t post here but there are literally dozens) can usually reverse it and many other hashes to its original input. That’s why security pros know to at least randomly salt hashes with SHA512 for example. Something you would have caught in an annual revision of the DPIA even in the absence of major changes to your webshop.

Do privacy pros need to know these tech details?

Not necessarily. But they do need to know about reidentification risks. In 2012, a groundbreaking thesis rattled mainly European regulators on the topic, leading to their guidance on singling out and anonymization in April 2014. Only to be more or less outdone by researchers in 2019 who showed this is getting pretty much worse as time and technological abilities progress. That was 4 years ago, who is to say today we haven’t gotten even further?

Then how to partake in this race?

Well, the above tells us that a focus on direct identifiers (‘PII’ in the old sense), like names and SSNs, isn’t enough for proper privacy. Even in the absence of strict regulation, we should by now know better. This is far from new. Single silver bullets usually also don’t exist. Problems persist especially when measures are implemented as a solve-all without detailed control. Take differential privacy. Several papers several years ago already described various angles but the key from the start is; ‘How do you manage the privacy budget?’. Many organizations say they protect privacy using the technology, but few of them are transparent about ‘where they put Epsilon’, so to speak (I’m simplifying, but not by much). Would for example the U.S. Census Bureau approach be considered sufficient today? What about in two years?

Prevent Pollution and Toxic Data

I firmly believe that the only perfect anonymization method is hard deletion. After all, data can bear value, but certainly that value can be negative as well:

– Laying in wait serving NO purpose established in the PIA, only there to be breached. At which point it is a liability and has become nothing but toxic.

– ‘Eternity’ is an ill advised retention period anyway, as per privacy-by-design one should only keep it while it serves such pre-defined, justified, documented purposes. Additional sanctions may just add insult to injury here.

So while we have the data for legitimate purposes still, there certainly are ways to extract information without compromising privacy. At all times however, realize:

Just because you have data, and you may not touch it in identifiable form, does that mean you can do anything with it?

That’s up to you. Did you obtain the data legally? Is it clear what you’re doing with it and does it fit with why you obtained it? Are you about to use it in a potentially re-identifiable way? Then what transparently conveyed purpose does it serve? In any case where you’re not comfortable answering the previous; make sure you prevent even indirect re-identification, and know that your comfort zone today may have been changed for you tomorrow while you were sleeping.