- From: Rigo Wenning <rigo@w3.org>
- Date: Sun, 23 Sep 2012 21:32:06 +0200
- To: public-tracking@w3.org
- Cc: Ed Felten <ed@felten.com>
Ed, On Thursday 13 September 2012 17:03:09 Ed Felten wrote: > I have several questions about what this means. > (A) Why does the definition talk about a process of making data > unlinkable, instead of directly defining what it means for data > to be unlinkable? Some data needs to be processed to make it > unlinkable, but some data is unlinkable from the start. The > definition should speak to both, even though > unlinkable-from-the-start data hasn't gone through any kind of > process. Suppose FirstCorp collects data X; SecondCorp collects > X+Y but then runs a process that discards Y to leave it with only > X; and ThirdCorp collects X+Y+Z but then minimizes away Y+Z to > end up with X. Shouldn't these three datasets be treated the > same--because they are the same X--despite having been through > different processes, or no process at all? for the data protection people like me, unlinkable data is not part of the scope of data protection measures or "privacy" if you want. It is therefore rather natural to only talk about linkable data in our Specifications meaning data linked to a person. And only address what to do with that linkable data and its link to a person. This may encompass a definition of what makes data "linkable". But it would go too far to define what's "unlinkable". Having done research about data being "unlinkable" (Slim Trabelsi/SAP has created a nice script to determine the entropy allowing for de-anonymization), a definition of "unlinkable data" would import that scientific dispute into the Specification. I would not really like that to happen as it would mean another point of endless debate. You can see this already happening in the thread following your message. Just asking for data to be "unlinkable" leaves the art of making that happen with every little webmaster in this world instead of using the expertise being here to find the right compromise between effort of anonymization and privacy threat involved. > (B) Why "commercially > reasonable" rather than just "reasonable"? The term "reasonable" > already takes into account all relevant factors. Can somebody > give an example of something that would qualify as "commercially > reasonable" but not "reasonable", or vice versa? If not, > "commercially" only makes the definition harder to understand. Yes, I think "commercially" is definitely an accident in that definition. Especially as in a democratic society, commercial companies are allowed to be commercially unreasonable. > (C) "there is confidence" seems to raise two questions. First, > who is it that needs to be confident? Second, can the confidence > be just an unsupported gut feeling of optimism, or does there > need to be some valid reason for confidence? Presumably the > intent is that the party holding the data has justified > confidence that the data cannot be linked, but if so it might be > better to spell that out. I think the "confidence" is a null/zero requirement. If someone easily de-anonymizes data that you were confident about, the legal system will chose the horizon of a "reasonable person". And by having light anon tools, you were not reasonable to assume confidence. > (D) Why "it contains information which could not be linked" rather > than the simpler "it could not be linked"? Do the extra words > add any meaning? (E) What does "in a production environment" add? > If the goal is to rule out results demonstrated in a research > environment, I doubt this language would accomplish that goal, > because all of the re-identification research I know of required > less than a production environment. If the goal is to rule out > linking approaches that aren't at all practical, some other > language would probably be better. Ed, you can link data together that is not personal data. The definition needs some better wording here. Because only the fact of linking personal data with other personal data and other data creates problem. The fact of linking data without personal connotation is just out of scope of the entire privacy concept. I agree that the "production environment" is meaningless. Rigo
Received on Sunday, 23 September 2012 19:32:30 UTC