Re: Deidentification (ISSUE-188)

On 2014-07-23 06:11, Roy T. Fielding wrote:

> We aren't talking about a definition of significant chance.
> De-identified is a state of being -- either it is or it isn't.
> 
> Linking many records together with a transaction-id, for example, has
> nothing to do with whether the records can be linked to a user.
> They might just be all the records related to a product SKU.
> We can't say that a data set isn't de-identified just because there
> exists some common field among the records.
> 
> If the records cannot be linked to a user, they do not represent a
> privacy risk.  If any of the same-transaction-id records can be linked
> to the user, then all of them can and the data is not de-identified.
> The number of records simply doesn't matter.  What matters is at least
> one of them (or some correlation of them) can be linked to a particular
> user.

I'm starting to get the impression that you're both wanting the same but 
misunderstand the other. While I agree with most of the things you've 
said above, I understand Vincent as trying to explain that beyond a 
certain threshold a set of linked records will represent a pattern that 
constitutes the link to the user. So the records themselve cannot be 
linked to the user individually, but in concert they can. My suggestion 
would be to include non-normative language that describes this issue (I 
don't believe it is easily quantifiable now) and to provide a few 
examples.

Anything beyond that would warrant a WG on its own.

Regards,

  Walter

Received on Wednesday, 23 July 2014 07:00:08 UTC