RE: Proposal from Big Basin break out

Walter,

Much of this was discussed in the room:

Phase 1:  Raw Data Permitted Uses:  Security/Fraud, Frequency Capping, Debugging, (some) Financial/Audit
  <Areas where an operational ID is needed.  The goal here is to define shorter retention timeframes where possible>

Phase 2:  De-Identified Permitted Uses:  Financial/Audit, Product Improvement, Market Research
  <One-way secret hash, operational/administrative controls.  Data is not able to be used for production purposes - unable to alter a specific user's online experience.>

Phase 3:  Unlinked Data:  Any use
  <Re-one-way secret, further data minimization and/or aggregation, key is destroyed>

- Shane

-----Original Message-----
From: Walter van Holst [mailto:walter.van.holst@xs4all.nl] 
Sent: Saturday, May 11, 2013 1:56 PM
To: Shane Wiley
Cc: Kevin Kiley; public-tracking@w3.org; Brad Kulick
Subject: RE: Proposal from Big Basin break out

On 2013-05-11 22:49, Shane Wiley wrote:
> Kevin,
> 
>  While the tri-state de-identification scheme does not dictate 
> specific IP Address replacement guiderails, I believe the "reasonable"
> tenant is the one to focus on here. For example, if IP Address is 
> replaced with Postal Code (5 digit, not 9 digit) then I believe most 
> record sets would continue to be deemed de-identified. But let's say 
> another team is looking only a hyper location of data subset and the 
> record set contains only the de-identified ID (separate key from other
> systems) and the lat/long for that ID. With only these data points, a 
> team can look at the frequency of events and geo-spacial clusters 
> overtime, but would not have the means to reverse identify the data 
> set as no side facts/data exist. It's this type of balance that is 
> difficult to prescriptively outline upfront and why standards focus on 
> principles and allow innovation to occur within those boundaries.

Dear Shane,

Before we go deeply into the details, I personally believe that the hashings both at the beginning and at the end of the de-identification process are much more important than any postal codes (even the four-digit two-character ones of my country of origin). What kind of hashes would be part of the proposal?

Moreover, I feel that the proposed scheme is lacking in any prescriptive power for the permitted uses. For the permitted uses I would feel much more comfortable with some guidance on both pseudonymisation and de-identification. The latter is easily achieved if we get to a consensus on de-identification in general.

Regards,

  Walter

Received on Saturday, 11 May 2013 21:02:44 UTC