Re: Proposal from Big Basin break out

Kevin,

given 3p mobile's explicit refusal to make any IPR commitments to this work, I'd like to request that you not engage on the technical work.  Peter Cranstone should be able to explain the details to you.

Thanks,

Thomas Roessler, W3C <tlr@w3.org> (@roessler)




On 2013-05-11, at 22:31 +0200, Kevin Kiley <kevin.kiley@3pmobile.com> wrote:

> I think Walter is right to raise the issue of 'granularity' for geo data replacing IP addresses
> in (supposedly) de-identified data and this needs more discussion. See comments (inline) below.
>  
> > On May 8, 2013, at 10:58 AM, Walter van Holst wrote:
> > 
> >> Dear Brad,
> >>
> >> If I understand the document correctly, IP-addresses are 'de-identified' based on geolocation.
> >> What would the lower floor of the granularity of such geolocation be?
> >> Regards,
> >> Walter
> > 
> > Brad Kulick ( Yahoo ) responded...
> > 
> >> Walter,
> >> We did not explicitly discuss this point. Nor was there consideration to be prescriptive in this area.
>  
> Yet, after the Big Basin breakout, Shane Wiley (Yahoo) did report back to the group at the Sunnyvale F2F
> that the 'level of granularity' HAD (apparently) been discussed.
>  
> Minutes from Sunnyvale F2F Day 3, following Big Basin breakout...
> http://www.w3.org/2013/05/08-dnt-minutes
>  
> Shane said ( from the microphone to the general assembly )...
>  
> [snip]
>  
> Shane Wiley (Yahoo): Next step - remove IP and replace with *BROAD* geo data .
>  
> [/snip]
>  
> So this goes back to Walter's original question.
>  
> What did Shane mean by *BROAD* geo data (only)?
>  
> Country codes only? Postal codes only?... NEVER any Latitude/Longitude?
>  
> Needs clarification, obviously.
>  
> >> Brad Kulick (Yahoo) also wrote...
> >> 
> >> The intention is that IP address is completely removed/replaced with geo data.
>  
> 'Completely removed' is good... still not sure about 'replacing' it with ANYTHING. See additional concerns below.
>  
> >> The granularity of the geo data would be determined with relation to the risk of re-identification that should be managed by the data controllers.
> >> Thanks,
> >> Brad ( Kulick ) ( Yahoo )
>  
> I believe the conversion of IP address(es) to 'geo data' of almost ANY granularity creates a significant 'risk of re-identification', or
> at least creates a direct violation of BOTH of the pending 'Deidentified Data' definitions in the current TCS.
>  
> From the latest (published) Working Draft of the 'Tracking Compliance and Scope' ( TCS ) deliverable...
>  
> Published April 30, 2013
> http://www.w3.org/2011/tracking-protection/
>  
> [snip]
>  
> 3.7 Deidentified Data
>  
> OPTION 1
> Data is deidentified when a party:
> (1) has taken measures to ensure with a reasonable level of justified confidence that the data cannot be used to infer information about,
> or otherwise be linked to, a particular consumer, computer, or other device;
> (2) does not to try to reidentify the data; and
> (3) contractually prohibits downstream recipients from trying to re-identify the data.
>  
> OPTION 2
> Data can be considered sufficiently deidentified to the extent that it has been deleted, modified, aggregated, anonymized or otherwise
> manipulated in order to achieve a reasonable level of justified confidence that the data cannot reasonably be used to infer information
> about, or otherwise be linked to, a particular user, user agent, or device.
>  
> Note(s):
>  
> The first option above is based on the definition of unlinkable data in the 2012 FTC privacy report;
> the second option was proposed by Daniel Kaufman. The group has a fundamental disagreement about whether internal
> access controls within an organization could be sufficient to de-identify data for the purposes of this standard.
>  
> Issue 188: Definition of unlinkable data
>  
> Issue 191: Non-normative Discussion of De-Identification
>  
> [/snip]
>  
> The latest 'proposal diagram' for de-identification posted by Brad Kulick ( Yahoo ) on May 8, 2013...
> http://lists.w3.org/Archives/Public/public-tracking/2013May/att-0045/Proposal_rev_2.pdf
>  
> [snip]
>  
> Paramount rules...
>  
> 1. Once a record is de-identified it can never be re-ID'd
> 2. You can never create a mapping between raw and de-identified records
>  
> Steps...
>  
> 1. Unique Ids
>     a. One-way secret Hash
> 2. IP Address
>     a. Replace w/geo data
> 3. URL cleanse
>     a. Filter user specific clues
> 4. Side facts
>     a. Remove elements that assist reverse ID
> 5. Unlink via 2nd application of one-way hash with salt/key #2, destroy salt/key #2 on some interval
>  
> Noteworthy: Accountability is required.
>  
> [/snip]
>  
> If step number 2a is allowed ( replace IP address with geo data rather than just REMOVE IP address ) then
> this (potentially) breaks 'Paramount rule 1' in the new Wiley/Kulick proposal, according to either one of
> the (current) optional definitions of 'de-identified data' currently codified in the TCS.
>  
> If the granularity of the geo data is not sufficiently restricted... then, at any time, the (supposedly)
> de-identified data can still (easily) be linked to 'a specific computer or device', depending on the
> realties of the underlying connection details.
>  
> If the accepted definition of de-identified data becomes OPTION 2... then it most certainly would
> ALSO violate the 'used to infer information' clause of that definition under ANY circumstances.
>  
> Yours;
> Kevin Kiley

Received on Saturday, 11 May 2013 20:42:40 UTC