Re: June Change Proposal: text on de-identification

On Jun 25, 2013, at 16:04 , Shane Wiley <wileys@yahoo-inc.com> wrote:

> John,
> 
> We all agree with that sentiment.  

yes!

> Where there is some disagreement is what is considered "bad" versus what is "okay".  

I think we probably should appeal to a standard, rather than identify and bless any method.  That method might not fit with some people's needs or practices, might de-identify less than they need, and might (despite our best efforts) later be found to have a flaw.

> We also agree on "perfect" on one extreme of the spectrum as well - full deletion or aggressive aggregation along the lines of k-anonymity.

Also true.  Can we hold people to a "generally accepted quality of de-identification"?  (at the time the de-id occurred).  what's the best way to phrase that we want a quality result?

> 
> - Shane
> 
> -----Original Message-----
> From: John Simpson [mailto:john@consumerwatchdog.org] 
> Sent: Tuesday, June 25, 2013 3:44 PM
> To: David Singer
> Cc: Nicholas Doty; public-tracking@w3.org Mailing List
> Subject: Re: June Change Proposal: text on de-identification
> 
> I agree we don't want to allow let badly de-identified data to be kept under DNT: 1
> 
> On Jun 25, 2013, at 11:06 AM, David Singer <singer@apple.com> wrote:
> 
>> 
>> On Jun 25, 2013, at 10:08 , John Simpson <john@consumerwatchdog.org> wrote:
>> 
>>> I'm not sure what the out-of-scope statement is meant to signify...  Seems to me the key is whether the data is de-identified or not..
>> 
>> Right.
>> 
>> If data is truly not linked to anyone, their device etc., then we no longer care about it: it's out of our scope.  It's not that we're "allowing" you to remember you sold a copy of '1984' at 5:53pm on Thursday, it's that we have nothing to say about data that is not 'tracking'.  There is a lot of data around, only some of which is 'tracking'.  We only care about 'tracking'.
>> 
>> The first sentence could be seen as implying that even if your de-identification isn't so hot, and the data still is 'tracking' (it can be linked to someone, their device or user-agent), the spec. gives you permission to keep it.  I don't think we want or have agreed to this.
>> 
>> I think we mean "get it clean, make sure it's not tracking data any more, and you are off our patch".  That's the second sentence.
>> 
>>> 
>>> 
>>> On Jun 25, 2013, at 9:58 AM, David Singer <singer@apple.com> wrote:
>>> 
>>>> By the way, the draft contains two different statements about de-identified data in section 5:
>>>> 
>>>> "When a third party receives a DNT:1 signal, that third party may nevertheless collect, retain, share or use data related to that network interaction if the data is de-identified as defined in this specification."
>>>> 
>>>> and
>>>> 
>>>> "It is outside the scope of this specification to control the collection and use of de-identified data."
>>>> 
>>>> I think the second is the correct statement, isn't it?  (If it truly isn't 'tracking' data, it's not in scope).  Is this an editorial oversight, or should I post a change proposal?
>>>> 
>>>> 
>>>> On Jun 25, 2013, at 0:28 , Nicholas Doty <npdoty@w3.org> wrote:
>>>> 
>>>>> Hi David,
>>>>> 
>>>>> I've updated the de-identification page to include your proposal next to the other and the editors' draft text: http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Deidentification
>>>>> 
>>>>> I'm not sure what the intended exact language is for the level of confidence. Last year's Working Draft included an option that used "high probability that it contains only information that could not be linked ... by a skilled analyst", but I'm not sure that's from Ed in particular or what you had in mind.
>>>>> 
>>>>> Thanks,
>>>>> Nick
>>>>> 
>>>>> On Jun 20, 2013, at 3:06 PM, David Singer <singer@apple.com> wrote:
>>>>> 
>>>>>> Problem
>>>>>> 
>>>>>> "Data is deidentified when a party:
>>>>>> 
>>>>>> 	* has achieved a reasonable level of justified confidence that the data cannot be used to infer information about, or otherwise be linked to, a particular consumer, computer, or other device;
>>>>>> 	* commits to try not to reidentify the data; and
>>>>>> 	* contractually prohibits downstream recipients from trying to re-identify the data."
>>>>>> 
>>>>>> 1) We have had (from Ed?) text that suggests better wording than "reasonable level of justified confidence" .
>>>>>> 
>>>>>> 2) If we have a definition of 'tracking' data, we should use it.  
>>>>>> 
>>>>>> 3) "downstream" is undefined, and actually we don't care where in a hypothetical stream you are, we want the data not to identify.  
>>>>>> 
>>>>>> Proposal:
>>>>>> 
>>>>>> 1)  I think it was something like "to a generally accepted high level of confidence".  I suggest we find text that says that basically you're doing as well as the normal state of the art.
>>>>>> 
>>>>>> 2) Suggest "the data is not, and cannot be made into, tracking data" instead of "cannot be used to infer information about, or otherwise be linked to, a particular consumer, computer, or other device"
>>>>>> 
>>>>>> 3) Delete "downstream" or replace it with "any".
>>>>>> 
>>>>>> 
>>>>>> David Singer
>>>>>> Multimedia and Software Standards, Apple Inc.
>>>> 
>>>> David Singer
>>>> Multimedia and Software Standards, Apple Inc.
>>>> 
>>>> 
>>> 
>> 
>> David Singer
>> Multimedia and Software Standards, Apple Inc.
>> 
> 
> 

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Tuesday, 25 June 2013 23:11:28 UTC