RE: Proposed friendly amendments to industry draft

Shane,

If we stick to common definitions, we are not confusing the rest of the world. The NAI and the FTC have defined de-identified very clear. Also WP29 has defined pseudonymous data to be data about a person. 

Your strategy looks similar, if not the same strategy as outlined in: https://github.com/lobbyplag/lobbyplag-data/raw/master/raw/lobby-documents/Yahoo%20on%20Pseudonymous%20Data.pdf

If we want a strong DNT, one that is meaningful, we should make the standard in line with common definitions, instead of confusing the rest of the word.

Rob


Shane Wiley <wileys@yahoo-inc.com> wrote:

>David,
>
>Small correction:  Green is the "final" state - not Red.
>
>In the industry proposal:  Red = raw, Yellow = de-identified but event
>linkable, Green = de-identified and un-linkable
>
>The term de-identified has been used for many different purposes hence
>the issue we're having with some people falling back on uses they may
>have seen in other contexts and therefore having concerns.  If we stick
>to our own definitions and how those are leveraged within this
>standard, I believe we'll have less issue here.
>
>- Shane
>
>-----Original Message-----
>From: David Singer [mailto:singer@apple.com] 
>Sent: Tuesday, July 09, 2013 10:49 AM
>To: Shane Wiley
>Cc: Rob van Eijk; public-tracking@w3.org WG
>Subject: Re: Proposed friendly amendments to industry draft
>
>
>On Jul 9, 2013, at 18:18 , Shane Wiley <wileys@yahoo-inc.com> wrote:
>
>> I disagree with this naming change as much of the data in the "red"
>zone may also be considered to be "pseudonymized".  What is critical to
>this conversation are definitions associated with the terms being used.
>>  
>> If the definition of IDENTIFICATION is: an act of identifying : the
>state of being identified -OR- b : evidence of identity
>(Marrian-Websters), then deidentification would be the opposite of
>this.  Or plainly - removing "evidence of identity".  While there are
>many ways to remove evidence of identity, I'll continue to argue the
>removal of operational "linkability" from identifiers meets this
>definition as well (as the "evidence" of the actual user/device
>identity has been removed).
>>  
>> Red State:  Data is fully identifiable (Limited Permitted Uses only -
>
>> retention rates should be short) Yellow State:  Data is de-identified
>
>> but linkable (Permitted Uses only - singular utility is analytics) 
>> Green State:  Data is de-identified and de-linked (any use)
>>  
>> When you further layer these concepts into the definition of
>TRACKING, basically the pairing of a unique ID with non-affiliated site
>URLs, you create the foundation for the presentation I distributed to
>the group 2 weeks ago.
>>  
>> We're disagreeing on the term "de-identification" I believe more
>because some are still attached to the notion the de-identified data in
>of itself is outside the scope of DNT.  This is incorrect in the new
>construct and only the combination of de-identification with de-linking
>reaches the bar of moving outside the scope of DNT.
>>  
>> I hope this is clearer.  For those that don't agree with this use of
>de-identification, could you please articulate what real-world use or
>loop hole you feel this creates?  If we've appropriately contained the
>collection and use of data in the standard, then I'm not seeing a way
>to game the system (which I believe you somehow see something here that
>I don't).
>>  
>> Thank you,
>> Shane
>
>I think that the point of my remark is that I am mostly concerned with
>data that is truly not associated with a person (their UA or device). 
>That's the only data that is out of scope in my mind.
>
>My perception is that the rest of the world uses "de-identified" to
>mean this.  Maybe I am wrong.
>
>I am fine with a best practices document saying that data that is NOT
>this strongly de-identifed should have its content reduced and its
>identifiability weakened as much as possible, which I think is your
>yellow state.
>
>What I don't want is is to have a requirement in the document that data
>be de-identified to be out of scope, when we re-define de-identified to
>be merely your yellow state.
>
>So, in summary:
>
>term A, your yellow:  data that has been minimized and pseudonymized so
>its harder to re-identify term B, your red: data that truly no longer
>can be connected to anyone or their UA or device
>
>The spec must require B for data to be out of scope.
>
>I think I would prefer A: pseudonymized, B: de-identified
>
>I think you have A: de-identified, B: de-linked
>
>
>
>
>
>>  
>>  
>> From: Rob van Eijk [mailto:rob@blaeu.com]
>> Sent: Tuesday, July 09, 2013 9:51 AM
>> To: David Singer; public-tracking@w3.org WG
>> Subject: Re: Proposed friendly amendments to industry draft
>>  
>> 
>> David,
>> I support the proposed change of wording.
>> 
>> s/de-identified/pseudonymized/
>> AND
>> s/de-linked/de-identified/
>> 
>> Rob
>> 
>> 
>> 
>> David Singer <singer@apple.com> wrote:
>> 
>> On Jul 9, 2013, at 17:18 , Rob van Eijk <rob@blaeu.com> wrote:
>> 
>> I am considering to formally object to the term de-identified in the
>DAA proposal.
>> 
>> The reasoning is that it has been used as synonym with 'the data it
>is not about a person anymore'. We need another word. 
>> 
>> or we need to use de-identified in the way that it is commonly used? 
>do we need more than one term?
>> 
>> If we do, I'd rather use a new term for data that is identifiable but
>that takes some work (or access to keys) to be so, such as
>pseudonymized.
>> 
>> So, in the DAA text, I'd change:
>> 
>> de-identifed (where it is defined) to pseudonymized de-linked (where 
>> it is defined) to de-identified
>> 
>> and leave the req!
>>  uirement
>> that data must be de-identified (in the strong sense) to be out of
>scope.
>> 
>> I am proposing to simply use the term linkable.
>> 
>> Rob
>> 
>> 
>> "Israel, Susan" <Susan_Israel@Comcast.com> wrote:
>> his document and how they may be used elsewhere, it may help to
>introduce the definitions by saying, "For purposes of this
>specification, ...." 
>> 
>> Substantive:  To clarify one of the differences between the
>de-identified and de-linked categories as I understand them, it may be
>helpful to add language that indicates that the de-identified category
>permits reliance on operational controls in addition to technical
>controls, which I believe is consistent with the ideas Thomas Schauf
>presented.  
>> 
>> Thus, the definition would read, "Data is de-identified when a party
>> 
>> 1. has taken reasonable steps to ensure th!
>>  at the
>> data cannot be reasonably re-associated or connected to a specific
>user, computer, or device without the use of additional data that is
>subject to separate and distinct technical and organizational controls
>to ensure such non-attribution, or wh!
>> en such
>> attribution would require a disproportionate amount of time, expense
>and effort; ...." 
>> 
>> 
>> I also support adding the audience measurement language that has been
>discussed and revised with  several participants and submitted by
>Esomar to the permitted uses section, 5.2. 
>> 
>> 
>> 
>> 
>> Susan Israel
>> Comcast Cable
>> 215.286.3239
>> 215.767.3926 mobile
>> 917.934.1044 NY
>> susan_israel@comcast.com
>> 
>> This message and any attachments to it may contain PRIVILEGED AND
>CONFIDENTIAL ATTORNEY-CLIENT INFORMATION AND/OR ATTORNEY WORK PRODUCT
>exclusively for intended recipients. Please DO NOT FORWARD OR
>DISTRIBUTE to anyone else. If you are not an intended recipient, please
>cont!
>>  act the
>> sender to report the error and then delete all copies of this message
>from your system.
>> 
>> 
>> 
>> 
>> 
>> David Singer
>> Multimedia and Software Standards, Apple Inc.
>> 
>
>David Singer
>Multimedia and Software Standards, Apple Inc.

Received on Tuesday, 9 July 2013 19:21:18 UTC