Re: Proposed friendly amendments to industry draft

Hi Shane and Rob,

In Sunnyvale, I believe we agreed to use a different term for "yellow":
not de-identified, and not pseudonymous. Can't we just consider it an
open issue that we haven't selected that term yet, and be respectful of
this fact? I think the most polite approach is to just call it "yellow"
in all proposals until we separately resolve the open nomenclature
issue. But regardless, I don't think continuing to use one term or the
other ought to have any influence for the group on the fact that the
name for this state -- if and however the state ends up being defined --
is an open and unresolved issue.

Thanks,
Dan

On 07/10/2013 12:36 AM, Shane Wiley wrote:
>
> Rob,
>
>  
>
> We can look at a considerable amount of HIPPA text and other privacy
> process text that offering differing definitions of de-identified. 
> And while the A29WP has proposed a definition of pseudonymous – there
> are many that predate it (MSFT White Paper on this topic in 2008 I
> believe).  Yahoo! definition is more aligned with theirs and would
> fall in the “Red” data category in the current R-Y-G conceptual framework.
>
>  
>
> I again ask that we not try to select a single historical definition
> as being the definitive definition for all time – and instead focus on
> definitions that make sense for this standard and use them rigorously
> in that context.  If a previous definition works for us – great!  But
> let’s not be blindly bound to definitions that were developed for a
> different context and didn’t envision all that we are discussing in
> this working group.
>
>  
>
> - Shane
>
>  
>
> *From:*Rob van Eijk [mailto:rob@blaeu.com]
> *Sent:* Tuesday, July 09, 2013 8:21 PM
> *To:* Shane Wiley; David Singer
> *Cc:* public-tracking@w3.org WG
> *Subject:* RE: Proposed friendly amendments to industry draft
>
>  
>
>
> Shane,
>
> If we stick to common definitions, we are not confusing the rest of
> the world. The NAI and the FTC have defined de-identified very clear.
> Also WP29 has defined pseudonymous data to be data about a person.
>
> Your strategy looks similar, if not the same strategy as outlined in:
> https://github.com/lobbyplag/lobbyplag-data/raw/master/raw/lobby-documents/Yahoo%20on%20Pseudonymous%20Data.pdf
>
> If we want a strong DNT, one that is meaningful, we should make the
> standard in line with common definitions, instead of confusing the
> rest of the word.
>
> Rob
>
> Shane Wiley <wileys@yahoo-inc.com <mailto:wileys@yahoo-inc.com>> wrote:
>
> David,
>
> Small correction:  Green is the "final" state - not Red.
>
> In the industry proposal:  Red = raw, Yellow = de-identified but event linkable, Green = de-identified and un-linkable
>
> The term de-identified has been used for many different purposes hence the issue we're having with some people falling back on uses they may have seen in other contexts and therefore having concerns.  If we stick to our own definitions and how those are leveraged within this standard, I believe we'll have less issue here.
>
> - Shane
>
> -----Original Message-----
> From: David Singer [mailto:singer@apple.com] 
> Sent: Tuesday, July 09, 2013 10:49 AM
> To: Shane Wiley
> Cc: Rob van Eijk; public-tracking@w3.org <mailto:public-tracking@w3.org> WG
> Subject: Re: Proposed friendly amendments to industry draft
>
>
> On Jul 9, 2013, at 18:18 , Shane Wiley
> <wileys@yahoo-inc.com <mailto:wileys@yahoo-inc.com>> wrote:
> I disagree with this naming change as much of the data in the "red" zone may also be considered to be "pseudonymized".  What is critical to this conversation are definitions associated with the terms being used.
>
> If the definition of IDENTIFICATION is: an act of identifying : the state of being identified -OR- b : evidence of identity (Marrian-Websters), then deidentification would be the opposite of this.  Or plainly - removing "evidence of identity".  While there are many ways to remove evidence of identity, I'll continue to argue the removal of operational "linkability" from identifiers meets this definition as well (as the "evidence" of the actual user/device identity has been removed).
>
> Red State:  Data is fully identifiable (Limited Permitted Uses only - 
> retention rates should be s!
>  hort)
> Yellow State:  Data is de-identified 
> but linkable (Permitted Uses only - singular utility is analytics) 
> Green State:  Data is de-identified and de-linked (any use)
>
> When you further layer these concepts into the definition of TRACKING, basically the pairing of a unique ID with non-affiliated site URLs, you create the foundation for the presentation I distributed to the group 2 weeks ago.
>
> We're disagreeing on the term "de-identification" I believe more because some are still attached to the notion the de-identified data in of itself is outside the scope of DNT.  This is incorrect in the new construct and only the combination of de-identification with de-linking reaches the bar of moving outside the scope of DNT.
>
> I hope this is clearer.  For those that don't agree with this use of de-identification, could you please articulate what real-world use or loop hole you feel this creates?  If we've appropriately contained the collection and!
>   use of
> data in the standard, then I'm not seeing a way to game the system (which I believe you somehow see something here that I don't).
>
> Thank you,
> Shane
>
> I think that the point of my remark is that I am mostly concerned with data that is truly not associated with a person (their UA or device).  That's the only data that is out of scope in my mind.
>
> My perception is that the rest of the world uses "de-identified" to mean this.  Maybe I am wrong.
>
> I am fine with a best practices document saying that data that is NOT this strongly de-identifed should have its content reduced and its identifiability weakened as much as possible, which I think is your yellow state.
>
> What I don't want is is to have a requirement in the document that data be de-identified to be out of scope, when we re-define de-identified to be merely your yellow state.
>
> So, in summary:
>
> term A, your yellow:  data that has been minimized !
>  and
> pseudonymized so its harder to re-identify term B, your red: data that truly no longer can be connected to anyone or their UA or device
>
> The spec must require B for data to be out of scope.
>
> I think I would prefer A: pseudonymized, B: de-identified
>
> I think you have A: de-identified, B: de-linked
>
>
>
>
>
>
> From: Rob van Eijk [mailto:rob@blaeu.com]
> Sent: Tuesday, July 09, 2013 9:51 AM
> To: David Singer; public-tracking@w3.org <mailto:public-tracking@w3.org> WG
> Subject: Re: Proposed friendly amendments to industry draft
>
>
> David,
> I support the proposed change of wording.
>
> s/de-identified/pseudonymized/
> AND
> s/de-linked/de-identified/
>
> Rob
>
>
>
> David Singer <singer@apple.com <mailto:singer@apple.com>> wrote:
>
> On Jul 9, 2013, at 17:18 , Rob van Eijk
> <rob@blaeu.com <mailto:rob@blaeu.com>> wrote:
>
> I am considering to formally object to the term de-identified in the DAA proposal.
>
> The reasoning is that it has been used as synonym with 'the data it is not about a person anymore'. We need another word. 
>
> or we need to use de-identified in the way that it is commonly used?  do we need more than one term?
>
> If we do, I'd rather use a new term for data that is identifiable but that takes some work (or access to keys) to be so, such as pseudonymized.
>
> So, in the DAA text, I'd change:
>
> de-identifed (where it is defined) to pseudonymized de-linked (where 
> it is defined) to de-identified
>
> and leave the req!
> uirement
> that data must be de-identified (in the strong sense) to be out of scope.
>
> I am proposing to simply use the term linkable.
>
> Rob
>
>
> "Israel, Susan" <Susan_Israel@Comcast.com <mailto:Susan_Israel@Comcast.com>> wrote:
> his document and how they ma!
>  y be
> used elsewhere, it may help to introduce the definitions by saying, "For purposes of this specification, ...." 
>
> Substantive:  To clarify one of the differences between the de-identified and de-linked categories as I understand them, it may be helpful to add language that indicates that the de-identified category permits reliance on operational controls in addition to technical controls, which I believe is consistent with the ideas Thomas Schauf presented.  
>
> Thus, the definition would read, "Data is de-identified when a party
>
> 1. has taken reasonable steps to ensure th!
> at the
> data cannot be reasonably re-associated or connected to a specific user, computer, or device without the use of additional data that is subject to separate and distinct technical and organizational controls to ensure such non-attribution, or wh!
> en such
> attribution would require a disproportionate amount of time, expense and effort; ...." 
>
> I
> also support adding the audience measurement language that has been discussed and revised with  several participants and submitted by Esomar to the permitted uses section, 5.2. 
>
>
>
>
> Susan Israel
> Comcast Cable
> 215.286.3239
> 215.767.3926 mobile
> 917.934.1044 NY
> susan_israel@comcast.com <mailto:susan_israel@comcast.com>
>
> This message and any attachments to it may contain PRIVILEGED AND CONFIDENTIAL ATTORNEY-CLIENT INFORMATION AND/OR ATTORNEY WORK PRODUCT exclusively for intended recipients. Please DO NOT FORWARD OR DISTRIBUTE to anyone else. If you are not an intended recipient, please cont!
> act the
> sender to report the error and then delete all copies of this message from your system.
>
>
>
>
>
> David Singer
> Multimedia and Software Standards, Apple Inc.
>
>
> David Singer
> Multimedia and Software Standards, Apple Inc.
>

Received on Wednesday, 10 July 2013 08:12:53 UTC