W3C home > Mailing lists > Public > public-tracking@w3.org > October 2013

Re: Additional documentation for Issue-231

From: Nicholas Doty <npdoty@w3.org>
Date: Sat, 19 Oct 2013 18:37:17 -0700
Cc: "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-Id: <DA2D1040-F190-489B-B522-DF5B583E5151@w3.org>
To: Jack Hobaugh <jack@networkadvertising.org>
Hi Jack,

I've added this change proposal to the wiki of change proposals on issue-188 regarding de-identification:
http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Deidentification#Expert_review_or_safe_harbor

I suggest that we can close issue-231 as a duplicate of issue-188, which is currently tracking this topic.

Thanks,
Nick

On October 16, 2013, at 5:54 PM, Jack Hobaugh <jack@networkadvertising.org> wrote:

> Issue-231
> 
> 2.8 De-identified
> 
> Normative:
> 
> For the purpose of this specification, tracking data may be de-identified using one of two methods:
> 
> 1.     Expert Review 
> A qualified statistical or scientific expert concludes, through the use of accepted analytic techniques, that the risk the information could be used alone, or in combination with other reasonably available information, to identify a user is very small.
> 
> 
> 2.     Safe Harbor
> Removal of the following fields and/or data types from the tracking data:
> 
> 
> a)     Cleanse URLs to remove end user information such as names, IDs, or account specific information
> 
> b)     Any geographic information that represents a granularity less than zip code
> 
> c)     Date information specific to the end user (e.g. DOB, graduation, anniversary, etc.)  Transaction dates (purchases, registration, shipping, etc.) specific to the end user can be retained as long as timestamp information is removed or obfuscated
> 
> d)     User age – note that age group information (e.g. 30-40) can be kept so long as the ages are expanded to year of birth at a minimum.  Multiple year age bands are preferred.
> 
> e)     Direct contact elements such as telephone numbers, email addresses, social network usernames, or other public “handles” that uniquely identify a user on a given service
> 
> f)      Social security numbers or other government issued identifiers (e.g. driver’s license number, registration numbers, tax id numbers, license plate information)
> 
> g)     Account numbers, membership numbers, or other static identifiers that can be used to identify the user on another site or service or to a place of business or other organization
> 
> h)     Full IP addresses and/or remote hostnames – may be converted to representative geolocation (no more granular than zip code)
> 
> i)      Biometric information, including video or images of the end user, voice prints/audio recording information
> 
> In addition to the removal of the above information, the de-identifying entity must not have actual knowledge that the remaining information could be used alone or in combination with other reasonably available information to identify an individual who is subject of the information.
> 
> Further, the de-identifying entity must implement:
> 
> 1.     Technical safeguards that prohibit re-identification of de-identified data and/or merging of the original tracking data and de-identified data
> 
> 2.     Business processes that specifically prohibit re-identification of de-identified data and/or merging of the original tracking data and de-identified data
> 
> 3.     Business processes that prevent inadvertent release of either the original tracking data or de-identified data
> 
> 4.     Administrative controls that limit access to both the original tracking data and de-identified data
> 
> If third parties will have access to the de-identified data, the de-identifying entity must have contractual protections in place that require the third parties (and their agents or affiliates) to:
> 
> 1.     Appropriately protect the data
> 
> 2.     Not attempt to re-identify the data
> 
> 3.     Only use the data for purposes specified by first party
> 
> Regardless of the de-identification approach, unique keys can be used to correlate records within the de-identified dataset, provided the keys do not exist outside the de-identified dataset and/or have no meaning outside the de-identified dataset (i.e. no mapping table can exist that links the original identifiers to the keys in the de-identified dataset.)
> 
> A de-identified dataset becomes irrevocably de-identified if the algorithm information used to generate the unique identifiers (e.g. encryption key(s) or cryptographic hash “salts”) is destroyed after the data is de-identified.
> 
> Non-normative:
> 
> Request data sent from user agents can contain information that could potentially be used to identify end users.  Such data must be de-identified prior to being used for purposes not listed under permitted uses.  While data de-identification does not guarantee complete anonymity, it greatly reduces the risk that a given end user can be re-identified.
> 
> Regardless of the method used (Expert Review or Safe Harbor), the de-identifying entity should document the processes it uses for de-identification and any instances where it has implemented de-identification techniques.  The entity should regularly review the processes and implementation instances to make sure the appropriate methods are followed.
> 
> Both tracking data and de-identified data must be appropriately protected using industry best practices, including:
> 
> ·       Access by authorized personnel only
> 
> ·       Rule of Least Privilege
> 
> ·       Use of secure transfer/access protocols
> 
> ·       Secure destruction of data once it is no longer needed
> 
> The de-identification and cleansing of URL data is particularly important, since the variety and format of identifying information will vary.  Considerations for cleansing URL information:
> 
> ·       Truncation to URL domain only where possible
> 
> ·       Where path and querystring information must be retained, key-value information should be scrubbed for known (proprietary) data types as well as data that matches patterns for known PII formats (e.g. telephone numbers, email addresses, etc.)
> 
> Reasoning:
> 
> De-identification obligations under DNT closely follow those defined under the HIPAA Privacy Rule.  HIPAA de-identification practices (specified in the Privacy Rule) have been successfully used by companies to protect Personal Health Information in a variety of scenarios, including research, confidential sharing, and public disclosure.   
> 
> 
> -- 
> Jack L. Hobaugh Jr
> Network Advertising Initiative | Counsel & Senior Director of Technology 
> 1634 Eye St. NW, Suite 750 Washington, DC 20006
> P: 202-347-5341 | jack@networkadvertising.org



Received on Sunday, 20 October 2013 01:37:39 UTC

This archive was generated by hypermail 2.3.1 : Friday, 3 November 2017 21:45:19 UTC