RE: ACTION-412, Naming R/Y/G

Hi Shane,


But presumably the reason someone sets DNT is because, in general, they do
not trust administrative controls & procedures. If they trust those of a
particular company, perhaps encouraged by a properly constructed
accountability model, they can give their consent. But if they do not they
should not have the decision made for them.






From: Shane Wiley 
Sent: 24 June 2013 04:58
To: Mike O'Neill; 'Peter Swire';
Subject: RE: ACTION-412, Naming R/Y/G




The goal here is to keep granular data but remove the linkage of the record
to a device/user in the real world (and remove the possibility of reverse
engineering the original production identifier).  If scrubbed appropriately,
these records should find the middle ground between supporting needed
reporting and NOT tracking a real person or device.  This of course requires
an accountability model (risk-based) to combine technical, operational, and
administrative controls to meet this goal.  Much like security
infrastructures and privacy policy promises, there is a degree of trust
imparted to the implementer to get it right.  But if they ever don't, then
they are appropriately held accountability for this failing.


- Shane


From: Mike O'Neill 
Sent: Saturday, June 22, 2013 9:27 AM
To: 'Peter Swire';
Subject: RE: ACTION-412, Naming R/Y/G




Converting  a bit pattern encoding a persistent identifier to another using
a one-way hash or any other one-to-one mapping just generates another
persistent identifier. The next time the person/device/browser visits the
domain the one-way function is applied again, and tracking continues
completely unabated.


Moreover if persistent identifiers, before or after applying a one-way hash,
are visible to passive examination of a data stream by a third-party
(perhaps by methods such as same-origin script access  or fibre-optic data
stream cloning),  the individual can be singled-out by that third-party. 


This is a "null" scrubbing method. 


If DNT is set  persistent identifiers should not be used unless for an
accepted permitted use, and then they should exist (duration limited) for
only as long as needed by the permitted use.


In my opinion, for a permitted use to be acceptable the duration of any
persistent identifiers should be justified and must be measured in no more
than hours. 






From: Peter Swire 
Sent: 22 June 2013 16:30
To: Group WG
Subject: ACTION-412, Naming R/Y/G


If the group decides to use a Red/Yellow/Green approach, one question has
been how to describe the three stages.  On the one hand, this may seem
trivial because the substance means more than the name.  On the other hand,
in my view, the names/descriptions are potentially important for two
reasons: (1) they provide intellectual clarity about whatgoes in each group;
and (2) they communicate the categories to a broader audience.


I was part of a briefing that Shane did on Friday on the phone to FTC
participants including Ed Felten and Paul Ohm.  The briefing was similar to
the approach Shane described at Sunnyvale.  In the move from red to yellow,
here were examples of what could be scrubbed:


1.  Unique IDs, to one-way secret hash.

2.  IP address, to geo data.

3.  URL cleanse, remove suspect query string elements.

4.  Side facts, remove link out data that could be used to reverse identify
the record.


Here are some ways that I've thought to describe what gets scrubbed, based
on this sort of list:


1.  Remove identifiers (name) and what have been called pseudo-identifiers
in the deID debates (phone, passwords, etc.).  But I don't think there is a
generally accepted way to decide what pseudo-identifiers would be removed.


2.  Earlier, I had suggested "direct" and "indirect" identifiers, but I
agree with Ed's objection that these definitions are vague.


3.  I am interested in the idea that going from red to yellow means removing
information that is "exogenous" to the system operated by the company.  That
is, for names/identifiers/data fields that are used outside of the company,
scrub those.  Going to green would remove information that is "endogenous"
to the system operated by the company, that is, even those within the
company, with access to the system, could no longer reverse engineer the


When I suggested those terms on the call, someone basically said the terms
were academic gobbledygook.  The terms are defined here:  I acknowledge the gobbledygood
point, and the word "exogenous" is probably one only an economist could
love.  But I welcome comments on whether the idea is correct - data fields
that are generated or observable outside of the company are different from
those generated within the company's system.


4.  If exogenous/endogenous are correct in theory, but gobbledygook in
practice, then I wonder if there are plain language words that get at the
same idea.  My best current attempt is that red to yellow means scrubbing
fields that are "observable from outside of the company" or "outwardly


So, my suggestion is that red to yellow means scrubbing fields that are
"observable from outside of the company" or "outwardly observable."


If this is correct, then the concept of k-anonymity likely remains relevant.
Keeping broad demographic information such as male/female or age group can
be in the yellow zone.  However, a left-handed person under five feet with
red hair would in most settings be a bucket too small.


Clearly, the group has a variety of issues to address if we decide to go
with a three-part R/Y/G approach to de-identification.  The limited goal of
this post is to try to help with terminology.  Is it useful to say that the
yellow zone means scrubbing data that is "observable from outside of the
company", except for broad demographic data?




P.S.  After I wrote the above, I realized that "observable from outside of
the company" is similar in meaning to what can be "tracked" by those outside
of the company.  So scrubbing those items plausibly reduces tracking, at
least by the other companies.



Received on Monday, 24 June 2013 07:10:07 UTC