Re: de-identification text for Wednesday's call

Shane,

Labeling my view as "totalitarian" or "absolutist" is inaccurate and not
appreciated. My approach allows lots of leeway as anonymization
technology improves, and takes into account that perfect anonymization
is impossible. It seems to me that you plan to label anything
"totalitarian" that suggests that keeping raw logs is not a viable
approach in general. To be clear, I don't even want to suggest that
keeping raw logs is *always* prohibited, just that it should be in the
normal case where records have relatively high entropy. I'm also happy
to discuss examples in great detail, but in my experience you have been
unwilling to engage in discussing the nitty gritty details of what
anonymization should entail.

On 04/02/2013 01:09 PM, Shane Wiley wrote:
>
> Dan,
>
>
> As with HIPPA, I believe differentiated treatment of internal and
> external datasets is appropriate as this changes the risk profile of
> re-identification -- again, the root of the conversation being a
> "risk-based" approach versus a totalitarian approach as you suggest. 
> My solution meets the perceived consumer harm in this case
>
Do you have evidence for this claim?


> -- yours of course does as well but goes far too far over the top of
> what is actually needed.  If your goal is to create a compromise
> end-point that will likely be implemented by industry then my
> recommended approach gets us there.
>
Any compromise must provide meaningful protection to users. Industry
implementation of a standard that is too weak will be worse for users
than having no standard at all, since it will provide only the guise of
protection.

>   If you'd like to instead stand by absolutist approaches, that is of
> course your prerogative and we'll have those removed through the
> standard W3C process.  I'm simply trying to save everyone some time
> and get to a meaningful outcome quickly.
>
Glibly dismissing my concerns is not a way to gain allies, or move the
W3C process forward. I don't think your view is as universally shared as
you seem to think that it is.


>  
>
> - Shane
>
>  
>
> *From:*Dan Auerbach [mailto:dan@eff.org]
> *Sent:* Tuesday, April 02, 2013 12:21 PM
> *To:* public-tracking@w3.org
> *Subject:* Re: de-identification text for Wednesday's call
>
>  
>
> Shane,
>
> Why hash at all in this case? If you are relying on operational and
> administrative controls, you might as well just pledge not to look up
> the cookie when you receive it. If you are rotating (and discarding)
> salts frequently, then it will have a positive effect, but otherwise I
> don't think hashing provides any benefit here.
>
> But this is an aside to our main disagreement about the larger issue
> about the role that operational and administrative controls should
> play. I agree that they should play a role, but only after
> de-identification of data has been achieved. If the result of a DNT:1
> request is business as usual, with minor scrubbing and the caveat that
> only 4000 engineers at a large corporation get default access to a
> specially marked database instead of 10000, then that will not be a
> successful standard. (Of course I welcome more detailed information
> about operational and administrative controls.)
>
> One last point I wanted to make is that of course the data sets I
> mentioned refer to public data. We don't have access to internal
> corporate data sets. There are laws in place to protect the pilfering
> of that data, so of course no-one is going to steal data then publish
> an academic paper about it, effectively painting a big target on
> themselves for federal prosecutors and corporate legal teams. In light
> of this, the right empirical question to ask is: of large publicly
> available data sets that contain user data and are somewhat akin to
> log data, how often are there successful re-identification or
> attribute disclosure attacks? Can you point to any public data sets
> where such an attack has not been found?
>
> If your argument is instead that public data should be treated
> differently from non-public data, then I'd suggest that this is out of
> scope for the DNT conversation. DNT is about giving users the choice
> to opt out of tracking by companies, which must entail meaningfully
> curbing data collection and retention by that company, not merely a
> request that a company not make public its collected data. (Indeed, in
> addition to being an excessively weak demand by the user, this would
> in some cases be a vacuous request, since making that information
> public is already prohibited by law.) The de-identification question
> exists within the scope of what the companies themselves can do with
> the data -- is the data de-identified with respect to the entity that
> collected the data?
>
> Best,
> Dan
>
> On 04/02/2013 11:03 AM, Shane Wiley wrote:
>
>     Dan,
>
>      
>
>     Once the one-way hash is applied (and other elements of record
>     appropriately cleansed) the data is moved to a system that is not
>     allowed to be accessed externally.  Its these operational and
>     administrative controls that are essential to ensure de-identified
>     data is not re-identified at some later time.  I believe you're
>     looking only at the technical merits which is only seeing a small
>     portion of the overall solution.
>
>      
>
>     - Shane
>
>      
>
>     *From:*Dan Auerbach [mailto:dan@eff.org]
>     *Sent:* Tuesday, April 02, 2013 10:59 AM
>     *To:* public-tracking@w3.org <mailto:public-tracking@w3.org>
>     *Subject:* Re: de-identification text for Wednesday's call
>
>      
>
>     On 04/02/2013 08:50 AM, Shane Wiley wrote:
>
>         once the one-way hash function has been applied the data is
>         never again able to be accessed in real-time to modify the
>         user's experience.
>
>     I think I'm confused, can you explain this more? How is this
>     possible? If you are just hashing a cookie string, your web server
>     receives a request that includes a cookie string, you hash that
>     cookie string (which is in incredibly fast operation), match the
>     hashed cookie against the stored data, and return personalized
>     results.
>
>     Or are you salting the hash differently for every request, or
>     combining the cookie with an ephemeral piece of data (the
>     timestamp) before hashing and then throwing away the timestamp?
>
>     Thanks for clarifying, apologies if I'm just being dense.
>
>     Dan
>
>
>
>     -- 
>
>     Dan Auerbach
>
>     Staff Technologist
>
>     Electronic Frontier Foundation
>
>     dan@eff.org <mailto:dan@eff.org>
>
>     415 436 9333 x134
>
>
>
>
> -- 
> Dan Auerbach
> Staff Technologist
> Electronic Frontier Foundation
> dan@eff.org <mailto:dan@eff.org>
> 415 436 9333 x134


-- 
Dan Auerbach
Staff Technologist
Electronic Frontier Foundation
dan@eff.org
415 436 9333 x134

Received on Tuesday, 2 April 2013 20:39:21 UTC