RE: Proposed Text for Local Law and Public Purpose from Mike O'Neill on 2012-11-05 (public-tracking@w3.org from November 2012)

From: Mike O'Neill <michael.oneill@baycloud.com>
Date: Mon, 5 Nov 2012 10:25:49 -0000
To: <public-tracking@w3.org>
Message-ID: <093001cdbb3f$ed50ed50$c7f2c7f0$@baycloud.com>
I just found this ICO (UK DPA) draft code of practice that explains the
issues around data anonymisation pretty well.

http://www.ico.gov.uk/about_us/consultations/~/media/documents/library/Corpo
rate/Research_and_reports/anonymisation_cop_draft_consultation.ashx

-----Original Message-----
From: Walter van Holst [mailto:walter.van.holst@xs4all.nl] 
Sent: 26 October 2012 20:40
To: public-tracking@w3.org
Subject: Re: Proposed Text for Local Law and Public Purpose

On 10/26/12 9:15 PM, Roy T. Fielding wrote:

>> A cryptographic hash of the IP-address, UA string, the first 7 bytes 
>> of a 64 bit Unix timestamp salted with the date string would suffice 
>> to provide a pretty hard to link identifier that would meet the needs 
>> as you just described.
> 
> I seriously doubt that an identifier that changes at least every 4.27 
> minutes, and also at 00:00 UTC, would be useful to anyone. Moreover, 
> it doesn't take IP masking into account (grouping identifiers by 
> allocation block).

First of all, it was a suggestion. If it would take a few bits less than the
first 7 bytes of the timestamp to get to a meaningful timeframe in wich you
can in retrospect detect click-fraud, I would consider that a wholly
different debate than one in which it is stated you cannot use anonimisation
for this purpose.

My point was that in order to detect similar http requests in order to
discern patterns that are highly likely to be fraudulent, it is probably
equally important to be able to group similar http requests than to retain
ip adresses, cookies, referrer URLs, URIs etc.

> I know Walter wasn't here the last time around, so I'll say this
> again:  DNT will have no effect on data collection or retention for 
> the purpose of detecting or preventing malicious activity. Performing 
> that function in the real world requires both the collection of IP 
> addresses and the setting of various types of cookies, including 
> identifier cookies, though not necessarily retaining those cookies on 
> the server.  AFAICT, this is allowed by EU laws because they are 
> necessary to secure any online service from existing attacks.

You are right in that interpretation of EU rules. It should be added though
that any use beyond the purpose you mentioned is not necessarily lawful. And
it should also be added that if it is possible to achieve the purpose with
less data, you should use that avenue instead.

> There is no need to mention it in our specs, and no need for the specs 
> to include anything about local laws and public purpose. These are 
> simply not our concerns and we have wasted far too much of our time on 
> them already.

I would concur on the latter and tend to think the former is pretty close to
the truth. Local laws are indeed not our concern. Public purpose, I'm less
sure of. The core concern to me is whether self-regulatory requirements
should be included and so far my position would be that they should be
explicitly excluded, perhaps with a list of exceptions.

By now I am starting to get more interested in MRC's position in this, so I
hope Chris and Rigo can work a way out to involve them in this process.

Regards,

 Walter
Received on Monday, 5 November 2012 10:33:57 UTC