Re: ISSUE-24 (fraud detection and defense) from Nicholas Doty on 2014-10-15 (public-tracking@w3.org from October 2014)

From: Nicholas Doty <npdoty@w3.org>
Date: Tue, 14 Oct 2014 21:34:07 -0700
To: Shane M Wiley <wileys@yahoo-inc.com>, "David (Standards) Singer" <singer@apple.com>
Cc: "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-Id: <9E512E87-50C0-460E-A277-8D95C9B9F59E@w3.org>
If the concern was just that the graduated reponse text should be non-normative, then perhaps we didn't have a disagreement at all! The current text doesn't have any normative requirements for graduated response, it just has a definition (which I believe is present in David's rewording below) and a note that it was preferred.

As I understood it at the time, we had agreement, from diverse participants in this Working Group who rarely agreed with one another, on the graduated response definition (which I believe came from Ian) and referring to it without adding an additional normative requirement. I believe the intent was to elaborate and give an example of how data minimization might apply for this permitted use, without proscribing any particular use case.

If the only remaining task is to emphasize the non-normativity, I'm happy to take an editorial action to word it as such.

Thanks,
Nick

On October 9, 2014, at 4:23 PM, Shane M Wiley <wileys@yahoo-inc.com> wrote:

> Thank you David - that works for me.
> 
> - Shane
> 
> -----Original Message-----
> From: David (Standards) Singer [mailto:singer@apple.com]
> Sent: Thursday, October 09, 2014 4:21 PM
> To: Shane M Wiley
> Cc: Justin Brookman; public-tracking@w3.org (public-tracking@w3.org)
> Subject: Re: ISSUE-24 (fraud detection and defense)
> 
> 
> On Oct 9, 2014, at 16:15 , Shane M Wiley <wileys@yahoo-inc.com> wrote:
> 
>> David,
>> 
>> I agree with the fix to the first paragraph.  I'm still of the opinion the 2nd paragraph is unneeded due to the data minimization principle already called out for all permitted uses.
>> 
>> Would you be comfortable switching this (2nd paragraph :-) ) to a non-normative element highlighting how data minimization may work for this permitted use?
> 
> Sure, we could try that.  Something like this?  (I guess it could be in a Note block, whereupon the disclaimer about being non-normative would be redundant).
> 
> For this permitted use, the general requirement for data minimization should be understood to mean that only relevant data is collected, and that not all data is collected all the time. Examples include (this list is neither exhaustive nor normative):
> * (focused collection) recording all use from a given IP address range, regardless of DNT signal, if the party believes it is seeing a coordinated attack on its service (such as click fraud) from that IP address range; similarly, if an attack shared some other identifiable fingerprint, such as a combination of User Agent and other protocol information, the party could retain logs on all interactions matching that fingerprint until it can be determined that they are not associated with such an attack or such retention is no longer necessary to support prosecution.
> * (sampling) collecting in-depth data from a random sample of visits, for statistical checking.
> * (watchfulness) start with a larger net and then minimize from there as you're able to determine traffic is not suspicious.
> 
>> 
>> - Shane
>> 
>> -----Original Message-----
>> From: David (Standards) Singer [mailto:singer@apple.com]
>> Sent: Thursday, October 09, 2014 4:11 PM
>> To: Shane M Wiley
>> Cc: Justin Brookman; public-tracking@w3.org (public-tracking@w3.org)
>> Subject: Re: ISSUE-24 (fraud detection and defense)
>> 
>> Hi Shane
>> 
>> I am not sure what your 'this' refers to. Let me try to be clearer...
>> 
>> 
>> 
>> My point is that we have the general requirements for permitted uses, and we shouldn't repeat them or (appear to) differ from them.  They already require:
>> 		* 3.3.1.1 No Secondary Uses
>> 		* 3.3.1.2 Data Minimization, Retention and Transparency
>> 		* 3.3.1.3 No Personalization
>> 		* 3.3.1.4 Reasonable Security
>> 
>> So my suggestion is that we shorten the first paragraph from:
>> 
>> Regardless of the tracking preference expressed, data may be collected and used to the extent reasonably necessary to detect security incidents, protect the service against malicious, deceptive, fraudulent, or illegal activity, and prosecute those responsible for such activity, provided that such data is not used for operational behavior (profiling or personalization) beyond what is reasonably necessary to protect the service or institute a graduated response.
>> 
>> to:
>> 
>> Regardless of the tracking preference expressed, data may be collected and used to the extent reasonably necessary to detect security incidents, protect the service against malicious, deceptive, fraudulent, or illegal activity, and prosecute those responsible for such activity, within the general requirements for a permitted use (above).
>> 
>> Then the second paragraph currently says:
>> 
>> When feasible, a graduated response to a detected security incident is preferred over widespread data collection. An example would be recording all use from a given IP address range, regardless of DNT signal, if the party believes it is seeing a coordinated attack on its service (such as click fraud) from that IP address range. Similarly, if an attack shared some other identifiable fingerprint, such as a combination of User Agent and other protocol information, the party could retain logs on all interactions matching that fingerprint until it can be determined that they are not associated with such an attack or such retention is no longer necessary to support prosecution.
>> 
>> and we change it (this is much rougher, and I use the word 'should'):
>> 
>> An approach that does not involve collecting either all data, or all the time, should be used. Examples include (this list is neither exhaustive nor normative):
>> * (focused collection) recording all use from a given IP address range, regardless of DNT signal, if the party believes it is seeing a coordinated attack on its service (such as click fraud) from that IP address range; similarly, if an attack shared some other identifiable fingerprint, such as a combination of User Agent and other protocol information, the party could retain logs on all interactions matching that fingerprint until it can be determined that they are not associated with such an attack or such retention is no longer necessary to support prosecution.
>> * (sampling) collecting in-depth data from a random sample of visits, for statistical checking.
>> * (watchfulness) start with a larger net and then minimize from there as you're able to determine traffic is not suspicious.
>> 
>> 
>> 
>> Feel free to add examples if you think we can clarify thereby...
>> 
>> 
>> 
>> On Oct 9, 2014, at 15:43 , Shane M Wiley <wileys@yahoo-inc.com> wrote:
>> 
>>> David,
>>> 
>>> I believe this is too prescriptive for a Policy Standard and should be based on principles rather than specifics.
>>> 
>>> We used to have language that applied several principles to all permitted uses:
>>> 
>>> - Data Minimization
>>> - Proportionality
>>> - Individual Use (Segmentation)
>>> 
>>> Specifically to the last one we should remove the "provided that such data is not used for operational behavior (profiling or personalization)" as this is covered by Individual Use (only used for that specific permitted use).
>>> 
>>> - Shane
>>> 
>>> -----Original Message-----
>>> From: David (Standards) Singer [mailto:singer@apple.com]
>>> Sent: Thursday, October 09, 2014 3:34 PM
>>> To: Shane M Wiley
>>> Cc: Justin Brookman; public-tracking@w3.org (public-tracking@w3.org)
>>> Subject: Re: ISSUE-24 (fraud detection and defense)
>>> 
>>> 
>>> On Oct 9, 2014, at 13:41 , Shane M Wiley <wileys@yahoo-inc.com> wrote:
>>> 
>>>> Justin,
>>>> 
>>>> I'd rather we drop the 2nd paragraph.  I've spoken to many security experts - including the security expert "live" at the October Sunnyvale F2F - and everyone agrees that the most effective security approach is to start with a larger net and then minimize from there as you're able to determine traffic is not suspicious.  Attempting to go the reverse direction is not effective and would allow many "bad guys" through.  As companies are all alone in this fight, we need every tool in our arsenal for this very specific permitted use.  Companies instead should look to data minimization and data segregation principles to ensure the proportional use of this information.
>>> 
>>> I think we can achieve that if we change the word/concept "graduated", which I understand you take to mean that you can only ramp up.  Instead, if we explain that you can't use this as a reason to collect all the data all the time, but instead, for example (this is not spec. text):
>>> 
>>> a) start with careful monitoring, and reduce from there as confidence grows and you learn what monitoring is effective to detect problems
>>> b) use light monitoring until a problem is detected, and then use focused heavier monitoring to determine its nature, details, origins etc.
>>> c) use statistical techniques or other quality control techniques for checking (e.g. random sampling)
>>> 
>>> We somehow have to leave it possible to do reasonable fraud detection and analysis while not leaving the barn door open to indifferent permanent collection. The first paragraph alone doesn't do this, unfortunately.
>>> 
>>> In addition, now we're back critiquing, the first paragraph implies that the only 'other use' restriction on this data is "not used for operational behavior (profiling or personalization) beyond what is reasonably necessary", but this is wrong.  Data collected for a permitted must not be used for ANY other purpose; this is a general statement about all permitted uses.
>>> 
>>> Finally, I think that we may need to re-state the other general requirement on data collected for a permitted use: it has to be applicable to that use. Certain types of data are not going to help you either detect or analyze fraud, so don't collect it.  If you collect it, you should imagine that at some time you'll have to justify why it could be useful (or ideally was, in fact, used).
>>> 
>>> So, three problems:
>>> 
>>> * graduated response is only one way of achieving 'not everything all the time', we need to be more general here to allow security people to operate reasonably
>>> * this permitted use requires, like all permitted uses:
>>> * that data collected for security and fraud detection and analysis needs to be usable for that purpose
>>> * that it not be used for any other purpose
>>> 
>>>> 
>>>> - Shane
>>>> 
>>>> From: Justin Brookman [mailto:jbrookman@cdt.org]
>>>> Sent: Thursday, October 09, 2014 1:19 PM
>>>> To: public-tracking@w3.org (public-tracking@w3.org)
>>>> Subject: ISSUE-24 (fraud detection and defense)
>>>> 
>>>> Hello all, last October we came very close to final agreement on language on the fraud/security permitted use:
>>>> 
>>>> Regardless of the tracking preference expressed, data MAY be collected, retained, and used to the extent reasonably necessary to detect security incidents, protect the service against malicious, deceptive, fraudulent, or illegal activity, and prosecute those responsible for such activity, provided that such data is not used for operational behavior (profiling or personalization) beyond what is reasonably necessary to protect the service or institute a graduated response.
>>>> When feasible, a graduated response to a detected security incident is preferred over widespread data collection. An example would be recording all use from a given IP address range, regardless of DNT signal, if the party believes it is seeing a coordinated attack on its service (such as click fraud) from that IP address range. Similarly, if an attack shared some other identifiable fingerprint, such as a combination of User Agent and other protocol information, the party could retain logs on all transactions matching that fingerprint until it can be determined that they are not associated with such an attack or such retention is no longer necessary to support prosecution.
>>>> 
>>>> However, Shane strongly objected to the language and the issue has remain unresolved.  So I am inclined to go for a Call for Objections on the issue.  Shane, would your proposal just end in the first paragraph after "to protect the service"?  Or do you wish to propose something different?
>>>> 
>>>> Justin Brookman
>>>> Director, Consumer Privacy
>>>> Center for Democracy & Technology
>>>> 202.407.8812
>>>> @JustinBrookman
>>> 
>>> David Singer
>>> Manager, Software Standards, Apple Inc.
>>> 
>> 
>> David Singer
>> Manager, Software Standards, Apple Inc.
>> 
> 
> David Singer
> Manager, Software Standards, Apple Inc.
> 
>
Received on Wednesday, 15 October 2014 04:34:25 UTC