Re: action-190 from Aleecia M. McDonald on 2012-05-30 (public-tracking@w3.org from May 2012)

From: Aleecia M. McDonald <aleecia@aleecia.com>
Date: Tue, 29 May 2012 18:51:46 -0700
To: "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-Id: <468975EA-57F3-4452-B36A-E39B8FF82790@aleecia.com>
Ah! Ok. Thanks, Ian, this helps, plus I see a place I am still not understanding. 

I think the benefit is "you may hold data that later on you realize does not comply with DNT for you, and you do not need to figure it out one way or another for six weeks." I think avoiding real time processing of log files is the advantage.  

Anything that boils down to "you may do these special things in the first six weeks that you cannot do after week six" starts to look like a loophole. I believe that is not what you are trying to do. Vincent's text neatly avoids the problem of a new list of permitted uses for log files or some other howling madness. 

The use case I believe we are trying to solve for, and what came up after we had consensus in DC (that is, why we are not just writing down what we already agreed to) is this: a company wants to run a report every day that gives a monthly total (of: page views, number of unique users, whatever.) For the first six weeks, we say it is ok for the company to query directly against the raw log file in order to pull out aggregate data. The resulting daily report contains non-identifiable aggregate data, which would be in full compliance with DNT. However, the raw data itself is very much identifiable in the log file, and holding that raw data might otherwise not comply with DNT (for third parties not covered by a business use or user consent, for example.) For six weeks, we say that's ok. After six weeks, companies may still do daily reports with non-identifiable data as before, but must also get their house in order and only hold on to that data which they are allowed to retain under DNT for their situation (e.g. based on their party type, any applicable business uses, user consent.) So, the part that is the same in all cases is that the final use is something that will be permitted after the six weeks is up. The part that changes is the form of the data: raw or processed, and raw may otherwise not be DNT compliant data. Counterwise, a third party company could not, for example, sell a list of websites that user Jane Smith has visited, since that is not going to be DNT-compliant after six weeks, and the question here is no longer about if the raw data will be compliant but rather the data use. 

Does this make sense? 

I *think* what I am suggesting matches what I have heard from the group. But Ian, I am sure I am not fully understanding you. From below, you are looking at both time to get data into a form that meets DNT yet thinking the time to do so is not a benefit? Or perhaps you mean something very specific by "assertions" and that is tied to the discussion on auditing? I am missing something. I do not understand what trouble you are running into. 

 There are two obvious other alternatives that do not involve real time processing. (1) No logging for DNT:1 users unless you are always a first party, or (2) Bifurcating the log files to have one log for DNT:1 and one for not DNT:1. These options might be useful to add as non-normative ideas for any company that finds it easier to handle their log files that way, at their convenience. But this is a suggested enhancement; it does not solve our issue.

As for auditing -- perhaps someone can help here or on the call tomorrow. I agree that thinking about how things work in practice is a very good and useful thing. Whatever choices we make, let us try to make them informed.

The last call left me frustrated enough (I doubt I am alone...) that I am trying to add a bit more structure and nudge things a little more. It seems as if this issue is one where all parties are basically trying to do the same thing yet we are still struggling as a group to find an acceptable end point, which usually means a bit of time and patience will let us work things through. Unless I am still missing something major, we ought to be able to get this done tomorrow.

 Aleecia

On May 29, 2012, at 4:31 PM, Ian Fette (イアンフェッティ) wrote:

> Thanks for the summary Aleecia. What you describe as "C" around allowed uses is the thing that I'm having trouble figuring out how we pull off. Presumably, once the logs are processed and in their final resting place, you have to be able to stand up to an audit / inquiry of some sort. "prove to me you're honoring whatever commitments you made w.r.t. DNT". If we have strict requirements from time zero " a data collector MUST NOT use the data for purposes other than those allowed outside of the six week period." then what have we gained? If I have to be able to make the same assertions from time zero that I would have to make at time t+6wk, then it seems like there is no benefit to the six week period at all, it is fundamentally no different from the period after six weeks as best as I can tell.
> 
> What I was trying to achieve was to say "Look, during this six week period, you have some time to get the data into a form or a system that meets the requirements of DNT."
> 
> I think it's instructive for people to think about how an audit might work / what it might imply. I don't mean to suggest that this group write audit requirements or guidelines, but merely think whether what we're proposing is actually implementable. 
> 
> -Ian
> 
> On Tue, May 29, 2012 at 2:46 PM, Aleecia M. McDonald <aleecia@aleecia.com> wrote:
> In the midst of writing the agenda for tomorrow I realized I was spending too much space on log files and should pull this out into a different message.
> 
> To go back to the point of this issue, we are tying to find a way to give companies flexibility when they do not yet know what data they hold in a log file. We are trying to find a path such that they do not have to operate in real-time, with all of the engineering challenges entailed.  
> 
> We have proposed text from Ian, which we discussed on the 9 May conference call. We ran into a few issues on the call:
> 
>  A. People not supporting Ian's text simply because they had not reviewed it. At this point there has been AMPLE time for review. We shall not have that issue again tomorrow.
> 
>  B. Confusion that Ian's proposal applies to first parties. 
>   - My read is that some of this confusion stems from the mistaken notion that data after six weeks must be discarded, as opposed to processed. We may need to clarify the text to make that clear if that confusion is wide-spread. We can talk about this on the call if needed.
>   - As Roy points out, at the moment log files are written, it may not be clear if data are first- or third-party unless we want to insist on real-time processing, which is part of what we're trying to avoid in the first place. As such, any party that _could_ be collecting log file data as a third-party will run into wanting time to process their logs.
> 
>  SUGGESTION: we add additional text to point out that for those who know they are always only first parties, they can do as they like with log file data so long as they are in compliance with other first party data practices. That will be the end result either way, but we can make this clearer I think.
> 
>  C. Confusion around the notion of processing a log file as a one-time or multi-time event. The consensus we had in DC assumed processing as a one-time event: we were working on something like "you may hold log file data for a short time until you process it, at which time the data must then comply with DNT rules for you." What we have since heard from Ian is that log processing is something that happens on a rolling basis. We then started down a path of complexity of what would, or would not be, permitted uses for log file data, and that created a new wave of confusion and frustration. This led to a counter-proposal from Vincent (http://lists.w3.org/Archives/Public/public-tracking/2012May/0171.html) of: 
>   Similarly, a data collector MUST NOT use the data for purposes other than those allowed outside of the six week period.
> 
>  SUGGESTION: we adopt Vincent's change, which simplifies much. 
> 
> We might also refer to the rest of the text for details on the fraud use rather than attempt to characterize it here, and illustrate more clearly that this is not a block on first parties. Specifically, we might tighten the original text of:
>   As examples, a data collector MAY use the raw data within a six week period to debug their system, a data collector MAY use the raw data within the six week period to build a profile of a user fraudulently or maliciously accessing the system for purposes such as blocking access to the system by that user, but the data collector MUST NOT build a profile to serve targeted advertisements based on the user's past six weeks of browsing activity.
> 
>  to: 
>   As examples, a data collector MAY use the raw data within a six week period for a permitted use like <link>fraud prevention</link> or to create reports with <link>unidentifiable data</link>, but a third party data collector MUST NOT build a profile to serve targeted advertisements based on the user's past six weeks of browsing activity. 
> 
> Here's how that all rolls up together:
> Protocol data, meaning data that is transmitted by a user agent, such as a web browser, in the process of requesting content from a provider, explicitly including items such as IP addresses, cookies, and request URIs, MAY be stored for a period of 6 weeks in a form that might not otherwise satisfy the requirements of this specification. For instance, the data may not yet be reduced to the subset of information allowed to be retained for permitted uses (such as fraud detection), and technical controls limiting access to the data for permitted uses may not be in place on things like raw logs data sitting on servers waiting for processing and aggregation into a centralized logs storage service.
> 
> Within this six week period, a data collector MUST NOT share data with other parties in a manner that would be prohibited outside of the six week period. Similarly, a data collector MUST NOT use the data for purposes other than would be allowed outside of the six week period. As examples, a data collector MAY use the raw data within a six week period for a permitted use like <link>fraud prevention</link> or to create reports with <link>unidentifiable data</link>, but a third party data collector MUST NOT build a profile to serve targeted advertisements based on the user's past six weeks of browsing activity.
> 
> After the six week period has passed, all other requirements of the DNT specification apply.
> Let's talk this through on the call and get this closed tomorrow. 
> 
>  Aleecia
>
Received on Wednesday, 30 May 2012 01:52:16 UTC