RE: action-190

Aleecia,

If our stance is a use-based one (you collect the data but do nothing with it outside of Permitted Uses) then a company is DNT compliant whether they hold the data for 1 day or 6 months.  It's from that perspective that I don't understand the goal here.

Attempting to state "what uses ARE NOT permitted within 6 weeks" and then also stating "what uses ARE permitted after 6 weeks" seems wasteful to me and is doubling the complexity of the problem.  I would suggest we either take one approach or the other and not attempt to address both simultaneously.

I personally like the "what uses ARE NOT permitted" as this was a simpler approach (no profiling, no passing to 3rd parties) but the group has far expanded the scope of the problem domain from online behavioral advertising to "all things online privacy" and therefore we've had to switch focus to a narrow permitted uses approach.

Please pick one.  :)

- Shane

From: Aleecia M. McDonald [mailto:aleecia@aleecia.com]
Sent: Wednesday, May 30, 2012 8:53 AM
To: public-tracking@w3.org (public-tracking@w3.org)
Subject: Re: action-190

Hi Shane,

My guess is you wrote this without seeing my reply to Ian. Short answer: the point of the six week window was, as I understood it in DC, to give companies time to hold data they otherwise would not be compliant to hold, without the burden of real-time data processing on log files to figure out what they may and may not write to the log files. I am seeing two concerns surface:
            - Responses that Ian is trying to sneak in some new type of data use during a six week loophole. I do not believe that is Ian's goal at all.
            - Shane's response below that giving companies more time to figure out what they have in their log files is a backdoor data retention attempt. I do not believe that was anyone's goal at all here, either.

This one is pretty close to a pure technical issue. As I watch responses, I cannot help thinking that too many people in the working group are jumping at shadows.

Just to be absolutely clear, in case people have forgotten the discussion in DC and missed it in Ian's original text, the end of the six week period is NOT "after this time you must delete all log files." Instead it is after the end of the six week period the entity holding the log file must finish processing the data into a DNT-compliant form. The entity needs to be sure they are able to retain the data under DNT. Nothing after the initial six weeks is different from a use based approach, regardless of whether "use based" means many carve outs from frequency capping to anti-fraud with unique identifiers or if use based means any use you like so long as it is unidentifiable data. Either approach works the same way here.

I do not imagine anyone will read this prior to the call, but to keep things straight in my own head, let me give a go at what I believe the world looks like with or without a six week processing period for log files. I'll number these for ease of reference in follow up discussion.

A. Always a first party
            A1. Real time processing:                   Can use all log file data any way they like, provided it is not combined with data from other entities
            A2. Six week processing:                    Can use all log file data any way they like, provided it is not combined with data from other entities

B. Might be a first party, might be a third party
            B1. Real time processing:                   When unsure, must follow the rules for a third party (see below)
            B2. Six week processing:                    Can take the time to figure out what data to keep either as a first party, or as a third party under permitted uses. The result must match whatever promise the entity made while sending a response header (&c). Meanwhile, during that six weeks, the entity can use the data in ways that would be permitted after the six weeks. For example, a report that lists number of page views from DNT users would be sufficiently unidentifiable that it is fine. After six weeks, the results of aggregate reports are still fine because they are unidentifiable, but any raw data from a DNT:1 user that is not covered by a permitted use must be discarded.

C. Third party
            C1. Real time processing:                   Must not collect or use any data from a DNT:1 user that does not fall under a specific business use. That can mean either (1) never writing anything to a log file from a DNT:1 user, or (2) new engineering work to create a system that parses incoming data in real time to determine if it does or does not fall under a specific business use, and which one(s). Either way, any raw data from a DNT:1 user that is not covered by a permitted use must be discarded. I would like to understand how proponents think this would work for companies.
            C2. Six week processing:                    Can keep raw data for six weeks to understand which data is from DNT:1 users and which is not. For not DNT:1 users, can keep data indefinitely or as otherwise promised by privacy policies, as today. For DNT:1 users, can keep data for specific users as matches whatever promise the entity made while sending a response header (&c). Meanwhile, during that six weeks, the entity can use the data in ways that would be permitted after the six weeks. For example, a report that lists number of page views from DNT users would be sufficiently unidentifiable that it is fine. After six weeks, the results of aggregate reports are still fine because they are unidentifiable, but any raw data from a DNT:1 user that is not covered by a permitted use must be discarded.

D. Outsourced service on behalf of another party
            D1. Real time processing:                   Can use all log file data in any way the other party could. This means for DNT:1 users, if the outsourced service does not know if the primary party is first or third party in a given transaction, they must treat it as third party at the time of collection. This means for DNT:1 users, they must figure out if there are relevant business uses at the time of collection or not collect it at all. Again, I would like to understand how proponents think this would work for companies.
            D2. Six week processing:                    Can take the time to figure out what data to keep either as a first party, or as a third party under permitted uses. The result must match whatever promise the entity made while sending a response header (&c). Meanwhile, during that six weeks, the entity can use the data in ways that would be permitted after the six weeks. For example, a report that lists number of page views from DNT users would be sufficiently unidentifiable that it is fine. After six weeks, the results of aggregate reports are still fine because they are unidentifiable, but any raw data from a DNT:1 user that is not covered by a permitted use must be discarded.

Please help me understand what I am missing. This is so much easier with a white board...

            Aleecia

On May 29, 2012, at 9:50 PM, Shane Wiley wrote:


Aleecia,

What does the six week period buy a 3rd party?  If our approach is "use based" (i.e. may only use data for a few very limited purposes) and those are in force from the moment you "collect" a log file entry, I'm not seeing why the six weeks is valuable.  As long as the data is not used for anything other than a Permitted Use, then the timeframe can be 1 second or 3 months prior to its use, unlinking, or destruction.  This seems like a backdoor approach to establishing an arbitrary data retention limit but I'm not seeing the value if the data is never "used" for anything outside of a Permitted Use.    If a 3rd party retains data for ANY of the Permitted Uses, it appears to trump this provision.  Is this simply to remove the risk of government intrusion requesting a 3rd party's raw log files?

I would recommend we remove this language altogether and move back to the current "use based" model for retention discussions.

- Shane

From: Aleecia M. McDonald [mailto:aleecia@aleecia.com]
Sent: Tuesday, May 29, 2012 2:46 PM
To: public-tracking@w3.org<mailto:public-tracking@w3.org> (public-tracking@w3.org<mailto:public-tracking@w3.org>)
Subject: action-190

In the midst of writing the agenda for tomorrow I realized I was spending too much space on log files and should pull this out into a different message.

To go back to the point of this issue, we are tying to find a way to give companies flexibility when they do not yet know what data they hold in a log file. We are trying to find a path such that they do not have to operate in real-time, with all of the engineering challenges entailed.

We have proposed text from Ian, which we discussed on the 9 May conference call. We ran into a few issues on the call:

            A. People not supporting Ian's text simply because they had not reviewed it. At this point there has been AMPLE time for review. We shall not have that issue again tomorrow.

            B. Confusion that Ian's proposal applies to first parties.
                        - My read is that some of this confusion stems from the mistaken notion that data after six weeks must be discarded, as opposed to processed. We may need to clarify the text to make that clear if that confusion is wide-spread. We can talk about this on the call if needed.
                        - As Roy points out, at the moment log files are written, it may not be clear if data are first- or third-party unless we want to insist on real-time processing, which is part of what we're trying to avoid in the first place. As such, any party that _could_ be collecting log file data as a third-party will run into wanting time to process their logs.

            SUGGESTION: we add additional text to point out that for those who know they are always only first parties, they can do as they like with log file data so long as they are in compliance with other first party data practices. That will be the end result either way, but we can make this clearer I think.

            C. Confusion around the notion of processing a log file as a one-time or multi-time event. The consensus we had in DC assumed processing as a one-time event: we were working on something like "you may hold log file data for a short time until you process it, at which time the data must then comply with DNT rules for you." What we have since heard from Ian is that log processing is something that happens on a rolling basis. We then started down a path of complexity of what would, or would not be, permitted uses for log file data, and that created a new wave of confusion and frustration. This led to a counter-proposal from Vincent (http://lists.w3.org/Archives/Public/public-tracking/2012May/0171.html) of:
                        Similarly, a data collector MUST NOT use the data for purposes other than those allowed outside of the six week period.

            SUGGESTION: we adopt Vincent's change, which simplifies much.

We might also refer to the rest of the text for details on the fraud use rather than attempt to characterize it here, and illustrate more clearly that this is not a block on first parties. Specifically, we might tighten the original text of:
                        As examples, a data collector MAY use the raw data within a six week period to debug their system, a data collector MAY use the raw data within the six week period to build a profile of a user fraudulently or maliciously accessing the system for purposes such as blocking access to the system by that user, but the data collector MUST NOT build a profile to serve targeted advertisements based on the user's past six weeks of browsing activity.

            to:
                        As examples, a data collector MAY use the raw data within a six week period for a permitted use like <link>fraud prevention</link> or to create reports with <link>unidentifiable data</link>, but a third party data collector MUST NOT build a profile to serve targeted advertisements based on the user's past six weeks of browsing activity.

Here's how that all rolls up together:

Protocol data, meaning data that is transmitted by a user agent, such as a web browser, in the process of requesting content from a provider, explicitly including items such as IP addresses, cookies, and request URIs, MAY be stored for a period of 6 weeks in a form that might not otherwise satisfy the requirements of this specification. For instance, the data may not yet be reduced to the subset of information allowed to be retained for permitted uses (such as fraud detection), and technical controls limiting access to the data for permitted uses may not be in place on things like raw logs data sitting on servers waiting for processing and aggregation into a centralized logs storage service.



Within this six week period, a data collector MUST NOT share data with other parties in a manner that would be prohibited outside of the six week period. Similarly, a data collector MUST NOT use the data for purposes other than would be allowed outside of the six week period. As examples, a data collector MAY use the raw data within a six week period for a permitted use like <link>fraud prevention</link> or to create reports with <link>unidentifiable data</link>, but a third party data collector MUST NOT build a profile to serve targeted advertisements based on the user's past six weeks of browsing activity.



After the six week period has passed, all other requirements of the DNT specification apply.
Let's talk this through on the call and get this closed tomorrow.

            Aleecia

Received on Wednesday, 30 May 2012 16:16:41 UTC