W3C home > Mailing lists > Public > public-tracking@w3.org > March 2013

RE: New text Issue 25: Aggregated data: collection and use for audience measurement research

From: Mike O'Neill <michael.oneill@baycloud.com>
Date: Sun, 10 Mar 2013 23:31:21 -0000
To: "'Kathy Joe'" <kathy@esomar.org>
Cc: <public-tracking@w3.org>
Message-ID: <0bee01ce1de7$5ff42610$1fdc7230$@baycloud.com>
Hi Kathy,


If the market research data set is being retained for tabulation &
aggregation reruns then there is no reason to keep the UIDs. The easiest way
to do that, and the most transparent, is give the encoding cookies a very
short lifetime. The data records would still be addressable with a key (the
UID), but they will not be linkable back to the user-agent from which they
had been collected. In my opinion this is the only way that an exemption for
market research data (in the absence of DNT:0 aka Tracking Consent) could be
acceptable. Processing the retained data to remove identifying data
(people's names, email addresses etc. that may be in there) in the Urls,
which the only definition of pseudonymisation that makes sense, is a good
idea. I do not see much point in retaining truncated IP addresses, they
might as well be deleted.




From: Kathy Joe [mailto:kathy@esomar.org] 
Sent: 09 March 2013 11:33
To: rigo@w3.org; public-tracking@w3.org
Cc: peter@peterswire.net; justin@cdt.org
Subject: New text Issue 25: Aggregated data: collection and use for audience
measurement research


Hi Rigo,

Thanks for your comments.  Reducing the calibration to a small percentage of
a specific group would create unreliable statistics because of bias, whilst
the objective of audience measurement research is to provide confidence in
the metrics.

The pseudonymised data is retained for that specific period so it can be
re-run if 

month by month checks are needed, as required by the audience measurement
standards defined by the joint industry bodies overseeing media measurement
around the world which also manage the auditing in their particular market.


a85a> &PHPSESSID=55143f172846ed39c7958cbeb837a85a






 From: Rigo Wenning [mailto:rigo@w3.org]

To: public-tracking@w3.org
Cc: Kathy Joe [mailto:kathy@esomar.org], peter@peterswire.net,
Sent: Thu, 07 Mar 2013 18:15:03 +0100
Subject: Re: Fw: New text Issue 25: Aggregated data: collection and use for
audience measurement research

I think key points are: 

1/ Panels are not an issue as they are based on consent anyway. The 
question is rather how to leverage DNT:0 to better get to consent for 
panels. Everything that requires "out-of-band" stuff on the Web 
diminishes the utility of DNT and DNT:0. 

2/ There is the calibration part. The devil is in the detail here. 
What would be the smallest percentage of DNT:0 users in a given 
clickstream so that calibration could still happen? Because I think if 
calibration is done with DNT:0 users, there is no issue. (Get a web-wide 

Instead of keeping the data, what about aggregate on the fly and have 
the software be certified by somebody? This would avoid the retention of 
data over 53 weeks. There is a very contentious discussion about data 
retention in Europe. Wanting 53 weeks data retention for DNT:1 while law 
enforcement will only get 24 weeks is recipe for more contention. 

=> calibration is our central issue. How can we do calibration either 
with sufficient DNT:0 or without data collection that foils the DNT:1 
goals. This isn't easy


On Wednesday 06 March 2013 13:34:22 Kathy Joe wrote:
> The panel output is calibrated by counting actual hits on tagged
> content and re-adjusting the results in order to ensure data produced
> from the panel accurately represents the whole audience. The counts
> must be pseudonomised. Counts are retained for sample, quality
> control, and auditing purposes during which time contractual measures
> must be in place to limit access to, and protect the data from other
> uses. A 53 week retention period is necessary so that month over
> month reports for a one year period may be re-run for quality
> checking purposes, after which the data must be de-identified. The
> counted data is largely collected on a first party basis, but to
> ensure complete representation, some will be third party placement.
> This collection tracks the content rather than involving the
> collection of a user's browser history.
Received on Sunday, 10 March 2013 23:32:01 UTC

This archive was generated by hypermail 2.3.1 : Friday, 3 November 2017 21:45:07 UTC