Re: Resource Timing from Sigbjørn Vik on 2010-09-03 (public-web-perf@w3.org from September 2010)

From: Sigbjørn Vik <sigbjorn@opera.com>
Date: Fri, 03 Sep 2010 12:34:01 +0200
To: "Bryan McQuade" <bmcquade@google.com>, "Anderson Quach" <aquach@microsoft.com>
Cc: "Jason Sobel" <jsobel@facebook.com>, "Zhiheng Wang" <zhihengw@google.com>, "public-web-perf@w3.org" <public-web-perf@w3.org>
Message-ID: <op.vigcyvmypi0hod@india>
Hi
(new on this list, so forgive me if I lack some background)

For privacy reasons, I don't think allowing websites to read the history  
of third party websites is a good idea. True, existing onload timing gives  
a rough indication of something being cached, which we can live with, but  
let us avoid making it worse. Even if embarrassingDomain.tld allows all  
other websites to read information about it, that doesn't imply that I as  
the user want this, particularily not to bigBrother.tld. If I visit  
embarrasingDomain.tld, I'd do it in private mode and/or delete the cache  
afterwards, maybe even do it in a different browser and/or restart the  
browser. Note that DNS information survives all of this (as it is cached  
in the OS and gateway), and a third party website can still check if I've  
been to embarrassingDomain.tld through DNS timing. Note that  
embarrassingDomain.tld can explicitly opt in to communicate with third  
parties today using other technologies, but if I as a user have deleted my  
cookies and closed my tabs, embarrassingDomain.tld doesn't have any  
information to share, so even in those cases the user is in control.

I can just imagine facebook's app, a list of websites users have recently  
visited, and a list of which websites are most popular among your  
friends... (Not trivial to implement as reading a value also sets and thus  
destroys it, but doable.)

Some other random thoughts is that such a facebook app could already be  
made using CSS visited styling, at least for still cached history.  
Accurate third party timing information also opens up for cross-document  
messaging. A list of subdomains can each hold one bit (visited/not  
visited), and both parties can read and write to these bits. Not directly  
a security problem as this is strictly opt-in from both sites, but  
probably an unwanted side effect.

As a user, my browsing habits are personal, and I don't want to share  
those, even if they might be valuable to websites, and even if websites  
want them to be shared with other websites. (Just like cookies, even  
though websites want supercookies which can be shared among sites, this is  
disallowed.) Users might want to explicitly opt in to sharing such  
information, so another possibility is that browsers add a preference  
toggle (default same-domain only) for timing information. We are unlikely  
to see 0 performance impact of Resource Timing, so a user option might be  
the best way in any case. If we don't add this in the spec, privacy  
concious browsers/add-ons are likely to add a toggle in any case, and  
we'll see websites break due to unexpected JS behaviour.

Another options is that third party DNS lookup timing information is  
off-limits in all cases, but other third party timing information is  
available, but then the spec needs to ensure this isn't leaked by allowing  
a website to read all other third party timing information and deduce that  
the missing time is identical to the DNS lookup time. This still reduces  
privacy though, it is possible to see which resources from a third party  
site the user has loaded, and getting such information changes from an art  
to an exact science, thus making it easier.


On Fri, 03 Sep 2010 04:18:55 +0200, Anderson Quach <aquach@microsoft.com>  
wrote:

> Hi Bryan,
>
> Thanks for your thoughtful reply. I agree that much of the Resource  
> Timing information such as the time taken to retrieve and load a  
> resource can be easily determined and thus figuring out whether or not  
> the resource was cached can be easily discovered with script today.
>
> However, it has been brought to our attention that we should not allow  
> make this an capability of new platform interfaces such as Resource  
> Timing.  We are actively brainstorming solutions to mitigate this  
> privacy attack. In fact, I'd love to hear some of your thoughts on  
> approaches to mitigate this issue.
>
> We also want the interface to be easy to use. One of our aspirations is  
> to be able to arrive at a solution where we can have Resource Timing on  
> by default for all downloaded resources, however, not at the cost of  
> impeding performance of the user-agent. This is an area where we will  
> need technical investigations and prototypes.
> The goals for Resource Timing we have in mind are:
> * ease of store and access to the resource timings
> * negligibly impacting the user-agent's performance
> * efficient lifetime management of the resource timing objects
> * end-user security and privacy conscious
>
> Best Regards,
> Anderson Quach
> IE Program Manager
>
> -----Original Message-----
> From: Bryan McQuade [mailto:bmcquade@google.com]
> Sent: Tuesday, August 31, 2010 8:56 AM
> To: Anderson Quach
> Cc: Jason Sobel; Zhiheng Wang; public-web-perf@w3.org
> Subject: Re: Resource Timing
>
> Hi Anderson and Zhiheng,
>
> I wanted to follow up on this and share more of my thoughts now that  
> I've had more time to think about it.
>
> I do not know the background on the security decisions for resource  
> timing. Zhiheng said: "In the example here, you can look into the DNS  
> time and TCP time of the resource fetched from otherdomain.com and  
> figure if the user has recently (or even currently) visit  
> otherdomain.com."
>
> This kind of information is already leaked by the browser and it is  
> relatively easy to ascertain whether a user has visited a site recently  
> due to the shared nature of the browser cache. You are right that for  
> resource timing it will be very clear (dns/tcp times of zero) but it's  
> easy enough to embed the URL of a resource known to be on some other  
> page in your own page, and then time the onload for that resource. If  
> it's short (under 10-15ms) it's likely from cache, indicating that the  
> user has visited the site. If you look at resource expirations from a  
> site you can even infer how recently the user visited that site. So the  
> shared browser cache leaks more information today than the resource  
> timing information would.
>
> I raise this issue because I do not expect we will see widespread  
> adoption of this header-based opt-in approach, and the real value of  
> resource timing hinges on web site operators being able to see how much  
> latency is added by third party content. You will see adoption of the  
> opt-in headers for the big players like google, facebook, and ms, where  
> they have the size to force their content providers to enable these  
> headers (or in some cases where they run all the services and can just  
> enable these headers themselves). But small hosters will not be able to  
> force third party providers to enable these headers, and they will be  
> locked out of this valuable data by default.
>
> Further, adding new headers increases the weight of each response, which  
> works against goals of making the web faster.
>
> I hope you will consider providing this data by default. Resource timing  
> is going to be very useful and will empower web site owners to  
> understand what's slowing their sites down, and allow them to put  
> pressure on the slow third party content providers. But this can happen  
> only if the site providers have access to the data they need, which is  
> why I am advocating to make it available by default.
>
> -Bryan

-- 
Sigbjørn Vik
Quality Assurance
Opera Software
Received on Friday, 3 September 2010 10:34:59 UTC