W3C home > Mailing lists > Public > public-web-perf@w3.org > September 2010

Re: Resource Timing

From: Nik Matsievsky <speed@webo.name>
Date: Sun, 05 Sep 2010 19:09:40 +0400
Message-ID: <4C83B2B4.3030803@webo.name>
To: Sigbjørn Vik <sigbjorn@opera.com>
CC: Bryan McQuade <bmcquade@google.com>, Anderson Quach <aquach@microsoft.com>, Jason Sobel <jsobel@facebook.com>, Zhiheng Wang <zhihengw@google.com>, "public-web-perf@w3.org" <public-web-perf@w3.org>
about security issues for performance timing measurement. I don't thin
that the following is a very good idea but it can be a compromise
between security and performance.

We can cache all timing information for a given resource (as well as all
browsers cache valuable HTTP headers: Cache, E-Tag, Content-Type, maybe
something more) when it's loaded first time by user agent. And if any
web app just wants to load its timing info - it gets this
first-time-fetched info (and can't determine if current 3rd-party asset
is cached or not). This restricts definition if current resource is
cached (but browsers can enable this technique for 3rd-party resources
only), but allow browsers to provide all timings for this asset (w/o
sharing user private date).

Maybe also we can check headers or a file on external (3rd-party) domain
- similar as Flash app does. Or/and can make this optional (only if user
wants to allow any web app to collect his/her browsing history this way
- it can collect all timings for all resources).

> Hi
> (new on this list, so forgive me if I lack some background)
>
> For privacy reasons, I don't think allowing websites to read the
> history of third party websites is a good idea. True, existing onload
> timing gives a rough indication of something being cached, which we
> can live with, but let us avoid making it worse. Even if
> embarrassingDomain.tld allows all other websites to read information
> about it, that doesn't imply that I as the user want this,
> particularily not to bigBrother.tld. If I visit embarrasingDomain.tld,
> I'd do it in private mode and/or delete the cache afterwards, maybe
> even do it in a different browser and/or restart the browser. Note
> that DNS information survives all of this (as it is cached in the OS
> and gateway), and a third party website can still check if I've been
> to embarrassingDomain.tld through DNS timing. Note that
> embarrassingDomain.tld can explicitly opt in to communicate with third
> parties today using other technologies, but if I as a user have
> deleted my cookies and closed my tabs, embarrassingDomain.tld doesn't
> have any information to share, so even in those cases the user is in
> control.
>
> I can just imagine facebook's app, a list of websites users have
> recently visited, and a list of which websites are most popular among
> your friends... (Not trivial to implement as reading a value also sets
> and thus destroys it, but doable.)
>
> Some other random thoughts is that such a facebook app could already
> be made using CSS visited styling, at least for still cached history.
> Accurate third party timing information also opens up for
> cross-document messaging. A list of subdomains can each hold one bit
> (visited/not visited), and both parties can read and write to these
> bits. Not directly a security problem as this is strictly opt-in from
> both sites, but probably an unwanted side effect.
>
> As a user, my browsing habits are personal, and I don't want to share
> those, even if they might be valuable to websites, and even if
> websites want them to be shared with other websites. (Just like
> cookies, even though websites want supercookies which can be shared
> among sites, this is disallowed.) Users might want to explicitly opt
> in to sharing such information, so another possibility is that
> browsers add a preference toggle (default same-domain only) for timing
> information. We are unlikely to see 0 performance impact of Resource
> Timing, so a user option might be the best way in any case. If we
> don't add this in the spec, privacy concious browsers/add-ons are
> likely to add a toggle in any case, and we'll see websites break due
> to unexpected JS behaviour.
>
> Another options is that third party DNS lookup timing information is
> off-limits in all cases, but other third party timing information is
> available, but then the spec needs to ensure this isn't leaked by
> allowing a website to read all other third party timing information
> and deduce that the missing time is identical to the DNS lookup time.
> This still reduces privacy though, it is possible to see which
> resources from a third party site the user has loaded, and getting
> such information changes from an art to an exact science, thus making
> it easier.
>
>
> On Fri, 03 Sep 2010 04:18:55 +0200, Anderson Quach
> <aquach@microsoft.com> wrote:
>
>> Hi Bryan,
>>
>> Thanks for your thoughtful reply. I agree that much of the Resource
>> Timing information such as the time taken to retrieve and load a
>> resource can be easily determined and thus figuring out whether or
>> not the resource was cached can be easily discovered with script today.
>>
>> However, it has been brought to our attention that we should not
>> allow make this an capability of new platform interfaces such as
>> Resource Timing.  We are actively brainstorming solutions to mitigate
>> this privacy attack. In fact, I'd love to hear some of your thoughts
>> on approaches to mitigate this issue.
>>
>> We also want the interface to be easy to use. One of our aspirations
>> is to be able to arrive at a solution where we can have Resource
>> Timing on by default for all downloaded resources, however, not at
>> the cost of impeding performance of the user-agent. This is an area
>> where we will need technical investigations and prototypes.
>> The goals for Resource Timing we have in mind are:
>> * ease of store and access to the resource timings
>> * negligibly impacting the user-agent's performance
>> * efficient lifetime management of the resource timing objects
>> * end-user security and privacy conscious
>>
>> Best Regards,
>> Anderson Quach
>> IE Program Manager
>>
>> -----Original Message-----
>> From: Bryan McQuade [mailto:bmcquade@google.com]
>> Sent: Tuesday, August 31, 2010 8:56 AM
>> To: Anderson Quach
>> Cc: Jason Sobel; Zhiheng Wang; public-web-perf@w3.org
>> Subject: Re: Resource Timing
>>
>> Hi Anderson and Zhiheng,
>>
>> I wanted to follow up on this and share more of my thoughts now that
>> I've had more time to think about it.
>>
>> I do not know the background on the security decisions for resource
>> timing. Zhiheng said: "In the example here, you can look into the DNS
>> time and TCP time of the resource fetched from otherdomain.com and
>> figure if the user has recently (or even currently) visit
>> otherdomain.com."
>>
>> This kind of information is already leaked by the browser and it is
>> relatively easy to ascertain whether a user has visited a site
>> recently due to the shared nature of the browser cache. You are right
>> that for resource timing it will be very clear (dns/tcp times of
>> zero) but it's easy enough to embed the URL of a resource known to be
>> on some other page in your own page, and then time the onload for
>> that resource. If it's short (under 10-15ms) it's likely from cache,
>> indicating that the user has visited the site. If you look at
>> resource expirations from a site you can even infer how recently the
>> user visited that site. So the shared browser cache leaks more
>> information today than the resource timing information would.
>>
>> I raise this issue because I do not expect we will see widespread
>> adoption of this header-based opt-in approach, and the real value of
>> resource timing hinges on web site operators being able to see how
>> much latency is added by third party content. You will see adoption
>> of the opt-in headers for the big players like google, facebook, and
>> ms, where they have the size to force their content providers to
>> enable these headers (or in some cases where they run all the
>> services and can just enable these headers themselves). But small
>> hosters will not be able to force third party providers to enable
>> these headers, and they will be locked out of this valuable data by
>> default.
>>
>> Further, adding new headers increases the weight of each response,
>> which works against goals of making the web faster.
>>
>> I hope you will consider providing this data by default. Resource
>> timing is going to be very useful and will empower web site owners to
>> understand what's slowing their sites down, and allow them to put
>> pressure on the slow third party content providers. But this can
>> happen only if the site providers have access to the data they need,
>> which is why I am advocating to make it available by default.
>>
>> -Bryan
>


-- 
Thank you,
Nik Matsievsky, WEBO Software, www.webogroup.com
+7 926 7281964 / skype:nikolay.matsievsky
Received on Monday, 6 September 2010 07:21:58 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:04:29 UTC