Re: Media Queries and optimizing what data gets transferred from Henri Sivonen on 2013-01-29 (www-style@w3.org from January 2013)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 29 Jan 2013 12:53:56 +0200
To: Ilya Grigorik <ilya@igvita.com>
Cc: www-style@w3.org
Message-ID: <CAJQvAucDO=+Uda7DH8qp==bors11P0DN1xeapw=DorAMOZNqjA@mail.gmail.com>
On Mon, Jan 28, 2013 at 9:10 PM, Ilya Grigorik <ilya@igvita.com> wrote:
> (2) Even if you provided all the markup, preload scanners in browsers don't
> always have all the viewport / CSS information at hand when kicking off
> request prefetching. Server negotiation resolves this.

In your proposal, server negotiation involves putting data in a
request header. How would the browser's HTTP stack have more
information than its preload scanners, etc.?

> (3) Your argument for "friendly to intermediate caches" is bogus. Vary
> exists for a reason.

Vary exists so that a cache can know the cached bytes are not valid
for serving from the cache.

I think the argument that's bogus is saying that Vary makes stuff
cache-friendly if what it ends up doing is making cache entries
practically never valid without checking with the origin server. It is
friendlier than not even being able to check with the origin server,
but it's substantially less friendly than using URL-only cache keys
and being able to respond from the cache without checking with the
origin server whenever bytes corresponding to a URL are in the cache.

> CDN's don't wan to be "dumb" either - they're all
> increasingly offering edge optimization services. And guess what: people are
> buying their services!

Well, obviously, since the client-side functionality is not there at
present. Your solution is not there currently, either. That sites are
using the option(s) currently available to them is no proof that your
currently non-deployed solution that is similar to the currently
available solution is better than and different presently non-deployed
solution.

> While, as I said earlier, I'm all for markup
> solutions, in practice many people are (rightfully) willing to pay a CDN or
> invest into own deployments, which can automate the problem. The cost of
> your development time >>> cost of automating the problem.

This assumes that solutions that generate multiple representations,
store them ahead of time and give a list of those representations to
the browser would be non-automatable.

>> Does anyone with a popular site (but not as popular as Google)
>> actually want to rescale images on the fly (as opposed to letting
>> Opera do it for them)? I really doubt that on-the-fly scaling to
>> precise client dimensions would become popular across the Web.
>> Instead, I expect sites to generate handful of pre-scaled image files
>> and then arrange something for choosing between them. In such a
>> scenario where the set of pre-scaled images is known in advance, the
>> design trade-offs become similar to the <video> codec negotiation
>> case. If each scaled image has its distinct URL, the site can declare
>> all the available scaled images in HTML and let the browser choose.
>> And this will again work efficiently without any logic in CDNs.
>
>
> Yes, plenty of sites, both large and small. A random sample of examples:
>
> http://developer.wordpress.com/docs/photon/
> http://adaptive-images.com/

Do these two scale images on a per-request basis or do they cache a
few pre-scaled versions?

> http://docs.sencha.io/current/index.html#!/guide/src

Does this want to be scaling images on a per-request basis or is it
doing that because srcset/<picture> is not available?

> Once again, let's not pose CDN's as an adversary - they're not. Instead,
> they are the ones who can help us make the web faster.

Currently, if you tell people to make their sites https, they
complained that CDNs charge more for https. Surely, CDNs will charge
more for solutions that involve content negotiation on the CDN than
for solutions that involve a 1-to-1 mapping between URLs and bytes to
serve.

I'm not saying that what you suggest could not be implemented on the
CDNs. I'm saying that a solution where the choice of representation
happens on the client and the bytes served for each URL don't depend
on server-side negotiation can be deployed without the participation
or changes on the CDN side.

Your proposal needs participation of both browsers and CDNs. Non-eager
Media Queries for CSS or srcset/<picture> for images need more
participation from browsers but no participation from CDNs.

>> > Having said that, images is just one example. Client-Hints is a generic,
>> > cache-friendly transport for client-server negotiation.
>>
>> Why do you characterize Client-Hints as cache-friendly? It seems to me
>> that with Vary: Client-Hints, even the local cache gets invalidated if
>> the user rotates the device 90° or if the current bandwidth and
>> estimates changes.
>
> It's cache-friendly compared to any other existing alternative...

I think it's a mistake to compare it only to existing alternatives. I
think you should compare your non-deployed solution to other potential
solutions (which of course are also non-deployed at present).

In the case of loading stylesheets, I think you should compare your
solution to the solution of being able to opt out of synchronous CSSOM
access and browsers promising not to eagerly download inapplicable
stylesheets that have been opted out of synchronous CSSOM access.

In the case of images, I think you should compare your solution to
srcset and <picture> proposals.

> Finally, yes, by definition Vary *will* add more variants of the resource to
> your cache. But the fact that you can cache to begin with is already a
> breakthrough.

It's not really that much of a breakthrough if you can virtually never
serve bytes from the cache without checking back with the origin
server.

> Let's take bw off the table - I'm removing it from the spec. Before we talk
> about BW in Client-Hints, we need to fix NetInfo api.

That still leaves a plethora of different width, height and pixel
density combinations, but without bandwidth, having an exact cache hit
isn't *completely* hopeless. Still pretty bad.

>> >>  * If the origin server doesn't get ETags right, intermediate caches
>> >> end up having a distinct copy of the data for each distinct
>> >> Client-Hints header value even if there is a smaller number of
>> >> different data alternatives on the origin server.
>> >
>> > Etags has *nothing* to do with this, and ETags is also not a mechanism
>> > to
>> > vary different responses to begin with.
>>
>> Have I misunderstood how HTTP cache validation works? If the cache
>> already has a response entity with an ETag and Vary: Client-Hints and
>> a new response to the cache comes in with a different value for
>> Client-Hints, isn't the cache supposed to issue a conditional request
>> with the ETag back to the origin server so that the origin server gets
>> to indicate whether the new Client-Hints value results in a different
>> response body or in the same one the cache already has?
>
>
> Yes, that's correct, that's the behavior with Vary. Having said that, a
> knitpick: ETag is an opaque token, and if the resource has been changed it
> should probably be a different value anyway.

I don't think I was suggesting anything about ETag not being opaque.

I'm talking about the case where what an image represents (i.e. the
original image that can be scaled) has not changed but Client-Hints is
different so that the origin server would respond with a different
bytes (i.e. a different scaling of the original image).

> Once again, I think your concern is fragmentation - fair enough, see my
> earlier comment.

My concern is that keying cache on the value of Client-Hints makes it
improbable to have cache hits that are known to be hits without
consulting with the server that actually performs the negotiation and
understands the semantics of Client-Hints (which HTTP caches don't).

>> >>  * Sending any HTTP header incurs extra traffic for all the sites that
>> >> don't pay attention to Client-Hints. That would be the whole Web at
>> >> least at first. That is, an HTTP-based solution involves a negative
>> >> externality for non-participating sites.
>> >
>> > This is easily addressed by making it an opt-in mechanism for HTTP 1.1.
>>
>> How would you handle the initial contact with the site? How would
>> opting into Client-Hints be better than setting a cookie? You
>> mentioned that cookies don't work cross-origin. How would Client-Hints
>> opt ins work cross origin?
>
>
> See my earlier document for why not cookies.

Do you mean the slide deck?

The reasons you give in the slide deck against using a cookie are it's
not working for the first request and cache-unfriendliness. As far as
I can tell, opt-in to Client-Hints doesn't work for the first request,
either, and when Client-Hints has a different value for pretty much
every different browser (true when bandwidth was part of it and still
mostly true if it's partitioned by different screens), Vary:
Client-Hints is not a cache-friendliness improvement over Vary:
Cookie. In fact, in the absence of login cookies, you could make the
values of Cookie exactly as partitioned as the values of Client-Hints.

> For opt-in, a mechanism similar to Alternate-Protocol can be provided:
> http://www.chromium.org/spdy/spdy-protocol/spdy-protocol-draft2#TOC-Server-Advertisement-of-SPDY-through-the-HTTP-Alternate-Protocol-header

This requires an HTTP round-trip to the server, so this kind of opt in
does not solve the problem of varying the top-level HTML in the
low-latency way upon first contact. I thought addressing that problem
was in scope for Client-Hints and one of the main motivators of
choosing a solution that puts stuff in the HTTP request.

>> > Further, the "cost" of upstream bytes, which is in the dozens of bytes,
>> > is
>> > easily offset by saving hundreds of kilobytes in the downstream (in case
>> > of
>> > images). The order of magnitude delta difference is well worth it.
>>
>> That might be true if you consider the cost in the context of a site
>> that actually pays attention to Client-Hints. But there's the
>> externality that you end up sending Client-Hints even to sites that
>> don't pay attention to it (unless it's opt in).
>
> 30 bytes or less + opt-in... Plus, as Patrick already pointed out, the 30
> bytes are not overflowing the CWND. I'm with you on concerns of adding any
> extra bytes, but this is not an argument against Client-Hints.

Even if it doesn't overflow the congestion window, do you have an
explanation for how it wouldn't matter towards data metering unmetered
mobile connections?

>> > correctly: wrong formats, wrong sizes, etc. Automation solves this
>> > problem.
>> > While not 100% related to this discussion, see my post here:
>> > http://www.igvita.com/2012/12/18/deploying-new-image-formats-on-the-web/
>>
>> This kind of "server automation will save us" argument would be easier
>> to buy if someone had already demonstrated a Web server that
>> automatically runs pngcrush on all PNG files and compresses JPEGs with
>> a better encoder than the one in libjpeg.
>>
>> Why isn't such a server in the popular use and why should we expect a
>> server that automatically scales images in response to Client-Hints to
>> enter into popular use?
>
>
> Oh hai: https://developers.google.com/speed/pagespeed/mod
>
> 200K+ sites using it + 3rd party integrations (EdgeCast, GoDaddy, Dreamhost)
> and others...

OK. That *is* a relevant data point. Thank you.

>> > Once again, they're not exclusive. If you don't have a server that can
>> > support image optimization, you should be able to hand-tune your markup.
>> > I'm
>> > all for that.
>>
>> "Not exclusive" means that there's more stuff—hence bad for
>> learnability of the platform.
>
>
> Nobody is forcing you to use it. If you only want to learn the markup way,
> then please be my guest!

"You don't need to use it" does not refute the learnability argument.
If there are more solutions to choose from, you need to learn about
them in order to make the choice what not to use.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Tuesday, 29 January 2013 10:54:27 UTC