Re: WebP, anyone using it? from Matthew Wilcox on 2012-10-18 (public-respimg@w3.org from October 2012)

From: Matthew Wilcox <elvendil@gmail.com>
Date: Thu, 18 Oct 2012 21:02:19 +0100
To: Marcos Caceres <w3c@marcosc.com>
Cc: David Demaree <ddemaree@adobe.com>, Tom Lane <tom@tomlane.me>, Peter Gasston <pgasston@gmail.com>, David Newton <david@davidnewton.ca>, François REMY <fremycompany_pub@yahoo.fr>, "public-respimg@w3.org" <public-respimg@w3.org>
Message-Id: <57C39DF2-3E92-4646-B6E4-07CD16F15269@gmail.com>

On 18 Oct 2012, at 20:25, Marcos Caceres <w3c@marcosc.com> wrote:

> On Thursday, October 18, 2012 at 7:44 PM, Matthew Wilcox wrote:
> 
>> That's my point - it's only a problem to people who have the scale issue. Which the majority of websites don't. I.e., the scale-to-very-high-traffic problem simply isn't an issue that a smaller site would have by definition of being a smaller site.
> 
> All small sites are big sites that haven't been found by [reddit, slashdot, whatever]. If they were to fall over because of an architectural issue on our side… that would be bad :)  This was David's point, I think: content negotiation does not scale well.  

Well, any small site hit by a DDOS (which is effectively what Reddit does) will fall over. Content negotiation is not going to be the problem there; getting a wave of traffic orders of magnitude over normal traffic is the issue. Content negotiation doesn't enter into it.

Content negotiation as done with cookie sniffing doesn't scale well, agreed. We're not saying that's how content negotiation needs to work though. You're building a response of "content negotiation is inefficient at scale" when all you really mean is "current techniques to do content negotiation don't scale". 

I'm not sure that adding a header to a URI request would involve anything like the same overhead as current content negotiation does either over the network or on the server.

Lets not forget that servers do content negotiation all the time - any time they're doing a sniff the browser string that's sent with all headers. That has not caused them to fall over. I'm suggesting (and have for some time) that we add other attributes to those headers.

>> As I understand the reported problem, it's not that CDNs can't do it - it's that their current business model makes it expensive.
> 
> No, I don't think that was said. It might be that you need so much computing power to respond in a reasonable amount of time that it's just not possible to do it at scale. If to get scale you need money, that is still a valid problem.

Again, you're focussing on *current implementations* which are by necessity sub-optimal.

> Consider I said: "If I had enough money, I could go to the moon." You'd probably laugh at me because simply having money won't get me there. Simply having lots of computers won't solve the problem either, as those computers need to be setup, maintained, etc.   

As above.

>> That's only because it's a niche technique at the moment. There's no reason not to believe that should this become common the CDNs would roll out the feature as standard, and thus the cost issue becomes moot.
> 
> If it's a computation problem of scale, the problem does not become moot. It can only be solved (maybe) by throwing oodles of computers at the problem or waiting for Moore's law to make the problem computationally feasible.
>> From an actual server perspective, content negotiation based on a header property sent along with a request is not that expensive - you don't even have to spool up a dynamic language like PHP; you just configure the server engine to look in a different directory using a re-write rule.
> 
> Ok, lets say it takes 3 operations to do it. Multiply that by 10,000,000 hits a day (we are talking CDN scale here). Consider moving stuff in and out of memory, writing to disk, etc. so it takes maybe ~1millisecond... I can see that simple becomes a massive problem real quick at that scale.
>> The only reason Adaptive Images requires Cookies; and thus requires PHP (and thus doesn't work on CDNs) is because the browser doesn't send useful headers along with URI requests. It has to use a cookie instead. We are, theoretically, in a position to change that.
>> 
>> Again; I'm not arguing for server side techniques *over* client side ones; I'm arguing it would be beneficial to have both.
> Agreed.   

On top of not agreeing that this is a genuine problem if there's an efficient mechanism (as opposed to the hacking needed today), it is also the case that *it's only ever a problem for high traffic sites*. 90% of site's are not high traffic sites.

There's no reason *not* to offer a solution just because it can't work in every use case. It need only be beneficial for the majority. Heck, not even that - there still aren't many real-world benefits for most of the HTML5 semantics yet.

> --  
> Marcos Caceres

Received on Thursday, 18 October 2012 20:02:54 UTC