W3C home > Mailing lists > Public > public-html@w3.org > May 2008

Re: Charset usage data

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 22 May 2008 23:32:18 +0000 (UTC)
To: Philip Taylor <pjt47@cam.ac.uk>
Cc: HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0805222331030.12911@hixie.dreamhostps.com>

On Thu, 22 May 2008, Philip Taylor wrote:
> 
> http://philip.html5.org/demos/html/charset-parsing/
> 
> All I tested (IE6/7, FF2/3, Opera 9.2/9.5, Safari 3) support 
> content="text/html; foo=bar; charset=...; baz=quux" (i.e. act 
> differently depending on the value of "...")

That's great test data. I've updated the spec accordingly.


> > As far as I can tell, 512 bytes is the sweet spot after which you get 
> > diminishing returns (you got 80% with 512, but doubling it only got 
> > you an extra 10%).
> 
> But on the other hand, doubling it got a huge 50% decrease in false 
> negatives :-) (Seems like it's just a tradeoff that can be interpreted 
> however you want, and I've got no idea what would be best in practice, 
> and 512 doesn't sound less reasonable than anything else.)

Yeah. I've left it at 512, which is just after the point where the line 
changes gradient quite dramatically. It's up to the UAs to decide exactly 
how many bytes (if any) they want to check. (Most seem to not check any.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 22 May 2008 23:33:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:17 GMT