- From: Alexey Proskuryakov <ap@webkit.org>
- Date: Tue, 22 Jul 2008 15:44:18 +0400
- To: Philip Taylor <pjt47@cam.ac.uk>
- Cc: Ian Hickson <ian@hixie.ch>, HTML WG <public-html@w3.org>
On May 22, 2008, at 4:32 PM, Philip Taylor wrote: >>> The encoding sniffing algorithm works significantly better with >>> 1024 bytes (finds 92% of charsets) that with 512 (finds 82%). If >>> anyone cares, I could try a more detailed comparison to see if >>> there's a 'good' value that could be suggested to UA developers, >>> since the 512 bytes used as an example in the spec is not great. >> As far as I can tell, 512 bytes is the sweet spot after which you >> get diminishing returns (you got 80% with 512, but doubling it only >> got you an extra 10%). > > But on the other hand, doubling it got a huge 50% decrease in false > negatives :-) > (Seems like it's just a tradeoff that can be interpreted however you > want, and I've got no idea what would be best in practice, and 512 > doesn't sound less reasonable than anything else.) FWIW, WebKit has just switched to checking 1024 bytes instead of 512 (and we ignore charset declarations outside of HEAD past that boundary). - WBR, Alexey Proskuryakov
Received on Tuesday, 22 July 2008 11:45:29 UTC