On May 22, 2008, at 4:32 PM, Philip Taylor wrote: >>> The encoding sniffing algorithm works significantly better with >>> 1024 bytes (finds 92% of charsets) that with 512 (finds 82%). If >>> anyone cares, I could try a more detailed comparison to see if >>> there's a 'good' value that could be suggested to UA developers, >>> since the 512 bytes used as an example in the spec is not great. >> As far as I can tell, 512 bytes is the sweet spot after which you >> get diminishing returns (you got 80% with 512, but doubling it only >> got you an extra 10%). > > But on the other hand, doubling it got a huge 50% decrease in false > negatives :-) > (Seems like it's just a tradeoff that can be interpreted however you > want, and I've got no idea what would be best in practice, and 512 > doesn't sound less reasonable than anything else.) FWIW, WebKit has just switched to checking 1024 bytes instead of 512 (and we ignore charset declarations outside of HEAD past that boundary). - WBR, Alexey ProskuryakovReceived on Tuesday, 22 July 2008 11:45:29 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:40:16 GMT