[whatwg] Encoding Sniffing from Alexey Proskuryakov on 2012-04-23 (public-whatwg-archive@w3.org from April 2012)

From: Alexey Proskuryakov <ap@webkit.org>
Date: Mon, 23 Apr 2012 10:58:17 -0700
Message-ID: <FAA095D7-6F14-4F76-9D30-8574C9B3D019@webkit.org>

21.04.2012, ? 3:21, Anne van Kesteren ???????(?):

> 1) Is this something we want to define and eventually implement the same way?

I think that the general direction should be getting rid of encoding sniffing. It's very rarely helpful if ever, and implementations are wildly different.

WebKit can optionally use ICU for charset detection. We also have custom built-in heuristics to switch between Japanese encodings only (think rendering unlabeled EUC-JP pages when default browser encoding is set to Shift-JIS). Safari doesn't enable ICU based detection to no visible user disconcert, and I don't know if the Japanese heuristics are still important.

> 2) Does this need to apply outside HTML? For JavaScript it forbidden per the HTML standard at the moment. CSS and XML do not allow it either. Is it used for decoding text/plain at the moment?
> 3) Is there a limit to how many bytes we should look at?

Related to the last question, WebKit doesn't implement re-navigation (neither for charset sniffing, nor for <meta charset>), and I don't think that we ever should.

- WBR, Alexey Proskuryakov

Received on Monday, 23 April 2012 10:58:17 UTC