[whatwg] Default encoding to UTF-8? from Leif Halvard Silli on 2011-12-11 (public-whatwg-archive@w3.org from December 2011)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Sun, 11 Dec 2011 12:44:37 +0100
Message-ID: <20111211124437163071.aefa65b3@xn--mlform-iua.no>

Leif Halvard Silli Sun Dec 11 03:21:40 PST 2011

> W.r.t. iframe, then the "big in Norway" newspaper Dagbladet.no is 
> declared ISO-8859-1 encoded and it includes a least one ads-iframe that 
  ...
> * Let's say that I *kept* ISO-8859-1 as default encoding, but instead 
> enabled the Universal detector. The frame then works.
> * But if I make the frame page very short, 10 * the letter "?" as 
> content, then the Universal detector fails - on a test on my own 
> computer, it guess the page to be Cyrillic rather than Norwegian.
> * What's the problem? The Universal detector is too greedy - it tries 
> to fix more problems than I have. I only want it to guess on "UTF-8". 
> And if it doesn't detect UTF-8, then it should fall back to the locale 
> default (including fall back to the encoding of the parent frame).

The above illustrates that the current charset-detection solutions are 
starting to get old: They are not geared and optimized towards UTF-8 as 
the firmly recommended and - in principle - anticipated default.

The above may also catch a real problem with switching to UTF-8: that 
one may need to embed pages which do not use UTF-8: If one could trust 
UAs to attempt UTF-8 detection (but not "Univeral detection) before 
defaulting, then it became virtually risk free to switch a page to 
UTF-8, even if it contains iframe pages. Not?

Leif H Silli

Received on Sunday, 11 December 2011 03:44:37 UTC