- From: Sergiusz Wolicki <sergiusz@wolicki.com>
- Date: Thu, 1 Dec 2011 18:48:16 +0100
I have read section 4.2.5.5 of the WHATWG HTML spec and I think it is sufficient. It requires that any non-US-ASCII document has an explicit character encoding declaration. It also recommends UTF-8 for all new documents and for authoring tools' default encoding. Therefore, any document conforming to HTML5 should not pose any problem in this area. The default encoding issue is therefore for old stuff. But I have seen a lot of pages, in browsers and in mail, that were tagged with one encoding and encoded in another. Hence, documents without a charset declaration are only one of the reasons of garbage we see. Therefore, I see no point in trying to fix anything in browsers by changing the ancient defaults (risking compatibility issues). Energy should go into filing bugs against misbehaving authoring tools and into adding proper recommendations and education in HTML guidelines and tutorials. Thanks, Sergiusz On Thu, Dec 1, 2011 at 7:00 AM, L. David Baron <dbaron at dbaron.org> wrote: > On Thursday 2011-12-01 14:37 +0900, Mark Callow wrote: > > On 01/12/2011 11:29, L. David Baron wrote: > > > The default varies by localization (and within that potentially by > > > platform), and unfortunately that variation does matter. > > In my experience this is what causes most of the breakage. It leads > > people to create pages that do not specify the charset encoding. The > > page works fine in the creator's locale but shows mojibake (garbage > > characters) for anyone in a different locale. > > > > If the default was ASCII everywhere then all authors would see mojibake, > > unless it really was an ASCII-only page, which would force them to set > > the charset encoding correctly. > > Sure, if the default were consistent everywhere we'd be fine. If we > have a choice in what that default is, UTF-8 is probably a good > choice unless there's some advantage to another one. But nobody's > figured out how to get from here to there. > > (I think this is legacy from the pre-Unicode days, when the browser > simply displayed Web pages using to the system character set, which > led to a legacy of incompatible Web pages in different parts of the > world.) > > -David > > -- > ? L. David Baron http://dbaron.org/ ? > ? Mozilla http://www.mozilla.org/ ? >
Received on Thursday, 1 December 2011 09:48:16 UTC