- From: Leif Halvard Silli <lhs@malform.no>
- Date: Tue, 02 Jun 2009 02:48:37 +0200
- To: Maciej Stachowiak <mjs@apple.com>
- CC: Geoffrey Sneddon <foolistbar@googlemail.com>, Larry Masinter <masinter@adobe.com>, Anne van Kesteren <annevk@opera.com>, Chris Wilson <Chris.Wilson@microsoft.com>, "M.T. Carrasco Benitez" <mtcarrascob@yahoo.com>, Travis Leithead <Travis.Leithead@microsoft.com>, Erik van der Poel <erikv@google.com>, "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>, Richard Ishida <ishida@w3.org>, Ian Hickson <ian@hixie.ch>, Harley Rosnow <Harley.Rosnow@microsoft.com>
Maciej Stachowiak On 09-06-02 00.38: > On Jun 1, 2009, at 2:09 PM, Geoffrey Sneddon wrote: >> On 1 Jun 2009, at 19:37, Larry Masinter wrote: >> >>> New behavior: IF you see, say, <doctype html5> THEN assume default >>> charset >>> is UTF8, rather than applying heuristics to guess charset. > > The result of not applying heuristics would be Windows-1252 - that is > the default except in the rare cases where the heuristics find a match. > I still don't understand why disabling the heuristics should be tied to > changing the default from Windows-1252 to UTF-8. Is it the choice of UTF-8 as default you don't understand? If so, then I'd like to quote the "Support World Languages" principle. >> If you see it how? You need to have read the encoded string to see >> such a string. >> >>> Yes, supplying explicit charset is preferable, but what would break >>> if such a new rule were supplied? >> >> The problem is that any HTML 5 content served as text/html will be >> treated as Windows-1252 by all existing user agents and UTF-8 by new >> ones, which is problematic and will lead to problems (as people tend >> to only test in one browser, and if it works in one browser assume it >> works everywhere) as it is hence inconsistent. > > Good point. The Degrade Gracefully design principle says: > > "On the World Wide Web, authors are often reluctant to use new language > features that cause problems in older user agents, or that do not > provide some sort of graceful fallback. HTML 5 document conformance > requirements should be designed so that Web content can degrade > gracefully in older or less capable user agents, even when making use of > new elements, attributes, APIs and content models." > > Making the doctype switch the default from Windows-1252 to UTF-8 will > mean only ASCII documents work correctly in both older and newer user > agents, unless the author explicitly declares an encoding. If you have > to explicitly declare a UTF-8 charset to get UTF-8, then nothing has > been gained for careful authors. But unaware authors face an unexpected > hazard. There is one aspect that you are - again - forgetting, and that is authoring tools and web servers. If complying authoring tools had to default to UTF-8 whenever someone select to create a HTML 5 document (much the same way that XML default to UTF-8/-16), then that would be a bonus and simplification and _motivation_ for using HTML 5. The next level should be that web servers defaults to sending a charset header which said "UTF-8" whenever they saw the HTML 5 doctype. Thus we could leave the Web browser behaviour as drafted, but require utf-8 as default from serves and authoring tools. -- leif halvard silli
Received on Tuesday, 2 June 2009 00:49:23 UTC