On Jun 1, 2009, at 2:09 PM, Geoffrey Sneddon wrote: > > On 1 Jun 2009, at 19:37, Larry Masinter wrote: > >> New behavior: IF you see, say, <doctype html5> THEN assume default >> charset >> is UTF8, rather than applying heuristics to guess charset. The result of not applying heuristics would be Windows-1252 - that is the default except in the rare cases where the heuristics find a match. I still don't understand why disabling the heuristics should be tied to changing the default from Windows-1252 to UTF-8. > > If you see it how? You need to have read the encoded string to see > such a string. > >> Yes, supplying explicit charset is preferable, but what would break >> if such a new rule were supplied? > > The problem is that any HTML 5 content served as text/html will be > treated as Windows-1252 by all existing user agents and UTF-8 by new > ones, which is problematic and will lead to problems (as people tend > to only test in one browser, and if it works in one browser assume > it works everywhere) as it is hence inconsistent. Good point. The Degrade Gracefully design principle says: "On the World Wide Web, authors are often reluctant to use new language features that cause problems in older user agents, or that do not provide some sort of graceful fallback. HTML 5 document conformance requirements should be designed so that Web content can degrade gracefully in older or less capable user agents, even when making use of new elements, attributes, APIs and content models." Making the doctype switch the default from Windows-1252 to UTF-8 will mean only ASCII documents work correctly in both older and newer user agents, unless the author explicitly declares an encoding. If you have to explicitly declare a UTF-8 charset to get UTF-8, then nothing has been gained for careful authors. But unaware authors face an unexpected hazard. Regards, MaciejReceived on Monday, 1 June 2009 22:39:26 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:19 GMT