- From: Larry Masinter <masinter@adobe.com>
- Date: Mon, 20 Jul 2009 07:01:55 -0700
- To: Ian Hickson <ian@hixie.ch>, "Dr. Olaf Hoffmann" <Dr.O.Hoffmann@gmx.de>
- CC: HTML WG <public-html@w3.org>
What the document should say, rather than having a 'willful' misinterpretation, is that ISO-8859-1 means ISO-8859-1, but that for backward compatibility with existing (broken) web content, HTTP interpreting agents SHOULD treat characters outside of the ISO-8859-1 repertoire as if they were in Windows-1252. This would allow and encourage HTML validators and HTML generation software to use the correct interpretation without a 'willful' disregard for compatibility with other standards and processing agents outside of the scope of the specifications of this committee. IMHO, the willful disregard for compatibility with other specifications in the current specification reflects a consistent error in judgment. I reject as an unsound design principle the notion that merely because there exist some broken web content today that we are forced to encode that broken behavior in HTML forever. Yes, HTML interpreting agents that wish to be compatible with existing content will need to apply some additional constraints and extensions, but it is unnecessary, and poor design, to fail to distinguish between advice to interpreting agents as to backward-compatibility behavior vs. advice to generating and authoring agents as to proper forward-looking behavior. Larry -- http://larry.masinter.net -----Original Message----- From: public-html-comments-request@w3.org [mailto:public-html-comments-request@w3.org] On Behalf Of Ian Hickson Sent: Monday, July 20, 2009 1:57 AM To: Dr. Olaf Hoffmann Cc: public-html-comments@w3.org Subject: Re: [HTML5] 2.8 Character encodings On Mon, 6 Jul 2009, Dr. Olaf Hoffmann wrote: > > in the current draft are mentioned in 2.8 > http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#character-encodings-0 > some 'willful' misinterpretations of encoding information, for example > to interprete a string like 'ISO-8859-1' as 'Windows-1252'. > > 1. Which string has an author to note, if he really wants to indicate, that > the encoding is for example 'ISO-8859-1' and not 'Windows-1252'? "ISO-8859-1". If the author has really used that encoding, then there is no difference between them (1252 is a superset). > 2. As far as I have seen, HTML5 has no version indication like previous > versions of HTML had and other popular formats like SVG have. > How can a browser identify, that a document is really intended as > 'HTML5' with the implicated 'willful' misinterpretations of encoding > information and no other HTMLversion? It doesn't matter, all versions of HTML are in practice processed with these mappings. It is indeed why HTML5 has these mappings -- because browsers already did this. We wouldn't add these mappings if we didn't have to to handle legacy content (content in previous versions of HTML). > Assuming that a viewer is able to identify a document somehow being a > HTML5 document after looking into the content and for example a server > sended 'ISO-8859-1' before, does this mean, that the viewer switches to > or reparses the document with 'Windows-1252' again? I don't understand the question. > Obviously it would be better to avoid such misinterpretation by using an > encoding like UTF-8 not confused by the current HTML5 draft, however due > to the history of older projects or server configurations it might be > still convenient for many authors to continue to use 'ISO-8859-1' > instead of other encodings, even if they switch for example from HTML4 > to HTML5 for some documents. Hopefully my answers above will reassure you that this is not in fact a problem that authors will face. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 20 July 2009 14:02:40 UTC