[HTML5] 2.8 Character encodings from Dr. Olaf Hoffmann on 2009-07-06 (public-html-comments@w3.org from July 2009)

From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
Date: Mon, 6 Jul 2009 17:52:39 +0200
To: public-html-comments@w3.org
Message-Id: <200907061752.39623.Dr.O.Hoffmann@gmx.de>

Hello,

in the current draft are mentioned in 2.8
http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#character-encodings-0
some 'willful' misinterpretations of encoding information, for example 
to interprete a string like 'ISO-8859-1' as 'Windows-1252'.

1. Which string has an author to note, if he really wants to indicate, that
the encoding is for example 'ISO-8859-1' and not 'Windows-1252'?

2. As far as I have seen, HTML5 has no version indication like previous
versions of HTML had and other popular formats like SVG have.
How can a browser identify, that a document is really intended as
'HTML5' with the implicated  'willful' misinterpretations of encoding
information and no other HTMLversion?
This assumes of course, that for other versions and formats there
is no such 'willful' misinterpretation and the identification problem
of encodings happens as usual by interpreting the string as 
provided by the author.

Assuming that a viewer is able to identify a document somehow
being a HTML5 document after looking into the content and for 
example a server sended 'ISO-8859-1' before, does this mean, that
the viewer switches to or reparses the document with 'Windows-1252'
again?

Obviously it would be better to avoid such misinterpretation by using
an encoding like UTF-8 not confused by the current HTML5 draft,
however due to the history of older projects or server configurations
it might be still convenient for many authors to continue to use
'ISO-8859-1' instead of other encodings, even if they switch for 
example from HTML4 to HTML5 for some documents.



Best wishes

Olaf

Received on Monday, 6 July 2009 16:08:22 UTC