W3C home > Mailing lists > Public > public-html-comments@w3.org > July 2009

Re: [HTML5] 2.8 Character encodings

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 20 Jul 2009 08:56:48 +0000 (UTC)
To: "Dr. Olaf Hoffmann" <Dr.O.Hoffmann@gmx.de>
Cc: public-html-comments@w3.org
Message-ID: <Pine.LNX.4.62.0907200853070.23663@hixie.dreamhostps.com>
On Mon, 6 Jul 2009, Dr. Olaf Hoffmann wrote:
> 
> in the current draft are mentioned in 2.8
> http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#character-encodings-0
> some 'willful' misinterpretations of encoding information, for example 
> to interprete a string like 'ISO-8859-1' as 'Windows-1252'.
>
> 1. Which string has an author to note, if he really wants to indicate, that
> the encoding is for example 'ISO-8859-1' and not 'Windows-1252'?

"ISO-8859-1". If the author has really used that encoding, then there is 
no difference between them (1252 is a superset).


> 2. As far as I have seen, HTML5 has no version indication like previous
> versions of HTML had and other popular formats like SVG have.
> How can a browser identify, that a document is really intended as
> 'HTML5' with the implicated  'willful' misinterpretations of encoding
> information and no other HTMLversion?

It doesn't matter, all versions of HTML are in practice processed with 
these mappings. It is indeed why HTML5 has these mappings -- because 
browsers already did this. We wouldn't add these mappings if we didn't 
have to to handle legacy content (content in previous versions of HTML).


> Assuming that a viewer is able to identify a document somehow being a 
> HTML5 document after looking into the content and for example a server 
> sended 'ISO-8859-1' before, does this mean, that the viewer switches to 
> or reparses the document with 'Windows-1252' again?

I don't understand the question.


> Obviously it would be better to avoid such misinterpretation by using an 
> encoding like UTF-8 not confused by the current HTML5 draft, however due 
> to the history of older projects or server configurations it might be 
> still convenient for many authors to continue to use 'ISO-8859-1' 
> instead of other encodings, even if they switch for example from HTML4 
> to HTML5 for some documents.

Hopefully my answers above will reassure you that this is not in fact a 
problem that authors will face.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 20 July 2009 08:57:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 June 2011 00:13:59 GMT