- From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
- Date: Fri, 31 Jul 2009 12:53:27 +0200
- To: public-html-comments@w3.org
Hello, within the last ten or more years I have already a lot of experience especially with german authors (they typically want to use Umlaute and the ß-ligature and sometimes the Euro-sign) and their problems. Every month I have to explain again, how to distinguish between UTF-8 and ISO-8859-1(5) and that ISO-8859-1 has no BOM - respectively that this is displayed as visible characters in some browser versions etc. One has to explain, that the indication of the server (stupid or correct) is more relevant than indications within the document and how to get it correct with PHP or with the .htaccess file of the Apache web-server and that one cannot change the encoding within one document. And of course, I still have in mind the behaviour of the browsers I used years ago, when the Euro was introduced in europe and some authors tried to use the Euro-sign without masking within ISO-8859-1. There was a clear indication, that this does not work in these legacy browsers, therefore it is not true, that the interpretation of 'ISO-8859-1' as 'Windows-1252' is compatible with older browsers - they/some had no bug and indicated the not representable character for example with a question mark, a box etc. And indeed, it was simple to explain, that the author either has to use another encoding or has to mask the character to fix his/her bug. This simple approach is maybe corrupted now with bugs in current versions of browsers. 'HTML5' seems to introduce a new rule how to identify the encoding. The encoding problem is obviously already hardly understandable for many authors. Suddenly this new rule with some opaque method to identify 'HTML5' documents complicates the situation even more and makes it much harder to explain, what to do to get a well defined document or script output and how to fix bugs. Therefore the main questions remain open up to here: 1. How to indicate the 'ISO-8859-1' encoding within an 'HTML5' document and not 'Windows-1252', if an author wants to specify 'ISO-8859-1' and nothing else? 2. How does a proper viewer/browser identify, that a document is 'HTML5' and that this specific rule has to be applied, if 'ISO-8859-1' is indicated. 3. At which point the encoding information switches from the information given by the server or the XML processing instruction to the specific rule of 'HTML5' to interprete the string 'ISO-8859-1' as indication for 'Windows-1252'? Indeed, up to here, this is all about encoding information, not how a document is decoded by the viewer (buggy or not). Olaf
Received on Friday, 31 July 2009 11:16:52 UTC