- From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
- Date: Tue, 4 Aug 2009 15:45:36 +0200
- To: public-html-comments@w3.org
Julian Reschke: > Dr. Olaf Hoffmann wrote: > > ... > > I can write the string, but indeed, if I do it, it means 'Windows-1252'. > > Therefore effectively, I cannot indicate, that something is > > 'ISO-8859-1' and not 'Windows-1252'. > > ... > > Olaf, from what you write it's not totally clear that you realize that > ISO-8859-1 is a proper subset of Windows-1252? > This seems to fit to what is noted for example at wikipedia for ISO/IEC 8859-1, not for ISO-8859-1, these are different for some control characters. Therefore ISO-8859-1 and Windows-1252 seem to be a superset of ISO/IEC 8859-1, but ISO-8859-1 is not a subset of Windows-1252, but the conflicting characters are typically not used in documents with correctly indicated ISO-8859-1 encoding. http://en.wikipedia.org/wiki/ISO/IEC_8859-1. What I personally use in (X)HTML is typically the ASCII subset, where I do not even have to care about differences between 'ISO-8859-1' and 'UTF-8' - if a server administrator (or a browser implementor) decides something surprising. However, because masking of special characters like Umlaute and the ß-ligature is not always available in the (X)HTML style and for example for an XML parser like Opera it seems to depend on the XHTML-version/doctype, if the (X)HTML predefined entities are known, I have to switch to more critical things. Many other others rely already for many years on unmasked special characters. And if they want to use and indicate 'ISO-8859-1', this should be possible. No problem too, if they want to use and indicate 'Windows-1252' as it is not problem to use and indicate 'UTF-8' - but mixing up this in a specification means basically confusion for some authors reading this, especially for those, who already have problems to indicate the encoding they used properly. Therefore it is not a big practical problem in what browsers currently do, if 'ISO-8859-1' is specified (if 'ISO-8859-1' is used). It is more, that readers of the draft are confused by the wording. If it is noted something like: "If 'ISO-8859-1' etc is indicated for a document encoding, a HTML5 praser will/may use for the presentable characters 'Windows-1252' for decoding." This would be already less confusing and does not change the meaning of the string or document, it describes only the behaviour or the parser, what is a difference. For more advanced authors one may add something like "Due to this rule, there may be no indication of a wrong encoding information. Some not presentable control characters of 'ISO-8859-1' might be presented as presentable characters according to Windows-1252." This indicates maybe the intended reason for this behaviour and indicates too, that such parsers should not be used to check proper encoding/decoding (what is nevertheless done by many authors with known consequences ;o) > So the only difference would be the ability to diagnose problems in > documents that claim to be ISO-8859-1, but actually use C1 control codes. > > That being said, I do agree with Larry that the spec should phrase it > differently. > > BR, Julian
Received on Tuesday, 4 August 2009 15:12:47 UTC