On Aug 7, 2007, at 19:56 , Jukka K. Korpela wrote: > When you have octet 146 in a document declared to be iso-8859-1 > encoded, it is interpreted as denoting a control code in the C1 > Controls area. The meanings of those control codes have not been > defined in the ISO 8859-1 standard, but they correspond to the C1 > Controls area of Unicode, so that e.g. 146 decimal (92 hexadecimal) > maps to the Unicode character U+0092. > Such characters (code positions) are forbidden in HTML 4.01 (or any > pre-XHTML version of HTML), so the validator correctly reports them > as erroneous ("non SGML character"). However, in XML, and hence in > XHTML, C1 Controls like U+0092 are allowed, though discouraged. > Formally, thus, they cannot be reported as errors. Exactly. This was (and still is to me, I can't claim to fully grasp it yet) a hairy issue which I think we settled in http://www.w3.org/Bugs/Public/show_bug.cgi?id=3164 -- olivierReceived on Wednesday, 8 August 2007 06:36:01 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:25 GMT