- From: olivier Thereaux <ot@w3.org>
- Date: Wed, 8 Aug 2007 15:36:42 +0900
- To: Jukka K.Korpela <jkorpela@cs.tut.fi>
- Cc: Cristina Fiorentini <c.fiorentini@comune.fe.it>, www-validator@w3.org
On Aug 7, 2007, at 19:56 , Jukka K. Korpela wrote: > When you have octet 146 in a document declared to be iso-8859-1 > encoded, it is interpreted as denoting a control code in the C1 > Controls area. The meanings of those control codes have not been > defined in the ISO 8859-1 standard, but they correspond to the C1 > Controls area of Unicode, so that e.g. 146 decimal (92 hexadecimal) > maps to the Unicode character U+0092. > Such characters (code positions) are forbidden in HTML 4.01 (or any > pre-XHTML version of HTML), so the validator correctly reports them > as erroneous ("non SGML character"). However, in XML, and hence in > XHTML, C1 Controls like U+0092 are allowed, though discouraged. > Formally, thus, they cannot be reported as errors. Exactly. This was (and still is to me, I can't claim to fully grasp it yet) a hairy issue which I think we settled in http://www.w3.org/Bugs/Public/show_bug.cgi?id=3164 -- olivier
Received on Wednesday, 8 August 2007 06:36:01 UTC