Re: non sgml characters

On Aug 7, 2007, at 19:56 , Jukka K. Korpela wrote:
> When you have octet 146 in a document declared to be iso-8859-1  
> encoded, it is interpreted as denoting a control code in the C1  
> Controls area. The meanings of those control codes have not been  
> defined in the ISO 8859-1 standard, but they correspond to the C1  
> Controls area of Unicode, so that e.g. 146 decimal (92 hexadecimal)  
> maps to the Unicode character U+0092.
> Such characters (code positions) are forbidden in HTML 4.01 (or any  
> pre-XHTML version of HTML), so the validator correctly reports them  
> as erroneous ("non SGML character"). However, in XML, and hence in  
> XHTML, C1 Controls like U+0092 are allowed, though discouraged.  
> Formally, thus, they cannot be reported as errors.

Exactly. This was (and still is to me, I can't claim to fully grasp  
it yet) a hairy issue which I think we settled in
http://www.w3.org/Bugs/Public/show_bug.cgi?id=3164

-- 
olivier

Received on Wednesday, 8 August 2007 06:36:01 UTC