- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Thu, 13 Oct 2005 02:51:46 +1000
Hi, This should probably belong in the parsing section, when you get up to writing it. In HTML4, according to SGML rules, numeric character references in the range from € to Ÿ are defined as UNUSED, which makes them "non-SGML characters". Strictly speaking, it's not an error to refer to these characters with character references (even the validator only issues a warning: reference to a non-SGML character); but, AIUI, neither SGML nor HTML4 assigns any meaning to them. http://lachy.id.au/log/2005/10/char-refs Technically, these character references should really refer to the Unicode control characters, but reality dictates otherwise for text/html, thanks to IE and countless (poorly written) books and tutorials. I, therefore, think the spec should say something along these lines: In HTML, numeric and hexadecimal character references referring to code positions in the range from 128 to 159 (0x80 to 0x9F) should be re-mapped to code positions in the Unicode character repertoire according to the CP1252 to Unicode table [CP1252]. This does not apply to XHTML. HTML documents must not use numeric or hexadecimal character references in this range, although browsers should support them for backwards compatibility. Authors should instead refer to the correct Unicode code position for these characters. Also, I think this would also be a nice conformance requirement to see for authoring tools: HTML Authoring tools should automatically convert these character references to either the equivalent Unicode code position or, if the file's encoding supports it, the character itself, according to the CP1252 to Unicode table [CP1252]. [CP1252] http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT None of that should apply to XHTML, since XML explicitly allows this range in the production for Char and, as far as I'm aware, no XHTML UA implements this buggy behaviour. -- Lachlan Hunt http://lachy.id.au/
Received on Wednesday, 12 October 2005 09:51:46 UTC