- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 6 Jun 2007 00:20:09 +0000 (UTC)
On Thu, 13 Oct 2005, Lachlan Hunt wrote: > > In HTML4, according to SGML rules, numeric character references in the > range from € to Ÿ are defined as UNUSED, which makes them > "non-SGML characters". Strictly speaking, it's not an error to refer to > these characters with character references (even the validator only > issues a warning: reference to a non-SGML character); but, AIUI, neither > SGML nor HTML4 assigns any meaning to them. > http://lachy.id.au/log/2005/10/char-refs > > Technically, these character references should really refer to the > Unicode control characters, but reality dictates otherwise for > text/html, thanks to IE and countless (poorly written) books and > tutorials. I, therefore, think the spec should say something along > these lines: > > In HTML, numeric and hexadecimal character references referring to > code positions in the range from 128 to 159 (0x80 to 0x9F) should be > re-mapped to code positions in the Unicode character repertoire > according to the CP1252 to Unicode table [CP1252]. This does not > apply to XHTML. Done. (With a must, and with an explicit table, since CP1252 doesn't define all those characters.) > HTML documents must not use numeric or hexadecimal character > references in this range, although browsers should support them for > backwards compatibility. Authors should instead refer to the correct > Unicode code position for these characters. Done. > Also, I think this would also be a nice conformance requirement to see for > authoring tools: > > HTML Authoring tools should automatically convert these character > references to either the equivalent Unicode code position or, if the > file's encoding supports it, the character itself, according to the > CP1252 to Unicode table [CP1252]. Not done, but it's redundant anyway since simply implementing the spec will do this automatically (the spec doesn't round-trip the out-of-range entities through the DOM). > None of that should apply to XHTML, since XML explicitly allows this > range in the production for Char and, as far as I'm aware, no XHTML UA > implements this buggy behaviour. Indeed. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 5 June 2007 17:20:09 UTC