[whatwg] Parsing Numeric Character References from Ian Hickson on 2007-06-06 (public-whatwg-archive@w3.org from June 2007)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 6 Jun 2007 22:38:45 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.0706062231260.9191@dhalsim.dreamhost.com>

On Sun, 12 Mar 2006, Lachlan Hunt wrote:
> 
> [The spec] does not cover [entities for] the characters in the range 
> from #x80 to #x9F, which have historically been treated as code points 
> from the Windows-1252 repertoire, rather than the control characters 
> from Unicode.  AFAIK, this is already interoperably implemented in all 
> browsers.

Fixed.

> Characters in the range from #x01 to #x19 (except for whitespace 
> characters) are not treated interoperably across platforms.  On Windows, 
> Firefox, IE and Opera all displayed characters from some repertoire I 
> couldn't identify.  But on Mac: all the browsers displayed either 
> nothing or a box (a place holder character).  I think these should all 
> return U+FFFD.

They return the appropriate <control> characters from Unicode. The reason 
they render on some platforms is that the fonts on some platforms (Windows 
in particular) have glyphs in those positions.

> The use of characters in either of these ranges should be an easy parse 
> error.

I've made the first set a parse error, since those actually don't 
roundtrip as one mights expect. But the x01-x19 entities roundtrip fine, 
they just render funkily. We could define something special about these 
characters in the rendering section, but I don't think they should be 
parse errors. Do you agree?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 6 June 2007 15:38:45 UTC