Re: [CSS21] out of range unicode escapes

On Mon, 16 Apr 2007, Bert Bos wrote:

> The CSS WG decided as follows on Björn Höhrmann's comment[1] about
> Unicode (numerical) escapes outside the legal Unicode range:
>
>  - Add this text to 4.1.3:
>
>    If the number is outside the range allowed by Unicode (e.g.,
>    "\110000" is above the maximum 10FFFF allowed in current Unicode),
>    the UA may replace the escape with the "replacement character"
>    (U+FFFD). If the character is to be displayed, the UA should show a
>    visible symbol, such as a "missing character" glyph (cf. 15.2, point
>    5).

The wording "current Unicode" sounds odd, since the Unicode Consortium has 
agreed that no characters will ever be assigned past 10FFFF. If they 
change this decision, it will be a different Unicode then.

I don't see why \110000 would be treated as anything but a malformed 
value, to be ignored, if you specify some fixed error processing for it.

Specifically, using U+FFFD is not suitable, since it's the replacement 
character to be used when data has been converted from some other 
character code and a particular character has no Unicode counterpart. This 
is quite different from having an out of range reference. If there has 
actually been some code conversion (so that U+FFFD might be adequate), 
then the data should of course be \ufffd and not something like \110000.

In practical terms, \110000 probably results from a typo (e.g., some digit 
repeated too many times), so I'd compare it with e.g. the string #fffffff 
appearing where a color value is expected.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Monday, 16 April 2007 18:00:33 UTC