Re: [CSS21] out of range unicode escapes

This may be based on a mistaken premise. While the primary use of U+FFFD is
as stated, it is also used as a replacement for ill-formed Unicode. See
http://www.unicode.org/reports/tr22/ for example.

"In the case of illegal source sequences, a conversion routine will
typically provide three options. It may stop with an error (or throw an
exception). Secondly, it may skip the source sequence. While this is
commonly an option, it can also hide corruption problems in the source text.
Lastly, it may map to a substitution character such as the Unicode
REPLACEMENT CHARACTER (U+FFFD)."

Mark

On 5/15/07, David Clarke <d.r.clarke@sheffield.ac.uk> wrote:
>
> This is cross posted to the public internationalisation core.
>
> On the I18N core, we have had some discussion on this topic, an this is
> the result of our discussion:
>
> In http://lists.w3.org/Archives/Public/www-style/2007Apr/0045.html the
> CSS WG propose replacing all out of range Unicode escapes with
> the "replacement character" (U+FFFD).
>
> This behaviour is not appropriate because U+FFFD is specified as a
> Replacement Character to be "used as a substitute for an uninterpretable
> character *from another encoding*".
> see: http://unicode.org/glossary/#replacement_character .
>
> The correct response to any invalid Unicode escape should be to treat it
> as a parse error (see section 4.1.8), in the same way that any other
> invalid or unexpected character would be.
>
> For clarity Add this text to 4.1.3 at CSS 2.1
> http://www.w3.org/TR/CSS21/syndata.html#q6  :
>
>     If the number is outside the range allowed by Unicode (e.g.,
>     "\110000" is above the maximum 10FFFF allowed in current Unicode),
>     then the parser should treat this as parse error and A user agent
>     must ignore a declaration containing this invalid property name or
> value.
>
> see: http://www.w3.org/TR/CSS21/syndata.html#ignore
>
> ----
> David Clarke
>
>
>
>
>
>


-- 
Mark

Received on Tuesday, 15 May 2007 16:15:59 UTC