Re: [CSS21] out of range unicode escapes from David Clarke on 2007-05-31 (www-style@w3.org from May 2007)

From: David Clarke <d.r.clarke@sheffield.ac.uk>
Date: Thu, 31 May 2007 09:24:59 +0100
To: Mark Davis <mark.davis@icu-project.org>
CC: www-style@w3.org, public-i18n-core@w3.org
Message-ID: <465E865B.7090206@sheffield.ac.uk>

Mark et al,

I stand corrected on the option of parsing of Unicode source sequences
and use of the replacement character in general.

As a personal opinion on this, it would seem logical to treat any
unexpected character, or sequence of characters in CSS in the same way.
This would be for a CSS parser to treat it as a parse error. This would
provide a consistent approach, without adding special case complexity to
a parser.

I really feel that an invalid Unicode source sequence in a block of CSS
is of the same nature as any other invalid sequence of characters.
Replacing an invalid Unicode sequence with another character, is likely
to hide errors, and produce an unintended result.

Mark Davis wrote:
> This may be based on a mistaken premise. While the primary use of
> U+FFFD is as stated, it is also used as a replacement for ill-formed
> Unicode. See http://www.unicode.org/reports/tr22/
> <http://www.unicode.org/reports/tr22/> for example.
>
> "In the case of illegal source sequences, a conversion routine will
> typically provide three options. It may stop with an error (or throw
> an exception). Secondly, it may skip the source sequence. While this
> is commonly an option, it can also hide corruption problems in the
> source text. Lastly, it may map to a substitution character such as
> the Unicode REPLACEMENT CHARACTER (U+FFFD)."
>
> Mark
>
>
>     This behaviour is not appropriate because U+FFFD is specified as a
>     Replacement Character to be "used as a substitute for an
>     uninterpretable
>     character *from another encoding*".
>     see: http://unicode.org/glossary/#replacement_character .
>
>     The correct response to any invalid Unicode escape should be to
>     treat it
>     as a parse error (see section 4.1.8), in the same way that any other
>     invalid or unexpected character would be.
>
>     For clarity Add this text to 4.1.3 at CSS 2.1
>     http://www.w3.org/TR/CSS21/syndata.html#q6
>     <http://www.w3.org/TR/CSS21/syndata.html#q6>  :
>
>         If the number is outside the range allowed by Unicode (e.g.,
>         "\110000" is above the maximum 10FFFF allowed in current Unicode),
>         then the parser should treat this as parse error and A user agent
>         must ignore a declaration containing this invalid property name or
>     value.
>
>     see: http://www.w3.org/TR/CSS21/syndata.html#ignore
>
>     ----
>     David Clarke
>
>
>
>
>
>
>
>
> -- 
> Mark

Received on Thursday, 31 May 2007 08:25:38 UTC