Re: [CSS21] out of range unicode escapes from Mark Davis on 2007-05-31 (www-style@w3.org from May 2007)

From: Mark Davis <mark.davis@icu-project.org>
Date: Thu, 31 May 2007 08:43:04 -0700
To: "Paul Nelson (ATC)" <paulnel@winse.microsoft.com>
Cc: "David Clarke" <d.r.clarke@sheffield.ac.uk>, www-style@w3.org, public-i18n-core@w3.org, "Michel Suignard" <michelsu@windows.microsoft.com>
Message-ID: <30b660a20705310843u6295df4agc3b4e62745fade3@mail.gmail.com>

As Paul says, it depends very much on the context. FFFD works pretty well in
many cases. If in literal text, it displays, and shows that something *was*
there. If in the middle of syntax, because it isn't normally a syntax
character (=, (, ), ...) it usually causes a syntax error.

The really serious security problems are caused by simply *removing* an
illegal or invalid sequence, or to replace them by a character such as "?"
which has syntactic meaning in many contexts, and can thus cause serious
misinterpretations.

Mark

On 5/31/07, Paul Nelson (ATC) <paulnel@winse.microsoft.com> wrote:
>
>  Of course the issue is how one is consuming the stream of text coming in.
>
>
>
> For example, the text is going to be displayed it needs to be replaced.
> Thus, an error in an inline CSS property would have been replace if the
> .html file has an error as part of the initial parsing/converting to
> Unicode. If, however, the text is a .CSS file that is not displayed and the
> css property parser is parsing it is easy to throw a parsing error and move
> on.
>
>
>
> When it comes time to render in the UA, who cares about trying to render
> right if there is invalid Unicode escapes. Whether the character is
> converted or turned into a replacement character the result is the
> same…something other than what the author intended…unless they were a
> malicious person trying to crash your UA.
>
>
>
> Regards,
>
>
>
> Paul
>
>
>
> *From:* www-style-request@w3.org [mailto:www-style-request@w3.org] *On
> Behalf Of *David Clarke
> *Sent:* Thursday, May 31, 2007 4:25 PM
> *To:* Mark Davis
> *Cc:* www-style@w3.org; public-i18n-core@w3.org
> *Subject:* Re: [CSS21] out of range unicode escapes
>
>
>
> Mark et al,
>
> I stand corrected on the option of parsing of Unicode source sequences and
> use of the replacement character in general.
>
> As a personal opinion on this, it would seem logical to treat any
> unexpected character, or sequence of characters in CSS in the same way. This
> would be for a CSS parser to treat it as a parse error. This would provide a
> consistent approach, without adding special case complexity to a parser.
>
> I really feel that an invalid Unicode source sequence in a block of CSS is
> of the same nature as any other invalid sequence of characters. Replacing an
> invalid Unicode sequence with another character, is likely to hide errors,
> and produce an unintended result.
>
> Mark Davis wrote:
>
> This may be based on a mistaken premise. While the primary use of U+FFFD
> is as stated, it is also used as a replacement for ill-formed Unicode. See http://www.unicode.org/reports/tr22/
> for example.
>
> "In the case of illegal source sequences, a conversion routine will
> typically provide three options. It may stop with an error (or throw an
> exception). Secondly, it may skip the source sequence. While this is
> commonly an option, it can also hide corruption problems in the source text.
> Lastly, it may map to a substitution character such as the Unicode
> REPLACEMENT CHARACTER (U+FFFD)."
>
> Mark
>
>
> This behaviour is not appropriate because U+FFFD is specified as a
> Replacement Character to be "used as a substitute for an uninterpretable
> character *from another encoding*".
> see: http://unicode.org/glossary/#replacement_character .
>
> The correct response to any invalid Unicode escape should be to treat it
> as a parse error (see section 4.1.8), in the same way that any other
> invalid or unexpected character would be.
>
> For clarity Add this text to 4.1.3 at CSS 2.1
> http://www.w3.org/TR/CSS21/syndata.html#q6   :
>
>     If the number is outside the range allowed by Unicode (e.g.,
>     "\110000" is above the maximum 10FFFF allowed in current Unicode),
>     then the parser should treat this as parse error and A user agent
>     must ignore a declaration containing this invalid property name or
> value.
>
> see: http://www.w3.org/TR/CSS21/syndata.html#ignore
>
> ----
> David Clarke
>
>
>
>
>
>
>
> --
> Mark
>
>
>



-- 
Mark

Received on Thursday, 31 May 2007 15:43:13 UTC