Re: [CSS21] out of range unicode escapes

As Paul says, it depends very much on the context. FFFD works pretty well in
many cases. If in literal text, it displays, and shows that something *was*
there. If in the middle of syntax, because it isn't normally a syntax
character (=, (, ), ...) it usually causes a syntax error.

The really serious security problems are caused by simply *removing* an
illegal or invalid sequence, or to replace them by a character such as "?"
which has syntactic meaning in many contexts, and can thus cause serious
misinterpretations.

Mark

On 5/31/07, Paul Nelson (ATC) <paulnel@winse.microsoft.com> wrote:
>
>  Of course the issue is how one is consuming the stream of text coming in.
>
>
>
> For example, the text is going to be displayed it needs to be replaced.
> Thus, an error in an inline CSS property would have been replace if the
> .html file has an error as part of the initial parsing/converting to
> Unicode. If, however, the text is a .CSS file that is not displayed and the
> css property parser is parsing it is easy to throw a parsing error and move
> on.
>
>
>
> When it comes time to render in the UA, who cares about trying to render
> right if there is invalid Unicode escapes. Whether the character is
> converted or turned into a replacement character the result is the
> sameā€¦something other than what the author intendedā€¦unless they were a
> malicious person trying to crash your UA.
>
>
>
> Regards,
>
>
>
> Paul
>
>
>
> *From:* www-style-request@w3.org [mailto:www-style-request@w3.org] *On
> Behalf Of *David Clarke
> *Sent:* Thursday, May 31, 2007 4:25 PM
> *To:* Mark Davis
> *Cc:* www-style@w3.org; public-i18n-core@w3.org
> *Subject:* Re: [CSS21] out of range unicode escapes
>
>
>
> Mark et al,
>
> I stand corrected on the option of parsing of Unicode source sequences and
> use of the replacement character in general.
>
> As a personal opinion on this, it would seem logical to treat any
> unexpected character, or sequence of characters in CSS in the same way. This
> would be for a CSS parser to treat it as a parse error. This would provide a
> consistent approach, without adding special case complexity to a parser.
>
> I really feel that an invalid Unicode source sequence in a block of CSS is
> of the same nature as any other invalid sequence of characters. Replacing an
> invalid Unicode sequence with another character, is likely to hide errors,
> and produce an unintended result.
>
> Mark Davis wrote:
>
> This may be based on a mistaken premise. While the primary use of U+FFFD
> is as stated, it is also used as a replacement for ill-formed Unicode. See http://www.unicode.org/reports/tr22/
> for example.
>
> "In the case of illegal source sequences, a conversion routine will
> typically provide three options. It may stop with an error (or throw an
> exception). Secondly, it may skip the source sequence. While this is
> commonly an option, it can also hide corruption problems in the source text.
> Lastly, it may map to a substitution character such as the Unicode
> REPLACEMENT CHARACTER (U+FFFD)."
>
> Mark
>
>
> This behaviour is not appropriate because U+FFFD is specified as a
> Replacement Character to be "used as a substitute for an uninterpretable
> character *from another encoding*".
> see: http://unicode.org/glossary/#replacement_character .
>
> The correct response to any invalid Unicode escape should be to treat it
> as a parse error (see section 4.1.8), in the same way that any other
> invalid or unexpected character would be.
>
> For clarity Add this text to 4.1.3 at CSS 2.1
> http://www.w3.org/TR/CSS21/syndata.html#q6   :
>
>     If the number is outside the range allowed by Unicode (e.g.,
>     "\110000" is above the maximum 10FFFF allowed in current Unicode),
>     then the parser should treat this as parse error and A user agent
>     must ignore a declaration containing this invalid property name or
> value.
>
> see: http://www.w3.org/TR/CSS21/syndata.html#ignore
>
> ----
> David Clarke
>
>
>
>
>
>
>
> --
> Mark
>
>
>



-- 
Mark

Received on Thursday, 31 May 2007 15:43:21 UTC