- From: Mark Davis <mark.davis@icu-project.org>
- Date: Thu, 31 May 2007 08:43:04 -0700
- To: "Paul Nelson (ATC)" <paulnel@winse.microsoft.com>
- Cc: "David Clarke" <d.r.clarke@sheffield.ac.uk>, www-style@w3.org, public-i18n-core@w3.org, "Michel Suignard" <michelsu@windows.microsoft.com>
- Message-ID: <30b660a20705310843u6295df4agc3b4e62745fade3@mail.gmail.com>
As Paul says, it depends very much on the context. FFFD works pretty well in many cases. If in literal text, it displays, and shows that something *was* there. If in the middle of syntax, because it isn't normally a syntax character (=, (, ), ...) it usually causes a syntax error. The really serious security problems are caused by simply *removing* an illegal or invalid sequence, or to replace them by a character such as "?" which has syntactic meaning in many contexts, and can thus cause serious misinterpretations. Mark On 5/31/07, Paul Nelson (ATC) <paulnel@winse.microsoft.com> wrote: > > Of course the issue is how one is consuming the stream of text coming in. > > > > For example, the text is going to be displayed it needs to be replaced. > Thus, an error in an inline CSS property would have been replace if the > .html file has an error as part of the initial parsing/converting to > Unicode. If, however, the text is a .CSS file that is not displayed and the > css property parser is parsing it is easy to throw a parsing error and move > on. > > > > When it comes time to render in the UA, who cares about trying to render > right if there is invalid Unicode escapes. Whether the character is > converted or turned into a replacement character the result is the > sameā¦something other than what the author intendedā¦unless they were a > malicious person trying to crash your UA. > > > > Regards, > > > > Paul > > > > *From:* www-style-request@w3.org [mailto:www-style-request@w3.org] *On > Behalf Of *David Clarke > *Sent:* Thursday, May 31, 2007 4:25 PM > *To:* Mark Davis > *Cc:* www-style@w3.org; public-i18n-core@w3.org > *Subject:* Re: [CSS21] out of range unicode escapes > > > > Mark et al, > > I stand corrected on the option of parsing of Unicode source sequences and > use of the replacement character in general. > > As a personal opinion on this, it would seem logical to treat any > unexpected character, or sequence of characters in CSS in the same way. This > would be for a CSS parser to treat it as a parse error. This would provide a > consistent approach, without adding special case complexity to a parser. > > I really feel that an invalid Unicode source sequence in a block of CSS is > of the same nature as any other invalid sequence of characters. Replacing an > invalid Unicode sequence with another character, is likely to hide errors, > and produce an unintended result. > > Mark Davis wrote: > > This may be based on a mistaken premise. While the primary use of U+FFFD > is as stated, it is also used as a replacement for ill-formed Unicode. See http://www.unicode.org/reports/tr22/ > for example. > > "In the case of illegal source sequences, a conversion routine will > typically provide three options. It may stop with an error (or throw an > exception). Secondly, it may skip the source sequence. While this is > commonly an option, it can also hide corruption problems in the source text. > Lastly, it may map to a substitution character such as the Unicode > REPLACEMENT CHARACTER (U+FFFD)." > > Mark > > > This behaviour is not appropriate because U+FFFD is specified as a > Replacement Character to be "used as a substitute for an uninterpretable > character *from another encoding*". > see: http://unicode.org/glossary/#replacement_character . > > The correct response to any invalid Unicode escape should be to treat it > as a parse error (see section 4.1.8), in the same way that any other > invalid or unexpected character would be. > > For clarity Add this text to 4.1.3 at CSS 2.1 > http://www.w3.org/TR/CSS21/syndata.html#q6 : > > If the number is outside the range allowed by Unicode (e.g., > "\110000" is above the maximum 10FFFF allowed in current Unicode), > then the parser should treat this as parse error and A user agent > must ignore a declaration containing this invalid property name or > value. > > see: http://www.w3.org/TR/CSS21/syndata.html#ignore > > ---- > David Clarke > > > > > > > > -- > Mark > > > -- Mark
Received on Thursday, 31 May 2007 15:43:21 UTC