RE: [CSS21] out of range unicode escapes from Paul Nelson (ATC) on 2007-06-09 (www-style@w3.org from June 2007)

From: Paul Nelson (ATC) <paulnel@winse.microsoft.com>
Date: Fri, 8 Jun 2007 21:11:47 -0700
To: David Clarke <d.r.clarke@sheffield.ac.uk>, Mark Davis <mark.davis@icu-project.org>
CC: <www-style@w3.org>, <public-i18n-core@w3.org>
Message-ID: <49C257E2C13F584790B2E302E021B6F91395AC6A@winse-msg-01.segroup.winse.corp.micros>

In the majority of cases, the author of the content is also the author of the CSS.

 

David, What are you proposing be done if invalid sequences are encountered? 

 

Paul

 

From: David Clarke [mailto:d.r.clarke@sheffield.ac.uk] 
Sent: Friday, June 01, 2007 12:56 PM
To: Mark Davis
Cc: Paul Nelson (ATC); www-style@w3.org; public-i18n-core@w3.org
Subject: Re: [CSS21] out of range unicode escapes

 

Stepping back a little. If I understand correctly, Paul Nelson is supporting my reasoning.

If an invalid sequence appears in the CSS, surely this will not be what the author intended? 

If the standard indicates that parsers should silently replace an invalid sequence by a character that may be valid (e.g. within some literal text), then validators ought to accept the invalid sequence.

If, on the other hand, it is treated like any other invalid sequence of characters, then the CSS will legitimately fail validation and signal that something needs to be corrected or, if being processed by a UA, ignored.

This is the same as specifying a non-existent colour name, the CSS declaration is ignored.


Mark Davis wrote: 

As Paul says, it depends very much on the context. FFFD works pretty well in many cases. If in literal text, it displays, and shows that something *was* there. If in the middle of syntax, because it isn't normally a syntax character (=, (, ), ...) it usually causes a syntax error. 

The really serious security problems are caused by simply *removing* an illegal or invalid sequence, or to replace them by a character such as "?" which has syntactic meaning in many contexts, and can thus cause serious misinterpretations. 

Mark

On 5/31/07, Paul Nelson (ATC) <paulnel@winse.microsoft.com> wrote: 

Of course the issue is how one is consuming the stream of text coming in.

 

For example, the text is going to be displayed it needs to be replaced. Thus, an error in an inline CSS property would have been replace if the .html file has an error as part of the initial parsing/converting to Unicode. If, however, the text is a .CSS file that is not displayed and the css property parser is parsing it is easy to throw a parsing error and move on.

 

When it comes time to render in the UA, who cares about trying to render right if there is invalid Unicode escapes. Whether the character is converted or turned into a replacement character the result is the same…something other than what the author intended…unless they were a malicious person trying to crash your UA.

 

Regards,

 

Paul


-- 
Mark

----
David Clarke

Received on Saturday, 9 June 2007 04:11:02 UTC