Re: Proposed revision of CSS2.1 description of backslash escapes from L. David Baron on 2010-06-30 (www-style@w3.org from June 2010)

From: L. David Baron <dbaron@dbaron.org>
Date: Tue, 29 Jun 2010 18:47:47 -0700
To: Zack Weinberg <zweinberg@mozilla.com>
Cc: www-style@w3.org, fantasai <fantasai.lists@inkedblade.net>
Message-ID: <20100630014747.GA27484@pickering.dbaron.org>

In http://lists.w3.org/Archives/Public/www-style/2010Feb/0221.html ,
Zack Weinberg wrote:
>   <li><p>Backslash (\) characters are not significant inside
>       <a href="#comments">comments</a>.  Elsewhere, they
>       introduce <span class="index-def" title="backslash
>       escapes"><a name="escaped-characters"><dfn>character
>       escapes</dfn></a></span>.</p>

As an introductory piece of text, I think this is hard to scan,
since it puts the main point inside an "Elsewhere" clause.  I think
it would be clearer written as (with the same links as above):
  # Backslash (\) characters introduce character escapes, except
  # inside of comments, where they are not significant.

Your proposal includes the text "normal character" and later "normal
punctuation character", which isn't a defined term.  I think you
mean "tokenized as a single-character DELIM token", though there
might be a better way to say that.

The rewording of the rules on escapes for the zero codepoint removes
the authoring conformance requirement in the old text "must not be
zero".  I think this could be solved by replacing:
  # If a hexadecimal escape would insert the character with code
  # point U+0000, the behavior is undefined.
with:
  # Style sheets must not contain escapes that would insert the code
  # point U+0000.  If a user-agent encounters such an escape, the
  # behavior is undefined.

Your proposal also erroneously drops this part of the current text:
  # In this case, user agents should treat a "CR/LF" pair
  # (U+000D/U+000A) as a single white space character.

It would probably also be good to reincorporate these pieces of the
current text:
  # Note that this means that a "real" space after the escape sequence
  # must itself either be escaped or doubled. 

I think it would also be beneficial in the introductory paragraph to
point out that there are three types of escaping: causing a newline
to be ignored, canceling the meaning of special characters, and
inserting a character by codepoint.

Otherwise the proposal seems fine, although:
 * I suspect others will find further issues,
 * I'm not sure such a big rewrite is really necessary, and
 * it does have the usual problem, present throughout CSS 2.1, of
   not specifying conformance requirements clearly using RFC2119
   keywords, and not clearly distinguishing conformance requirements
   on different parties (style sheets, user-agents, etc.).
(I wonder whether it would be better to try to keep more of the
current text as a statement of style sheet conformance and then
write a separate statement of processor conformance.)

-David

-- 
L. David Baron                                 http://dbaron.org/
Mozilla Corporation                       http://www.mozilla.com/

Received on Wednesday, 30 June 2010 01:48:21 UTC