- From: Zack Weinberg <zweinberg@mozilla.com>
- Date: Thu, 15 Jul 2010 12:49:38 -0700
- To: W3C Emailing list for WWW Style <www-style@w3.org>, L. David Baron <dbaron@dbaron.org>, fantasai <fantasai@inkedblade.net>
[ I never received dbaron's response to my proposal, so I'm replying to
the text at
http://lists.w3.org/Archives/Public/www-style/2010Jun/0658.html - sorry
for breaking threading ]
> In http://lists.w3.org/Archives/Public/www-style/2010Feb/0221.html ,
> Zack Weinberg wrote:
> > <li><p>Backslash (\) characters are not significant inside
> > <a href="#comments">comments</a>. Elsewhere, they
> > introduce <span class="index-def" title="backslash
> > escapes"><a name="escaped-characters"><dfn>character
> > escapes</dfn></a></span>.</p>
>
> As an introductory piece of text, I think this is hard to scan,
> since it puts the main point inside an "Elsewhere" clause. I think
> it would be clearer written as (with the same links as above):
> # Backslash (\) characters introduce character escapes, except
> # inside of comments, where they are not significant.
Good point. I'd be fine with that change.
> Your proposal includes the text "normal character" and later "normal
> punctuation character", which isn't a defined term. I think you
> mean "tokenized as a single-character DELIM token", though there
> might be a better way to say that.
That's not quite what I mean -- inside a string, for instance, it
wouldn't be tokenized as a DELIM. I don't immediately see a less
clunky way to put it, alas.
> The rewording of the rules on escapes for the zero codepoint removes
> the authoring conformance requirement in the old text "must not be
> zero". I think this could be solved by replacing:
> # If a hexadecimal escape would insert the character with code
> # point U+0000, the behavior is undefined.
> with:
> # Style sheets must not contain escapes that would insert the code
> # point U+0000. If a user-agent encounters such an escape, the
> # behavior is undefined.
Ok, except that HTML5 now requires U+0000 to be converted to U+FFFD
very early in processing (I believe this is still a "parse error" in
HTML5 terms, i.e. an authoring conformance violation, but whatwg.org
is down right now and I can't find the parser algorithm on the W3C
site), so it is tempting to make CSS2.1 match:
# Style sheets must not contain escapes that would insert the code
# point U+0000. If a user-agent encounters such an escape, it is to
# insert the REPLACEMENT CHARACTER, U+FFFD, instead.
and, perhaps as an additional bullet point to the list of "The
following rules always hold":
# Style sheets must not contain the character with code point U+0000,
# or characters in the range U+D800--U+D8FF (except as properly
# encoded UTF-16 surrogate pairs). If a user-agent encounters any of
# these characters, it is to behave as if it had encountered the
# REPLACEMENT CHARACTER, U+FFFD, instead.
(I also brought this up in
http://lists.w3.org/Archives/Public/www-style/2010Jun/0109.html .)
> Your proposal also erroneously drops this part of the current text:
> # In this case, user agents should treat a "CR/LF" pair
> # (U+000D/U+000A) as a single white space character.
Good catch. That was unintentional.
> It would probably also be good to reincorporate these pieces of the
> current text:
> # Note that this means that a "real" space after the escape sequence
> # must itself either be escaped or doubled.
Hang on, that's not quite right. a\26 \ x is the same identifier as
a\26\20x, whereas a\26 x is two identifiers, [a\26] [x], yes? So it
shouldn't say "escaped". But apart from that, I would be fine with
putting the note back.
> I think it would also be beneficial in the introductory paragraph to
> point out that there are three types of escaping: causing a newline
> to be ignored, canceling the meaning of special characters, and
> inserting a character by codepoint.
>
> Otherwise the proposal seems fine, although:
> * I suspect others will find further issues,
> * I'm not sure such a big rewrite is really necessary, and
> * it does have the usual problem, present throughout CSS 2.1, of
> not specifying conformance requirements clearly using RFC2119
> keywords, and not clearly distinguishing conformance requirements
> on different parties (style sheets, user-agents, etc.).
> (I wonder whether it would be better to try to keep more of the
> current text as a statement of style sheet conformance and then
> write a separate statement of processor conformance.)
I'd be okay with a much smaller patch. I didn't like my previous
attempts to just insert the new normative requirements without also
revising the whole section, but here's another go at it:
* Replace "indicates three types of character escapes" with "may
indicate one of three types of character escape. Inside a CSS
comment, a backslash has no special meaning, and if a backslash is
immediately followed by the end of the style sheet, it also has no
special meaning."
* Append "Outside a string, a backslash followed by a newline has no
special meaning." to the paragraph beginning "First, inside a
string".
* Delete "Except within CSS comments" from the paragraph beginning
"Second, it cancels".
* Delete ", where allowed," from the note at the bottom of the
section.
* Append this text to the first paragraph of the note at the bottom of
the section: "When a backslash has 'no special meaning', it is
tokenized like any other punctuation character without special
meaning: as part of a comment, part of a string, or as a DELIM,
based on the context."
* Possibly change "must itself either be escaped or doubled" to "must
be doubled", but this is a nitpick on a non-normative aside.
How does that sound?
zw
Received on Thursday, 15 July 2010 19:50:13 UTC