- From: Zack Weinberg <zweinberg@mozilla.com>
- Date: Thu, 15 Jul 2010 12:49:38 -0700
- To: W3C Emailing list for WWW Style <www-style@w3.org>, L. David Baron <dbaron@dbaron.org>, fantasai <fantasai@inkedblade.net>
[ I never received dbaron's response to my proposal, so I'm replying to the text at http://lists.w3.org/Archives/Public/www-style/2010Jun/0658.html - sorry for breaking threading ] > In http://lists.w3.org/Archives/Public/www-style/2010Feb/0221.html , > Zack Weinberg wrote: > > <li><p>Backslash (\) characters are not significant inside > > <a href="#comments">comments</a>. Elsewhere, they > > introduce <span class="index-def" title="backslash > > escapes"><a name="escaped-characters"><dfn>character > > escapes</dfn></a></span>.</p> > > As an introductory piece of text, I think this is hard to scan, > since it puts the main point inside an "Elsewhere" clause. I think > it would be clearer written as (with the same links as above): > # Backslash (\) characters introduce character escapes, except > # inside of comments, where they are not significant. Good point. I'd be fine with that change. > Your proposal includes the text "normal character" and later "normal > punctuation character", which isn't a defined term. I think you > mean "tokenized as a single-character DELIM token", though there > might be a better way to say that. That's not quite what I mean -- inside a string, for instance, it wouldn't be tokenized as a DELIM. I don't immediately see a less clunky way to put it, alas. > The rewording of the rules on escapes for the zero codepoint removes > the authoring conformance requirement in the old text "must not be > zero". I think this could be solved by replacing: > # If a hexadecimal escape would insert the character with code > # point U+0000, the behavior is undefined. > with: > # Style sheets must not contain escapes that would insert the code > # point U+0000. If a user-agent encounters such an escape, the > # behavior is undefined. Ok, except that HTML5 now requires U+0000 to be converted to U+FFFD very early in processing (I believe this is still a "parse error" in HTML5 terms, i.e. an authoring conformance violation, but whatwg.org is down right now and I can't find the parser algorithm on the W3C site), so it is tempting to make CSS2.1 match: # Style sheets must not contain escapes that would insert the code # point U+0000. If a user-agent encounters such an escape, it is to # insert the REPLACEMENT CHARACTER, U+FFFD, instead. and, perhaps as an additional bullet point to the list of "The following rules always hold": # Style sheets must not contain the character with code point U+0000, # or characters in the range U+D800--U+D8FF (except as properly # encoded UTF-16 surrogate pairs). If a user-agent encounters any of # these characters, it is to behave as if it had encountered the # REPLACEMENT CHARACTER, U+FFFD, instead. (I also brought this up in http://lists.w3.org/Archives/Public/www-style/2010Jun/0109.html .) > Your proposal also erroneously drops this part of the current text: > # In this case, user agents should treat a "CR/LF" pair > # (U+000D/U+000A) as a single white space character. Good catch. That was unintentional. > It would probably also be good to reincorporate these pieces of the > current text: > # Note that this means that a "real" space after the escape sequence > # must itself either be escaped or doubled. Hang on, that's not quite right. a\26 \ x is the same identifier as a\26\20x, whereas a\26 x is two identifiers, [a\26] [x], yes? So it shouldn't say "escaped". But apart from that, I would be fine with putting the note back. > I think it would also be beneficial in the introductory paragraph to > point out that there are three types of escaping: causing a newline > to be ignored, canceling the meaning of special characters, and > inserting a character by codepoint. > > Otherwise the proposal seems fine, although: > * I suspect others will find further issues, > * I'm not sure such a big rewrite is really necessary, and > * it does have the usual problem, present throughout CSS 2.1, of > not specifying conformance requirements clearly using RFC2119 > keywords, and not clearly distinguishing conformance requirements > on different parties (style sheets, user-agents, etc.). > (I wonder whether it would be better to try to keep more of the > current text as a statement of style sheet conformance and then > write a separate statement of processor conformance.) I'd be okay with a much smaller patch. I didn't like my previous attempts to just insert the new normative requirements without also revising the whole section, but here's another go at it: * Replace "indicates three types of character escapes" with "may indicate one of three types of character escape. Inside a CSS comment, a backslash has no special meaning, and if a backslash is immediately followed by the end of the style sheet, it also has no special meaning." * Append "Outside a string, a backslash followed by a newline has no special meaning." to the paragraph beginning "First, inside a string". * Delete "Except within CSS comments" from the paragraph beginning "Second, it cancels". * Delete ", where allowed," from the note at the bottom of the section. * Append this text to the first paragraph of the note at the bottom of the section: "When a backslash has 'no special meaning', it is tokenized like any other punctuation character without special meaning: as part of a comment, part of a string, or as a DELIM, based on the context." * Possibly change "must itself either be escaped or doubled" to "must be doubled", but this is a nitpick on a non-normative aside. How does that sound? zw
Received on Thursday, 15 July 2010 19:50:13 UTC