Re: [css3-syntax] Thoughts on proposed Syntax module from Kang-Hao (Kenny) Lu on 2012-08-28 (www-style@w3.org from August 2012)

From: Kang-Hao (Kenny) Lu <kanghaol@oupeng.com>
Date: Wed, 29 Aug 2012 07:59:37 +0800
To: "L. David Baron" <dbaron@dbaron.org>
CC: WWW Style <www-style@w3.org>
Message-ID: <503D5B69.8080302@oupeng.com>
(12/08/29 5:28), L. David Baron wrote:
> As I said at the face-to-face meeting, I think the approach that the
> specification takes to CSS's (), [], and {} matching rules is going
> in the wrong direction.  I think the normative specification text
> for these should be the general statements about how the processing
> works, and not the code-like form of the current specification,
> since I really *don't* want bugs in the specification that break the
> general rules to end up being codified in the specification.  I
> worry that ending up with exceptions to these rules could prevent us
> from making general improvements to parsing technology that would
> otherwise (without exceptions) be possible.  (For example, we might
> at some point in the future have generated parsers based on two
> different but parallel state machines, one describing the correct
> syntax and another describing the error handling behavior (for when
> the first state machine goes into a failure state) -- done in a way
> that a state in the error handling state machine can be determined
> at parser generation time from the state in the correct-input state
> machine.)

It's a bit difficult to me to understand this concern without specific
examples of "bugs in the specification that break the general rules". Do
you mean, for example, error handling for BAD_URI should be undefined?

> That said, I think the problems with this approach don't show up
> much in the material currently specified; I think most of the
> problems appear when describing how to parse the syntax of all the
> property values, which is where the bulk of CSS parsing logic lives.

Hmm... I am intrigued. Do you have specific examples?

> I also think this sort of specification describing a state machine
> in prose is generally far less readable than a specification that
> describes a tokenization and grammar in a concise format.  I think
> the special case of HTML parsing (which has so many complex rules
> that it can't reasonably be written in a concise format) doesn't
> mean that all other languages should be described in the same prose
> style.  Yes, CSS 2.1's description of parsing is not as precise as
> it should be, but I'm not at all convinced that the fix to that
> problem needs to be as drastic as switching to a state machine
> written in prose.

This seems like a problem of stylistic choice. The spec can be

  1. A state machine
  2. Some formal grammar rules
  3. A bunch of rules in natural language

>From the meeting minutes, it seems that Bert prefers 2. I wrote down the
universal extension to CSS 2.1's grammar rules[1], which would pretty
much generate Tab's state machine (except the BAD_URI issue and
potentially others) in flex or other parser generators, but I don't
think it's too readable either, albeit arguably concise.

I am not a fan of 3. Some use of words in CSS 2.1 was just vague and
left some open questions (what makes an "open construct"?), and I am not
sure we can keep it readable if we dump details like error handling for
BAD_URI in.

Therefore, I kind of believe 1. is optimal but I am happy to be proved
wrong by another try.

> Some specific comments on "3.5 Changes from the CSS 2.1 Tokenizer":
> ==================================================================
> 
>   # 2. The BAD-URI token (now bad-url) is "self-contained". In other
>   # words, once the tokenizer realizes it's in a bad-url rather than
>   # a url token, it just seeks forward to look for the closing ),
>   # ignoring everything else. This behavior is simpler than treating
>   # it like a FUNCTION token and paying attention to opened blocks
>   # and such. Only WebKit exhibits this behavior, but it doesn't
>   # appear that we've gotten any compat bugs from it. 
> 
> So if I'm understanding this correctly, this is more than the change
> we already made for issue 129 that's described in
> https://bugzilla.mozilla.org/show_bug.cgi?id=569646 .  You're saying
> that not only do we ignore [] and {} that are prior to the point at
> which the URL is known to be invalid, but that you also ignore []
> and {} that are *after* that point, until you reach a closing )?

and also ("' after that point, if I understand this correctly.

> I guess this change seems reasonable to me.

I don't have a strong opinion, but it seems that WebKit is not
exhibiting this behavior either. Test case:

  data:text/html,<style>body { url(a "); background: green; }</style>

(would be green if this were implemented. Chrome 23 gives white.)

Tab, what test cases did you use?


In any case, I'd hope implementations converge as soon as possible or
authors can do lots of crazy hacks here.

> Some specific comments on "3.7. Changes from CSS 2.1 Core Grammar":
> ==================================================================
> 
>   # 1. No whitespace or comments are allowed between the DELIM(!)
>   # and IDENT(important) tokens when processing an !important
>   # directive at the end of a style rule. 
> 
> I disagree with this change; I think disallowing whitespace is a
> significant compatibility problem.  There are a significant number
> of uses with whitespace that people have written in Gecko's codebase
> (including the only use of !important in a code example in our
> userContent-example.css that explains how to write user style
> sheets).  Three of the examples in the cascading chapter of CSS 2.1
> also use whitespace.

Yeah, whitespace, at least, is a problem. For what's worth, WebKit
accepts whitespace between !important too.


[1] http://lists.w3.org/Archives/Public/www-style/2012May/1163



Cheers,
Kenny
-- 
Web Specialist, Oupeng Browser, Beijing
Try Oupeng: http://www.oupeng.com/
Received on Wednesday, 29 August 2012 00:00:06 UTC