Re: [selectors3] WIP: a selector parser based on css3-syntax from Kang-Hao (Kenny) Lu on 2012-06-11 (www-style@w3.org from June 2012)

From: Kang-Hao (Kenny) Lu <kennyluck@csail.mit.edu>
Date: Mon, 11 Jun 2012 11:19:21 +0800
To: Simon Sapin <simon.sapin@kozea.fr>
CC: WWW Style <www-style@w3.org>
Message-ID: <4FD563B9.8010503@csail.mit.edu>
(12/06/10 1:19), Simon Sapin wrote:
> Le 08/06/2012 02:06, fantasai a écrit :
>> On 06/07/2012 03:48 PM, Simon Sapin wrote:
>>> Ideally I’d like to see everything defined in terms of the new
>>> Syntax3, but I understand that this stuff takes time and effort.
>>> Can any external help be useful?
>>
>> If you've got a patch for Selectors 3, I'll take it. ^_^
> 
> Any feedback at this point?

The patch looks big. :p I consider your "patch" a potential input to
selectors4 or css3-syntax, and I have feedback later in this mail, but
here's my patch for selectors3 (a .patch attached too):

  # The a and b values must be integers (positive, negative, or zero)

  | The a and b values must be integers (i.e. without a decimal dot
  | ".")

  # In addition to this, :nth-child() can take ‘odd’ and ‘even’ as
  # arguments instead.

  | In addition to this, :nth-child() can take a single IDENT token
  | representing ‘odd’ or ‘even’ instead.

(By representing, say, 'odd', I mean {O}{D}{D}. Not sure what's the
right terminology)

  # The argument to :nth-child() must match the grammar below, where
  # INTEGER matches the token [0-9]+ and the rest of the tokenization
  # is given by the Lexical scanner in section 10.2:
  #
  # nth
  #   : S* [ ['-'|'+']? INTEGER? {N} [ S* ['-'|'+'] S* INTEGER ]? |
  #          ['-'|'+']? INTEGER | {O}{D}{D} | {E}{V}{E}{N} ] S*
  #   ;

  | (nothing)

(Fixing the formal grammar takes more effort such as Simon's "patch", so
we should just remove it at this point. In general, I think the prose
here is OK. The point is to remove conflicting information that would
potentially confuse people. In the same vein, I don't think the parsing
rules in CSS 2.1 are too bad to read to the point that it's really not
understandable. The problem is that the formal grammar contradicts the
rules and people just get confused as to which is correct. Therefore, I
think we should remove the formal grammar in CSS 2.1 too.)

  # When the value b is preceded by a negative sign, the "+" character
  # in the expression must be removed (it is effectively replaced by
  # the "-" character indicating the negative value of b).

  | If the b part is present, there must be one and only one sign ("+"
  | or "-") after the an part.

(The former prose forbids "+-" but not "++" and "--".)

  # Whitespace is permitted after the "(", before the ")", and on
  # either side of the "+" or "-" that separates the an and b parts
  # when both are present.

  | Whitespaces and comments are permitted after the "(", before the
  | ")", and on either side of the "+" or "-" that separates the
  | an and b parts when both are present.

(This is a normative change. It assumes that the sign before a is part
of the DIMENSION token, and so this requires some new test cases, I assume.)

Feedback welcome.


Some feedback to Simon's input:

In general, I think you should target either selectors4 or css3-syntax
since I don't think we really want a new spec in the middle. Also, if
you are targeting selectors4 (which is likely a better target), the
descriptions of the syntax should be better integrated with the semantics.

  # An additional procedure should probably be added to Syntax3 for the
  # tree construction of a stand-alone selector, as found for example
  # in getElementsBySelector().

s/getElementsBySelector/querySelector/ ?

  # Issue 1:
  #     These definitions encode the constraint that a pseudo-element
  #     can only be last. Should they be more general, in case future
  #     levels want to relax the constraint?

Yes, selectors4 already allows pseudo-classes following a pseudo-element.

  # If at any point an invalid selector is encountered, the parser is
  # aborted and there is no output/result tree. It is up to the host
  # language to define what happens to invalid selectors.

I think you meant to say "host environment". The host language of
document.querySelector() will be JS, and I don't think it's up to the JS
spec to decide what will happen when an invalid selector is encountered.


In "nth-start mode",

  # ident token with the value 'n-'
  # ident token with the value '-n-'

, cases like "n-3" or "-n-0" is missing.


In "nth-after-n mode",

  # number token with the integer flag
  #     Set b to the token’s value. Switch to the nth-end mode.

, if the number token does not have a sign. This is an error.


Cheers,
Kenny
Attachments

text/plain attachment: remove-nth-formal-grammar.patch
Received on Monday, 11 June 2012 03:19:51 UTC