- From: Paul Duffin <pduffin@volantis.com>
- Date: Tue, 25 Mar 2008 12:26:07 -0600 (MDT)
- To: fantasai <fantasai.lists@inkedblade.net>
- Cc: www-style@w3.org
fantasai wrote: > Paul Duffin wrote: >> >> Not too much more complex than allowing dimensions but makes it much >> easier to specify, implement, and author. IMNSHO having to type a >> couple of extra characters is less onerous than having to remember >> lots of different rules about where white space is necessary and where >> it is not. > > This is totally inconsistent with existing CSS syntax. Requiring whitespace > between tokens is less of a burden than defining a new syntax for lengths. > The new syntax would only be needed within expressions. The issue with the grammar is not that it requires whitespaces between tokens but that the same semantic construct can have a number of different possible tokenizations depending on the presence or not of whitespace. e.g. within the nth-child() function 2n-1 <DIMENSION> 2n -1 <DIMENSION> <NUMBER> 2 n -1 <NUMBER> <IDENTIFIER> <NUMBER> All of these are semantically the same and only differ in the use of whitespace (whether they are actually allowed at the moment is another matter). The nth-child() function is relatively simple but if arbitrary expressions are allowed then the problem will only get worse. e.g. 2 n - 1 <NUMBER> <IDENTIFIER> <OPERATOR> <NUMBER> 2n - 1 <DIMENSION> <OPERATOR> <NUMBER> I do not know of any modern language (Fortran does have something similar but its syntax is hardly modern) that has a tokenization strategy that has this sort of behaviour. In fact I think that is one reason why most modern languages have a restriction that identifiers cannot start with a digit (which is exactly what <DIMENSION> is). The purpose of tokenization is to simplify the input in preparation for the syntax analysis. As it stands the tokenization does the opposite, increasing the number of combinations of tokens (even now I am not sure that I have enumerated them all) making the grammar much more complex. In fact it is quite possible that in some cases the above tokenization would result in ambiguous grammars. >> I am concerned that unless the syntax is clearly defined in a >> recognized format, e.g. BNF, then there will be all sorts of >> ambiguities that will be resolved by each implementation in different >> ways. > > CSS syntax is usually defined in both prose and grammar productions. > The ambiguities usually arise from the grammar not being precise > enough to reflect constraints from the prose. > Syntax should be defined first and foremost using standard grammar / tokenizer mechanisms that can be automatically checked for ambiguity. Prose should only be used to add constraints in exceptional circumstances. The more you rely on prose the more ambiguities (and hence arguments) there will be in how it is supposed to behave with a corresponding detrimental impact on implementations. My reference to compatibility with XPath was simply to raise the point that XPath already has defined an expression language that can deal with identifiers containing "-"s and CSS should learn from that. I agree that CSS must be easy to author but it is more than just the number of characters they have to type. It also means that they must be able to understand how they are supposed to write it, and have an expectation that it will work across all browsers. These are just as important as the former and are adversely impacted by a complex and ambiguous grammar.
Received on Tuesday, 25 March 2008 18:26:45 UTC