Re: [css3-syntax] Critique of Feb 15 draft from Tab Atkins Jr. on 2013-02-24 (www-style@w3.org from February 2013)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Sun, 24 Feb 2013 10:51:06 -0800
To: Simon Sapin <simon.sapin@kozea.fr>
Cc: Zack Weinberg <zackw@panix.com>, www-style list <www-style@w3.org>
Message-ID: <CAAWBYDAbdVAFUoG1HYhUx9jp2zRe8WbQzSWS_b7QU1c1+PHg7Q@mail.gmail.com>

On Sun, Feb 24, 2013 at 12:14 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
> Le 24/02/2013 05:32, Tab Atkins Jr. a écrit :
>> In a string, a normal newline is invalid, but you can escape a newline
>> to include it in the string.  I think it makes the spec a little
>> simpler to collapse \r\n into a single newline char, so I don't have
>> to push those details into the string states.  That's all.  I could
>> simplify the preprocessing to only convert \r\n specifically into a
>> \n, as it's not actually necessary to do anything with a lone \r.
>> Thoughts?
>
> Apparently the starting point was: "Why is \r converted but not \f?" How
> about doing one of these, for consistency?
>
> * Not converting a lone \r. Newline pre-processing only affects \r\n.
> * Or, also convert \f to \n.
>
> (Of course, implementation are free not to have a separate pre-processing
> pass and do the same work inline in the tokenizer.)

I'm fine with doing either, with a preference toward the first bullet
if I change anything.

Zack, opinions?


>>> >§4: The algorithm in this section might benefit from conversion to a
>>> >recursive-descent presentation as was already done for the parser (§5).
>>
>> I'm not certain.  Switching to a RC parser was great for the Parsing
>> section because those types of parsers are good for nested structures,
>> which is most of CSS.  Tokens only*barely*  have a concept of nesting,
>>
>> in things like the fact that both types of strings have nearly the
>> same internals, or url tokens contain strings.
>>
>> It would save some duplication, but I could probably also do that just
>> by defining some more functions.
>
> It’s clearly not recursive-descent since there is no recursion. Still, I
> think the "functions calling each other" style can be more readable than a
> state machine.
>
> We don’t have to switch everything at once. Can we start by a "Consume a
> quoted string" function used for both string tokens and url tokens? Later
> we’ll discussed a "Consume an identifier" function. (Although hash tokens
> are a bit tricky.) Then some other part of the tokenizer. And before you
> know it, the state machine will not be needed anymore ;)

Yeah, I've come around to this.  I'm starting with the recognition
functions, but I'll add some consuming functions as I go along.

~TJ

Received on Sunday, 24 February 2013 18:51:52 UTC