- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Fri, 31 May 2013 10:47:49 -0700
- To: Simon Sapin <simon.sapin@exyr.org>
- Cc: Zack Weinberg <zackw@panix.com>, www-style list <www-style@w3.org>
On Fri, May 31, 2013 at 2:58 AM, Simon Sapin <simon.sapin@exyr.org> wrote:
> Le 31/05/2013 08:14, Tab Atkins Jr. a écrit :
>> Okay, I've just pushed a large commit that rewrites the tokenizer to
>> be recursive-descent.
>
>
> Here are a few comments:
>
> §4.3.1. Consume a token
>
> U+0023 NUMBER SIGN (#)
> […] If the first three characters of the 〈hash〉’s value
> would start an identifier, set the 〈hash〉’s type flag to "id".
>
> This sets the type flag based on the unescaped value of the token. It should
> be based on input characters instead.
Ah, sure. That was slightly more awkward to spec, so I did it the
current way, but you're right. Fixed.
> U+0055 LATIN CAPITAL LETTER U (U)
> U+0075 LATIN SMALL LETTER U (u)
> […] Otherwise, if the next 3 input characters are an ASCII
> case-insensitive match for "url(", consume them,
> consume a url token, and return it.
>
> Opposite problem here. U, R or L can be escaped, so you need to check the
> result of "Consume a name" rather than input characters. (We resolved on
> that a few months ago.)
Ah, thanks. Fixed.
> Also, this tokenizer does not generate 〈function〉s at all. I think it should
> have a "Consume an identifier-like token" algorithm (with a better name?)
> that returns one of 〈ident〉, 〈funtion〉 or 〈url〉; and use that whenever the
> current "Consume a token" returns an 〈ident〉.
Urf, I knew there had to be something wrong with my ident handling.
Yeah, fixed.
> §4.3.3. Consume a string token
>
> This algorithm must be called with an ending character,
> which denotes the character that ends the string.
>
> This is not necessary. Every time "Consume a string token" is called, the
> ending character is also the current input character and thus does not need
> to be passed explicitly. By the way, this character needs to be consumed at
> the beginning of "Consume a string token", which otherwise terminates
> immediately.
>
> Alternatively, consume the opening quote on ever call site. (I prefer the
> other solution.)
Every call site already consumes the starting character, so this is
fine as written.
> §4.3.13. Consume the remnants of a bad url
>
> the input stream starts with a valid escape
> Consume an escaped character.
>
> Perhaps add a note that this is useful for the \) sequence. (This is the
> only sequence that makes this different from "Anything else: Do nothing.")
Sure, done. (Good catch, by the way - when I was first rewriting this
section, I considered dropping it until I realized it was necessary
for that exact case.)
> Editorial: I’d move related sections together. For example "Consume a
> number" next to "Consume a numeric token", "Consume the remnants of a bad
> url" next to "Consume a url token", etc.
My strategy right now is to put the "consume an <x> token" sections
together, and then put the utility functions together. I don't like
interspersing them very much, because some are used by multiple
states, so it's unclear what they should be next to.
> Editorial, very minor: You use sometimes "would start {a number,an
> identifier}" and sometimes "starts with {a number,an identifier}" for the
> same thing. This confused me for a bit when using CTRL+F. Maybe pick one and
> use it everywhere? Consistency is good.
I know I created the two variant linktexts for a reason, but I'll see
if I can reword to a consistent pattern without the prose being
awkward.
~TJ
Received on Friday, 31 May 2013 17:48:35 UTC