- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Fri, 31 May 2013 10:47:49 -0700
- To: Simon Sapin <simon.sapin@exyr.org>
- Cc: Zack Weinberg <zackw@panix.com>, www-style list <www-style@w3.org>
On Fri, May 31, 2013 at 2:58 AM, Simon Sapin <simon.sapin@exyr.org> wrote: > Le 31/05/2013 08:14, Tab Atkins Jr. a écrit : >> Okay, I've just pushed a large commit that rewrites the tokenizer to >> be recursive-descent. > > > Here are a few comments: > > §4.3.1. Consume a token > > U+0023 NUMBER SIGN (#) > […] If the first three characters of the 〈hash〉’s value > would start an identifier, set the 〈hash〉’s type flag to "id". > > This sets the type flag based on the unescaped value of the token. It should > be based on input characters instead. Ah, sure. That was slightly more awkward to spec, so I did it the current way, but you're right. Fixed. > U+0055 LATIN CAPITAL LETTER U (U) > U+0075 LATIN SMALL LETTER U (u) > […] Otherwise, if the next 3 input characters are an ASCII > case-insensitive match for "url(", consume them, > consume a url token, and return it. > > Opposite problem here. U, R or L can be escaped, so you need to check the > result of "Consume a name" rather than input characters. (We resolved on > that a few months ago.) Ah, thanks. Fixed. > Also, this tokenizer does not generate 〈function〉s at all. I think it should > have a "Consume an identifier-like token" algorithm (with a better name?) > that returns one of 〈ident〉, 〈funtion〉 or 〈url〉; and use that whenever the > current "Consume a token" returns an 〈ident〉. Urf, I knew there had to be something wrong with my ident handling. Yeah, fixed. > §4.3.3. Consume a string token > > This algorithm must be called with an ending character, > which denotes the character that ends the string. > > This is not necessary. Every time "Consume a string token" is called, the > ending character is also the current input character and thus does not need > to be passed explicitly. By the way, this character needs to be consumed at > the beginning of "Consume a string token", which otherwise terminates > immediately. > > Alternatively, consume the opening quote on ever call site. (I prefer the > other solution.) Every call site already consumes the starting character, so this is fine as written. > §4.3.13. Consume the remnants of a bad url > > the input stream starts with a valid escape > Consume an escaped character. > > Perhaps add a note that this is useful for the \) sequence. (This is the > only sequence that makes this different from "Anything else: Do nothing.") Sure, done. (Good catch, by the way - when I was first rewriting this section, I considered dropping it until I realized it was necessary for that exact case.) > Editorial: I’d move related sections together. For example "Consume a > number" next to "Consume a numeric token", "Consume the remnants of a bad > url" next to "Consume a url token", etc. My strategy right now is to put the "consume an <x> token" sections together, and then put the utility functions together. I don't like interspersing them very much, because some are used by multiple states, so it's unclear what they should be next to. > Editorial, very minor: You use sometimes "would start {a number,an > identifier}" and sometimes "starts with {a number,an identifier}" for the > same thing. This confused me for a bit when using CTRL+F. Maybe pick one and > use it everywhere? Consistency is good. I know I created the two variant linktexts for a reason, but I'll see if I can reword to a consistent pattern without the prose being awkward. ~TJ
Received on Friday, 31 May 2013 17:48:35 UTC