Re: [css3-syntax] Digest of remaining issues from recent discussions

On Tue, Feb 19, 2013 at 10:03 PM, Simon Sapin <simon.sapin@kozea.fr> wrote:
> http://lists.w3.org/Archives/Public/www-style/2013Feb/0278.html
>
> 1. Perhaps for team-legal rather than this WG? When a spec contains detailed
> algorithm in English, implementing it may look like "translating" it to a
> computer language, similar to translating the spec to another human
> language.
>
> Clarify that implementing is not a "derivative work" forbidden by the W3C
> Document License?
>
>
> http://lists.w3.org/Archives/Public/www-style/2013Feb/0402.html

As noted by others, this is perhaps a Team legal issue, but shouldn't
be an issue for Syntax until/unless they say something about it.

> 2. Possible security issue: Taking the stylesheet’s character encoding from
> the referring document should be same-origin only.

Haven't looked into this yet.  Added an issue.

> 3. Editorial: get rid of §3.2.1. "Preprocessing the input stream" by doing
> the same work in the tokenizer?

I haven't tried to do this yet.  Is it necessary?  You could consider
it part of tokenization, in the same way that "consume a component
value" is part of parsing.

> 4. Editorial: The tokenizer would be nicer (and could be less redundant)
> with a style closer to that of the parser: a bunch of "functions" that call
> each-other rather than a state machine. (Not quite "recursive decent"
> though, there is no recursion.)

With the help of some identification/validation functions, I've
eliminate a *ton* of redundancy.  Let me know if you find any
remaining stuff that would benefit from being abstracted out.

> 5. Editorial: use more look-ahead to avoid "reconsuming"?

Most of the reconsuming is just for convenience.  If you spot places
where you think I could use lookahead rather than reconsuming (and
which wouldn't violate my "three characters of lookahead, one token of
lookahead" rule), let me know.

> 6. *-match tokens: maybe add now tokens for !#%+./?@ (each follow by = equal
> sign) in addition to the current ~|^$* so that future additions to Selectors
> don’t need to add new tokens. Maybe have a single "match" token with a
> character value (like delim) rather than many tokens.

Could, or we could just wait and add to the parser later.  Dunno what's best.

> 7. *If* SVG2 wants some of its attributes values to have CSS syntax *but*
> not allow CSS comments, add a "no comment" flag to the tokenizer. Tab and I
> would rather just allow comments, though, if that’s not a web-compat issue.

Up to the SVGWG, but I don't think they need this control.

> 8a. Should EOF in quoted strings or urls not be an error at all, to be
> consistent with the rest of the "unexpected EOF" rules?

I'd be fine with this, but I'd need to check compat again, as I've
forgotten the original details.

> 8b. (Special case of the above) §4.2 of CSS 2.1 has an example where EOF in
> a string as acceptable, in contradiction with its own Core Grammar in §4.4.1
> where it’s a bad-string token.

Yay!

> 9. There is concern with bad-string and bad-url being "preserved". (Should
> always be errors caught as early as possible?) But I don’t see how to do
> this while enabling Media Queries’s fine-grained error handling.

Right, they need to stick around for various reasons.

> 10. Editorial: §4.4.12 has some redundant checks, since this mode is only
> ever entered in specific cases.

Removed as many redundant checks as I could find.  Let me know if
there are any left.

> 11. Apparently SVG requires scientific notation not only for numbers (which
> we now have in CSS) but also for percentages and dimensions.

Fixed.

> 12. Some concern about changes in bad-url tokenization. Did non-WebKit
> implementers discuss it? (No opinion from me.)

Not really.  This needs to be discussed with the WG to make sure it's fine.

> 13. Proposal: make at-rule syntax completely generic: get rid of the
> "recognized at-rule", "declaration-filled" and "rule-filled" concepts. Parse
> ';' or a generic {} block for at rules. Definitions of specific at-rules can
> call back into Syntax with one entry point or another to parse the contents
> of a {} block.

Done.

> 14. Editorial: Non-normative prose describing error recovery would be nice.
> (Like the diagrams describe valid syntax.)

Sounds fine.  Will do.

> 15. Quirks mode and transform function whitespace do not belong in the
> generic Syntax module, but in the grammar of the relevant
> attributes/properties.

Quirks mode has been removed.  Transform-function-whitespace needs to
be handled at Syntax level to be sane; it really is a tokenizer flag,
because it's invoked only when parsing the transform attribute in SVG.

> 16. Maybe an+b belongs in Selectors rather than Syntax?

I think it's appropriate to leave in Syntax, but my current approach
is bad.  Instead, I need to describe it in terms of tokens (so it can
interact with other tokens in grammars), then do a "reserialize and
reparse with simpler rules" thing.

> 17. Hash tokens need a new "is a valid ident" for ID selectors. The edit is
> not trivial: if 4. or 5. are to happen, might be better to do this
> afterwards or at the same time.

Done.  (Though we're trying to drop that now, which would make me happy.)

~TJ

Received on Thursday, 9 May 2013 00:14:19 UTC