- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Mon, 17 Nov 2014 13:24:06 -0800
- To: "L. David Baron" <dbaron@dbaron.org>
- Cc: www-style list <www-style@w3.org>
On Mon, Nov 17, 2014 at 11:31 AM, Tab Atkins Jr. <jackalmage@gmail.com> wrote: > On Thu, Nov 13, 2014 at 5:27 PM, L. David Baron <dbaron@dbaron.org> wrote: >> I'd prefer to leave which values are valid syntax and which aren't >> the way they are; I don't see the point in introducing compatibility >> risk without a good reason. Unless, that is, implementation >> behavior doesn't actually match the current spec. > > While I don't *generally* agree that this is necessary, looking over > the complications of handling the syntax properly when I take scinot > into account, I'm going to switch to an approach that makes "match the > current syntax" easy to do. > > (I'm just going to claim all the token combinations that show up, > regardless of what's in them, then concatenate and re-parse their > representations. This makes it much easier to the correct number of > characters in each form. This makes <urange> a bit wider in > syntax-space than I'd like, but it's not a big deal, and, like <anb>, > you just have to be careful when using <urange> in new syntaxes in the > future.) And done. Review appreciated; I ended up taking the old <unicode-range-token> spec text and just generalizing it to be error-detecting. A valid <urange> now matches exactly the syntax of the old <unicode-range-token>. Some methodology information: to account for all possible token combinations, I took the following primitive strings: 2 a e 2a 2e a2 e2 2a2 2e2 a2a e2e a2e e2a 2a2a 2a2e 2e2a 2e2e a2a2 a2e2 e2a2 e2e2 2a2a2 2a2e2 2e2a2 2e2e2 a2a2a a2a2e a2e2a a2e2e e2a2a e2a2e e2e2a e2e2e These should have captured every possibility regarding number/ident/dimension parsing, including any scinot issues in numbers/dimensions. I then generated strings by running "u+{0}" on all of them, and again witthen tested themh "u+{0}-{1}" (ranging over the cross-product of the list with itself). I ran all of these through tokenizer at <https://github.com/tabatkins/parse-css>, which matches the spec, found all the unique token combinations so produced, and made a grammar from that. Those patterns which were produced by the first set (without the - character) got an optional "?" tacked onto their end in the grammar, and I added an extra clause just for the u+???? form. I believe this is an exhaustive cover of the syntax possibilities. ~TJ
Received on Monday, 17 November 2014 21:24:53 UTC