- From: Simon Sapin <simon.sapin@exyr.org>
- Date: Mon, 02 Sep 2013 09:52:09 +0100
- To: "Tab Atkins Jr." <jackalmage@gmail.com>
- CC: www-style <www-style@w3.org>, John Daggett <jdaggett@mozilla.com>
Le 02/09/2013 08:57, Tab Atkins Jr. a écrit : >>> I doubt we'll *ever* give "U+1?5-300" a valid >>> meaning, because it's a nonsensical range. As I argued previously at >>> the face-to-face, the only reason that these silly kinds of ranges >>> were *ever* valid is because someone valued terseness over accuracy >>> when writing the regex - it's trivial to make a slightly longer regex >>> that only matches the ranges with sensical syntax. >> >> Actually, the Fonts spec now rejects many more corner cases than it used to >> (eg. decreasing range.) It’s much easier IMO to say "drop the declaration" >> than to try to encode all of these constraints in the tokenizer. >> >> Yes, we could make the token definition more restrictive (and less silly) >> but I think that the added complexity does not buy us anything. > > Yeah, as I said (though perhaps not clearly enough), I'm fine with > removing the additional checks that Syntax did to verify that the > token "made sense". I'm okay with pushing at least that much to the > individual specs that use the token. (Not happy, but okay with it.) > > What I'm against is forcing every use of the token to define how to > *parse* it, and reject nonsensical tokens like "U+1?5-300". That > particular sequence of characters will *never* be a valid > unicode-range, no matter what we do, or what type of error-recovery a > particular property ends up wanting to define. > > In other words: > > * "U+9-1" is okay - let's keep that valid at the Syntax level, and let > Fonts deal with it as it wishes. > * "U+1?5" is not okay - let's reject that early, because we know for > certain that it's wrong. > * "U+???" should be transformed into "U+000-999" at the Syntax level, > because that's the way it'll *always* be interpreted, and we shouldn't > force every usage of the token to re-define how to parse a token. We > should just ensure that every unicode-range is turned into a start > value and an optional end value, with both values being positive > integers. So, trying to interpret this, you’re proposing to keep "Consume a unicode-range token" as it was, but skip the "Set the unicode-range’s range" step. To token would have a start and an optional end that are both integers. (Or the end could be non-optional, and set to the start if not provided in the source.) Is this correct? In the ED just before my edits: https://dvcs.w3.org/hg/csswg/raw-file/aa1b58939f73/css-syntax/Overview.html#consume-a-unicode-range-token https://dvcs.w3.org/hg/csswg/raw-file/aa1b58939f73/css-syntax/Overview.html#set-the-unicode-ranges-range If the token’s model is two integers, I think the Fonts spec should be changed to define its <urange> in terms of these integers. The current definition is based on text, so it’s more consistent with a token containing code points. John, what do you think? >>> By making Syntax "agnostic" about this, we end up requiring every >>> usage of the token to repeat the exact same parsing/validation logic >>> every time. This is silly, when we can just bake that in once at the >>> Syntax level, >> >> >> No need to repeat. If we ever need ranges of code points again, the new >> feature can refer to the parsing defined in the Fonts spec. If appropriate, >> we could then move it to the Values & Units spec. > > Unless we think there's the faintest possibility of "U+1?5" ever being > considered valid, we should go ahead and do the parsing in the > *parsing* spec. ^_^ I don’t think everything parsing-related *has* to be in the Syntax spec. We already have lots parsing definitions in other specs for individual properties, Selectors, etc. In this case I still believe it doesn’t buy us anything, but I’m not against doing a bit more than CSS 2.1 in Syntax. See above. >>> unless we really do think accidental usages of that >>> character pattern are something to worry about. >> >> I’m not worried about that. All definitions of <unicode-range> we ever had >> require it to start with [uU]+[0-9a-fA-F?], which is pretty characteristic. > > Still, though, that character pattern could show up in a base64 value > put directly in a custom property - if it was preceded by a delim > character, it'll parse correctly. ^_^ That’s a separate but interesting question. What can go wrong if authors expect random text to round-trip through Custom Properties parsing and serialization? (Not just with <unicode-range>.) -- Simon Sapin
Received on Monday, 2 September 2013 08:52:32 UTC