- From: Tab Atkins Jr. via GitHub <sysbot+gh@w3.org>
- Date: Fri, 01 Feb 2019 23:45:23 +0000
- To: public-css-archive@w3.org
tabatkins has just created a new issue for https://github.com/w3c/csswg-drafts: == [css-syntax] Wrapping up the <unicode-range> thing == (migrated from the mailing list) **Tab Atkins said:** > So, unicode ranges aren't settled right now, and I'd like to wrap them up. > > Quick history lesson: > > 1. Unicode ranges were originally defined as a CSS token. They have > to be specially handled, because they don't look like any other token. > > 2. FF got some bug reports about the selector `u+a {...}` failing - > the reason is because it parses as a unicode-range token, which is > invalid for selectors. > > 3. I proposed we eliminate unicode-range as a token, and break it down > into a complicated reimagining based on existing tokens, like I did > for An+B. > > > The major problem with this is that some hex numbers look like > exponented numbers. For example, "U+04e4" is supposed to be Ӥ, but it > parses as: > > ident(U) delim(+) number(40000) > > Obviously, 0x4e4 and 40000 are very different numbers! (U+40000 is > actually invalid!) I currently solve this by keeping around the > "representation" of the number token, which is the actual characters > it was written with, but no impl does that, or is willing to keep > around a string for every number and dimension they parse. > > So I think there are two ways we can handle this: > > 1. Abandon the project, restore <unicode-range-token>, and live with > the fact that we have a weird almost-useless token that will > occasionally cause problems for authors in unrelated contexts. (We > can't even really do something like make Selectors treat unicode-range > specially, because it can cut selectors in pieces - "u+area" parses as > a urange(a) ident(rea)!) > > 2. Produce a new, reliable syntax for unicode ranges, and keep around > the old version for back-compat, with a warning that some values won't > parse correctly. The most obvious fix is to just replace the + with a > -, like "U-0404", "U-400-600", or "U-4??". This makes the entire > thing an ident, which keeps around the characters properly (or an > ident followed by some ? delims, which is also fine). > > Thoughts? ------------ **Simon Sapin said:** > On 22/06/15 17:26, Tab Atkins Jr. wrote: > > So I think there are two ways we can handle this: > > > > 1. Abandon the project, restore <unicode-range-token>, and live with > > the fact that we have a weird almost-useless token that will > > occasionally cause problems for authors in unrelated contexts. (We > > can't even really do something like make Selectors treat unicode-range > > specially, because it can cut selectors in pieces - "u+area" parses as > > a urange(a) ident(rea)!) > > Not sure if this is a good idea, but we *could* handle that in the > Selectors grammar as well. u+a/**/rea would also parse, which we might > not want, but it’s much harder for authors to accidentally do that than u+a. > > > > 2. Produce a new, reliable syntax for unicode ranges, and keep around > > the old version for back-compat, with a warning that some values won't > > parse correctly. The most obvious fix is to just replace the + with a > > -, like "U-0404", "U-400-600", or "U-4??". This makes the entire > > thing an ident, which keeps around the characters properly (or an > > ident followed by some ? delims, which is also fine). > > `unicode-range: U+04e4` works today in multiple browsers. Breaking this > seems worse than the u+a selector not working. (Introducing an > alternative unicode-range syntax will not help existing unmaintained > content.) --------------- **fantasai said:** > I agree with Simon. We should not break unicode-range syntax here. > > If it's possible to fix this by munging the Selectors grammar, > that seems like the best option. I'd argue that we may want to > allow implementations to use context-specific parsing rules as > well, if they want to go that route instead, so the UA would be > allowed to either accept or reject u+a/**/rea. (A full CSS parser > might not want to do that, but a Selectors parser shouldn't have > to deal with unicode-range token munging. Ditto An+B, now I think > about it.) --------------- **Simon Sapin said:** > Allowing a different behavior without mandating it reduces interop, and > this doesn’t seem to be a good enough reason to do it. ------------- **fantasai said:** > The cases where there wouldn't be interop are just weird edge cases > like u+a/**/rea, right? I don't think interop on that case is worth > imposing the complexity of a CSS-token-munging parsing model on all > non-CSS implementations of Selectors. ------------- **Tab Atkins said:** > On Fri, Jun 26, 2015 at 3:37 PM, Simon Sapin <simon.sapin@exyr.org> wrote: > > `unicode-range: U+04e4` works today in multiple browsers. Breaking this > > seems worse than the u+a selector not working. (Introducing an alternative > > unicode-range syntax will not help existing unmaintained content.) > > There's a difference between "it works" and "it's used". I'm going to > run some searches over our corpus and see if I can find any actual > uses of unicode-ranges that look like scinot numbers. Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/3591 using your GitHub account
Received on Friday, 1 February 2019 23:45:24 UTC