- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Wed, 4 Sep 2013 10:10:22 -0700
- To: Lea Verou <lea@verou.me>
- Cc: www-style list <www-style@w3.org>
On Wed, Sep 4, 2013 at 9:40 AM, Lea Verou <lea@verou.me> wrote: > Today’s telcon discussion on unicode-range reminded me of most authors’ > biggest gripes with it: That ranges need to be defined as unicode codepoints > instead of strings, requiring a unicode table lookup for every single use of > that descriptor. Strings seemed like a no-brainer: Figuring out the unicode > codepoint for a specific character is something machines do much better than > humans. I researched it and it seems that the reason this was not allowed > was this [1]: > >> Makes sense but I think the implementation details could easily get a > bit hairy, you would end up with ambiguous situations involving things > like combining diacritics and shaped vowels. To authors they would > appear as a single character but underneath they would in fact be > multi-character strings: >> unicode-range: 'å'; /* could be a-ring or 'a' followed by ring diacritic >> */ > > I understand the complexity, but it seems like one of those cases where we > couldn’t decide what to do, so we did nothing and ended up with very > author-unfriendly syntax. Most use cases would not be ambiguous, so as long > as we define something reasonable for the ones that are, authors would > rarely stumble on any complexity. On the plus side, authors would be able to > use unicode-range without any need to look up unicode tables for the vast > majority of their use cases. For the cases that are ambiguous, when authors > want to do something different than the way we’ve defined stings to work, > they can always use codepoints. Requiring them to use codepoints all the > time because strings might be confusing in some cases does not seem > reasonable to me. > > The way I picture it working, ranges would also be available (such as > "a"-"z"). Single character strings would just be a shortcut to their unicode > codepoint and they could be combined (e.g. ranges like "a"-U+7F). > Multi-character strings would be invalid (including letters followed by > diacritics). > > Thoughts? Is there any other reason this was not allowed, that I missed? > > [1]: http://lists.w3.org/Archives/Public/www-style/2009Jun/0000.html I'm not a fan of the syntax, particularly when you mix codepoint and characters, like "a"-U+7F, which parses as STRING(a) IDENT(-U) DIMENSION(+7, F) rather than the intended STRING(a) DELIM(-) UNICODE-RANGE(U+7F). A function would work better, as "unicode-range('a', u+7f)". I was considering suggesting that we could do normalization, to avoid the "combined character or normal character + combining character", but any normalization would also do unfortunate things with characters like Angstrom (normalized to precomposed A with o-ring). So yeah, just let a string of two characters be invalid. Some keyboards/OSes have trouble typing characters precomposed, but oh well, that's what the codepoint form is for. ~TJ ~TJ
Received on Wednesday, 4 September 2013 17:11:10 UTC