- From: Lea Verou <lea@verou.me>
- Date: Wed, 4 Sep 2013 19:40:21 +0300
- To: www-style list <www-style@w3.org>
- Message-Id: <397583B8-075A-4841-A828-0048FA76912B@verou.me>
Today’s telcon discussion on unicode-range reminded me of most authors’ biggest gripes with it: That ranges need to be defined as unicode codepoints instead of strings, requiring a unicode table lookup for every single use of that descriptor. Strings seemed like a no-brainer: Figuring out the unicode codepoint for a specific character is something machines do much better than humans. I researched it and it seems that the reason this was not allowed was this [1]: > Makes sense but I think the implementation details could easily get a bit hairy, you would end up with ambiguous situations involving things like combining diacritics and shaped vowels. To authors they would appear as a single character but underneath they would in fact be multi-character strings: > unicode-range: 'å'; /* could be a-ring or 'a' followed by ring diacritic */ I understand the complexity, but it seems like one of those cases where we couldn’t decide what to do, so we did nothing and ended up with very author-unfriendly syntax. Most use cases would not be ambiguous, so as long as we define something reasonable for the ones that are, authors would rarely stumble on any complexity. On the plus side, authors would be able to use unicode-range without any need to look up unicode tables for the vast majority of their use cases. For the cases that are ambiguous, when authors want to do something different than the way we’ve defined stings to work, they can always use codepoints. Requiring them to use codepoints all the time because strings might be confusing in some cases does not seem reasonable to me. The way I picture it working, ranges would also be available (such as "a"-"z"). Single character strings would just be a shortcut to their unicode codepoint and they could be combined (e.g. ranges like "a"-U+7F). Multi-character strings would be invalid (including letters followed by diacritics). Thoughts? Is there any other reason this was not allowed, that I missed? [1]: http://lists.w3.org/Archives/Public/www-style/2009Jun/0000.html – Lea Verou • lea.verou.me • @leaverou
Received on Wednesday, 4 September 2013 16:40:52 UTC