Re: [csswg-drafts] [css-fonts-5] Make `unicode-range` syntax suck less (#7921) from Addison Phillips via GitHub on 2022-10-23 (public-css-archive@w3.org from October 2022)

From: Addison Phillips via GitHub <sysbot+gh@w3.org>
Date: Sun, 23 Oct 2022 16:45:03 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-1288152032-1666543501-sysbot+gh@w3.org>

You might want to start by looking at what Unicode and ICU have done in this space. For example, the [UnicodeSet](https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/UnicodeSet.html) class in ICU4J is similar to the kinds of "range selection" you're describing here--one can add characters according to various Unicode properties, classes, and scripts to build up ranges, invert ranges, etc.

I think the descriptions in the thread above need to be tighter. Are `greek` and `japanese` supposed to be script names, e.g. equivalent to ISO15924 codes like `Grek` and `Jpan`? Or are they meant to describe specific character sets, such as the `el` (Greek) and `ja` locale exemplary sets in CLDR (such as [this one](https://unicode-org.github.io/cldr-staging/charts/latest/summary/el.html#2703e9d07ab2ef3a))? These kinds of sets definitely *do* intersect in various ways (and most languages use at least some of the "common" script--think punctuation). I'll also call out that Unicode runs all the way to `U+10FFFF`.



-- 
GitHub Notification of comment by aphillips
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/7921#issuecomment-1288152032 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Sunday, 23 October 2022 16:45:05 UTC