Re: [csswg-drafts] [css‑fonts‑4] Create keywords for `unicode‑range` (#4573) from CSS Meeting Bot via GitHub on 2024-09-25 (public-css-archive@w3.org from September 2024)

From: CSS Meeting Bot via GitHub <sysbot+gh@w3.org>
Date: Wed, 25 Sep 2024 00:12:24 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-2372608901-1727223141-sysbot+gh@w3.org>
The CSS Working Group just discussed ``[css‑fonts‑4] Create keywords for `unicode‑range` ``, and agreed to the following:

* `RESOLVED: Add a set of keywords from SCript and Script Extensions`
* `RESOLVED: Punting on General category for now.`
* `RESOLVED: script categories can be excluded (except for Common)`

<details><summary>The full IRC log of that discussion</summary>
&lt;TabAtkins> astearns: We had a reoslution, but there was continued discussion. Chris, take it away<br>
&lt;TabAtkins> ChrisL: I was getting an idea of what we had consensus on, that was to use unicode proeprties SCript and Script Extension<br>
&lt;TabAtkins> ChrisL: so if Script Extension says its "deva", you'd get all the characters with that extension from the keyword `deva`<br>
&lt;TabAtkins> ChrisL: and you'd get the Common block by default, but with a way to exclude perhaps<br>
&lt;TabAtkins> ChrisL: Also, the resolution covered a way to add some ranges, but people might want to exclude some ranges.<br>
&lt;TabAtkins> addison: so use the 1594 script code to include scripts<br>
&lt;TabAtkins> addison: with maybe special handling for common?<br>
&lt;TabAtkins> ChrisL: common would always be included if not listed, but you could exclude it<br>
&lt;TabAtkins> addison: this is in addition to do codepoints?<br>
&lt;TabAtkins> ChrisL: yes<br>
&lt;fantasai> TabAtkins: basically a shorthand for one or more codepoint ranges<br>
&lt;TabAtkins> addison: have you looked at - this isn't regex, but at the regex unicode categories?<br>
&lt;TabAtkins> addison: people might want character classes<br>
&lt;TabAtkins> addison: also, CLDR's sets of characters by locale or by language<br>
&lt;TabAtkins> addison: maybe a source<br>
&lt;TabAtkins> addison: just trying to think of why people would be using this<br>
&lt;TabAtkins> addison: a common thing iv'e seen is people only wanting to accept certain chars, so only the ones actualyl used by finnish or hungarian. that's a bigger list than just the alphabet. unicode has a list like that.<br>
&lt;TabAtkins> addison: otherwise this seems, well, not unreasonble<br>
&lt;TabAtkins> addison: don't want to sound reticent, no shade<br>
&lt;TabAtkins> addison: jsut want to suggest other places to potentially look<br>
&lt;florian> q?<br>
&lt;fantasai> https://www.w3.org/TR/css-text-4/#character-properties<br>
&lt;TabAtkins> fantasai: we have precedent for including script extensions<br>
&lt;astearns> ack fantasai<br>
&lt;TabAtkins> fantasai: we generically include it in Appendix E of Text, it's the right thing to do pretty much everywhere we reference the Script proeprty<br>
&lt;TabAtkins> fantasai: including common makes sense; ability to exclude common seems interesting but tricky, especially with combining marks<br>
&lt;TabAtkins> ChrisL: yeah, coudln't think of a use-case for it<br>
&lt;TabAtkins> fantasai: yeah, having a hyphen or something probably doesn't want to use a different font<br>
&lt;TabAtkins> fantasai: so my suggestion is not have the common-exclusion ability unless people ask<br>
&lt;TabAtkins> astearns: So do you still want to exclude other keywords?<br>
&lt;TabAtkins> fantasai: seems reasonable, yes<br>
&lt;florian> q+<br>
&lt;TabAtkins> astearns: Big reason to exclude common is if you have a stack, the first font is for Korean, the rest of the stack is for everything else. You'd exclude common from teh Korean font. But you can also do that by flipping the font stack and excluding Korean, instead<br>
&lt;astearns> ack florian<br>
&lt;TabAtkins> florian: Yes, but also affects which fonts line-sizing and units takes from. If it's predominantly from Korean, you might want to take from that font even if there's fallback<br>
&lt;TabAtkins> fantasai: you're more likely to want to exclude punctuation than Common, like combining marks are in Common. You don't want base characters form one font and combining from another.<br>
&lt;TabAtkins> TabAtkins: yeah, like addison said about the regex unicode categories, they have Punctuation<br>
&lt;TabAtkins> fantasai: not full power, you can match on like east-asian width, doesn't seem useful. just some things.<br>
&lt;TabAtkins> addison: Yeah, just looking at it for a few suggestions, not necessarily all. I'm spitballing.<br>
&lt;TabAtkins> fantasai: I think we really only need Script and General category.<br>
&lt;TabAtkins> astearns: So we'd only need Script ,is that excluding script Extensions?<br>
&lt;TabAtkins> fantasai: No, including that. No use of CSS wouldn't want Script Extensions. Our *definition* of the Script proeprty includes that by default.<br>
&lt;TabAtkins> astearns: So can we resovle on using SCript and Script Extensions to create keywords?<br>
&lt;TabAtkins> xfq: Also General?<br>
&lt;TabAtkins> fantasai: We can start from the Script and add a few others as needed<br>
&lt;TabAtkins> astearns: Any objections?<br>
&lt;TabAtkins> RESOLVED: Add a set of keywords from SCript and Script Extensions<br>
&lt;TabAtkins> astearns: now about General<br>
&lt;TabAtkins> fantasai: Yeah, I'm not as sure about Common, if they're trying to include letters but don't get combining marks. But excluding General is okay<br>
&lt;fantasai> s/Common/General Category/<br>
&lt;TabAtkins> xfq: Yeah, and they can always generate a codepoint list if they need<br>
&lt;dbaron> fantasai: (clarifying) including General is bad, excluding General is ok<br>
&lt;TabAtkins> fantasai: *Including* General seems footgun-y, but *excluding* General seems reasoanble.<br>
&lt;TabAtkins> astearns: that's fine by me<br>
&lt;TabAtkins> astearns: anyone want to argue for something more than that now, rather than waiting until it's justified later by requests?<br>
&lt;TabAtkins> RESOLVED: Punting on General category for now.<br>
&lt;TabAtkins> astearns: switching to the question of whether we do "exclusion" as well as inclusion<br>
&lt;TabAtkins> ChrisL: Yes, let's get a resolution<br>
&lt;TabAtkins> fantasai: When excluding, dont' want to exclude Common alongside the others (but including it alongside the specified value is okay)<br>
&lt;TabAtkins> proposed: script and and script exclusions can be excluded (except for Common)<br>
&lt;fantasai> s/and and script exclusions/categories/<br>
&lt;TabAtkins> RESOLVED: script categories can be excluded (except for Common)<br>
&lt;TabAtkins> ChrisL: So extending the grammar will break current impls<br>
&lt;TabAtkins> fantasai: So you declare it twice?<br>
&lt;fantasai> TabAtkins: just like normal<br>
&lt;TabAtkins> ChrisL: Ah so last one that's valid<br>
&lt;fantasai> scribe+<br>
&lt;fantasai> TabAtkins: existing unicode grammar is the worst<br>
&lt;fantasai> TabAtkins: I tried, it cannot be reasonably be described with CSS tokenization rules<br>
&lt;fantasai> TabAtkins: options are special tokenization (which breaks selector u+a)<br>
&lt;fantasai> TabAtkins: or do cusotm parsing of unicode-range , which is what we're doing now<br>
&lt;fantasai> TabAtkins: I suggest keeping to that, and add functional form that expresses with numbers<br>
&lt;fantasai> TabAtkins: and build on that<br>
&lt;kbabbitt> +1<br>
&lt;fantasai> florian: so unicode(\d\d\d\d)<br>
&lt;fantasai> TabAtkins: you can't directly express hex values because might be ident or dimension or number<br>
&lt;fantasai> TabAtkins: but you coudl do xHHHH<br>
&lt;fantasai> astearns: so you're proposing using the same descriptor that has the current unicode-range syntax<br>
&lt;fantasai> astearns: or a new functional value syntax that is cleaner and does what we want<br>
&lt;fantasai> astearns: is that better or worse than having an entirely separate descriptor?<br>
&lt;fantasai> TabAtkins: no opinion<br>
&lt;fantasai> astearns: I think it's probably better to re-use the name; invalidity itneractions are more obvious<br>
&lt;fantasai> ChrisL: agreed<br>
&lt;fantasai> TabAtkins: actually, i change my mind. I have a very strong opinion which is to agree with you<br>
&lt;fantasai> TabAtkins: Right now unicode-range descriptor is special magic syntax<br>
&lt;fantasai> TabAtkins: so sure apply them both<br>
&lt;fantasai> astearns: what do we call the function?<br>
&lt;fantasai> florian: unicode()<br>
&lt;fantasai> fantasai: u()<br>
&lt;fantasai> ChrisL: what about negation?<br>
&lt;fantasai> TabAtkins: a "not" keyword to prefix<br>
&lt;fantasai> dbaron: maybe be more explicit about subsetting the font to only characters in that range?<br>
&lt;fantasai> TabAtkins: maybe just "codepoints()"<br>
&lt;fantasai> florian: I like u()<br>
&lt;fantasai> [some mixup]<br>
&lt;fantasai> s/mixup/mixup about parsing weirdness/<br>
&lt;fantasai> TabAtkins: Oh, actually I mean I disagree with astearns<br>
&lt;fantasai> TabAtkins: we should use a new property<br>
&lt;fantasai> s/property/descriptor name/<br>
&lt;fantasai> ?: Then how do they interact?<br>
&lt;fantasai> TabAtkins: then let's intersect them. Initial value is 'all'<br>
&lt;fantasai> ChrisL: Could also reset unicode-range to all when encountering the new thing<br>
&lt;fantasai> dbaron: then you have a weird ordering dependency<br>
&lt;TabAtkins> fantasai: it would be weird if you set the new thing, then unicode-range<br>
&lt;TabAtkins> fantasai: they're both setting the same thing, it's weird if one invalidates the other<br>
&lt;fantasai> fantasai: Maybe unicode-range and unicode-set, and take tab's suggestion to intersect them<br>
&lt;TabAtkins> addison: table this, since i18n isn't helpful? you're on the right track.<br>
&lt;TabAtkins> addison: A related topic<br>
&lt;addison> https://unicode.org/cldr/charts/45/summary/kam.html<br>
&lt;TabAtkins> addison: you can see that it has sets of chars in use for a language<br>
&lt;TabAtkins> addison: for a locale you can see what's pretty commonly used, if you use that as a range it's similar to what you'd want in a font<br>
&lt;addison> https://unicode.org/cldr/charts/45/summary/ks.html<br>
&lt;TabAtkins> addison: not *as* exhaustive as some things<br>
&lt;TabAtkins> addison: but it kinda describes what your font should support if it's rendering locale=ks, etc<br>
&lt;astearns> ack fantasai<br>
&lt;TabAtkins> fantasai: I think this is more restrictive than what you usually want. you might include words from another lang, and you've dropped chars you wouldn't otherwise drop<br>
&lt;TabAtkins> astearns: Any other comments before line-clamp?<br>
</details>


-- 
GitHub Notification of comment by css-meeting-bot
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/4573#issuecomment-2372608901 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Wednesday, 25 September 2024 00:12:25 UTC