RE: Emphasis mark skipping (i18n-action#49 csswg-drafts#839) from Addison Phillips on 2023-10-06 (public-i18n-core@w3.org from October to December 2023)

From: Addison Phillips <addisoni18n@gmail.com>
Date: Fri, 6 Oct 2023 11:42:51 -0700
To: "'Ken Whistler'" <kenwhistler@sonic.net>, "'Robin Leroy'" <eggrobin@unicode.org>
Cc: <unicoRe@unicode.org>, 'Mark Davis Ⓤ' <mark@unicode.org>, <public-i18n-core@w3.org>
Message-ID: <0a0e01d9f884$e9da4e30$bd8eea90$@gmail.com>
Thanks Ken, that’s helpful.

 

I think CSS’s concern is not to overstep Unicode’s perceived authority and a desire to have Unicode handle any/all “character related stuff” rather than defining it in this or that W3C document. I am, of course, aware that Unicode is not some vast army of otherwise-unused encoding experts waiting to seize the opportunity to do useful work. What you suggest sounds like a reasonable path for folks (such as those involved with CSS) to publish material into the Unicode space (where it can help others) rather than burying it in some random specification.

 

Let’s see how UTC responds and, for that matter, the CSS folks react.

 

Addison

 

From: Ken Whistler <kenwhistler@sonic.net> 
Sent: Friday, October 6, 2023 9:14 AM
To: Addison Phillips <addisoni18n@gmail.com>; 'Robin Leroy' <eggrobin@unicode.org>
Cc: unicoRe@unicode.org; 'Mark Davis Ⓤ' <mark@unicode.org>; public-i18n-core@w3.org
Subject: Re: Emphasis mark skipping (i18n-action#49 csswg-drafts#839)

 

Addison,

And the Catch-22 here is that the Unicode Consortium doesn't want to be in the business of maintaining what amounts effectively to an infinite list of possible interesting lists of characters specific to various application areas outside our control.

The way around this is for interested parties who have a pressing enough need in a specific area to write what amounts to a mini-specification concerning that area and then develop associated data files to support implementations in that specific area. That is how the mathematicians have been proceeding, for example, in trying to develop agreed upon sets of math characters, behaviors, and specific lists of properties relevant to their implementations:

https://www.unicode.org/reports/tr25/

http://www.unicode.org/Public/math/revision-15/

Those are then published as part of the collection of Unicode specifications, even though math is outside the core competence of the UTC.

A relatively easy way to start down this road would be to put together a Unicode Technical Note (which doesn't require formal review and approval by the UTC) explaining the problem it addresses and providing some kind of solution -- which might include a specific property definition and/or lists or derivations. Then an external specification could refer to that UTN for what it wants. It is a lightweight way of kicking the can over the fence, as it were.

We have increasing started taking that route as a way for specialists to write up detailed implementation models for complex scripts that exceeded what we could reasonably add to the core specification. See for example, UTN #48, Implementing Kawi, or UTN #51, Musical Symbols and Sasak Characters in the Balinese Script. And Ken Lunde recently wrote up a UTN to provide a detailed and complete specification of a new, complicated Unihan database property, kStrange.

There is no reason, in principle, why a UTN could not have an accompanying data file that could then be maintained by whoever owns that UTN's content. Or, if the problem it addresses is of wide enough concern, such a UTN could also, in principle, be graduated to UTR status to get more formal review and approval and with its data posted in a more generic location for external specifications to pick up.

So my suggestion would be for somebody familiar with the requirements and implementation details to try writing up a short UTN on Japanese wakiten practice (and kindred emphasis systems in other East Asian writing systems) with a short analysis of how the required behavior articulates against Unicode character property assignments. That would raise the visibility of the problem in the Unicode context, outside private threads discussing the maintenance of CSS. If folks pursue an avenue like that, it could end up eventually with what you are after -- a list formally maintained in some context in the Unicode Consortium's stable of specifications, so it wouldn't have to be carried around as an exception list in CSS code.

--Ken

P.S. Again the caveat -- these are my personal ruminations. None of this has been presented to the UTC yet.

On 10/6/2023 7:52 AM, Addison Phillips wrote:

I do not know whether this approach is practical for CSS.

 

CSS doesn’t want to be in the business of making lists of characters and their properties. I think you could read this as a request that *Unicode* make such a list/derived property. I note that there is also this CLDR issue we recently filed (which doesn’t seem like a CLDR problem to me): https://unicode-org.atlassian.net/browse/CLDR-17044, about a mapping that CSS maintains of small kana to kana and which looks pretty similar to this.
Received on Friday, 6 October 2023 18:42:58 UTC