RE: Emphasis skip property? [W3C I18N Action #99]

Thanks Markus and Mark.

 

I did not expect that there would actually be a change to the GC (which is why I wrote the sentence in the circuitous way that I did). However, to Mark’s point, the effect needed is similar to “NotReallyPunctuation”.

 

My real question is: how do we get this on the agenda for the UTC in a future version of Unicode? And how do we track it? Do we need to produce some kind of proposal? Periodic emails are probably not the answer…

 

Addison

 

From: Mark Davis Ⓤ <mark@unicode.org> 
Sent: Friday, May 24, 2024 2:28 PM
To: Markus Scherer <markus.icu@gmail.com>
Cc: Addison Phillips <addisoni18n@gmail.com>; petercon@unicode.org; craig@unicode.org; asmus@unicode.org; Ken Whistler <kenwhistler@sonic.net>; public-i18n-core@w3.org; Florian Rivoal <florian@rivoal.net>; fantasai@inkedblade.net; unicoRe@unicode.org
Subject: Re: Emphasis skip property? [W3C I18N Action #99]

 

That may sound like a joke, and the name I mentioned certainly is, but we have done similar things before to address issues where compatibility constraints came into play.

 

On Fri, May 24, 2024 at 2:21 PM Mark Davis Ⓤ <mark@unicode.org <mailto:mark@unicode.org> > wrote:

I proposed changing some of the obvious errors in Po some years ago, but there was concern about disruption. I suppose we could have a NotReallyPunctuation property...

 

On Fri, May 24, 2024 at 12:12 PM Markus Scherer <markus.icu@gmail.com <mailto:markus.icu@gmail.com> > wrote:

On Fri, May 24, 2024 at 11:28 AM Addison Phillips <addisoni18n@gmail.com <mailto:addisoni18n@gmail.com> > wrote:

The issue pertains to the use of emphasis marks (e.g. Japanese bouten). It is customary to skip punctuation characters in these emphasis systems. See [2] and [3] below for specific text (where there is a list of symbols affected).

 

CSS found that the Unicode general categories don’t align nicely with which characters to skip. W3C doesn’t want to maintain the list of characters to skip/not skip: it would probably make more sense for Unicode to maintain it. Participants speculate that this might be achieved by splitting a general category or via some Unicode property (or some other mechanism).

 

Splitting a general category is verboten.

https://www.unicode.org/policies/stability_policy.html#Property_Value

The enumeration of General_Category property values is fixed. No new values will be added.

 

It sounds like one of the things you are asking is whether Unicode would change the General_Category of #%‰&@... from Po to So. Is that right?

General_Category values can be changed, but the distinction between punctuation and symbols is fuzzy, and changing gc values for commonly used characters (especially ASCII) can be very disruptive.

 

markus

Received on Sunday, 26 May 2024 15:33:55 UTC