Re: [csswg-drafts] [css-text-4] Bikeshedding word-boundary-expansion (#7385)

I have looked into interpuncts, and changed my mind about them.

<details>
<summary>Click to expand a detailed explanation of why</summary>

It is tempting to support conversions to and from interpunct as well, primarily for the sake of modern vs ancient renditions of Latin text: there was a shift in usage of the Latin script from word separation with interpunct in the classical Roman period, to no delimiter in late antiquity, to space-separated words in the early middle age, to space-separated words with punctuation around the renaissance, and a number of variants along the way.

However, switching from one style to the another involves more than just swapping one type of space with another, and would also require punctuation transformation, and a few other things.

For instance, here is a modern rendition of the beginning of <a href="https://en.wikipedia.org/wiki/Res_Gestae_Divi_Augusti"><cite>Res Gestae Divi Augusti</cite></a>, followed by a classical one.

<blockquote>
Annos undeviginti natus exercitum privato consilio et privata impensa comparavi, per quem rem publicam dominatione factionis oppressam in libertatem vindicavi. Quas ob res senatus decretis honorificis in ordinem suum me adlegit C.&nbsp;Pansa A.&nbsp;Hirtio consulibus, consula rem locum sententiae dicendae simul dans, et imperium mihi dedit.
</blockquote>

<blockquote>
ANNOS·&#x200B;VNDEVIGINTI·&#x200B;NATVS·&#x200B;EXERCITVM·&#x200B;PRIVATO·&#x200B;CONSILIO·&#x200B;ET·&#x200B;PRIVATA·&#x200B;IMPENSA·&#x200B;COMPARAVI·&#x200B;PER·&#x200B;QVEM·&#x200B;REM·&#x200B;PVBLICAM·&#x200B;DOMINATIONE·&#x200B;FACTIONIS·&#x200B;OPPRESSAM·&#x200B;IN·&#x200B;LIBERTATEM·&#x200B;VINDICAVI· QUAS·&#x200B;OB·&#x200B;RES·&#x200B;SENATVS·&#x200B;DECRETIS·&#x200B;HONORIFICIS·&#x200B;IN·&#x200B;ORDINEM·&#x200B;SVVM·&#x200B;ME·&#x200B;ADLEGIT·&#x200B;C·PANSA·&#x200B;A·HIRTIO·&#x200B;CONSVLIBVS·&#x200B;CONSVLA·&#x200B;REM·&#x200B;LOCVM·&#x200B;SENTENTIAE·&#x200B;DICENDAE·&#x200B;SIMVL·&#x200B;DANS·&#x200B;ET·&#x200B;IMPERIVM·&#x200B;MIHI·&#x200B;DEDIT· 
</blockquote>

This involves transforming:
* lone spaces into interpunct+zero-width-space
* comma+space into interpunct+zero-width-space
* period+space into interpunct+space (or interpunct+zero-width-space, depending on style)
* period+NBSP into interpunct (or interpunct+zero-width-space, depending on style)
* not shown in this example, but ideally trailing interpuncts at the end of a line should be removed (and possibly `word-break: break-all` should be applied, depending on style).
* lower case to upper case (which can be handled with `text-transform`), alongside u to V and j to I (which `text-transform` theoretically could handle, but doesn't)
* not shown in this example, but if the text had been written to indicate long vowels, transforming from modern to classical would also involve transforming marcons to apices, except for ī that maps to ꟾ (U+A7FE), so the first two words `Annōs ūndēvīgintī` become `ANNÓS·​V́NDÉVꟾGINTꟾ`.

This seems beyond the reasonable scope of this property, and the precise rules might even need to be fine tuned for the particular content and styles in question, making it impractical to provide a generic built-in transform.

Interpunct in other languages is typically used for different purposes, so if it cannot be done for Latin, it's not worth doing at all.
</details>

TLDR: Transformations from zwsp or space to interpuct, or the other way around, would either be excessively complex, or not practical to use, or both, and even though I was tempted, I think we should not attempt them here.

-- 
GitHub Notification of comment by frivoal
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/7385#issuecomment-1684987995 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Saturday, 19 August 2023 14:21:25 UTC