[csswg-drafts] [css-text] 'Punctuation space' handling for non-CJK languages (#8661)

r12a has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-text] 'Punctuation space' handling for non-CJK languages ==
I've been trying to get my head around the ramifications of spaces before punctuation for a while now, because i think the implications go well beyond the case of French, but i still need to marshal more evidence. However, here are some thoughts.

To be honest, I'm inclined, at this point, to wonder whether this should be disunified from the autospace property. Ways in which this punctuation spacing differ conceptually from CJK autospace include the following:

1. punctuation spacing does not add balanced spaces around embedded items (which are from a different script)
2. punctuation spacing usually involves a requirement not to wrap the punctuation mark separately onto the following line during text wrap
3. the size of the gap is not the same as for autospacing, and may vary according to the script (or possibly the punctuation mark) in question


Here are some examples where punctuation marks may need to be separated by a gap from the preceding text. The examples are taken from a cursory browse through my orthography notes.

- N'Ko text may contain commas that are separated by a gap from the end of the previous word: see https://r12a.github.io/scripts/nkoo/nqo.html?showIndex#phrase
- Newar, ditto for danda and double danda: see https://r12a.github.io/scripts/newa/new.html?showIndex#phrase
- Odia, ditto for danda and double danda (to avoid serious ambiguity): see https://r12a.github.io/scripts/orya/or.html?showIndex#phrase
- Adlam text appears to separate most punctuation marks from preceding text: see https://r12a.github.io/scripts/adlm/fuf.html?showIndex#phrase
- Punjabi danda: see https://r12a.github.io/scripts/guru/pa.html?showIndex#phrase
- Santali danda & double danda: see https://r12a.github.io/scripts/olck/sat.html?showIndex#phrase
- Thai repetition mark is optionally preceded by a fixed width, non breaking gap: see https://r12a.github.io/scripts/thai/th.html?showIndex#phrase
- Tibetan often has a gap before or between punctuation marks: see https://r12a.github.io/scripts/tibt/bo.html?showIndex#phrase
- I have also come across Hindi and Mongolian texts where ordinary spaces have been used to move sentence-final punctuation away from the preceding text. 

Then there are characters with a PR line-break property, which signals that they should not be wrapped when alongside a sequence of digits, even if separated by a space. I suspect that for several of these, the space may depend on the preference of the content author, or may benefit from a CSS directive that can standardised or change the approach taken throughout a document.



The size of the gap will vary from language to language, and maybe punctuation mark to punctuation mark within a given language.  But browsers may also apply different gap sizes, if they feel that their size better meets the needs of their users.

The replace function will still be valuable. In some cases a gap may be built into the font glyph for the punctuation, in other cases it is created by the author inserting a regular space. In other use cases, no gap may be built in to the font but application of an appropriate gap varies from content author to content author, or may even vary within the same document. A CSS property that produces standardised gaps across a document, where needed depending on the font or the author preferences, would seem like a useful thing to have.  If that property could also prompt a user agent to apply intelligence to establish the correct sized gap and wrapping behaviour for a given punctuation mark or symbol for a given language (there probably aren't that many to consider) that would be an additional benefit.

In summary:

[1] should this behaviour be bundled with the somewhat different CJK autospace behaviour?

[2] perhaps we should edit the current spec text at https://www.w3.org/TR/css-text-4/#valdef-text-autospace-punctuation to make it clear that French is just one possible application among many, and make the case that browsers can begin addressing the needs of other languages right now without waiting for 'future specifications'. I'm not sure whether there needs to be a registry for such information.

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/8661 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Thursday, 30 March 2023 13:29:53 UTC