W3C home > Mailing lists > Public > public-css-archive@w3.org > April 2019

[csswg-drafts] [css-text] Should zero width space break Arabic shaping? (#3861)

From: Florian Rivoal via GitHub <sysbot+gh@w3.org>
Date: Mon, 22 Apr 2019 02:56:25 +0000
To: public-css-archive@w3.org
Message-ID: <issues.opened-435580699-1555901783-sysbot+gh@w3.org>
frivoal has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-text] Should zero width space break Arabic shaping? ==
This is probably more of a unicode issue than a css issue, but we have a fair bit of people involved with text layout and i18n over here, so filing it here first to figure out if we should take it to unicode or not.

When writing https://github.com/web-platform-tests/wpt/pull/14673, I had misread the unicode standard, and though that ZERO WIDTH SPACE was supposed to break arabic shaping, based on a table that said "all spacing characters" do so. But there's a distinction between "spacing characters" and "spaces characters", and ZERO WIDTH SPACE is part of the later, not the former. 

https://www.unicode.org/Public/UCD/latest/ucd/ArabicShaping.txt gives further details about which character does what to shaping, and classifies ZERO WIDTH SPACE as T (transparent), which neither forces nor breaks shaping, and just behaves as if it wasn't there for shaping purposes.

So Unicode has a definite answer as to what's supposed to happen, but several people in the thread about my tests were surprised by that answer (including @behdad, @r12a, and myself), because ZERO WIDTH SPACE is used as a word divider, and that suggests it ought to be breaking shaping. @r12a [brought up nastaliq](https://github.com/web-platform-tests/wpt/pull/14673#issuecomment-484388292) as a reasonable use case, because:
> when using nastaliq script, esp. in Urdu, inter-word spaces are often not applied, because words are separated enough by the arrangement of glyphs along the sloping baselines. If you do, however, want to indicate word boundaries in those situations without unsightly spacing, using ZWSP seems to be an obvious way of doing so.

So, what do we collectively think? Is unicode likely enough to be mistaken that we should raise this issue with them? Is there a know good reason for why things are the way they are?

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/3861 using your GitHub account
Received on Monday, 22 April 2019 02:56:27 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 19 October 2021 01:31:07 UTC