Re: [charmod-norm] Does ZWJ/ZWNJ affect meaning?

On 1/15/2016 1:32 AM, r12a via GitHub wrote:
> r12a has just created a new issue for
> https://github.com/w3c/charmod-norm:
>
> == Does ZWJ/ZWNJ affect meaning? ==
> section 2.4 of the document[1] says
>
> "A special case are the Unicode control characters U+200D Zero Width
> Joiner (also known as ZWJ) and U+200C Zero Width Non-Joiner (also
> known as ZWNJ). These invisible controls sometimes do affect the
> meaning of characters sequences where they appear, although their
> usual use is to control ligature formation— either preventing the
> formation of undesirable ligatures or encouraging the formation for
> desirable ones."
>
> is this correct, ie. that the presence of ZWJ or ZWNJ can actually
> change the meaning of the text. If so, do we have an example?

The presence of a ligature can imply that there's no word boundary in a 
compound word at that location. There are cases where two different 
compounds have the same spelling, but differ in the way they break down. 
Requesting a ligature at a specific point would imply that that location 
isn't a word boundary and thus force the reader to disregard any reading 
that would place a word boundary there.

Wach-stube
Wachs-tube

is a canonical example from German for this kind of discussion. 
Requesting an "st" ligature (with a ZWJ) would eliminate the bottom reading.

The rules of typesetting are in the process of being "dumbed down" to 
match the actual behavior of software (which cannot know the intended 
meaning). When that process is complete, ligatures and semantics may 
become orthogonal for some scripts, but in principle they are not.

Now, in many other scripts, the ZWJ / ZWNJ are required part of spelling 
of certain words, for example the native name for Sri Lanka. In cases 
like this, leaving out a ZWJ/ZWNJ changes the spelling from correct to 
incorrect. Whether, like the German example, the incorrect spelling can 
ever be mistaken for the correct spelling of a different word (or 
phrase) is a separate issue.

It's enough to know that they are part of the orthography itself, not 
"merely" a stylistic element intended for fine-tuning the typography.

A./
>
> [1] http://w3c.github.io/charmod-norm/#unicodeControls
>
> See https://github.com/w3c/charmod-norm/issues/44
> Further comments on this issue will NOT be notified to this list. If
> you'd like to follow the discussion, please do so by subscribing to
> the issue via the above link. Do not reply to this email.
>
>

Received on Friday, 15 January 2016 16:58:20 UTC