W3C home > Mailing lists > Public > public-i18n-mongolian@w3.org > July to September 2015

Re: NNBSP Impact

From: Mansour, Kamal <Kamal.Mansour@monotype.com>
Date: Wed, 15 Jul 2015 23:11:40 +0000
To: "public-i18n-mongolian@w3.org" <public-i18n-mongolian@w3.org>
CC: Greg Eck <greck@postone.net>
Message-ID: <D1CC2940.CC90%kamal.mansour@monotype.com>
Greg,
While I’m not qualified to reply as an expert in Mongolian script, I would like to present my analysis of the requirements for Mongolian as an experienced OpenType implementor.

Since Martin earlier referred to the use of the ZWNJ in Persian, I would like to expand on that a bit. As Martin stated, some white space still does appear between neighboring characters. The observed white space is not due to some characteristic of ZWNJ at all, it is the result of the natural spacing of the sequence of characters. The insertion of ZWNJ after a word causes the  last letter to take on a final shape, while the character after ZWNJ will either take on an initial or final form. In either case, initial and final forms in Persian are always “padded” with built-in white space (“sidebearing” in typographic jargon): final forms include white space on both sides, while initial forms have it on the right only. Consequently, when placed next to each other, a final form will normally separated from the contiguous initial form by a discretely small space without the intromission of any other space character.

The Unicode Standard (Ch. 13.4) states that
U+200C zero width non-joiner (ZWNJ) and U+200D zero width joiner (ZWJ) may be used to select a particular positional form of a letter in isolation or to override the expected positional form within a word. Basically, they evoke the same contextual selection effects in neighboring letters as do non-joining or joining regular letters, but are themselves invisible.
So, ZWNJ and ZWJ are ways of forcing the cursive joining behavior of Mongolian text to something other than the natural choice.

In examining the current use of NNBSP for Noto Mongolian, we do use not the spacing attributes of NNBSP at all; we simply treat it as a trigger to choose a different variant of a particular character. In that sense, it is functionally similar to MVS. Our logic examines the context for a match to a particular pattern including NNBSP or MVS, and when it finds it, it simply replaces one glyph by another. Character spacing is not modified in any way. The natural side bearings of the selected glyph are preserved. I realize that the Unicode Standard (Ch. 6.2) states that “in Mongolian text, the NNBSP is typically displayed with 1/3 the width of a normal space character”, but that didn’t seem necessary in our implementation. Maybe this spacing technique was necessary in hot-metal typesetting. Should we do the same today?

Kamal

From: Greg Eck <greck@postone.net<mailto:greck@postone.net>>
Date: Wednesday, 15 July 2015 at 09:55
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp<mailto:duerst@it.aoyama.ac.jp>>, "public-i18n-mongolian@w3.org<mailto:public-i18n-mongolian@w3.org>" <public-i18n-mongolian@w3.org<mailto:public-i18n-mongolian@w3.org>>
Subject: RE: NNBSP Impact
Resent-From: <public-i18n-mongolian@w3.org<mailto:public-i18n-mongolian@w3.org>>
Resent-Date: Wednesday, 15 July 2015 at 09:55


Hi Martin,

Thank you for your good comments. I have taken some time to review Chapter 23 of the Unicode Standard 7.0 as referenced below. I can see your point somewhat in the possibility of the ZWNJ taking the place of the NNBSP - even though it is a bit non-intuitive. I guess I am against the idea for two reasons. The first is that as the name implies, there is actually to be no space emitted by the rendering system - it is designed to have zero width. However the NNBSP_replacement needs to have space (while at the same time not being space). I say this recognizing the statement that some fonts render the ZWNJ with space. The second reason that I would not go for the idea is that time will probably tell us that we need a character specific to the Mongolian block that we can specifically taylor to the needs of this separation between a STEM+Suffix OR a Suffix+Suffix. If we go for another character that is multi-functional as the ZWNJ is and it fails to serve this new function as a replacement for the NNBSP, then we are in trouble again as we are now. I think we should still call for a completely new character that we can count on for time to come. The MVS was originally created for the sole purpose of separating the stem from the special final A/E. Let's create another sole-purpose character that will do the job specifically of separating the STEM/Suffix and the Suffix/Suffix.

Greg



I have created a spreadsheet as attached showing the features of the MVS as compared to the NNBSP. The differences between the two characters are highlighted in yellow. As the MVS appears to be doing pretty good in the areas where the NNBSP is deficient, I suggest that we study through the MVS features and use the MVS features to model the new NNBSP_replacement character. I do not understand all of the features attached to the MVS as listed. Do we have someone who could analyze the differences and start a features list for the new NNBSP_replacement character?



Thanks,
Greg





-----Original Message-----
From: Martin J. Dürst [mailto:duerst@it.aoyama.ac.jp]
Sent: Wednesday, July 15, 2015 7:15 PM
To: Greg Eck <greck@postone.net<mailto:greck@postone.net>>; public-i18n-mongolian@w3.org<mailto:public-i18n-mongolian@w3.org>
Subject: Re: NNBSP Impact



Hello Greg,



On 2015/07/15 11:08, Greg Eck wrote:

> Hi Martin,

>

> Thanks for the comment. No one has mentioned the ZWNJ yet. I have found that the ZWNJ is helpful in simulating context in Mongolian examples.



Yes, that's one of its two main usages. The other is for suffixes.





> But probably not what we need here in the case of glue-ing the suffixes together.



I suggest you look at Chapter 9 and Chapter 23.2 of the Unicode Standard.



In particular, I found the following text on page 800 of

http://www.unicode.org/versions/Unicode7.0.0/ch23.pdf:




>>>>

Zero-Width Spaces and Joiner Characters. The zero-width spaces are not to be confused with the zero-width joiner characters. U+200C zero width non-joiner and U+200D zero width joiner have no effect on word or line break boundaries, and zero width nobreak space and zero width space have no effect on joining or linking behavior. The zero-width joiner characters should be ignored when determining word or line break boundaries. See “Cursive Connection” later in this section.

>>>>



The "ignore word break" is exactly what you are looking for, as far as I understand. As for line breaks, I have no idea how the work in Mongolian, but if there is something like intra-word linebreaks (with hyphenation or similar or without), then that will be handled by the language-dependent line breaking logic even if the zero-width non-joiner doesn't by default provide a line-break opportunity.



I'm not at all an expert for Mongolian, and so I may be missing something. But I think there is a high chance that you will be asked similar questions if you send a formal proposal to the UTC, and so it may be worth a more careful check.



One thing I was concerned about in my previous mail is that a "zero width" non-breaking space would not be wide enough (because at least the name suggests that it's smaller than a "narrow" space). However, looking at the examples at the SampleOfDagDeg.pdf document, the 'spaces' between the stem and the suffix seem to be about the same as the 'spaces' where the letters cannot be connected, and would be a font matter anyway, so there shouldn't be any serious problems there.



Regards,   Martin.



> Greg

>

>

> -----Original Message-----

> From: Martin J. Dürst [mailto:duerst@it.aoyama.ac.jp]

> Sent: Wednesday, July 15, 2015 9:38 AM

> To: Greg Eck; public-i18n-mongolian@w3.org<mailto:public-i18n-mongolian@w3.org>

> Subject: Re: NNBSP Impact

>

> Hello Greg, others,

>

> To me it looks like the situation for Mongolian suffixes is vaguely familiar to the situation with Persian suffixes that are written with a slight separation. What is used in Persian is the ZERO WIDTH NON-JOINER (ZWNJ). Although it's name includes "zero width", in all the example I have seen there is actually some white space between the characters, i.e. they are not glued together.

>

> I'm sorry if this has already been considered.

>

> Regards,   Martin.

>

> On 2015/07/15 10:15, Greg Eck wrote:

>> I am calling for an a new control character to replace the NNBSP (U+202F) for usage specifically in the Mongolian block (U+1800-18AF).

>> Given our discussion over the past few weeks, it appears that the NNBSP is too generic to handle the specific needs of the Mongolian script in at least the following areas:

>>

>> -          NNBSP (“Narrow Non-Breaking SPace” actually is a space

>>

>> -          The control character needed in the Mongolian Script needs to be a non-space

>>

>> -          Word-count utility breaks as a result of the NNBSP presence

>>

>> -          Spell-checkers have difficulty parsing as the word breaks upon encountering the NNBSP

>>

>> -          Sort routines have the same difficulty

>>

>> -          Word-jumping (as with MS Word CTL-RIGHT/LEFT) breaks due to the space feature inherent to the NNBSP

>>

>> -          Cannot redefine the NNBSP as it is used as a bona fide space in other languages

>>

>> -          Future utilities as yet undefined

>>

>> -          Others?

>> Means of implementation would be specific to the individual font developers.

>> The features of the new character would be very similar to the MVS (U+180E).

>> Suggested code-point: U+180F

>> Suggested name: Mongolian Suffix Separator (to match the similar name

>> Mongolian Vowel Separator) Can I call for individuals to speak up on backing the notion and also for individuals who might not agree with the notion?

>> There is a UTC meeting the end of July – if there is consensus, maybe we could get it on the docket?

>> Greg

>>
Received on Wednesday, 15 July 2015 23:12:23 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:07:04 UTC