Re: NNBSP Impact

Hello Greg,

On 2015/07/15 11:08, Greg Eck wrote:
> Hi Martin,
>
> Thanks for the comment. No one has mentioned the ZWNJ yet. I have found that the ZWNJ is helpful in simulating context in Mongolian examples.

Yes, that's one of its two main usages. The other is for suffixes.


> But probably not what we need here in the case of glue-ing the suffixes together.

I suggest you look at Chapter 9 and Chapter 23.2 of the Unicode Standard.

In particular, I found the following text on page 800 of
http://www.unicode.org/versions/Unicode7.0.0/ch23.pdf:

 >>>>
Zero-Width Spaces and Joiner Characters. The zero-width spaces are not 
to be confused with the zero-width joiner characters. U+200C zero width 
non-joiner and U+200D zero width joiner have no effect on word or line 
break boundaries, and zero width nobreak space and zero width space have 
no effect on joining or linking behavior. The zero-width joiner 
characters should be ignored when determining word or line break
boundaries. See “Cursive Connection” later in this section.
 >>>>

The "ignore word break" is exactly what you are looking for, as far as I 
understand. As for line breaks, I have no idea how the work in 
Mongolian, but if there is something like intra-word linebreaks (with 
hyphenation or similar or without), then that will be handled by the 
language-dependent line breaking logic even if the zero-width non-joiner 
doesn't by default provide a line-break opportunity.

I'm not at all an expert for Mongolian, and so I may be missing 
something. But I think there is a high chance that you will be asked 
similar questions if you send a formal proposal to the UTC, and so it 
may be worth a more careful check.

One thing I was concerned about in my previous mail is that a "zero 
width" non-breaking space would not be wide enough (because at least the 
name suggests that it's smaller than a "narrow" space). However, looking 
at the examples at the SampleOfDagDeg.pdf document, the 'spaces' between 
the stem and the suffix seem to be about the same as the 'spaces' where 
the letters cannot be connected, and would be a font matter anyway, so 
there shouldn't be any serious problems there.

Regards,   Martin.

> Greg
>
>
> -----Original Message-----
> From: Martin J. Dürst [mailto:duerst@it.aoyama.ac.jp]
> Sent: Wednesday, July 15, 2015 9:38 AM
> To: Greg Eck; public-i18n-mongolian@w3.org
> Subject: Re: NNBSP Impact
>
> Hello Greg, others,
>
> To me it looks like the situation for Mongolian suffixes is vaguely familiar to the situation with Persian suffixes that are written with a slight separation. What is used in Persian is the ZERO WIDTH NON-JOINER (ZWNJ). Although it's name includes "zero width", in all the example I have seen there is actually some white space between the characters, i.e. they are not glued together.
>
> I'm sorry if this has already been considered.
>
> Regards,   Martin.
>
> On 2015/07/15 10:15, Greg Eck wrote:
>> I am calling for an a new control character to replace the NNBSP (U+202F) for usage specifically in the Mongolian block (U+1800-18AF).
>> Given our discussion over the past few weeks, it appears that the NNBSP is too generic to handle the specific needs of the Mongolian script in at least the following areas:
>>
>> -          NNBSP (“Narrow Non-Breaking SPace” actually is a space
>>
>> -          The control character needed in the Mongolian Script needs to be a non-space
>>
>> -          Word-count utility breaks as a result of the NNBSP presence
>>
>> -          Spell-checkers have difficulty parsing as the word breaks upon encountering the NNBSP
>>
>> -          Sort routines have the same difficulty
>>
>> -          Word-jumping (as with MS Word CTL-RIGHT/LEFT) breaks due to the space feature inherent to the NNBSP
>>
>> -          Cannot redefine the NNBSP as it is used as a bona fide space in other languages
>>
>> -          Future utilities as yet undefined
>>
>> -          Others?
>> Means of implementation would be specific to the individual font developers.
>> The features of the new character would be very similar to the MVS (U+180E).
>> Suggested code-point: U+180F
>> Suggested name: Mongolian Suffix Separator (to match the similar name
>> Mongolian Vowel Separator) Can I call for individuals to speak up on backing the notion and also for individuals who might not agree with the notion?
>> There is a UTC meeting the end of July – if there is consensus, maybe we could get it on the docket?
>> Greg
>>

Received on Wednesday, 15 July 2015 11:15:29 UTC