- From: Greg Eck <greck@postone.net>
- Date: Wed, 15 Jul 2015 16:55:01 +0000
- To: Martin J. Dürst <duerst@it.aoyama.ac.jp>, "public-i18n-mongolian@w3.org" <public-i18n-mongolian@w3.org>
- Message-ID: <BN3PR10MB0321C0B7B114992B9C2D87B2AF9A0@BN3PR10MB0321.namprd10.prod.outlook.com>
Hi Martin, Thank you for your good comments. I have taken some time to review Chapter 23 of the Unicode Standard 7.0 as referenced below. I can see your point somewhat in the possibility of the ZWNJ taking the place of the NNBSP - even though it is a bit non-intuitive. I guess I am against the idea for two reasons. The first is that as the name implies, there is actually to be no space emitted by the rendering system - it is designed to have zero width. However the NNBSP_replacement needs to have space (while at the same time not being space). I say this recognizing the statement that some fonts render the ZWNJ with space. The second reason that I would not go for the idea is that time will probably tell us that we need a character specific to the Mongolian block that we can specifically taylor to the needs of this separation between a STEM+Suffix OR a Suffix+Suffix. If we go for another character that is multi-functional as the ZWNJ is and it fails to serve this new function as a replacement for the NNBSP, then we are in trouble again as we are now. I think we should still call for a completely new character that we can count on for time to come. The MVS was originally created for the sole purpose of separating the stem from the special final A/E. Let's create another sole-purpose character that will do the job specifically of separating the STEM/Suffix and the Suffix/Suffix. Greg I have created a spreadsheet as attached showing the features of the MVS as compared to the NNBSP. The differences between the two characters are highlighted in yellow. As the MVS appears to be doing pretty good in the areas where the NNBSP is deficient, I suggest that we study through the MVS features and use the MVS features to model the new NNBSP_replacement character. I do not understand all of the features attached to the MVS as listed. Do we have someone who could analyze the differences and start a features list for the new NNBSP_replacement character? Thanks, Greg -----Original Message----- From: Martin J. Dürst [mailto:duerst@it.aoyama.ac.jp] Sent: Wednesday, July 15, 2015 7:15 PM To: Greg Eck <greck@postone.net>; public-i18n-mongolian@w3.org Subject: Re: NNBSP Impact Hello Greg, On 2015/07/15 11:08, Greg Eck wrote: > Hi Martin, > > Thanks for the comment. No one has mentioned the ZWNJ yet. I have found that the ZWNJ is helpful in simulating context in Mongolian examples. Yes, that's one of its two main usages. The other is for suffixes. > But probably not what we need here in the case of glue-ing the suffixes together. I suggest you look at Chapter 9 and Chapter 23.2 of the Unicode Standard. In particular, I found the following text on page 800 of http://www.unicode.org/versions/Unicode7.0.0/ch23.pdf: >>>> Zero-Width Spaces and Joiner Characters. The zero-width spaces are not to be confused with the zero-width joiner characters. U+200C zero width non-joiner and U+200D zero width joiner have no effect on word or line break boundaries, and zero width nobreak space and zero width space have no effect on joining or linking behavior. The zero-width joiner characters should be ignored when determining word or line break boundaries. See “Cursive Connection” later in this section. >>>> The "ignore word break" is exactly what you are looking for, as far as I understand. As for line breaks, I have no idea how the work in Mongolian, but if there is something like intra-word linebreaks (with hyphenation or similar or without), then that will be handled by the language-dependent line breaking logic even if the zero-width non-joiner doesn't by default provide a line-break opportunity. I'm not at all an expert for Mongolian, and so I may be missing something. But I think there is a high chance that you will be asked similar questions if you send a formal proposal to the UTC, and so it may be worth a more careful check. One thing I was concerned about in my previous mail is that a "zero width" non-breaking space would not be wide enough (because at least the name suggests that it's smaller than a "narrow" space). However, looking at the examples at the SampleOfDagDeg.pdf document, the 'spaces' between the stem and the suffix seem to be about the same as the 'spaces' where the letters cannot be connected, and would be a font matter anyway, so there shouldn't be any serious problems there. Regards, Martin. > Greg > > > -----Original Message----- > From: Martin J. Dürst [mailto:duerst@it.aoyama.ac.jp] > Sent: Wednesday, July 15, 2015 9:38 AM > To: Greg Eck; public-i18n-mongolian@w3.org<mailto:public-i18n-mongolian@w3.org> > Subject: Re: NNBSP Impact > > Hello Greg, others, > > To me it looks like the situation for Mongolian suffixes is vaguely familiar to the situation with Persian suffixes that are written with a slight separation. What is used in Persian is the ZERO WIDTH NON-JOINER (ZWNJ). Although it's name includes "zero width", in all the example I have seen there is actually some white space between the characters, i.e. they are not glued together. > > I'm sorry if this has already been considered. > > Regards, Martin. > > On 2015/07/15 10:15, Greg Eck wrote: >> I am calling for an a new control character to replace the NNBSP (U+202F) for usage specifically in the Mongolian block (U+1800-18AF). >> Given our discussion over the past few weeks, it appears that the NNBSP is too generic to handle the specific needs of the Mongolian script in at least the following areas: >> >> - NNBSP (“Narrow Non-Breaking SPace” actually is a space >> >> - The control character needed in the Mongolian Script needs to be a non-space >> >> - Word-count utility breaks as a result of the NNBSP presence >> >> - Spell-checkers have difficulty parsing as the word breaks upon encountering the NNBSP >> >> - Sort routines have the same difficulty >> >> - Word-jumping (as with MS Word CTL-RIGHT/LEFT) breaks due to the space feature inherent to the NNBSP >> >> - Cannot redefine the NNBSP as it is used as a bona fide space in other languages >> >> - Future utilities as yet undefined >> >> - Others? >> Means of implementation would be specific to the individual font developers. >> The features of the new character would be very similar to the MVS (U+180E). >> Suggested code-point: U+180F >> Suggested name: Mongolian Suffix Separator (to match the similar name >> Mongolian Vowel Separator) Can I call for individuals to speak up on backing the notion and also for individuals who might not agree with the notion? >> There is a UTC meeting the end of July – if there is consensus, maybe we could get it on the docket? >> Greg >>
Attachments
- application/oleobject attachment: Comparision of Ctl Characters4.ods
Received on Wednesday, 15 July 2015 16:55:33 UTC