- From: Richard Wordingham <richard.wordingham@ntlworld.com>
- Date: Wed, 29 Jul 2015 22:32:10 +0100
- To: public-i18n-mongolian@w3.org, unicore@unicode.org
(I've copied this to the UniCore list in case the discussion moves from there to the general Unicode list rather than to the public-i18n-mongolian list.) Badral S. wrote on Wed, 29 Jul 2015 at 20:50:10 +0900 > I do not know france. > When france word and mongolian word connected with NNBSP, the NNBSP > belong to which one ? This case exists in Mongolian document like > mongolian people studing france language. (asume the france languauge > need NNBSP) The present word-break property value of NNBSP is "Other". With this property, there is a word break on either side of it, so there would be three items: 1) The French word. 2) The NNBSP - not a word. 3) The Mongolian word. French can use NNBSP to provide extra spacing between a following word and punctuation, such as a full stop (.) or a comma (,). I do not believe it uses it to separate words. If French used U+2009 THIN SPACE instead, there could be a new line break before the punctuation, which would be wrong for French. Therefore the French, or rather, those of them who care about such small details, have apparently been using NNBSP. Now, if the word-break property of NNBSP were given the value it should have been given in the first place, "MidLetter", we will see the following word breaking patterns: For Mongolian word, NNBSP, Mongolian suffix: 1 word, = Word + NNBSP + suffix. For French word, NNBSP, Mongolian suffix: 1 word, = French word + NNBSP + Mongolain suffix. For French word, NNBSP, comma, there will be three items: 1) The French word. 2) The NNBSP - not a word. 3) Comma - not a word. I believe these are the desired outcomes. None of the Unicode *rules* have to change; all that has to change is one of the the data files. (A list in UAX#29 would also be changed for consistency.) For programs that use ICU for word-breaking, the change would occur when they update to the version of ICU released after the change to the Unicode Character Database (UCD). As to what happens for other programs, that is unpredictable. They would change after the UCD changes, but there can be a long delay. At least Windows 10 users will not have to upgrade to another version of Windows. Richard.
Received on Wednesday, 29 July 2015 22:31:03 UTC