W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2008

Re: [UAX29] i18n comment 18: Scripts without spaces

From: Addison Phillips <addison@yahoo-inc.com>
Date: Fri, 07 Mar 2008 14:04:08 -0800
Message-ID: <47D1BBD8.3050500@yahoo-inc.com>
To: ishida@w3.org
CC: public-i18n-core@w3.org

ishida@w3.org wrote:
> Comment from the i18n review of:
> http://www.unicode.org/reports/tr29/tr29-12.html
> 
> Comment 18
> At http://www.w3.org/International/reviews/0801-uax29/
> Editorial/substantive: E
> Tracked by: AP
> 
> Location in reviewed document:
> 4 [http://www.unicode.org/reports/tr29/tr29-12.html#Word_Boundaries]
> 
> Comment:All of the examples include space-separated languages. No mention is made of the fact that some languages don't use spaces between words, which we think is an extremely important point to make. It should be explicitly mentioned here and possibly an example given.
> 
> 

In reviewing the text, this note seems to address this comment:

--
For Thai, Lao, Khmer, Myanmar, and other scripts that do not use 
typically use spaces between words, a good implementation should not 
just depend on the default word boundary specification, but should use a 
more sophisticated mechanism, as is also required for line breaking. 
Ideographic scripts such as Japanese and Chinese are even more complex. 
Where Hangul text is written without spaces, the same applies. However, 
in the absence of such a more sophisticated mechanism, the rules 
specified in this annex at least supply a well-defined default.
--

On the other hand, I think it would be useful somewhere in the 
introductory area: too many programmers make assumptions about 
word-breaking behavior. So I would suggest adding something like the 
sentence marked ** to the second paragraph in Section 4 so that it reads 
like:

--
Word boundaries can also be used in intelligent cut and paste. With this 
feature, if the user cuts a selection of text on word boundaries, 
adjacent spaces are collapsed to a single space. For example, cutting 
“quick” from “The_quick_fox” would leave “The_ _fox”. Intelligent cut 
and paste collapses this text to “The_fox”. **Note that word break 
boundaries are not restricted to whitespace and punctuation. Indeed, 
some languages do not use spaces at all.** Figure 1 gives an example of 
word boundaries.
--


Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG

Internationalization is an architecture.
It is not a feature.
Received on Friday, 7 March 2008 22:04:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 October 2008 10:18:53 GMT