Re: i18n-ISSUE-411: Definition of whitespace should come from Unicode from Martin J. Dürst on 2015-03-09 (www-international@w3.org from January to March 2015)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 09 Mar 2015 10:59:56 +0900
To: John Cowan <cowan@mercury.ccil.org>, Eric Prud'hommeaux <eric@w3.org>
CC: Andrew Sullivan <ajs@anvilwalrusden.com>, public-ldp-comments@w3.org, cowan@ccil.org, Steven Atkin <atkin@us.ibm.com>, www-international@w3.org
Message-ID: <54FCFE9C.5060902@it.aoyama.ac.jp>

On 2015/03/09 02:49, John Cowan wrote:
> Eric Prud'hommeaux scripsit:

> If we look in detail at the 25 characters that Unicode says are of type
> WS, here's what we find:

>> The downside is that someone typing in some script with its own
>> whitespace (does that exist?) must use ASCII space, but they have to
>> anyways because all of the language keywords are in ASCII.
>
> Of the three remaining space characters, two are like that:
>
> Ideographic space (U+3000):  yet another fixed-width space, but actually
> heavily used in Japanese text.

"heavily used in Japanese text" is relative. You won't find lots of 
ideographic spaces in Japanese because Japanese as such is written 
without spaces. But on the other hand, the chances are high that if 
there's a space somewhere in Japanese text (mostly for formatting 
purposes) it's an ideographic space.

Two anecdotes from personal experience:
1) When I get a new PC, one thing I do is adjust the settings of the 
Japanese input method. I make sure that it converts space input to the 
ASCII space rather than the ideographic space.
2) When teaching programming, one nasty error students run into is an 
ideographic space instead of an ASCII space. I have to teach them how to 
detect/avoid this error.

Regards,   Martin.

Received on Monday, 9 March 2015 02:00:24 UTC