Re: XML Blueberry (non-ASCII name characters in Japan) (fwd)

----- Forwarded message from Murata Makoto -----

From xml-dev-errors@lists.xml.org Sat Jul 07 12:47:44 2001
Envelope-to: cowan@mercury.ccil.org
Received: from one.elistx.com ([209.116.252.130])
	by mercury.ccil.org with esmtp (Exim 3.12 #1 (Debian))
	id 15IvF9-0001D4-00
	for <cowan@mercury.ccil.org>; Sat, 07 Jul 2001 12:47:44 -0400
Received: from CONVERSION-DAEMON.eListX.com by eListX.com (PMDF V6.0-24 #44856)
	id <0GG400K013TIHO@eListX.com> for cowan@mercury.ccil.org; Sat,
	07 Jul 2001 12:44:25 -0400 (EDT)
Received: from ELIST-DAEMON.eListX.com by eListX.com (PMDF V6.0-24 #44856)
	id <0GG400K043TGHI@eListX.com> (original mail from mura034@attglobal.net)
	; Sat, 07 Jul 2001 12:44:05 -0400 (EDT)
Received: from CONVERSION-DAEMON.eListX.com by eListX.com (PMDF V6.0-24 #44856)
	id <0GG400K013TFHG@eListX.com> for xml-dev@elist.lists.xml.org
	(ORCPT xml-dev@lists.xml.org); Sat, 07 Jul 2001 12:44:03 -0400 (EDT)
Received: from DIRECTORY-DAEMON.eListX.com by eListX.com (PMDF V6.0-24 #44856)
	id <0GG400K013TEHF@eListX.com> for xml-dev@elist.lists.xml.org
	(ORCPT xml-dev@lists.xml.org); Sat, 07 Jul 2001 12:44:02 -0400 (EDT)
Received: from prserv.net (out2.prserv.net [32.97.166.32])
	by eListX.com (PMDF V6.0-24 #44856) with ESMTP id <0GG400J423TEZ1@eListX.com>
	for xml-dev@lists.xml.org; Sat, 07 Jul 2001 12:44:02 -0400 (EDT)
Received: from makoto.attglobal.net
	(slip-210-88-161-117.kw.jp.prserv.net[210.88.161.117]) by prserv.net (out2)
	with SMTP          id <20010707164219202023ckkpe>; Sat,
	07 Jul 2001 16:42:19 +0000
Date: Sun, 08 Jul 2001 01:40:19 +0900
From: Murata Makoto <mura034@attglobal.net>
Subject: Re: XML Blueberry (non-ASCII name characters in Japan)
In-reply-to: <0GFB00C03YHR9D@eListX.com>
To: xml-dev@lists.xml.org
Message-id: <200107071640.AA05439@makoto.attglobal.net>
X-Mailer: AL-Mail32 Version 1.10
References: <0GFB00C03YHR9D@eListX.com>
List-Owner: <mailto:xml-dev-help@lists.xml.org>
List-Post: <mailto:xml-dev@lists.xml.org>
List-Subscribe: <mailto:xml-dev-request@lists.xml.org?body=subscribe>
List-Unsubscribe: <mailto:xml-dev-request@lists.xml.org?body=unsubscribe>
List-Archive: <http://lists.xml.org/archives/xml-dev>
List-Help: <http://lists.xml.org/elists/admin_email.shtml>,
	<mailto:xml-dev-request@lists.xml.org?body=help>

> > So I think it would be appropriate, in this discussion,
> > to have some people in the mainframe trenches give us
> > a briefing on the scale and the difficulty of the problems
> > they face, and for some of our i18n gurus to highlight
> > the problems faced by an XML language designer who wants
> > to use one of the newly-added languages.
> 
> I second this.

Summary: Japanese characters have been heavily used for tag names 
and they have been very useful.  Addition of more characters 
(CJK ideographics introduced in Unicode 3.1, etc.) is intensely 
desirable.

1. Current Status

XML 1.0 provides name characters for the Japanese language.  Since the 
inception of XML 1.0, people have used Japanese name characters 
for XML.  I believe that such use is very common.

Some people use Japanese name characters wherever possible.  Reasons: (1) 
the Japanese language is natural for Japanese, (2) translation to English 
is sometimes impossible because of cultural differences , and (3) some topics 
(e.g., Buddhism research) are specfic to Japan or Asia.

For example, an XML-based language for medical information uses 
Japanese name characters.  This language has been designed by doctors 
who read and write English well.  Nevertheless, they have chosen Japanese 
names because some terms simply cannot be translated to English.

Buddhism researchers have created a few DTDs which heavily use non-ASCII 
name characters.  Such names are very difficult to translate to English.  
Even when such translation is possible, these researchers want to use 
non-ASCII names very much.

One of my DTDs is used for data interchange between two companies.  This 
application is not experimenal but already plays a very important role in 
their main business.  All tag names in this DTD use Japanese characters.  
As far as I know, they have not cause any problems.  To the contrary, 
they are helpful in debugging, etc.

Others discourage use of Japanese name characters.  The reason is that 
some XML tools (e.g., CSS of Microsoft IE5.5) fail to support non-ASCII 
markup characters.  I think that such XML tools are broken and we should
try to change this situation.

2. Useful Additions.

To my regret, KATAKANA MIDDLE DOT (which is used to connect two 
names) is missing in the list of name characters of Unicode 2.0 and 
thus it is also missing in XML 1.0.  As a result, quite a few Japanese 
users have complained about this omission.  Addition of this character 
will make a lot of Japanese users happier.  To me, this is already 
a good enough reason to create XML 1.1.

Unicode 3.1 allows so many CJK ideographics.  Quite a few people expect 
that these characters will also be allowed as name characters.
Unlike Rick Jelliffe, I don't agree that newly introduced CJK ideographics 
are archaic.  First, national standards (e.g., JIS and CNS) have revisited 
unification: what was unified as a single character has occasionally become 
two characters.  One of the two characters has become a non-BMP 
character.  Second, quite a few Chenam characters are non-BMP characters.  

Some of the compatibility ideographics, namely U+FA0E, U+FA0F, U+FA11, 
U+FA13, U+FAF14, U+FA1F, U+FA21, U+FA23, U+FA24, U+FA27, U+FA28, FA29,
has become normal ideographics AFTER XML 1.0 was created.  Addition of 
these characters is very useful.

MURATA Makoto

------------------------------------------------------------------
The xml-dev list is sponsored by XML.org, an initiative of OASIS
<http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To unsubscribe from this elist send a message with the single word
"unsubscribe" in the body to: xml-dev-request@lists.xml.org

----- End of forwarded message from Murata Makoto -----

-- 
John Cowan                                   cowan@ccil.org
One art/there is/no less/no more/All things/to do/with sparks/galore
	--Douglas Hofstadter

Received on Saturday, 7 July 2001 16:28:21 UTC