Re: Request for clarification on Canonical XML from Markus Scherer on 2003-07-29 (w3c-ietf-xmldsig@w3.org from July to September 2003)

From: Markus Scherer <markus.scherer@us.ibm.com>
Date: Tue, 29 Jul 2003 09:44:59 -0700
To: "Mark Davis" <mark.davis@jtcsv.com>
Cc: "Martin Duerst" <duerst@w3.org>, "Jungshik Shin" <jshin@i18nl10n.com>, "Joseph Reagle" <reagle@w3.org>, w3c-i18n-ig@w3.org, w3c-ietf-xmldsig@w3.org
Message-ID: <OFC97845BE.3FA9C5A6-ON88256D72.005AF2B3-88256D72.005C00AA@us.ibm.com>

Mark is correct about ECMAScript improvements. ECMAScript edition 3 is and 
will be for some time the current standard. There will be an add-on 
standard, ECMAScript for XML, that will add some language upgrades outside 
of its immediate scope, but no Unicode/I18N improvements.

However, the committee has considered the issue and decided, like Java/ICU 
and certainly many others, that it would be too disruptive to support 
supplementary code points with anything else but UTF-16 (up from 
effectively UCS-2). Most of the necessary work was done on this path but 
has been suspended now.

In addition to Java/ICU/ECMAScript, there are many other examples for 
16-bit Unicode, even on Linux. OpenOffice, the whole KDE/Qt stack, Opera, 
Mozilla, SAP, MacOS X, the Xerces XML parser suite implementing the 
standard XML DOM API, all use 16-bit Unicode, and many if not all support 
supplementary code points now. So far, I was counting Python, too. I need 
to follow the URL below.

markus

Markus Scherer  マルクス  IBM GCoC-Unicode/ICU  San José, CA 
markus.scherer@us.ibm.com

"Mark Davis" <mark.davis@jtcsv.com>
2003-07-29 08:03

        To:     "Jungshik Shin" <jshin@i18nl10n.com>, "Joseph Reagle" 
<reagle@w3.org>
        cc:     "Martin Duerst" <duerst@w3.org>, 
<w3c-ietf-xmldsig@w3.org>, <w3c-i18n-ig@w3.org>, Markus 
Scherer/Cupertino/IBM@IBMUS
        Subject:        Re: Request for clarification on Canonical XML

My understanding is that the ECMAScript Unicode improvements have been
effectively put on hold, due to the loss of one of the leading
contributors due to the Netscape layoffs. I'm cc'ing Markus Scherer in
case he can add anything on that topic.

Mark
__________________________________
http://www.macchiato.com
►  “Eppur si muove” ◄

----- Original Message ----- 
From: "Jungshik Shin" <jshin@i18nl10n.com>
To: "Joseph Reagle" <reagle@w3.org>
Cc: "Martin Duerst" <duerst@w3.org>; <w3c-ietf-xmldsig@w3.org>;
<w3c-i18n-ig@w3.org>
Sent: Tuesday, July 29, 2003 06:19
Subject: Re: Request for clarification on Canonical XML

>
> On Mon, 28 Jul 2003, Joseph Reagle wrote:
>
> > [[[
> > Note: Canonical XML is an octet sequence resulting from
characters, from the
> > UCS character domain, encoded in UTF-8. Creating a deterministic
octet
>
>   Two successive 'from ...'s just linked by a comma are a bit
confusing,
> aren't they?
>
> ...
> > some applications might want a canonical form of XML in a
different
> > encoding, or one that is simply a sequence of characters, without
concern
> > for its encoding. For example, it may be appropriate to choose
UTF-16
> > rather than UTF-8 as the encoding of an API in a programming
language using
> > UTF-16 to represent Unicode strings, such as Java or Python. Or,
one might
> ....
>
>   Python's use of UTF-16(actually UCS-2) for the internal string
> represenation appears to be going away.  See
> http://mail.nl.linux.org/linux-utf8/2003-07/msg00113.html
> Even if it's not going away, Python doesn't seem to be
> a typical case to take an example of.
> See also http://www.egenix.com/files/python/Unicode-EPC2002-Talk.pdf
> (page 15 and page 27). When ECMAscript is updated to deal with
UTF-16
> (as it is implemented and specified, its support of UTF-16 as
opposed to
> UCS-2 is at most patchy) as planned, it might be a good example.
> On the other hand, as is well known, there are widely used/popular
APIs
> that (exclusively) use UTF-16 (Win32 W APIs and ICU) that may be
cited
> if Java alone is considered too 'lonely' :-)
>
>   Jungshik
>
> P.S. I'm humbled and grateful that Martin and Tex welcomed me to the
list.
>
>

Received on Tuesday, 29 July 2003 12:49:17 UTC