Re: XMLP SOAP1.2 Issue 270: Further clarification...

We are hours away from submitting a request to go to CR, and so we will
consider your readability edits during the CR period.


............................................
David C. Fallside, IBM
Ext Ph: 530.477.7169
Int  Ph: 544.9665
fallside@us.ibm.com



|---------+---------------------------->
|         |           "Addison Phillips|
|         |           [wM]"            |
|         |           <aphillips@webmet|
|         |           hods.com>        |
|         |           Sent by:         |
|         |           xmlp-comments-req|
|         |           uest@w3.org      |
|         |                            |
|         |                            |
|         |           12/11/2002 04:08 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                        |
  |       To:       <xmlp-comments@w3.org>                                                                                 |
  |       cc:       <public-i18n-ws@w3.org>, <w3c-i18n-ig@w3.org>                                                          |
  |       Subject:  XMLP SOAP1.2 Issue 270: Further clarification...                                                       |
  |                                                                                                                        |
  |                                                                                                                        |
  >------------------------------------------------------------------------------------------------------------------------|




Dear XMLP Editors,

Some time ago I worked with webMethods' XMLP rep, Asir, to produce a
rewrite of SOAP 1.2 Part 2 Appendix B [1]. This is issue #270 [2] on your
issues list. More recently I became the chair of the I18N-WG's Web Services
task force, which has been delegated the review of Web Services
recommendations, etc., by the I18N-WG Core group. At our recent
face-to-face we reviewed the various i18n issues in SOAP 1.2 to ensure that
we understood the actions taken by XMLP/SOAP WG and that we had no further
comments.

One of the issues our task force reviewed was Appendix B. As I remarked in
our last exchange (see the bottom of this email for a refresher), the text
was pretty hard to read and very difficult to understand. The I18N-WG
resolved therefore to prepare an improved version, which is included in
this message between the ---- lines.

This version is functionally identical to the previous version. However, it
should be easier to understand and therefore to correctly implement. Would
you please incorporate this version into SOAP 1.2 Part 2 in place of the
current version?

Best Regards,

Addison (on behalf of W3C-I18N-WG)

Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.

+1 408.962.5487 (phone)  +1 408.210.3569 (mobile)
-------------------------------------------------
Internationalization is an architecture.
It is not a feature.

Chair, W3C-I18N-WG Web Services Task Force
To participate see http://www.w3.org/International/ws

[1] http://www.w3.org/2000/xp/Group/2/06/LC/soap12-part2.html#namemap
[2] http://www.w3.org/2000/xp/Group/xmlp-lc-issues.html#x270

----------------

B. Mapping Application Defined Names to XML Names

This appendix details an algorithm for taking an application defined name,
such as the name of a variable or field in a programming language, and
mapping it to the Unicode characters that are legal in the names of XML
elements and attributes as defined in [Namespaces in XML].

Note: Application defined names are generally subject to the specific
restrictions of their underlying development environment. Ideally these
names should restricted to the subset of Unicode characters in accordance
with the guidelines in Unicode Standard Annex #15, Annex 7 ("Programming
Language Identifiers")[1]. Names that follow these guidelines will
generally also follow the guidelines in [Namespaces in XML].

Hex Digits

[5]    hexDigit    ::=    [0-9A-F]
* Note, only uppercase letters A-F are defined here.

B.1 Rules for mapping application defined names to XML Names

1. An XML Name has two parts: Prefix and LocalPart. Let Prefix be
determined per the rules and constraints specified in Namespaces in XML
[3]. The LocalPart will be determined by transforming the application name
of the object as follows.

2. Let "T" be the application name. "T" must be represented by a sequence
of Unicode characters in Unicode Normalization Form C (NFC) [2]. Let "M" be
the output of this algorithm, also as a sequence of Unicode characters.

Note: If the name in the application is represented in some non-Unicode
character encoding, it must be converted to Unicode before starting the
name mapping process. Ideally any such conversion will use a reversable
conversion (so that the original byte sequence can be obtained from the
Unicode sequence), although this is not possible for some encodings.

Note: Characters in the application name's original encoding that do not
have a mapping to Unicode should be handled in some reasonable, application
defined manner.

Note: "A sequence of Unicode characters" means a sequence of code points
(sometimes called Unicode Scalar Values). This should not be taken to imply
that the characters are using any particular encoding of Unicode. The
conversion algorithm itself may use whatever Unicode encoding is most
convenient on that platform. However, it is important to note that
surrogate pair sequences (pairs of UTF-16 code points that represent
characters in Unicode above U+FFFF) must be handled as a single Unicode
code point (their Scalar Value, that is the specific supplemental character
they represent), rather than as the individual bytes or surrogate
characters in UTF-16. Unpaired surrogates are not permitted. For example:

The UTF-16 sequence 0xD800 0xDC00 represents the Unicode character U+10000.
When performing the following steps, the value U+10000 is considered to be
a single Unicode character in the sequence T.

3. Let "i" be an integer representing the current position in "T", T(i),
with a starting value of 1, such that T(1) is the first character in T,
T(2) is the second, and so forth. Let "n" represent the last position in T,
T(n).

4. Starting with 1, iterate across T by increasing i by 1 and perform the
following evaluation on each character.

   a. If T begins with the string "xml" (or any upper/lower case variation,
such as "xmL", "XML", or "xMl"), encode T(1) using rule 4.c.i (that is,
output either "_x0078_" for "x" or "_x0058_" for "X") and increase "i" to
4.

   Note: if the sequence starts with "xml"/"XML"/"xMl"/etc., and is
followed by a Unicode combining character, the combining character is not
considered part of the letter "l" for processing purposes. By contrast,
precomposed characters such as U+013B (Latin letter capital L with cedilla)
do not trigger this rule. That is "xmĻ" is encoded as xmĻ, whereas
xml(U+0300) is encoded as _x0078_ml_x0300_. [Note that Normalization Form C
means that this special case will always produce the same resulting
sequence.]

   b. If T(i) falls in the range 0xD800 through 0xDFFF (that is, there is
an unpaired surrogate character in the sequence), stop with an error.

   c. Else if T(i) is not a valid XML NCName character (see [3]) or if i=1
and T(i) is not a valid first character of an XML NCName (see [3]) then:

      i. If T(i) < U+10000 and T(i) is not in the range 0xD800 through
0xDFFF, output to "M" the sequence "_x" followed by four hexDigits
representing the Unicode Scalar Value followed by an underscore ("_") (for
example, "x" (U+0078) would be encoded as "_x0078_").

      ii. Else if T(i) > U+FFFF, output to "M" the sequence "_x" followed
by eight hexDigits representing the Unicode Scalar Value. For example, the
character 0x10FFFE would be encoded as "_x0010FFFE_".

   d. Else if T(i) = "_" (lowline) and T(i+1) = "x" or "X", output
"_x005F_" to M.

   e. Else output T(i) to M.

[1]
http://www.unicode.org/unicode/reports/tr15/#Programming_Language_Identifiers

[2] http://www.unicode.org/unicode/reports/tr15/
[3] http://www.w3.org/TR/REC-xml-names/

Examples:



Hello world -> Hello_x0020_world  // space not permitted
Hello_xorld -> Hello_x005F_xorld  // _x rule
Helloworld_ -> Helloworld_

          x -> x
        xml -> _x0078_ml   // starts with xml
       -xml -> _x002D_xml  // starts with hyphen-minus
       x-ml -> x-ml        // not the string xml

     Ælfred -> Ælfred
   άγνωστος -> άγνωστος
ᜉᜅᜎᜈ        -> _x1709__x1705__x170E__x1708_   // the Tagalog block is newer and
not permitted in XML 1.0
ᏙᏚᎥ         -> _x13D9__x13DA__x13A5_  // The Cherokee block is newer and not
permitted in XML 1.0

xml̀moo -> _x0078_ml_x0300_moo  // Starts with "xml". Note that combining
character U+0300 is considered as separate from the "l" in "xml".

Note to editor> The last example is the only one that I have changed
(added).




----------------


> -----Original Message-----
> From: Martin Gudgin [mailto:mgudgin@microsoft.com]
> Sent: Sunday, August 25, 2002 6:03 PM
> To: Addison Phillips [wM]; asirv@webmethods.com
> Cc: W3C Public Archive; Jean-Jacques Moreau; Marc Hadley; Nilo Mitra;
> Noah Mendelson; Henrik Frystyk Nielsen
> Subject: RE: Algorithm for mapping an application defined name to an XML
> name
>
>
> Addison,
>
> Thanks very much for your detailed comments. I've commented inline
>
> Martin
>
> > -----Original Message-----
> > From: Addison Phillips [wM] [mailto:aphillips@webmethods.com]
> > Sent: 24 August 2002 00:05
> > To: Martin Gudgin; asirv@webmethods.com
> > Cc: W3C Public Archive; Jean-Jacques Moreau; Marc Hadley;
> > Nilo Mitra; Noah Mendelson; Henrik Frystyk Nielsen
> > Subject: RE: Algorithm for mapping an application defined
> > name to an XML name
> >
> >
> > Hi Martin,
> >
> > Thanks for the note. It's been awhile since I thought about this.
>
> Sorry it's taken us so long to incorporate your feedback.
>
> >
> > My edits were done from the original proposal. Although I
> > modified the text to be more correct about various Unicode
> > issues, I didn't change the structure of the original at all.
> > (FWIW, I would have designed and written it differently. And
> > I hate standards that obfuscate what's going on as much as
> > this one does. It not being my document, I didn't rewrite it.
> > I just edited the text to be more correct.)
>
> If you have ( or have time to produce ) a more readable version, I'm
> sure the editorial team would be very grateful.
>
/* much more deleted... */

Received on Thursday, 12 December 2002 08:38:04 UTC