RE: Comments on WSDL 2.0 (Core, Adjuncts, Soap 1.1 Binding) from the i18n core wg

Hello Addison, Jonathan, Felix, others,

[Also coping and;
when replying, please reduce cross-posting if possible.]

Addison's description below is very much to the point,
but there are a few additions that I'd like to make.

At 03:31 05/11/02, Addison Phillips wrote:
 >Hi Jonathan/WSDL-WG,
 >This is a personal response. I hope that the WG can address this next week
 >and send you an official response.
 >The differences are actually quite simple. XLink 1.0 uses almost exactly
 >the same text as IRI Section 3.1 Step 2. It omits Step 1.
 >Step 1 consists of three things.
 >1a. covers when you transcribe an IRI into a digital form from a
 >non-digital form (e.g. when you type in the IRI you wrote down on a napkin
 >earlier). This can't apply to WSDL, of course, unless the WSDL starts to be
 >printed on the sides of buses or the backs of envelopes :-).
 >1b. covers conversion from a "legacy" (non-Unicode) encoding to a Unicode
 >encoding and requires normalizing the text using Unicode Normalization Form
 >C. Since WSDL is defining an XML document, which is defined as a sequence
 >of Unicode characters, this is really a consideration for the XML
 >processor, I believe, rather than something WSDL itself needs to address.

Yes, but then again, the XML processor doesn't necessarily know
which (parts of) elements or attributes will be interpreted as
IRIs. It turns out that in a well-layered architecture, it is
difficult to actually do the Unicode normalization correctly.
We have already received implementer feedback from the CSS WG
that the normalization in 1b. (which is currently a MUST) is
difficult to implement. As a consequence, there is some chance
that this gets revisited in a future version of the IRI spec,
where that normalization requirement may be downgraded to a
SHOULD or MAY or removed altogether. Any feedback on this,
in particular from implementers, is wellcome at

 >1c. is basically a no-op: it covers normalization of IRIs already in a
 >Unicode encoding (you don't normalize if the IRI is already in Unicode).
 >So the difference is the application of (1b) when the document is encoded
 >using a non-Unicode encoding, something that probably doesn't apply to WSDL
 >directly anyway: it is something that happens at the XML processor level,
 >XML documents being a sequence of Unicode characters...

There is also another difference between IRIs and anyURIs as defined
by XML Schema (by reference to XLink). It is the fact that anyURI
allows the space character and a few other US-ASCII characters that
are not allowed in IRIs. The reason for this is that they were
allowed in an earlier draft of the IRI spec, on which XLink and
XML Schema were based, but that later on there was strong feedback
from the IETF that these characters should be disallowed.

Because of this history, the IRI spec still contains a 'backdoor'
paragraph in Section 3.1 that reads:

    Systems accepting IRIs MAY also deal with the printable characters in
    US-ASCII that are not allowed in URIs, namely "<", ">", '"', space,
    "{", "}", "|", "\", "^", and "`", in step 2 above.  If these
    characters are found but are not converted, then the conversion
    SHOULD fail.  Please note that the number sign ("#"), the percent
    sign ("%"), and the square bracket characters ("[", "]") are not part
    of the above list and MUST NOT be converted.  Protocols and formats
    that have used earlier definitions of IRIs including these characters
    MAY require percent-encoding of these characters as a preprocessing
    step to extract the actual IRI from a given field.  This
    preprocessing MAY also be used by applications allowing the user to
    enter an IRI.

For WSDL, this gives the following choices:
a) Allow these characters, and specify the addidional escaping as above
b) Disallow these characters. This can easily be done with a pattern
c) Allow some of the above characters but not all (XLink 1.1 allows
    the space for the xlink:href attribute, but not other characters,
    although '^' is used in xpointers)
c) Choose a mixture of the above (XLink does not allow any of the
    above characters in some other fields that are defined as IRIs,
    such as role and arcrole).

For WSDL, I think b) is the best choice, but there may be some
feedback from implementers and users.

Regards,    Martin.

 >In any case, I agree with Martin. I would suggest text more like the
 >following instead:
 >Note: The xs:anyURI type is defined so that xs:anyURI values are
 >essentially IRIs [RFC 3987]. The conversion from xs:anyURI values to an
 >actual URI is via an escaping procedure defined by [XLink 1.0], which is
 >identical in most respects to IRI Section 3.1. (The only difference being
 >that IRI defines handling of non-Unicode encoded byte sequences,
 >considerations which do not affect this document directly.)
 >Best Regards,
 >Addison P. Phillips
 >Globalization Architect, Quest Software
 >Chair, W3C Internationalization Core Working Group
 >Internationalization is not a feature.
 >It is an architecture.
 >> -----Original Message-----
 >> From: [mailto:public-i18n-core-
 >>] On Behalf Of Jonathan Marsh
 >> Sent: 2005年11月1日 9:59
 >> To: Martin Duerst; Felix Sasaki;
 >> Cc:
 >> Subject: RE: Comments on WSDL 2.0 (Core, Adjuncts, Soap 1.1 Binding) from
 >> the i18n core wg
 >> Do you have the differences at your fingertips or will I have to do my
 >> own homework? :-)  And, which do you prefer, that we list diffs or stay
 >> quiet? I expect the WG to adopt the I18N suggestions without much
 >> dissent so having a clear position from the experts is valuable.
 >> -----Original Message-----
 >> From: Martin Duerst []
 >> Sent: Sunday, October 30, 2005 1:08 AM
 >> To: Felix Sasaki; Jonathan Marsh;
 >> Cc:
 >> Subject: Re: Comments on WSDL 2.0 (Core, Adjuncts, Soap 1.1 Binding)
 >> from the i18n core wg
 >> Same comment here as for XLink 1.1: I think it's not a good idea to
 >> use the text below (provided by Felix) as such, because it easily
 >> may give the impression that there are serious differences when
 >> the chances for differences is actually very small. So I think it's
 >> better to either list the differences or not say anything.
 >> Regards,   Martin.
 >> At 12:41 05/10/26, Felix Sasaki wrote:
 >>  >
 >>  >On Wed, 26 Oct 2005 06:32:11 +0900, Jonathan Marsh
 >> <>
 >>  >wrote:
 >>  >
 >>  >>
 >>  >> The WG had a hard time understanding your comment 3:
 >>  >>
 >>  >> "It would be good if you could mention that although xs:anyURI
 >> allows
 >>  >> for IRIs (see LC74a), the mapping from IRI to URI in xs:anyURI is
 >>  >> currently not defined in terms of IRI. This comment relates also for
 >>  >> example to the reference of xs:anyURI in sec. and sec.
 >>  >> and to the Adjuncts specification."
 >>  >>
 >>  >> Can you provide us with more background, or perhaps precise wording
 >> for
 >>  >> what you'd like to see?
 >>  >
 >>  >
 >>  >Sorry for being unclear. The problem is as follows, and this is also a
 >>  >proposal for some text which you might integrate as a note in WSDL
 >> 2.0:
 >>  >
 >>  >xs:anyURI defines a mapping from xs:anyURI values to URIs via an URI
 >>  >reference escaping procedure. In the current version of XML Schema 2,
 >> this
 >>  >procedure is defined in terms of XLink 1.0, and does not reply on the
 >>  >escaping procedure from RFC 3987 (IRI, sec. 3.1). Hence, relying on
 >>  >xs:anyURI might generate escaped URIs which are different from IRI
 >> based
 >>  >escaped URIs.
 >>  >
 >>  >Is that o.k. with you?
 >>  >
 >>  >Best regards,
 >>  >
 >>  >Felix
 >>  >
 >>  >>
 >>  >> -----Original Message-----
 >>  >> From:
 >>  >> [] On Behalf Of Felix
 >>  >> Sasaki
 >>  >> Sent: Saturday, October 08, 2005 8:54 PM
 >>  >> To:
 >>  >> Cc:
 >>  >> Subject: Comments on WSDL 2.0 (Core, Adjuncts, Soap 1.1 Binding)
 >> from
 >>  >> the i18n core wg
 >>  >>
 >>  >>
 >>  >> Dear Web Services Description Working Group,
 >>  >>
 >>  >> With this mail I am sending you i18n comments [1] on the WSDL 2.0
 >> WDs
 >>  >> (Core, Adjuncts, Soap 1.1 Binding). Since I am rather late (please
 >>  >> accept
 >>  >> my appologies), there was no time to get endorsement from the i18n
 >> core
 >>  >>
 >>  >> wg. So please regard these comments currently as my personal
 >> comments.
 >>  >>
 >>  >> I am looking forward for you feedback. Best regards,
 >>  >>
 >>  >> Felix Sasaki (team contact of the i18n core wg)
 >>  >>
 >>  >> [1]
 >>  >>
 >>  >>
 >>  >>
 >>  >
 >>  >
 >>  >

Received on Friday, 4 November 2005 07:07:34 UTC