W3C home > Mailing lists > Public > public-ws-desc-comments@w3.org > November 2004

I18N Comments, WSDL 2.0 Part I (partial)

From: Addison Phillips [wM] <aphillips@webmethods.com>
Date: Fri, 5 Nov 2004 10:28:10 -0800
To: <public-ws-desc-comments@w3.org>
Cc: <public-i18n-ws@w3.org>, <w3c-i18n-ig@w3.org>
Message-ID: <PNEHIBAMBMLHDMJDDFLHMEDAIMAA.aphillips@webmethods.com>

Dear WSD WG,

Following is the first part of our comments on WSDL 2.0. These comments apply to Part I, "Core Language". We realize that these comments are late, but hope that you will consider them carefully. Internationalization-related comments are presented first. Editorial and other comments are presented at the end.

[I18N Comments]

1. Section 2.1. In the following quote, the reference to URI should explicitly include IRIs (as the type xs:anyURI allows for these). In fact, there should be some care taken to clarify that URIs in this document mean IRIs, if possible:

Note that it is RECOMMENDED that the value of the targetNamespace attribute information item SHOULD be a dereferencible URI and that it resolve to a WSDL document which provides service description information for that namespace.

2. Section 2.1. Name uniqueness. While QName's definition provides for uniqueness in an internationalized way, it may be useful to reference what makes a name unique in this document. In particular, there is a gap that surrounds QName in that, although it is based on NCName and thus is "include normalized" according to the rules in CharMod:Normaliation, there is no requirement that the name itself be in a normalized form (i.e. Unicode Normalization Form C, which is recommended by but not required by XML 1.0/1.1) Some consideration for matching QNames should be made. This is a low-priority comment, since other groups are struggling with this issue (unsuccessfully).

Each WSDL or type system component MUST be uniquely identified by its qualified name.

3. Section 2.1.2, Section 5.0. The <documentation> element presents a number of internationalization concerns to us. Here is some of the text of Section 5:

WSDL uses the optional documentation element information item as a container for human readable and/or machine processable documentation. The content of the element information item is arbitrary character information items and element information items ("mixed" content in XML Schema[XML Schema: Structures]). The documentation element information item is allowed inside any WSDL element information item.

The problem here is that the documentation is supposed to describe, either in plain-text or in markup, aspects of the WSDL for documentation purposes. Human readable text has two problems here. First, it need to be tagged with the natural language (by referencing xml:lang, xsi:language, or by directly referencing RFC 3066 or its successors) and second, it may need to be repeated in different languages (which may be but are not necessarily translations). 

We suggest that:

a) The <documentation> element require an xml:lang attribute. The attribute may be empty (xml:lang="")
b) The <documentation> element be allowed to be repeated, provided the xml:lang attributes in each of the elements be unique.

You may wish to reference the <documentation> element (under annotations) in XMLSchema, although it is not as clear about the above as we would probably like :-).

4. Section 2.4.2. RPC Style. There is a requirement on the local part of the output element name that says:

The LocalPart of the output element's QName is obtained by concatenating the name of the operation and the string value "Response".

We understand that this is historically the way that it has been done and that some implementations, at least, rely on this. However, while it seems reasonable at first glance that, say, for operation "foobar", the response is called "foobarResponse", it may be less helpful in cases where the original operation name is non-ASCII in nature. It isn't clear why the name needs to be quite so determinate (and thus a concatenated construct) in the first place. Is there some reason why the return message needs a name based on the request message's plus some English ("Programmer-ese") token? Since in the RPC style the "out" message is presumably the response, can this requirement be relaxed?

5. Section 2.15. Simple Types. This section gave us a great deal of concern. In this section WSDL defines seven simple types used in the component model of WSDL 2.0. These types are: string, Token, NCName, anyURI, QName, boolean and int. The argument presented in this section is that these needed to be redefined because "the types defined here go beyond the capabilities of XML Schema to describe."

We are not sure why you consider this to be the case (our suspicion is that it is to ensure XML 1.1 compatibility). However, the definitions presented here are much less mature than those in XML Schema for internationalization purposes. We would strongly urge you to reconsider and use the XML Schema definitions directly. If there is a good reason not to use XML Schema directly, then we urge you to import, fully, the definitions in XML Schema for each of these types. A cursory review of our issues with the types you define are:

5a. string. The definition includes all code points between U+0000 and U+10FFFF. It doesn't deal with illegal characters in XML, such as surrogates, unassigned, or non-characters (like U+FFFF or U+10FFFF). XML 1.0 and XML 1.1 define various productions that can be used to avoid this problem, but we don't see why you don't just use the definition found in http://www.w3.org/TR/xmlschema-2/#string

5b. Token. This definition is similar to the one in XML Schema, but leaves out the prohibition on character #0xD. It is not usefully different than the one in XML Schema.

QName, NCName. The NCName and QName definitions say more-or-less what they are, but the productions cited in XML Schema (Namespaces in XML, http://www.w3.org/TR/1999/REC-xml-names-19990114/) should be explicitly cited.

5c. anyURI. This implicitly disallows IRIs. You should include the text from the second and subsequent paragraphs in XML Schema's defintion. In particular, anyURI in XML Schema represents the *unescaped* sequence (it is, effectively, an IRI).

5d. int. This is problematic on two fronts. First, it is different from the "int" type in XML Schema (it is very similar to the "integer" type: "int" in XML Schema is derived from "long", which is derived from "integer" and has a maximum and minimum value corresponding to an integer type of a specific size). Second, you don't define the lexical representation, which may present problems for internationalization. One presumes that the lexical description is the same...

[Non-I18N Comments]

6. Section 1.1. (non-i18n) The various terms defined in this section should be made into glossary references, since these are the Ur-definitions used in WSDL.

7. Section 2.0 (editorial) Capitalization of 'schema' in last paragraph of this section.

Thank you for considering our comments. We may have (intend to have) some addition comments on these documents within a week.

Best Regards,

Addison (For I18N WG and the Web Services I18N Task Force),

Addison P. Phillips
Director, Globalization Architecture

Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force

Internationalization is an architecture. 
It is not a feature.
Received on Friday, 5 November 2004 18:28:59 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:31:00 UTC