Action: LC74e (I18N Comments) from Roberto Chinnici on 2004-11-17 (www-ws-desc@w3.org from November 2004)

From: Roberto Chinnici <Roberto.Chinnici@Sun.COM>
Date: Wed, 17 Nov 2004 11:21:09 -0800
To: WS Description List <www-ws-desc@w3.org>
Cc: aphillips@webmethods.com
Message-id: <419BA4A5.9040205@sun.com>
This message is in fulfilment of my action item around LC74e [1].

>From the action item text [1]:

> 5. Section 2.15. Simple Types. This section gave us a great deal of concern.
> In this section WSDL defines seven simple types used in the component model of WSDL 2.0.
> These types are: string, Token, NCName, anyURI, QName, boolean and int. The argument
> presented in this section is that these needed to be redefined because "the types defined here
> go beyond the capabilities of XML Schema to describe."
> 
> We are not sure why you consider this to be the case (our suspicion is that it is to ensure
> XML 1.1 compatibility).

That's indeed the goal.

> However,  the definitions presented here are much less mature than
> those in XML Schema for internationalization purposes. We would strongly urge you to reconsider
> and use the XML Schema definitions directly. If there is a good reason not to use XML Schema
> directly, then we urge you to import, fully, the definitions in XML Schema for each of these
> types.

Using XML Schema definitions directly would preclude supporting XML 1.1. This said, if the
alternative we explored has negative consequences on internationalization, I'd be in favor
of revisiting the whole XML 1.1 support issue.

> A cursory review of our issues with the types you define are:
> 
> 5a. string. The definition includes all code points between U+0000 and U+10FFFF. It doesn't
> deal with illegal characters in XML, such as surrogates, unassigned, or non-characters (like
> U+FFFF or U+10FFFF). XML 1.0 and XML 1.1 define various productions that can be used to avoid
> this problem, but we don't see why you don't just use the definition found in
> http://www.w3.org/TR/xmlschema-2/#string

The definition of xs:string says:

  [Definition:]  The string datatype represents character strings in XML. The ·value space·
  of string is the set of finite-length sequences of characters (as defined in [XML 1.0
  (Second Edition)]) that ·match· the Char production from [XML 1.0 (Second Edition)].

It normatively references XML 1.0 and in particular its Char production. In XML 1.1 this
production has been extended, so that there are valid finite character sequence in XML 1.1
which can not be modeled as xs:string(s). Hence the need for wsdls:string.
 
> 5b. Token. This definition is similar to the one in XML Schema, but leaves out the prohibition
> on character #0xD. It is not usefully different than the one in XML Schema.

The definition should be amended to exclude #0xD. It is different from the one in XML Schema
because it restricts wsdls:string, not xs:string. Thus there are valid wsdls:Token values
that are not valid xs:token(s) (see previous point).
 
> QName, NCName. The NCName and QName definitions say more-or-less what they are, but the productions
> cited in XML Schema (Namespaces in XML, http://www.w3.org/TR/1999/REC-xml-names-19990114/) should
> be explicitly cited.

Actually the wsdls:NCName is closer to the definition of NCName in [2], which in turns refers
to the definition of NameChar in [3]. It's an attempt to define a minimally constrained non-colonized
name type. As such, it contains many more values than xs:NCName. The same applies to wsdls:QName,
since it builds on wsdls:NCName.
 
> 5c. anyURI. This implicitly disallows IRIs. You should include the text from the second and
> subsequent paragraphs in XML Schema's defintion. In particular, anyURI in XML Schema represents
> the *unescaped* sequence (it is, effectively, an IRI).

You're correct. The definition of wsdls:anyURI should be amended in the way you describe.

> 5d. int. This is problematic on two fronts. First, it is different from the "int" type in XML
> Schema (it is very similar to the "integer" type: "int" in XML Schema is derived from "long",
> which is derived from "integer" and has a maximum and minimum value corresponding to an integer
> type of a specific size). Second, you don't define the lexical representation, which may
> present problems for internationalization. One presumes that the lexical description is the same...

The definition of wsdls:int given in the spec is unnecessarily wide. Its only use in the
specification is as the value of the {http error status code} property, for which an xs:int
would be sufficient. So I propose to amend the specification to redefine wsdls:int as being
an alias for xs:int. In particular, its value space would be from -2147483648 to 2147483647
inclusive.

[1] http://www.w3.org/2002/ws/desc/4/lc-issues/#LC74e
[2] http://www.w3.org/TR/xml-names11/#ns-qualnames
[3] http://www.w3.org/TR/xml-names11/#NT-NCNameChar

Roberto

-- 
Roberto Chinnici
Java Web Services
Sun Microsystems, Inc.
roberto.chinnici@sun.com
Received on Wednesday, 17 November 2004 19:17:04 UTC