RE: Action: LC74e (I18N Comments) from Addison Phillips [wM] on 2004-11-18 (www-ws-desc@w3.org from November 2004)

From: Addison Phillips [wM] <aphillips@webmethods.com>
Date: Thu, 18 Nov 2004 09:40:22 -0800
To: "Roberto Chinnici" <Roberto.Chinnici@Sun.COM>, "WS Description List" <www-ws-desc@w3.org>
Cc: <public-i18n-ws@w3.org>
Message-ID: <PNEHIBAMBMLHDMJDDFLHEECJINAA.aphillips@webmethods.com>
Hi Roberto,

Thanks for the note. I am preparing a detailed response, but Asir pinged me to send you a "placeholder" for now. The I18N WG has some problems with your proposed solution. Although we recognize the problems that XML Schema and XML 1.0 have because the character productions in XML are tied to a specific version of Unicode and that you are trying to avoid these issues, we also point out that your specific text is insufficient to address all of the I18N issues. I should note that these problems are not original to WSD WG. I think we should arrange a time to discuss this topic.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
http://www.webMethods.com

Chair, W3C Internationalization Working Group
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

> -----Original Message-----
> From: Roberto Chinnici [mailto:Roberto.Chinnici@Sun.COM]
> Sent: 2004年11月17日 11:21
> To: WS Description List
> Cc: aphillips@webmethods.com
> Subject: Action: LC74e (I18N Comments)
> 
> 
> This message is in fulfilment of my action item around LC74e [1].
> 
> From the action item text [1]:
> 
> > 5. Section 2.15. Simple Types. This section gave us a great 
> deal of concern.
> > In this section WSDL defines seven simple types used in the 
> component model of WSDL 2.0.
> > These types are: string, Token, NCName, anyURI, QName, boolean 
> and int. The argument
> > presented in this section is that these needed to be redefined 
> because "the types defined here
> > go beyond the capabilities of XML Schema to describe."
> > 
> > We are not sure why you consider this to be the case (our 
> suspicion is that it is to ensure
> > XML 1.1 compatibility).
> 
> That's indeed the goal.
> 
> > However,  the definitions presented here are much less mature than
> > those in XML Schema for internationalization purposes. We would 
> strongly urge you to reconsider
> > and use the XML Schema definitions directly. If there is a good 
> reason not to use XML Schema
> > directly, then we urge you to import, fully, the definitions in 
> XML Schema for each of these
> > types.
> 
> Using XML Schema definitions directly would preclude supporting 
> XML 1.1. This said, if the
> alternative we explored has negative consequences on 
> internationalization, I'd be in favor
> of revisiting the whole XML 1.1 support issue.
> 
> > A cursory review of our issues with the types you define are:
> > 
> > 5a. string. The definition includes all code points between 
> U+0000 and U+10FFFF. It doesn't
> > deal with illegal characters in XML, such as surrogates, 
> unassigned, or non-characters (like
> > U+FFFF or U+10FFFF). XML 1.0 and XML 1.1 define various 
> productions that can be used to avoid
> > this problem, but we don't see why you don't just use the 
> definition found in
> > http://www.w3.org/TR/xmlschema-2/#string
> 
> The definition of xs:string says:
> 
>   [Definition:]  The string datatype represents character strings 
> in XML. The ・value space・
>   of string is the set of finite-length sequences of characters 
> (as defined in [XML 1.0
>   (Second Edition)]) that ・match・ the Char production from [XML 
> 1.0 (Second Edition)].
> 
> It normatively references XML 1.0 and in particular its Char 
> production. In XML 1.1 this
> production has been extended, so that there are valid finite 
> character sequence in XML 1.1
> which can not be modeled as xs:string(s). Hence the need for wsdls:string.
>  
> > 5b. Token. This definition is similar to the one in XML Schema, 
> but leaves out the prohibition
> > on character #0xD. It is not usefully different than the one in 
> XML Schema.
> 
> The definition should be amended to exclude #0xD. It is different 
> from the one in XML Schema
> because it restricts wsdls:string, not xs:string. Thus there are 
> valid wsdls:Token values
> that are not valid xs:token(s) (see previous point).
>  
> > QName, NCName. The NCName and QName definitions say 
> more-or-less what they are, but the productions
> > cited in XML Schema (Namespaces in XML, 
http://www.w3.org/TR/1999/REC-xml-names-19990114/) should
> be explicitly cited.

Actually the wsdls:NCName is closer to the definition of NCName in [2], which in turns refers
to the definition of NameChar in [3]. It's an attempt to define a minimally constrained non-colonized
name type. As such, it contains many more values than xs:NCName. The same applies to wsdls:QName,
since it builds on wsdls:NCName.
 
> 5c. anyURI. This implicitly disallows IRIs. You should include the text from the second and
> subsequent paragraphs in XML Schema's defintion. In particular, anyURI in XML Schema represents
> the *unescaped* sequence (it is, effectively, an IRI).

You're correct. The definition of wsdls:anyURI should be amended in the way you describe.

> 5d. int. This is problematic on two fronts. First, it is different from the "int" type in XML
> Schema (it is very similar to the "integer" type: "int" in XML Schema is derived from "long",
> which is derived from "integer" and has a maximum and minimum value corresponding to an integer
> type of a specific size). Second, you don't define the lexical representation, which may
> present problems for internationalization. One presumes that the lexical description is the same...

The definition of wsdls:int given in the spec is unnecessarily wide. Its only use in the
specification is as the value of the {http error status code} property, for which an xs:int
would be sufficient. So I propose to amend the specification to redefine wsdls:int as being
an alias for xs:int. In particular, its value space would be from -2147483648 to 2147483647
inclusive.

[1] http://www.w3.org/2002/ws/desc/4/lc-issues/#LC74e
[2] http://www.w3.org/TR/xml-names11/#ns-qualnames
[3] http://www.w3.org/TR/xml-names11/#NT-NCNameChar

Roberto

-- 
Roberto Chinnici
Java Web Services
Sun Microsystems, Inc.
roberto.chinnici@sun.com
Received on Thursday, 18 November 2004 17:42:01 UTC